CN108389069A - Top-tier customer recognition methods based on random forest and logistic regression and device - Google Patents

Top-tier customer recognition methods based on random forest and logistic regression and device Download PDF

Info

Publication number
CN108389069A
CN108389069A CN201810028009.1A CN201810028009A CN108389069A CN 108389069 A CN108389069 A CN 108389069A CN 201810028009 A CN201810028009 A CN 201810028009A CN 108389069 A CN108389069 A CN 108389069A
Authority
CN
China
Prior art keywords
tier customer
sample
logistic regression
random forest
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810028009.1A
Other languages
Chinese (zh)
Inventor
李云亭
张洪利
荣以平
朱伟义
刘霄慧
尹明立
粱波
王伟
姜云
刘昳娟
王鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shandong Electric Power Co Ltd filed Critical State Grid Shandong Electric Power Co Ltd
Priority to CN201810028009.1A priority Critical patent/CN108389069A/en
Publication of CN108389069A publication Critical patent/CN108389069A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a kind of top-tier customer recognition methods based on random forest and logistic regression and devices, the described method comprises the following steps:Sample customer value feature is obtained, and carries out quality differentiation;Using sample customer data, top-tier customer identification model is built based on random forest and logistic regression algorithm;Using the value characteristic of client to be identified as input, it is based on the top-tier customer identification model, judges whether the client is top-tier customer.The present invention is based on the precise positionings that big data realizes top-tier customer.

Description

Top-tier customer recognition methods based on random forest and logistic regression and device
Technical field
The invention belongs to machine learning field more particularly to it is a kind of based on the top-tier customer of random forest and logistic regression know Other method and device.
Background technology
With electric Power Reform in-depth, comprehensive relieving of sales market, electric companies at different levels of State Grid Corporation of China face The market competitive pressure, to promote power grid enterprises' profitability and competitiveness, increases the loyalty, satisfaction and visitor of top-tier customer Family stickiness, on the basis of carrying out whole society's universal service, it will be each sale of electricity main body to provide good service for top-tier customer for enterprise Compete the main means and strategy of top-tier customer, it is necessary to targetedly competitive service strategy is formulated, by limited Service Source It puts on the body of top-tier customer, is established with it stable for electricity consumption relationship, be that power grid enterprises keep long-term sustainable to develop Inevitable choice.
With the explosive growth of data volume and the continuous improvement of business need, traditional service system structure is more next It is more difficult to meet the requirement of system operation.Big data technology has been reached common understanding in the world as important strategic resource, This basic strategic resource of data is analysis customer demand and provides pertinent service, provides data supporting.
Therefore, the precise positioning that top-tier customer how is realized based on big data, is that the technology urgently solved is needed to ask at present Topic.
Invention content
To overcome above-mentioned the deficiencies in the prior art, the present invention provides a kind of sale of electricity side groups in random forest and logistic regression Top-tier customer recognition methods and device, the method with grid company client electrical properties, electricity consumption behavior, with electrical feature etc. Based on mass data, the customer evaluation index system of various dimensions is established, the customer evaluation built in a manner of data analysis is passed through Model carries out comprehensive score to client, to realize the precise positioning to top-tier customer.
To achieve the above object, the present invention adopts the following technical scheme that:
A kind of top-tier customer recognition methods based on random forest and logistic regression, includes the following steps:
Step 1:Sample customer value feature is obtained, and carries out quality differentiation;
Step 2:Using sample customer data, mould is identified based on random forest and logistic regression algorithm structure top-tier customer Type;
Step 3:Using the value characteristic of client to be identified as input, it is based on the top-tier customer identification model, judges institute State whether client is top-tier customer.
Further, the step 1 includes:
Step 1.1:Customer value evaluating characteristic index system is built according to user's items power information of acquisition;
Step 1.2:According to the value characteristic of the index system statistical sample user, and carries out sample of users quality and sentence Not.
Further, in the step 1 value characteristic include the corresponding essential attribute of user, economic value, Laden-Value, Dynamogenetic value, credit worthiness, industry are worth data.
Further, the step 2 includes:
Step 2.1:Sample of users data are pre-processed;
Step 2.2:Top-tier customer judgment models are trained based on random forest method;
Step 2.3:Top-tier customer grade judgment models are built using logistic regression algorithm;
Step 2.4:Top-tier customer, which is obtained, in conjunction with top-tier customer judgment models and top-tier customer grade judgment models identifies mould Type.
Further, the step 2.1 includes:Data cleansing, characteristic factor quantization, feature expand, feature selecting and different Constant value processing.
Further, the step 2.2 includes:
Full feature training:Sample chooses whole sample of users data, and model enters ginseng for whole operational indicators;
Important feature is trained:Sample chooses whole sample of users data, and it is high preceding 40% index of importance that model, which enters ginseng,;
Full characteristic crossover training:Mix the sample with user data and averagely split into 10 parts, select every time wherein 9 parts as trained sample This, remaining 1 part is used as forecast sample, loop iteration 10 times, model to enter ginseng for whole operational indicators;
Important feature cross-training:Mix the sample with user data and averagely split into 10 parts, every time select wherein 9 parts as train Sample, remaining 1 part is used as forecast sample, loop iteration 10 times, and it is high preceding 40% index of importance that model, which enters ginseng,.
Further, the step 2.3 includes:The top-tier customer that top-tier customer judgment models obtain is passed through into logistic regression Model carries out comprehensive score;Multiple comprehensive score sections are set, top-tier customer grade judgment models are obtained.
Further, the method further includes:Trained model is integrated, it is special to collect user by data-interface Data are levied, the judgement of the high-quality grade of client is periodically carried out.
Second purpose according to the present invention, the present invention also provides a kind of high-quality visitor based on random forest and logistic regression Family identification device, including memory, processor and storage are on a memory and the computer program that can run on a processor, institute It states when processor executes described program and realizes the method.
Third purpose according to the present invention, the present invention also provides a kind of computer readable storage mediums, are stored thereon with Computer program executes a kind of top-tier customer based on random forest and logistic regression when the program is executed by processor Recognition methods.
Beneficial effects of the present invention
1, the present invention is adopted by grid company client with electrical properties, electricity consumption behavior, with based on the mass datas such as electrical feature With the technological means of machine learning, the identification of top-tier customer is realized, providing good service to be directed to top-tier customer provides guarantor Barrier helps to promote power grid enterprises' competitiveness.
2, the present invention carries out the training of client's identification model in such a way that random forest and logistic regression are combined, described Identification model can judge the high-quality grade of client, high-quality visitor be furthermore achieved on the basis of identifying whether client is good The precise positioning at family.
3, the present invention establishes the permanent mechanism of the top-tier customer identification model upgrading optimization, based on supervising professional method to excellent The judging result of matter client's identification model aperiodically carries out efficiency analysis, and is based on analysis result, the high-quality visitor of re -training Family Statistical error model achievees the purpose that model version upgrading and optimization by re -training model.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's shows Meaning property embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is that the present invention is based on the top-tier customer recognition methods flow charts of random forest and logistic regression;
Fig. 2 is that top-tier customer identification model of the present invention builds flow chart;
Fig. 3 is that the present invention is based on client's grade trend schematic diagrames that logistic regression is formed.
Specific implementation mode
It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
In the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other.
Embodiment one
The top-tier customer recognition methods based on random forest and logistic regression that present embodiment discloses a kind of, as shown in Figure 1, Include the following steps:
(1) data preparation stage
1, customer value evaluating characteristic index system is established:
By collecting User Profile information, economic value category information, Laden-Value category information, dynamogenetic value category information, letter It is worth category information with value category information, industry, the various factors for influencing customer general value of comprehensive analysis are established customer value and commented Valence characteristic index system.Discussion and customer surveys are concentrated by client, realizes that prefectures and cities' sample of users quality differentiates, is model Training provides data basis.
It is based on the various value characteristics that grid company is brought according to top-tier customer, the every electricity consumption for combing client refers to Mark sorts out index according to customer value angle, builds customer evaluation index system, to criterionization processing, goes forward side by side Row various dimensions summarize, to judge that the high-quality characteristic of client provides data basis.
2, model training sample is determined:
By the top-tier customer index system determined with districts and cities expert discussions, based on sales service application system, telecommunications is used Acquisition system is ceased, counts the corresponding essential attribute of sample of users, economic value, Laden-Value, dynamogenetic value, credit valence respectively Value, industry are worth data, in this, as model training sample.To 47.4 ten thousand sample customer electricity behavioural characteristic numbers in the present embodiment According to expert judging has been carried out, whether high-quality it is labelled with.
User property:Family number, name in an account book, trade classification, whether highly energy-consuming and electricity consumption classification.
Economic value:Customer electricity to situation of getting a profit caused by power supply enterprise, as average electric sales rate is higher, electricity consumption compared with Greatly, the more client of the electricity charge.Including:Current average electric sales rate, the current electricity charge, current electricity, accumulative average electric sales rate, the accumulative electricity charge, Accumulative electricity, contract capacity and working capacity.
Laden-Value:The electric load value that client shows during electricity consumption, as power factor (PF) is larger, average negative The high and low preferable client of paddy power consumption rate of lotus rate.Including:Average daily load rate, Peak power use rate, valley power consumption rate and power tune system Number.
Dynamogenetic value:Client itself electricity consumption development is preferable, and future contributes larger client, can be brought to company lasting Profit contribution.Including:Current electricity growth rate, nearly 3 months electricity growth rates, nearly 6 months electricity growth rates, nearly 1 year electricity increase Long rate, increase-volume number and volume reduction number.
Credit worthiness:Credit is that the basic guarantee of transaction is completed for electricity consumption both sides, can use electricity in accordance with the law, pay the electricity charge on time Client.Including:The advance rate of carrying down of the electricity charge, the overdue number of days of electricity charge returned money, the overdue number of electricity charge returned money, electricity charge returned money phase, check Returned ticket number and promise breaking stealing number.
Industry is worth:Consider that the industry development foreground of client, the development of industry entirety electricity consumption level are preferable.Including:Industry Electricity growth rate, industry major class electricity growth rate and industry group electricity growth rate.
In data preparation stage, the standard formulation work of supervision source is also carried out, i.e., as effective supervision source, it is substantially answered What the business of the satisfaction is, the supervision source of output only in the business is just considered effective, can carry out Supervised learning.
Expert judging has been carried out to 47.4 ten thousand sample customer electricity behavioural characteristic data in the present embodiment, whether excellent has been labelled with Matter.
(2) data processing stage
Current database is easily invaded and harassed by noise, loss data and inconsistent data, and quantity is too big, and comes from mostly Multiple heterogeneous data sources cause the quality of data relatively low, and low-quality data will cause the result of data analysis inaccurate, therefore Before model training, need to carry out data prediction.The data prediction of this programme is mainly from characteristic factor quantization, exceptional value Reason, continuous variable processing etc. expansion.
1, data cleansing
It is examined by the inspection of data over run value, feature validation test, data null value, data is cleaned.
It transfinites inspection:Check that electricity consumption and electricity charge electricity price are 0 record and are deleted, electricity consumption and electricity charge electricity price are equal Indicate that user without electricity, i.e., does not produce for 0, other related features also do not have characteristic.
Characteristic validity inspection:Check that the excessively single record of user's importance characteristic information, only minority belong to important User.
Null value inspection:Check that the complete overdue number of days of empty and electricity charge returned money of pause day digital section lacks serious record.Suspend day The complete empty expression pause full user of number of days of digital section lacks;It checks the overdue number of days of electricity charge returned money, it is found that field record is sky, but have Body business is not overdue.
2, characteristic factor quantifies
The information such as files on each of customers, festivals or holidays and the weather come from marketing system or other systems acquisition are all to use word or generation Number indicate, need to carry out numeralization expression to this class variable.
42 field features such as name in an account book, family number, industry, industry group, industry major class, highly energy-consuming trade, importance rate. It is classified as follows:1) customer attribute information;2) economic value;3) Laden-Value;4) dynamogenetic value;5) credit worthiness;6) industry valence Value.
Factorization is converted:(being expressed using 0/1/2/3... digital codings) industry, industry group, industry major class, high consumption It can industry, importance rate, electricity consumption classification, voltage class, region, scale of investment, the size of capacity, load character;
3, feature is expanded:
1) normalization is expanded:(setting within [0-1] user data value to data as feature) electricity charge, contract capacity, Nearly annual electricity sales amount, nearly 6 monthly average electricity sales amounts, nearly 3 monthly average electricity sales amounts, working capacities;
2) discretization is expanded:(user data value is segmented by size and is used as feature) electricity charge, contract capacity, a nearly annual Electricity sales amount, nearly 6 monthly average electricity sales amounts, nearly 3 monthly average electricity sales amounts, working capacities;
3) sequencing feature is expanded:(sorting by size user data value as feature) electricity charge, are put down at contract capacity for nearly 1 year Equal electricity sales amount, nearly 6 monthly average electricity sales amounts, nearly 3 monthly average electricity sales amounts, working capacities;
4) few data encoding is measured to expand:(codings of onehot 0/1) increase-volume number, volume reduction number, the old deficient electricity charge, Chen Qian electricity Take accounting, promise breaking stealing number.
4, feature selecting:
For user property feature, the distributing equilibrium situation of data is observed, whether these dimensional characteristics of preliminary analysis are to excellent The influence of matter and requirement item.
For 5 class value characteristics, the distributing equilibrium situation of data is observed, whether these dimensional characteristics of preliminary analysis are to high-quality With the influence of requirement item.It checks whether with associate feature.
Comprehensive dimensionality reduction, explores a variety of methods of attempting, and the result of comprehensive various methods carries out dimensionality reduction.
5, outlier processing
Gathered data, which exists, not to be acquired or the case where abnormal data, archives class data the case where there is also missings, needs needle Missing values processing is carried out to this partial data, different missing values processing methods is selected according to different business rule:
Default value is replaced:For such as the case where load character, voltage class, being set by universal business rule in certain archives Default value is set to be calculated.
Case scalping method:If missing values proportion is fewer, and certain attribute is important, then is picked using case Division weeds out the data.If such as user id loses in User Profile information, directly weeds out the data.
Mean value Shift Method:If missing values are value types, the number of missing is filled with the average value of front and back data According to.
If missing values are non-numeric types, the data that are lacked come polishing with the mode of the attribute.
Calorie completion method:An object most like with missing data object is selected in data set, with the value of the object Instead of missing values.
(3) model training stage
The present embodiment carries out model training using random forest and logistic regression, as shown in Figure 2.
1, it is based on random forest method and trains top-tier customer judgment models
Importance index is chosen
Importance index selection is carried out using following two methods:One is the methods based on OOB errors, referred to as MDA (Mean Decrease Accuracy);Another kind is the method based on Gini impurity levels, referred to as MDG (Mean Decrease Gini).Both of which is that the bigger expression variable of scalar value is more important.By model training, index importance analysis knot is obtained Two methods of fruit, the importance index that comparison obtain, table specific as follows:
Table 1
Ranking MDA MDG
1 Accumulative electricity Accumulative electricity
2 The accumulative electricity charge The accumulative electricity charge
3 The current electricity charge The current electricity charge
4 Current electricity Current electricity
5 Working capacity Working capacity
6 It dishonours a cheque number Power tune coefficient
7 Accumulative average electric sales rate Industry major class electricity growth rate
8 Industry major class electricity growth rate Annual daily load rate
9 Power tune coefficient The electricity charge returned money phase
10 Accumulative electricity price growth rate Industry group electricity growth rate
In conjunction with the above importance index, determine that 13 indexs are importance index, it is specific as follows:
Table 2
Serial number Importance index Corresponding data arranges
1 Accumulative electricity 7
2 The accumulative electricity charge 8
3 The current electricity charge 5
4 Current electricity 4
5 Working capacity 10
6 Power tune coefficient 15
7 It dishonours a cheque number 35
8 Accumulative average electric sales rate 9
9 Industry major class electricity growth rate 39
10 Accumulative electricity price growth rate 24
11 Annual daily load rate 11
12 The electricity charge returned money phase 34
13 Industry group electricity growth rate 38
Training data is trained and is optimized by random forest method, it is whether excellent with user to find out electricity consumption behavioural characteristic value Correspondence between matter, generation judge the whether good model of client.
Preferably, using following training process, implementation model gradually adjusts, from two dimensions of model stability and accuracy Carry out model validation analysis:
Full feature training:Sample chooses all 47.4 ten thousand families, and model enters ginseng for whole operational indicators;
Important feature is trained:Sample chooses all 47.4 ten thousand families, and it is high preceding 40% index of importance that model, which enters ginseng,;
Full characteristic crossover training:Whole sample means are split into 10 parts, select every time wherein 9 parts as training sample, Remaining 1 part is used as forecast sample, loop iteration 10 times, model to enter ginseng for whole operational indicators;
Important feature cross-training:Whole sample means are split into 10 parts, select every time wherein 9 parts as trained sample This, remaining 1 part is used as forecast sample, loop iteration 10 times, and it is high preceding 40% index of importance that model, which enters ginseng,.
Wherein, noise identification is carried out by the notable property coefficient p of analysis model input variable, noise variance will not be included in mould Type.
The present embodiment amounts to 47.4 ten thousand datas of collection and weeds out 3.94 ten thousand sample of users by data cleansing.Model Training process is total to apply 43.5 samples, wherein 10.06 ten thousand families are top-tier customer, 33.39 ten thousand families are non-prime client, high-quality With the ratio 0.3 to 1 of non-prime sample.
2, top-tier customer grade judgment models are built using logistic regression algorithm
The probability P and comprehensive score Y that user is top-tier customer, wherein probability P=1/ (1 are obtained using logistic regression algorithm + exp (- Y)) it is about mono- nonlinear function of comprehensive score Y.Comprehensive score Y is a continuous variable, different by being arranged Comprehensive score section provides numerical basis for the further subdivision high-quality grade of client.Whole top-tier customers are passed through into logistic regression Model carries out comprehensive score, and score value Y to form client's grade trend figure according to being ranked up from high to low, by top-tier customer according to Quartile method is divided, and is determined four grade top-tier customer scorings section (such as Fig. 3), is formed top-tier customer rating scale.With Logic Regression Models calculate the Y value of storage top-tier customer, judge the high-quality grade of the client by its Y value.
Top-tier customer identification model falls into 5 types all high voltage customers, is respectively:Non-prime client, level-one top-tier customer (grade is low), two level top-tier customer (grade is relatively low), three-level top-tier customer (higher ranked), level Four top-tier customer (grade is high).
In 47.4 ten thousand current training samples, probability P is divided into top-tier customer more than 0.5, and probability is less than or equal to 0.5 is divided into non-prime client, the category of model result rate of accuracy reached based on important feature to 99.1%.Probability P=1/ (1 + exp (- Y)) it is comprehensive score Y can be used as further subdivision client high-quality etc. about mono- nonlinear function of comprehensive score Y The numerical basis of grade.Score value Y to form client's grade trend figure according to being ranked up from high to low, by top-tier customer according to four points Position method is divided, and is determined four grade top-tier customer scorings section (such as Fig. 3), is formed top-tier customer rating scale.With logic Regression model calculates the Y value of storage top-tier customer, judges the high-quality grade of the client by its Y value.
The high-quality evaluation of single client:
Specific to single top-tier customer, the high-quality solution for differentiating result of sole user is carried out using logistic regression as auxiliary It releases.By the analysis to sample data, the model coefficient K of each index is obtained.And the size generation of the product Hi of K values and characteristic value Contribution degree of the table index in the reflection high-quality degree of client, influences client good principal element, i.e. user to analyze High-quality speciality.
(4) model iteration optimization
The permanent mechanism of modeler model edition upgrading optimization.Carry out model by supervising professional and judges that result is corrected, it is indefinite Phase carries out efficiency analysis to model judgement result and reaches model version by re -training model on the basis of analysis result The purpose of upgrading and optimization.
(5) modelling effect is assessed
With the data of expert estimation, test to accuracy rate, the recall rate of best model, assessment models effect.
(6) model application deployment
Trained model is integrated, user characteristic data is collected by data-interface, it is high-quality periodically to carry out client The judgement of grade.
Embodiment two
The purpose of the present embodiment is to provide a kind of computing device.
A kind of top-tier customer identification device based on random forest and logistic regression, including memory, processor and storage On a memory and the computer program that can run on a processor, the processor realize following step when executing described program Suddenly, including:
Step 1:Sample customer value feature is obtained, and carries out quality differentiation;
Step 2:Using sample customer data, mould is identified based on random forest and logistic regression algorithm structure top-tier customer Type;
Step 3:Using the value characteristic of client to be identified as input, it is based on the top-tier customer identification model, judges institute State whether client is top-tier customer.
Embodiment three
The purpose of the present embodiment is to provide a kind of computer readable storage medium.
A kind of computer readable storage medium, is stored thereon with computer program, which executes when being executed by processor Following steps:
Step 1:Sample customer value feature is obtained, and carries out quality differentiation;
Step 2:Using sample customer data, mould is identified based on random forest and logistic regression algorithm structure top-tier customer Type;
Step 3:Using the value characteristic of client to be identified as input, it is based on the top-tier customer identification model, judges institute State whether client is top-tier customer.
Each step involved in the device of above example two and three is corresponding with embodiment of the method one, specific implementation mode It can be found in the related description part of embodiment one.Term " computer readable storage medium " is construed as including one or more The single medium or multiple media of instruction set;Any medium is should also be understood as including, any medium can be stored, be compiled Code carries the instruction set for being executed by processor and processor is made to execute the either method in the present invention.
Beneficial effects of the present invention
1, the present invention is adopted by grid company client with electrical properties, electricity consumption behavior, with based on the mass datas such as electrical feature With the technological means of machine learning, the identification of top-tier customer is realized, providing good service to be directed to top-tier customer provides guarantor Barrier helps to promote power grid enterprises' competitiveness.
2, the present invention carries out the training of client's identification model in such a way that random forest and logistic regression are combined, described Identification model can judge the high-quality grade of client, high-quality visitor be furthermore achieved on the basis of identifying whether client is good The precise positioning at family.
3, the present invention establishes the permanent mechanism of the top-tier customer identification model upgrading optimization, based on supervising professional method to excellent The judging result of matter client's identification model aperiodically carries out efficiency analysis, and is based on analysis result, the high-quality visitor of re -training Family Statistical error model achievees the purpose that model version upgrading and optimization by re -training model.
It will be understood by those skilled in the art that each module or each step of aforementioned present invention can be filled with general computer It sets to realize, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, either they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.The present invention is not limited to any specific hardware and The combination of software.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (10)

1. a kind of top-tier customer recognition methods based on random forest and logistic regression, which is characterized in that include the following steps:
Step 1:Sample customer value feature is obtained, and carries out quality differentiation;
Step 2:Using sample customer data, top-tier customer identification model is built based on random forest and logistic regression algorithm;
Step 3:Using the value characteristic of client to be identified as input, it is based on the top-tier customer identification model, judges the visitor Whether family is top-tier customer.
2. a kind of top-tier customer recognition methods based on random forest and logistic regression as described in claim 1, feature exist In the step 1 includes:
Step 1.1:Customer value evaluating characteristic index system is built according to user's items power information of acquisition;
Step 1.2:According to the value characteristic of the index system statistical sample user, and carry out sample of users quality differentiation.
3. a kind of top-tier customer recognition methods based on random forest and logistic regression as claimed in claim 1 or 2, feature It is, value characteristic includes the corresponding essential attribute of user, economic value, Laden-Value, dynamogenetic value, letter in the step 1 It is worth data with value, industry.
4. a kind of top-tier customer recognition methods based on random forest and logistic regression as described in claim 1, feature exist In the step 2 includes:
Step 2.1:Sample of users data are pre-processed;
Step 2.2:Top-tier customer judgment models are trained based on random forest method;
Step 2.3:Top-tier customer grade judgment models are built using logistic regression algorithm;
Step 2.4:Top-tier customer identification model is obtained in conjunction with top-tier customer judgment models and top-tier customer grade judgment models.
5. a kind of top-tier customer recognition methods based on random forest and logistic regression as claimed in claim 4, feature exist In the step 2.1 includes:Data cleansing, characteristic factor quantization, feature expansion, feature selecting and outlier processing.
6. a kind of top-tier customer recognition methods based on random forest and logistic regression as claimed in claim 4, feature exist In the step 2.2 includes:
Full feature training:Sample chooses whole sample of users data, and model enters ginseng for whole operational indicators;
Important feature is trained:Sample chooses whole sample of users data, and it is high preceding 40% index of importance that model, which enters ginseng,;
Full characteristic crossover training:Mix the sample with user data and averagely split into 10 parts, select every time wherein 9 parts as training sample, Remaining 1 part is used as forecast sample, loop iteration 10 times, model to enter ginseng for whole operational indicators;
Important feature cross-training:Mix the sample with user data and averagely split into 10 parts, select every time wherein 9 parts as trained sample This, remaining 1 part is used as forecast sample, loop iteration 10 times, and it is high preceding 40% index of importance that model, which enters ginseng,.
7. a kind of top-tier customer recognition methods based on random forest and logistic regression as claimed in claim 4, feature exist In the step 2.3 includes:The top-tier customer that top-tier customer judgment models obtain is carried out synthesis by Logic Regression Models to comment Point;Multiple comprehensive score sections are set, top-tier customer grade judgment models are obtained.
8. a kind of top-tier customer recognition methods based on random forest and logistic regression as described in claim 1, feature exist In the method further includes:Trained model is integrated, user characteristic data is collected by data-interface, is periodically opened Open up the judgement of the high-quality grade of client.
9. a kind of top-tier customer identification device based on random forest and logistic regression, including memory, processor and it is stored in On memory and the computer program that can run on a processor, which is characterized in that the processor executes real when described program Now such as claim 1-8 any one of them methods.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor A kind of such as top-tier customer identification side based on random forest and logistic regression of claim 1-8 any one of them is executed when execution Method.
CN201810028009.1A 2018-01-11 2018-01-11 Top-tier customer recognition methods based on random forest and logistic regression and device Pending CN108389069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810028009.1A CN108389069A (en) 2018-01-11 2018-01-11 Top-tier customer recognition methods based on random forest and logistic regression and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810028009.1A CN108389069A (en) 2018-01-11 2018-01-11 Top-tier customer recognition methods based on random forest and logistic regression and device

Publications (1)

Publication Number Publication Date
CN108389069A true CN108389069A (en) 2018-08-10

Family

ID=63076097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810028009.1A Pending CN108389069A (en) 2018-01-11 2018-01-11 Top-tier customer recognition methods based on random forest and logistic regression and device

Country Status (1)

Country Link
CN (1) CN108389069A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242323A (en) * 2018-09-18 2019-01-18 深圳市元征科技股份有限公司 A kind of Automobile Service Factory's methods of marking and relevant apparatus
CN109255629A (en) * 2018-08-22 2019-01-22 阳光财产保险股份有限公司 A kind of customer grouping method and device, electronic equipment, readable storage medium storing program for executing
CN109344906A (en) * 2018-10-24 2019-02-15 中国平安人寿保险股份有限公司 Consumer's risk classification method, device, medium and equipment based on machine learning
CN109559146A (en) * 2018-09-25 2019-04-02 国家电网有限公司客户服务中心 Electricity customer service center accesses data center's optimization method based on the provinces and cities of logistic model prediction potential user's quantity
CN109754157A (en) * 2018-11-30 2019-05-14 畅捷通信息技术股份有限公司 A kind of methods of marking and system for reflecting enterprise's health management, financing and increasing letter
CN110033307A (en) * 2019-01-04 2019-07-19 国网浙江省电力有限公司电力科学研究院 A kind of electric power top-tier customer screening technique based on machine learning model
CN110059749A (en) * 2019-04-19 2019-07-26 成都四方伟业软件股份有限公司 Screening technique, device and the electronic equipment of important feature
CN112001570A (en) * 2020-09-29 2020-11-27 北京嘀嘀无限科技发展有限公司 Data processing method and device, electronic equipment and readable storage medium
CN113762619A (en) * 2021-09-08 2021-12-07 国家电网有限公司 Power distribution internet of things user load identification method, system, equipment and storage medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255629A (en) * 2018-08-22 2019-01-22 阳光财产保险股份有限公司 A kind of customer grouping method and device, electronic equipment, readable storage medium storing program for executing
CN109242323A (en) * 2018-09-18 2019-01-18 深圳市元征科技股份有限公司 A kind of Automobile Service Factory's methods of marking and relevant apparatus
CN109559146A (en) * 2018-09-25 2019-04-02 国家电网有限公司客户服务中心 Electricity customer service center accesses data center's optimization method based on the provinces and cities of logistic model prediction potential user's quantity
CN109559146B (en) * 2018-09-25 2022-11-04 国家电网有限公司客户服务中心 Provincial and municipal access data center optimization method for predicting number of potential users by electric power customer service center based on logistic model
CN109344906A (en) * 2018-10-24 2019-02-15 中国平安人寿保险股份有限公司 Consumer's risk classification method, device, medium and equipment based on machine learning
CN109754157A (en) * 2018-11-30 2019-05-14 畅捷通信息技术股份有限公司 A kind of methods of marking and system for reflecting enterprise's health management, financing and increasing letter
CN110033307A (en) * 2019-01-04 2019-07-19 国网浙江省电力有限公司电力科学研究院 A kind of electric power top-tier customer screening technique based on machine learning model
CN110059749A (en) * 2019-04-19 2019-07-26 成都四方伟业软件股份有限公司 Screening technique, device and the electronic equipment of important feature
CN110059749B (en) * 2019-04-19 2020-05-19 成都四方伟业软件股份有限公司 Method and device for screening important features and electronic equipment
CN112001570A (en) * 2020-09-29 2020-11-27 北京嘀嘀无限科技发展有限公司 Data processing method and device, electronic equipment and readable storage medium
CN113762619A (en) * 2021-09-08 2021-12-07 国家电网有限公司 Power distribution internet of things user load identification method, system, equipment and storage medium
CN113762619B (en) * 2021-09-08 2023-07-28 国家电网有限公司 Distribution Internet of things user load identification method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108389069A (en) Top-tier customer recognition methods based on random forest and logistic regression and device
CN108364191A (en) Top-tier customer Optimum Identification Method and device based on random forest and logistic regression
CN109063945B (en) Value evaluation system-based 360-degree customer portrait construction method for electricity selling company
CN108388955A (en) Customer service strategies formulating method, device based on random forest and logistic regression
CN108388974A (en) Top-tier customer Optimum Identification Method and device based on random forest and decision tree
CN108280541A (en) Customer service strategies formulating method, device based on random forest and decision tree
WO2020159575A1 (en) Method and system of dynamic model selection for time series forecasting
CN108256691A (en) Refund Probabilistic Prediction Model construction method and device
CN108154311A (en) Top-tier customer recognition methods and device based on random forest and decision tree
CN106294882A (en) Data digging method and device
CN110119948A (en) Based on when variable weight dynamic combined power consumer credit assessment method and system
CN109740036A (en) OTA platform hotel's sort method and device
CN114819530A (en) Demand side flexible resource adjustable potential prediction method and system
CN105359172A (en) Calculating a probability of a business being delinquent
CN110689437A (en) Communication construction project financial risk prediction method based on random forest
CN112258067A (en) Low-voltage user payment behavior classification method based on Gaussian mixture model clustering algorithm
CN116187808A (en) Electric power package recommendation method based on virtual power plant user-package label portrait
CN107844874A (en) Enterprise operation problem analysis system and its method
CN113450141A (en) Intelligent prediction method and device based on electricity selling quantity characteristics of large-power customer groups
CN112529712A (en) Modeling method and system for user operation analysis RFM
CN113592140A (en) Electric charge payment prediction model training system and electric charge payment prediction model
CN112529628A (en) Client label generation method and device, computer equipment and storage medium
CN115953166B (en) Customer information management method and system based on big data intelligent matching
CN113988969A (en) Collaborative filtering recommendation method based on RFM model
CN115689755A (en) Intelligent stock selection method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180810