CN108989096A - A kind of broadband user's attrition prediction method and system - Google Patents
A kind of broadband user's attrition prediction method and system Download PDFInfo
- Publication number
- CN108989096A CN108989096A CN201810691994.4A CN201810691994A CN108989096A CN 108989096 A CN108989096 A CN 108989096A CN 201810691994 A CN201810691994 A CN 201810691994A CN 108989096 A CN108989096 A CN 108989096A
- Authority
- CN
- China
- Prior art keywords
- user
- data
- historical
- prediction
- test sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
Abstract
The embodiment of the present invention discloses a kind of broadband user's attrition prediction method and system, is related to field of communication technology, can be handled by the Unbalanced distribution to historical use data, improves customer churn prediction model to the accuracy rate and recall rate of future customer attrition prediction.This method comprises: obtaining historical use data, historical use data includes historical user's behavioral data and historical user's status data;Training sample and test sample are chosen in historical use data, training sample is learnt according to pre-defined algorithm, generate customer churn prediction model;Historical user's behavioral data of test sample is inputted into customer churn prediction model, obtains the User Status data of prediction;The test sample of prediction error if it exists then regenerates customer churn prediction model;The historical user's behavioral data obtained in predetermined amount of time is inputted to the customer churn prediction model regenerated, obtains future customer status data.The embodiment of the present invention is applied to network system.
Description
Technical field
The embodiment of the present invention is related to the communications field more particularly to a kind of broadband user's attrition prediction method and system.
Background technique
Broadband services is as critical position indispensable in the competition of common carrier full-service, with swashing for broadband services
The losing issue of strong competition, broadband user is got worse.How preferably to find in time and keep the broadband user that will be lost
Become operator urgently to solve the problems, such as.Broadband user's churn analysis is mainly by the historical use data being lost in the past
It is analyzed, excavates the feature that may cause customer churn, take appropriate measures in time, reduce the generation of customer churn.
This cuts operating costs to enterprise, and improving business performance has particularly important meaning.
Sample data in customer churn prediction project includes two parts content: customer churn sample data and user's non-streaming
Lose sample data.Conceptual data sample proportion shared by customer churn sample data be it is very low, only account under normal conditions 2% a left side
It is right.Obviously, so low customer churn sample data accounting is very unfavorable for the foundation of customer churn prediction model, existing
Have in technology to solve customer churn sample data and the non-imbalance problem for being lost sample data of user, usually uses and owe to adopt
The methods of sample and over-sampling, lack sampling and over-sampling are all stochastical sampling, user's non-streaming of a small amount of most classes of lack sampling selection
The customer churn sample data for losing sample data and minority class constitutes new training sample, if but the customer churn sample of minority class
When the quantity of notebook data is very little, in this way, although making training sample integrally more balanced;But number of training is too
The missing that will cause user information less, allowing certain user to be lost feature cannot show well, lead to poor fitting problem.It crosses
Sampling is on the contrary, be to do stochastical sampling in a small number of customer churn sample datas, the customer churn sample data of class is lacked in expansion
Quantity since the promotion of customer churn sample data volume causes the calculation amount of training process to increase, caused in this way
Fitting problems.Either poor fitting problem or overfitting problem can all reduce customer churn prediction model and be lost to future customer
The accuracy rate and recall rate of prediction.
Summary of the invention
The embodiment of the present invention provides a kind of broadband user's attrition prediction method and system, can be by historical user's number
According to Unbalanced distribution processing so that customer churn prediction model adequately learn to be lost user feature, improve user flow
Prediction model is lost to the accuracy rate and recall rate of future customer attrition prediction.
In a first aspect, providing a kind of broadband user's attrition prediction method, comprising: obtain historical use data, historical user
Data include historical user's behavioral data and historical user's status data;Wherein, historical user's behavioral data includes that user uses
Flow duration and complaint data, historical user's status data includes customer churn status data and the non-loss status number of user
According to wherein historical user's status data and first the second predetermined amount of time after a predetermined period of time in the first predetermined amount of time
Interior historical user's behavioral data is corresponding;Training sample is chosen in historical use data, and training sample is calculated according to predetermined
Method is learnt, and customer churn prediction model is generated;Test sample is chosen in historical use data, by the history of test sample
User behavior data inputs customer churn prediction model, obtains the User Status data of prediction;If the User Status data of prediction
There are inconsistent with historical user's status data of test sample;Historical user's status data then in test sample is whole
When for non-attrition status, weight in training sample is added in the test sample of the first predetermined ratio in the test sample of prediction error
Newly-generated customer churn prediction model;If the User Status data of prediction and historical user's status data of test sample exist not
It is consistent;Then historical user's status data in test sample, will be in the test sample of prediction error there are when attrition status
The test sample of the second predetermined ratio and the test sample of the third predetermined ratio in the correct test sample of prediction be added
Customer churn prediction model is regenerated in training sample;The historical user's behavioral data obtained in predetermined amount of time is inputted into weight
Newly-generated customer churn prediction model obtains future customer status data.
In above-mentioned broadband user's attrition prediction method, firstly, obtaining historical use data, historical use data includes history
User behavior data and historical user's status data;Wherein historical user's status data in the first predetermined amount of time and first pre-
Historical user's behavioral data in the second predetermined amount of time after section of fixing time is corresponding;Then it is obtained in historical use data
Training sample is taken, study generation, customer churn prediction model are carried out to training sample according to pre-defined algorithm;In addition, being used in history
Test sample is chosen in user data, historical user's behavioral data of test sample is inputted into customer churn prediction model, is obtained pre-
The User Status data of survey.Historical user's status data of User Status data and test sample is compared, according to correlation data
The historical user's behavioral data obtained in predetermined amount of time is inputted to the customer churn prediction model regenerated to prediction model,
Obtain future customer status data.The embodiment of the present invention provides a kind of broadband user's attrition prediction method, by using history
The Unbalanced distribution of user data is handled, so that customer churn prediction model adequately learns to the feature for being lost user, is improved and is used
Accuracy rate and recall rate of the family attrition prediction model to future customer attrition prediction.
Optionally, after acquisition historical use data further include: pre-processed to historical use data.
Optionally, carrying out pretreatment to historical use data includes: to carry out at least following items to historical use data
One or more operations: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, and missing values are filled out
It fills.
Optionally, the corresponding customer churn prediction of the historical use data of at least two periods before obtaining current time
Model, the wherein at least two period, the length of at least two periods was different using current time as end caps;To at least
Two customer churn prediction models are assessed, and according to the assessment result of at least two customer churn prediction models, obtain target
Customer churn prediction model.
Optionally, assessment result includes below one or more: accuracy rate, recall rate and F1 score;It is wherein accurate
Rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users;Recall rate are as follows: prediction is accurate be lost number of users with
The practical percentage for being lost number of users;F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+recall rate).
Second aspect provides a kind of broadband user's attrition prediction system, comprising:
Module is obtained, for obtaining historical use data, historical use data includes historical user's behavioral data and history
User Status data;Wherein, historical user's behavioral data includes user using flow duration and complains data, historical user's shape
State data include customer churn status data and the non-loss status data of user, wherein the historical user in the first predetermined amount of time
Status data is corresponding with historical user's behavioral data in first the second predetermined amount of time after a predetermined period of time.
Training module, for choosing training sample according in the historical use data for obtaining module acquisition, to training sample
This is learnt according to pre-defined algorithm, generates customer churn prediction model.
Test module, for choosing test sample according in the historical use data for obtaining module acquisition, by test specimens
This historical user's behavioral data inputs customer churn prediction model, obtains the User Status data of prediction.
Processing module, if there are different for historical user's status data of User Status data and test sample for predicting
It causes;When all non-attrition status of historical user's status data then in test sample, by the test sample of prediction error
In the first predetermined ratio test sample be added training sample in regenerate customer churn prediction model.
Processing module, if historical user's status data of the User Status data and test sample that are also used to predict exists not
It is consistent;Then historical user's status data in test sample, will be in the test sample of prediction error there are when attrition status
The test sample of the second predetermined ratio and the test sample of the third predetermined ratio in the correct test sample of prediction be added
Customer churn prediction model is regenerated in training sample.
Prediction module, for will be inputted again in predetermined amount of time according to the historical user's behavioral data for obtaining module acquisition
The customer churn prediction model of generation obtains future customer status data.
Optionally, processing module is also used to pre-process the historical use data for obtaining module acquisition.
Optionally, processing module, for being carried out at least following items to the historical use data for obtaining module acquisition
One or more operations: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, the filling of missing values.
Optionally, training module, for generating at least two periods obtained before current time according to module is obtained
The corresponding customer churn prediction model of historical use data, the wherein at least two period is using current time as end
The length of point, at least two periods is different;Evaluation module, for at least two customer churns generated according to training module
Prediction model is assessed;According to the assessment result of at least two customer churn prediction models, target user's attrition prediction is obtained
Model.
Optionally, assessment result includes below one or more: accuracy rate, recall rate and F1 score;It is wherein accurate
Rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users;Recall rate are as follows: prediction is accurate be lost number of users with
The practical percentage for being lost number of users;F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+recall rate).
It is to be appreciated that a kind of broadband user's attrition prediction system of above-mentioned offer is for executing presented above first
The corresponding method of aspect, therefore, attainable beneficial effect can refer to the method and in detail below of first aspect above
The beneficial effect of corresponding scheme in embodiment, details are not described herein again.
Detailed description of the invention
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow diagram for broadband user's attrition prediction method that the embodiment of the present invention provides;
Fig. 2 is a kind of acquisition historical use data that the embodiment of the present invention provides and pre-processes rank to historical use data
The flow diagram of section;
Fig. 3 is a kind of flow diagram for historical use data Unbalanced distribution processing that the embodiment of the present invention provides;
Fig. 4 is a kind of Optimizing Flow schematic diagram for customer churn prediction model that the embodiment of the present invention provides;
Fig. 5 is a kind of Optimizing Flow signal for illustrative customer churn prediction model that the embodiment of the present invention provides
Figure;
Fig. 6 is a kind of structural schematic diagram for broadband user's attrition prediction system that the embodiment of the present invention provides.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
As shared by ratio and the non-streaming appraxia user data sample for being lost user data sample shared by user data sample
The ratio difference of user data sample is very big, in reality, to the loss user data and non-streaming appraxia in user data sample
The processing method of the Unbalanced distribution problem of user data is generally the method for sampling, makes it by carrying out processing to user data sample
Become the user data sample data set of balance from unbalanced user data sample data set, this method is in most of situation
It is lower to bring promotion to final result.The method of sampling is broadly divided into over-sampling (up-sampling Oversampling) and lack sampling
(down-sampling Undersampling), up-sampling are that handle minority's class from loss user data sample randomly selects more parts of duplication, under
Sampling be from non-streaming appraxia user data sample it is generic in reject some samples at random, in other words only from generic middle selected part
Sample.Stochastical sampling biggest advantage is simple, but disadvantage is also apparent from.User data sample data after up-sampling concentrates meeting
Occur some samples repeatedly, trains the prediction model come and have certain over-fitting;And the user data sample after down-sampling
It is lost portion of user data sample, prediction model has only acquired a part of user data sample aggregated model.Therefore, this two
Kind method can reduce accuracy rate and recall rate that prediction model predicts customer churn.Such as Fig. 1, the embodiment of the present invention provides one
Kind broadband user's attrition prediction method, comprising:
101, historical use data is obtained, historical use data includes historical user's behavioral data and historical user's status number
According to;Wherein, historical user's behavioral data includes user using flow duration and complains data, and historical user's status data includes
Customer churn status data and the non-loss status data of user.Wherein historical user's status data in the first predetermined amount of time with
Historical user's behavioral data in first the second predetermined amount of time after a predetermined period of time is corresponding.
In addition, after obtaining historical use data further include: pre-processed to historical use data;It is detailed, to going through
History user data is pre-processed, and is specifically included: carrying out one or more behaviour at least following items to historical use data
Make: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, the filling of missing values.
Illustratively, historical user's behavioral data can also include but is not limited to following data: the network entry time of user,
AAA (certification Authentication, the abbreviation for authorizing Authorization and charging Accounting), customer relation management
The data such as system CRM, report barrier.
102, training sample is chosen in historical use data, training sample is learnt according to pre-defined algorithm, is generated
Customer churn prediction model.
103, test sample is chosen in historical use data, and historical user's behavioral data of test sample is inputted into user
Attrition prediction model obtains the User Status data of prediction.
If there are inconsistent for historical user's status data of the User Status data and test sample 104, predicted;Then exist
It is when all non-attrition status of historical user's status data in test sample, first in the test sample of prediction error is pre-
The test sample of certainty ratio is added in training sample and regenerates customer churn prediction model.
If there are inconsistent for historical user's status data of the User Status data and test sample 105, predicted;Then exist
Second in the test sample of prediction error is made a reservation for ratio there are when attrition status by historical user's status data in test sample
Weight in training sample is added in the test sample of example and the test sample of the third predetermined ratio in the correct test sample of prediction
Newly-generated customer churn prediction model.
106, the customer churn that the historical user's behavioral data obtained in predetermined amount of time input regenerates is predicted into mould
Type obtains future customer status data.
In order to better understand, referring to Fig. 2, step 101 is described in detail, wherein step 201 is to obtain history to use
User data stage, step 202~206 are the pretreatment stages to historical use data.It is specific as follows:
201, historical use data, including historical user's behavioral data and historical user's status data are obtained.
Understand the stage for data source systems and the confirmation for needing to obtain historical use data, data preparation based on business
The top priority in stage is the acquisition that complete for historical use data in source data system.The historical use data of acquisition will
It is finally loaded into one to be known as in the tables of data of " the wide table of user ", is based on the wide table of user, pretreatment work will be carried out.
202, the attribute variable of historical user's behavioral data in historical use data is reconstructed.
In the construction process of the attribute variable of historical user's behavioral data, not only from existing local service, data
The direct historical user's behavioral data attribute variable obtained in the system of source, while further including according to existing direct historical user's behavior
The derivative variable of data attribute structure's variable.These, which derive variable, can be divided into the derivative change of trend type according to the difference of make
The derivative variable of amount, mean type and the derivative variable of Boolean type.By constructing derivative variable, enrich pre- for constructing customer churn
The input variable quantity for surveying model improves the accuracy rate of customer churn prediction model prediction.
203, data audit is carried out to historical use data, wherein historical use data includes the history of reconfiguration attribute variable
User behavior data.
The historical use data obtained from local service, data source systems, due to from not homologous ray, historical user's number
According to there are problems that a large amount of spelling, input error, illegal value, null value, inconsistent value, write a Chinese character in simplified form, a variety of expressions of same entity
(repetition) does not follow referential integrity and other issues, in order to intuitively find the quality problems of these historical use datas, benefit
The quality control for magnanimity historical use data is realized with data audit function.The data audit of historical use data is for number
The data sample emphasis that value type data are audited with nonumeric type data is different.
204, data classification is carried out to the historical use data after data audit.
Carrying out data classification to the historical use data after data audit is explored based on model, and basic ideas are history
User data subdivision will by historical use data being subdivided into different types to historical use data progress preliminary analysis
Historical use data is divided into different groups, and historical use data feature in analysis group according to the value of certain determinant attributes
Significance degree and group between difference degree.This is the heuristic method that discovery is lost user characteristics, by the exploration of this primary,
It can assist the general direction of discovery customer churn prediction model feature.
205, the attribute variable of historical user's behavioral data in the sorted historical use data of data is about subtracted.
And the category of historical user's behavioral data in the historical use data after all data classifications in the wide table of non-user
Property variable is involved in modeling process, and the attribute variable of excessive historical user's behavioral data often results in customer churn prediction model
The inaccuracy of foundation.This step realizes the attribute variable of pact to(for) nonumeric type historical user behavioral data based on correlation rule
Subtract, about subtracting for numeric type historical user behavioral data attribute variable is realized based on correlation coefficient process.
206, the missing values of historical use data are filled, wherein historical use data includes after attribute variable about subtracts
Historical user's behavioral data, obtain pretreated historical use data.
There are many missing values for the historical use data obtained from each local service, data source systems, due to missing values
Presence will lead to historical use data and can not participate in modeling very well, influence modeling accuracy.Missing values treatment process is based on average
The methods of value, correlation rule, which are realized, handles the filling of missing values in historical use data.More than, step 202-206 is not
The sequencing that it is executed is limited, i.e., in the preprocessing process to historical use data shown in step 202-206, only
One or more operations in the following items to historical use data: the weight of attribute variable can be completed according to certain sequence
Structure, data audits, data classification, attribute variable about subtract, the filling of missing values.
Referring to Fig. 3, step 102~105 are described in detail by Fig. 3.
301, home window sample area (training sample) and the outer sample area (test sample) of home window are established.
Training sample is obtained in historical use data and establishes home window sample area, is obtained test sample and is established initial window
The outer sample area of mouth.For example, can establish home window sample area as training sample in historical use data 70%, 30% makees
Sample area outside home window is established for test sample.
302, study is originally carried out according to pre-defined algorithm to home window sample area and generates customer churn prediction model.
Illustratively, pre-defined algorithm includes but is not limited to decision tree, random forest, support vector machines, naive Bayesian etc.
Algorithm.
303, it is generated using historical user's behavioral data test of sample area outside home window based on home window sample area
Customer churn prediction model tested, the User Status data predicted.
304, whether the User Status data of comparison prediction and historical user's status data of test sample are consistent.If one
It causes, process terminates;If inconsistent, step 305 is carried out;
305, judge in historical user's status data in test sample with the presence or absence of customer churn status data.
If it does not exist, step 306 is carried out;If it exists, step 307 is carried out.
306, the test sample of the first predetermined ratio in the test sample of prediction error is added in training sample again
Generate customer churn prediction model;Jump to step 303.
Illustratively, the first predetermined ratio can be 50% of the User Status data of prediction error in test sample.
307, by the test sample of the second predetermined ratio in the test sample of prediction error and prediction correctly test
The test sample of third predetermined ratio in sample is added in the training sample and regenerates customer churn prediction model;It jumps
To step 303.
Illustratively, the second predetermined ratio can be 50% of the User Status data of prediction error in test sample, the
Three predetermined ratios can be to predict the 50% of correct User Status data in test sample.
In addition, being the optimized flow chart of customer churn prediction model referring to shown in Fig. 4, it is shown that specific step is as follows:
401, the corresponding customer churn of the historical use data of at least two periods predicts mould before generating current time
Type, the wherein at least two period, the length of at least two periods was different using current time as end caps.
402, at least two customer churn prediction models are assessed.
403, according to the assessment result of at least two customer churn prediction models, target user's attrition prediction model is obtained.
Detailed, assessment result includes below one or more: accuracy rate, recall rate and F1 score;It is wherein accurate
Rate (precision), which is that prediction is accurate, to be lost number of users and predicts the percentage for being lost number of users;Recall rate (recall) is pre-
Survey the accurate percentage for being lost number of users and being lost number of users with reality;F1 score (F1score) are as follows: 2* accuracy rate * recall rate/
(accuracy rate+recall rate), i.e.,
It is detailed, during customer churn prediction, as time goes by, need periodically to be predicted, due to user's row
To change, needs to be updated optimization to customer churn prediction model, especially encounter winter and summer vacation, International Labour Day, National Day
Equal festivals or holidays, user behavior change greatly, and establish model using festivals or holidays data, adaptability is poor, in order to make festivals or holidays data
Influence to model is reduced to minimum, is illustratively said to the Optimizing Flow of customer churn prediction model referring to Figure 5
Bright, wherein adjacent two vertical lines indicate a period in Fig. 5.It is assumed that predetermined period of customer churn is that (predetermined period can for the moon
Think that week or day, principle are identical), then setting period here indicates one month, model foundation and more new logic is as follows:
501, customer churn prediction model is established for the first time.It at least needs to obtain continuous four months user data recently to establish
First customer churn prediction model (is chosen minimum time Duan Weisi months) here.Wherein historical use data includes trimestral
Historical user's behavioral data and one month historical user's status data, it should be noted that trimestral historical user's behavior
Data and one month historical user's status data are at corresponding relationship;In addition, trimestral historical user's behavioral data is at one
Before historical user's status data of the moon.
For example, the historical use data in available April to July establishes user if User Status data at the beginning of prediction August
Attrition prediction model.Specific step is as follows:
Firstly, the historical use data in April to July is obtained, it is right including historical user's behavioral data in April to June
Answer historical user's status data in July.
Then, customer churn prediction model is established according to the historical use data in April to July, wherein according to historical user
Data are established the first customer churn prediction model and are described in detail in figure 2 and figure 3, and detailed process is referring to Fig. 2 and figure
3, details are not described herein again.
Finally, obtaining historical user's behavioral data in May to July, and it is entered into customer churn prediction model pre-
Survey the User Status data of August.
502, customer churn prediction model is updated second.It obtains continuous four months historical use datas recently and establishes the
Two customer churn prediction models;Wherein historical use data includes that three months history user behavior datas and one month history are used
Family status data.It should be noted that trimestral historical user's behavioral data and one month historical user's status data at
Corresponding relationship, historical user's behavioral data of the other three moon is before one month historical user's status data.It obtains again most
Nearly continuous five months historical use datas establish third customer churn prediction model, and wherein historical use data includes four months
Historical user's behavioral data and one month historical user's status data.It should be noted that historical user's behavior in four months
Data and one month historical user's status data are at corresponding relationship;In addition, four months historical user's behavioral datas are at one
Before historical user's status data of the moon.Second user attrition prediction model and third customer churn prediction model are commented
Estimate, according to the assessment result of second user attrition prediction model and third customer churn prediction model, obtains target user and be lost
Prediction model.To fourth user attrition prediction model, the 5th customer churn prediction model and the 6th customer churn prediction model
It is assessed, mould is predicted according to fourth user attrition prediction model, the 5th customer churn prediction model and the 6th customer churn
The assessment result of type obtains target user's attrition prediction model.
For example, the User Status data at the beginning of prediction September, the historical use data in available April to August establishes user's stream
Lose prediction model.Specific step is as follows:
Firstly, the historical use data for obtaining May to August establishes second user attrition prediction model.April is obtained again to 8
The historical use data of the moon establishes third customer churn prediction model.It wherein establishes second user attrition prediction model and third is used
The step of family attrition prediction model, is identical as the first customer churn prediction model, and detailed process is referring to Fig. 5, and details are not described herein again.
Then, second user attrition prediction model and third customer churn prediction model are assessed, is used according to second
The assessment result of family attrition prediction model and third customer churn prediction model obtains target user's attrition prediction model.
Finally, if second user attrition prediction forecast result of model is better than third customer churn prediction model, by second
Customer churn prediction model obtains historical user's behavioral data in June to August as target user's attrition prediction model, will
It is input to the User Status data that September is predicted in customer churn prediction model;If third customer churn prediction model prediction effect
Fruit is better than second user attrition prediction model, then using third customer churn prediction model as target user's attrition prediction model,
And historical user's behavioral data in May to August is obtained, the user at the beginning of predicting September is entered into customer churn prediction model
Status data.
It should be noted that assessment result specifically include it is below one or more: accuracy rate, recall rate and F1 point
Number.Wherein, accuracy rate (precision) is the accurate percentage for being lost number of users and being lost number of users with prediction of prediction;Recall rate
(recall) number of users and the practical percentage for being lost number of users are lost for prediction is accurate;F1 score (F1score) are as follows:Wherein, F1 score (F1 score) is to be used to measure the one of two disaggregated model accuracy in statistics
Kind index, it has combined the accuracy rate and recall rate of customer churn prediction model, and it is accurate that F1 score can be regarded as model
A kind of weighted average of rate and recall rate.If such as the accuracy rate of second user attrition prediction model is higher, third customer churn
The recall rate of prediction model is higher.If desired the higher target user's attrition prediction model of accuracy rate, then select second user stream
Prediction model is lost as target user's attrition prediction model;If desired the higher target user's attrition prediction model of recall rate, then
Select third customer churn prediction model as target user's attrition prediction model.
503, third time updates customer churn prediction model, obtains continuous four months historical use datas recently and establishes the
Four customer churn prediction models;Wherein historical use data includes that three months history user behavior datas and one month history are used
Family status data.It should be noted that trimestral historical user's behavioral data and one month historical user's status data at
Corresponding relationship;In addition, trimestral historical user's behavioral data is before one month historical user's status data.It obtains again most
Nearly continuous five months historical use datas establish the 5th customer churn prediction model, and wherein historical use data includes four months
Historical user's behavioral data and one month historical user's status data.It should be noted that historical user's behavior in four months
Data and one month historical user's status data are at corresponding relationship;In addition, four months historical user's behavioral datas are at one
Before historical user's status data of the moon.Obtaining nearest continuous six months historical use datas again, to establish the 6th customer churn pre-
Model is surveyed, wherein historical use data includes historical user's status data of five months history user behavior datas and one month.
It should be noted that five months historical user's behavioral datas and one month historical user's status data are at corresponding relationship;Separately
Outside, five months historical user's behavioral datas are before one month historical user's status data.
For example, the User Status data at the beginning of prediction 10 months, the historical use data in available April to September establish user
Attrition prediction model.Specific step is as follows:
Firstly, the historical use data for obtaining June to September establishes fourth user attrition prediction model;May is obtained again to 9
The historical use data of the moon establishes the 5th customer churn prediction model;The historical use data for obtaining April to September again establishes the 6th
Customer churn prediction model.Wherein establish fourth user attrition prediction model, the 5th customer churn prediction model and the 6th use
The step of family attrition prediction model, is identical as the first customer churn prediction model, and details are not described herein again.
Then, fourth user attrition prediction model, the 5th customer churn prediction model and the 6th customer churn are predicted
Model is assessed, and determines target user's attrition prediction model according to assessment result.Specific use of the process referring at the beginning of above-mentioned September
Family attrition prediction process, is no longer described in detail here.
Customer churn prediction model more new logic and so on, do not remake excessive description herein.
In above-mentioned broadband user's attrition prediction method, firstly, obtaining historical use data, historical use data includes history
User behavior data and historical user's status data;Wherein historical user's status data in the first predetermined amount of time and first pre-
Historical user's behavioral data in the second predetermined amount of time after section of fixing time is corresponding;Then it is obtained in historical use data
Training sample is taken, study generation, customer churn prediction model are carried out to training sample according to pre-defined algorithm;In addition, being used in history
Test sample is chosen in user data, historical user's behavioral data of test sample is inputted into customer churn prediction model, is obtained pre-
The User Status data of survey.Historical user's status data of User Status data and test sample is compared, according to correlation data
The historical user's behavioral data obtained in predetermined amount of time is inputted to the customer churn prediction model regenerated to prediction model,
And detailed exemplary illustration is carried out for the above method, which is not described herein again.The application passes through to historical use data
Unbalanced distribution processing so that customer churn prediction model adequately learn to be lost user feature, raising customer churn
Accuracy rate and recall rate of the prediction model to future customer attrition prediction.
Such as Fig. 6, the embodiment of the present invention provides a kind of broadband user's attrition prediction system 60, comprising:
Module 601 is obtained, for obtaining historical use data, historical use data includes historical user's behavioral data and goes through
History User Status data;Wherein, historical user's behavioral data includes user using flow duration and complains data, historical user
Status data includes customer churn status data and the non-loss status data of user, wherein the history in the first predetermined amount of time is used
Family status data is corresponding with historical user's behavioral data in first the second predetermined amount of time after a predetermined period of time.
Training module 602, it is right for obtaining training sample according in the historical use data for obtaining the acquisition of module 601
Training sample carries out study according to pre-defined algorithm and generates customer churn prediction model.
Test module 603, for choosing test sample in historical use data according to acquisition module 601, by test specimens
This historical user's behavioral data inputs customer churn prediction model, obtains the User Status data of prediction.
Processing module 604, if existing for the User Status data of prediction and historical user's status data of test sample
Inconsistent;When all non-attrition status of historical user's status data then in test sample, by the test of prediction error
The test sample of the first predetermined ratio in sample is added in training sample and regenerates customer churn prediction model.
Processing module 604, if historical user's status data of the User Status data and test sample that are also used to predict is deposited
Inconsistent;Then historical user's status data in test sample is there are when attrition status, by the test specimens of prediction error
The test sample of the second predetermined ratio in this and the test sample of the third predetermined ratio in the correct test sample of prediction
It is added in training sample and regenerates customer churn prediction model.
Prediction module 605, historical user's behavioral data for that will be obtained in predetermined amount of time according to acquisition module 601 are defeated
Enter the customer churn prediction model regenerated, obtains future customer status data.
In a kind of illustrative scheme, processing module 604 is also used to use the history that data acquisition module 601 obtains
User data is pre-processed.
In a kind of illustrative scheme, processing module 604, for the historical use data for obtaining the acquisition of module 601
Carry out one or more operations at least following items: reconstruct, the data audit, data classification, attribute variable of attribute variable
About subtract, the filling of missing values.
In a kind of illustrative scheme, training module 602, for generating current time before according to acquisition module 601
The corresponding customer churn prediction model of historical use data of at least two periods obtained, wherein at least two period are equal
Using current time as end caps, the length of at least two periods is different;Evaluation module 606, for according to training module
The 602 at least two customer churn prediction models generated are assessed;According to the assessment of at least two customer churn prediction models
As a result, obtaining target user's attrition prediction model.
In a kind of illustrative scheme, assessment result includes below one or more: accuracy rate, recall rate and F1
Score;Wherein accuracy rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users;Recall rate are as follows: prediction is quasi-
Really it is lost number of users and the practical percentage for being lost number of users;F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+it recalls
Rate).
Wherein, the technical effect of above method embodiment is related to content and realization can directly quote system embodiment
In description in corresponding functional module, specifically repeat no more.
The step of method in conjunction with described in the disclosure of invention or algorithm can realize in a manner of hardware, can also
It is realized in a manner of being to execute software instruction by processor.Such as: above-mentioned processing module can be realized by processor, be obtained
Module can by transceiver or other just have signal receiving function circuit realize.The embodiment of the present invention also provides a kind of storage
Medium, the storage medium may include memory, for being stored as computer software used in broadband user's attrition prediction system
Instruction, it includes execute program code designed by paging method.Specifically, software instruction can be by corresponding software module group
At software module can be stored on random access memory (Random Access Memory, RAM), flash memory, read-only storage
Device (Read Only Memory, ROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable ROM,
EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), register, hard disk, movement are hard
In the storage medium of disk, CD-ROM (CD-ROM) or any other form well known in the art.A kind of illustrative storage
Medium couples to enable a processor to from the read information, and can be written to the storage medium and believe to processor
Breath.Certainly, storage medium is also possible to the component part of processor.
The embodiment of the present invention also provides a kind of computer program, which can be loaded directly into memory, and
Containing software code, which is loaded into via computer and can be realized above-mentioned broadband user's attrition prediction after executing
Method.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention
It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions
Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Computer-readable medium includes computer storage media and communication media, and wherein communication media includes convenient for from a place to another
Any medium of one place transmission computer program.Storage medium can be general or specialized computer can access it is any
Usable medium.
More than, only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and it is any to be familiar with
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those skilled in the art, should all cover
Within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (10)
1. a kind of broadband user's attrition prediction method characterized by comprising
Historical use data is obtained, the historical use data includes historical user's behavioral data and historical user's status data;
Wherein, historical user's behavioral data includes user using flow duration and complains data, historical user's status number
According to including customer churn status data and the non-loss status data of user, wherein the historical user in the first predetermined amount of time
Status data is opposite with historical user's behavioral data in described first the second predetermined amount of time after a predetermined period of time
It answers;
Training sample is chosen in the historical use data, the training sample is learnt according to pre-defined algorithm, is generated
Customer churn prediction model;
Test sample is chosen in the historical use data, described in historical user's behavioral data input by the test sample
Customer churn prediction model obtains the User Status data of prediction;
If there are inconsistent for historical user's status data of the User Status data of the prediction and the test sample;Then exist
When all non-attrition status of historical user's status data in the test sample, by the test sample of prediction error
The test sample of one predetermined ratio is added in the training sample and regenerates customer churn prediction model;
If there are inconsistent for historical user's status data of the User Status data of the prediction and the test sample;Then exist
Historical user's status data in the test sample is pre- by second in the test sample of prediction error there are when attrition status
The training is added in the test sample of certainty ratio and the test sample of the third predetermined ratio in the correct test sample of prediction
Customer churn prediction model is regenerated in sample;
The customer churn prediction model that will be regenerated described in the historical user's behavioral data obtained in predetermined amount of time input, is obtained
Obtain future customer status data.
2. broadband user's attrition prediction method according to claim 1, which is characterized in that acquisition historical user's number
According to later further include: pre-processed to the historical use data.
3. broadband user's attrition prediction method according to claim 2, which is characterized in that described to historical user's number
According to being pre-processed, comprising:
One or more operations at least following items: the reconstruct of attribute variable, data are carried out to the historical use data
Audit, data classification, attribute variable about subtract, the fillings of missing values.
4. broadband user's attrition prediction method according to claim 1, which is characterized in that the method also includes:
Generate the corresponding customer churn prediction model of historical use data of at least two periods before current time, wherein institute
Stated at least two periods using the current time as end caps, the length of at least two period is different;
At least two customer churn prediction models are assessed, according to the assessment knot of at least two customer churn prediction models
Fruit obtains target user's attrition prediction model.
5. broadband user's attrition prediction method according to claim 4, which is characterized in that the assessment result includes following
It is one or more: accuracy rate, recall rate and F1 score;Wherein
The accuracy rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users;
The recall rate are as follows: prediction is accurate to be lost number of users and the practical percentage for being lost number of users;
The F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+recall rate).
6. a kind of broadband user's attrition prediction system characterized by comprising
Module is obtained, for obtaining historical use data, the historical use data includes historical user's behavioral data and history
User Status data;Wherein, historical user's behavioral data includes user using flow duration and complains data, described to go through
History User Status data include customer churn status data and the non-loss status data of user, wherein in the first predetermined amount of time
The historical user in historical user's status data and described first the second predetermined amount of time after a predetermined period of time
Behavioral data is corresponding;
Training module, for choosing training sample in the historical use data obtained according to the acquisition module, to institute
It states training sample to be learnt according to pre-defined algorithm, generates customer churn prediction model;
Test module, for choosing test sample in the historical use data obtained according to the acquisition module, by institute
The historical user's behavioral data for stating test sample inputs the customer churn prediction model, obtains the User Status data of prediction;
Processing module, if existing for the User Status data of the prediction and historical user's status data of the test sample
Inconsistent;When all non-attrition status of historical user's status data then in the test sample, by prediction error
The test sample of the first predetermined ratio in test sample is added in the training sample and regenerates customer churn prediction model;
The processing module, if being also used to the User Status data of the prediction and historical user's status number of the test sample
According to there are inconsistent;Then historical user's status data in the test sample is there are when attrition status, by prediction error
Test sample in the second predetermined ratio test sample and third predetermined ratio in the correct test sample of prediction
Test sample is added in the training sample and regenerates customer churn prediction model;
Prediction module, for will be obtained described in historical user's behavioral data input that module obtains in predetermined amount of time according to described
The customer churn prediction model regenerated obtains future customer status data.
7. broadband user's attrition prediction system according to claim 6, which is characterized in that
The processing module is also used to pre-process the historical use data that the data acquisition module obtains.
8. broadband user's attrition prediction system according to claim 7, which is characterized in that
The processing module, the historical use data for obtaining to the acquisition module carry out at least following items
One or more operations: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, the filling of missing values.
9. broadband user's attrition prediction system according to claim 6, which is characterized in that
The training module, for going through according at least two periods of the acquisition module acquisition before generating current time
The corresponding customer churn prediction model of history user data, wherein at least two period is to terminate with the current time
The length of endpoint, at least two period is different;Evaluation module, for according to the training module generate described in extremely
Few two customer churn prediction models are assessed;According to the assessment result of at least two customer churns prediction model, obtain
Obtain target user's attrition prediction model.
10. broadband user's attrition prediction system according to claim 9, which is characterized in that the assessment result include with
Under it is one or more: accuracy rate, recall rate and F1 score;Wherein,
The accuracy rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users;
The recall rate are as follows: prediction is accurate to be lost number of users and the practical percentage for being lost number of users;
The F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+recall rate).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810691994.4A CN108989096A (en) | 2018-06-28 | 2018-06-28 | A kind of broadband user's attrition prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810691994.4A CN108989096A (en) | 2018-06-28 | 2018-06-28 | A kind of broadband user's attrition prediction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108989096A true CN108989096A (en) | 2018-12-11 |
Family
ID=64538804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810691994.4A Pending CN108989096A (en) | 2018-06-28 | 2018-06-28 | A kind of broadband user's attrition prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108989096A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110570948A (en) * | 2019-09-09 | 2019-12-13 | 深圳市伊欧乐科技有限公司 | User future weight prediction method, device, server and storage medium |
CN110930192A (en) * | 2019-11-22 | 2020-03-27 | 携程旅游信息技术(上海)有限公司 | User loss prediction method, system, device and storage medium |
CN112085528A (en) * | 2020-09-08 | 2020-12-15 | 北京深演智能科技股份有限公司 | Data processing method and device |
CN112749721A (en) * | 2019-10-31 | 2021-05-04 | 彩虹无线(北京)新技术有限公司 | Driving risk evaluation model training method and device |
CN113066479A (en) * | 2019-12-12 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Method and device for evaluating model |
CN113641912A (en) * | 2021-08-20 | 2021-11-12 | 北京得间科技有限公司 | Information pushing method, computing device and computer storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170220933A1 (en) * | 2016-01-28 | 2017-08-03 | Facebook, Inc. | Systems and methods for churn prediction |
CN107067033A (en) * | 2017-04-12 | 2017-08-18 | 邹霞 | The local route repair method of machine learning model |
US20180018684A1 (en) * | 2016-07-13 | 2018-01-18 | Urban Airship, Inc. | Churn prediction with machine learning |
CN107832581A (en) * | 2017-12-15 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Trend prediction method and device |
CN108039977A (en) * | 2017-12-21 | 2018-05-15 | 广州市申迪计算机系统有限公司 | A kind of telecommunication user attrition prediction method and device based on user's internet behavior |
CN108171280A (en) * | 2018-01-31 | 2018-06-15 | 国信优易数据有限公司 | A kind of grader construction method and the method for prediction classification |
-
2018
- 2018-06-28 CN CN201810691994.4A patent/CN108989096A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170220933A1 (en) * | 2016-01-28 | 2017-08-03 | Facebook, Inc. | Systems and methods for churn prediction |
US20180018684A1 (en) * | 2016-07-13 | 2018-01-18 | Urban Airship, Inc. | Churn prediction with machine learning |
CN107067033A (en) * | 2017-04-12 | 2017-08-18 | 邹霞 | The local route repair method of machine learning model |
CN107832581A (en) * | 2017-12-15 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Trend prediction method and device |
CN108039977A (en) * | 2017-12-21 | 2018-05-15 | 广州市申迪计算机系统有限公司 | A kind of telecommunication user attrition prediction method and device based on user's internet behavior |
CN108171280A (en) * | 2018-01-31 | 2018-06-15 | 国信优易数据有限公司 | A kind of grader construction method and the method for prediction classification |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110570948A (en) * | 2019-09-09 | 2019-12-13 | 深圳市伊欧乐科技有限公司 | User future weight prediction method, device, server and storage medium |
CN112749721A (en) * | 2019-10-31 | 2021-05-04 | 彩虹无线(北京)新技术有限公司 | Driving risk evaluation model training method and device |
CN110930192A (en) * | 2019-11-22 | 2020-03-27 | 携程旅游信息技术(上海)有限公司 | User loss prediction method, system, device and storage medium |
CN113066479A (en) * | 2019-12-12 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Method and device for evaluating model |
CN112085528A (en) * | 2020-09-08 | 2020-12-15 | 北京深演智能科技股份有限公司 | Data processing method and device |
CN113641912A (en) * | 2021-08-20 | 2021-11-12 | 北京得间科技有限公司 | Information pushing method, computing device and computer storage medium |
CN113641912B (en) * | 2021-08-20 | 2024-02-09 | 北京得间科技有限公司 | Information pushing method, computing device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108989096A (en) | A kind of broadband user's attrition prediction method and system | |
Saliu et al. | Supporting software release planning decisions for evolving systems | |
Afshar et al. | Nondominated archiving multicolony ant algorithm in time–cost trade-off optimization | |
CN106095942B (en) | Strong variable extracting method and device | |
CN110349000A (en) | Method, apparatus and electronic equipment are determined based on the volume strategy that mentions of tenant group | |
US20090259521A1 (en) | Method of Identifying Innovations Possessing Business Disrupting Properties | |
CN114219360A (en) | Monitoring safety prediction method and system based on model optimization | |
CN110415103A (en) | The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable disturbance degree index | |
Allmendinger et al. | Navigation in multiobjective optimization methods | |
Urquhart et al. | Optimisation and illumination of a real-world workforce scheduling and routing application (WSRP) via Map-Elites | |
CN110349007A (en) | The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index | |
Mouhib et al. | TSMAA‐TRI: A temporal multi‐criteria sorting approach under uncertainty | |
Malairajan et al. | CLING: heuristic to solve integrated resource allocation and routing problem with time window | |
Demiral et al. | Economic complexity–carbonization nexus in the European Union: A heterogeneous panel data analysis | |
CN114757448B (en) | Manufacturing inter-link optimal value chain construction method based on data space model | |
Parveen et al. | G-EMME/2: Automatic calibration tool of the EMME/2 transit assignment using genetic algorithms | |
CN112419025A (en) | User data processing method and device, storage medium and electronic equipment | |
Borissova et al. | Multi-Attribute Decision-Making Model for Ranking of Web Development Frameworks | |
CN110070208B (en) | Railway daily passenger capacity acquisition method based on data correction | |
Liu et al. | Expected value-based method to determine the importance of engineering characteristics in QFD with uncertainty theory | |
Dawid et al. | Holdup and the evolution of bargaining conventions | |
Calvino et al. | Artificial intelligence, complementary assets and productivity: evidence from French firms | |
Maheshwari et al. | Selection of Accounting Software Tools for Small Businesses: Analytical Hierarchy Process Approach | |
CN114900556B (en) | Cloud interconnection method and system based on service preference learning in multi-cloud heterogeneous environment | |
US20240144274A9 (en) | Transaction-risk evaluation by resource-limited devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |
|
RJ01 | Rejection of invention patent application after publication |