CN108989096A

CN108989096A - A kind of broadband user's attrition prediction method and system

Info

Publication number: CN108989096A
Application number: CN201810691994.4A
Authority: CN
Inventors: 王俊锁; 王纯波; 任虎; 张祖国
Original assignee: Yaxin Technology (chengdu) Co Ltd
Current assignee: Yaxin Technology (chengdu) Co Ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2018-12-11

Abstract

The embodiment of the present invention discloses a kind of broadband user's attrition prediction method and system, is related to field of communication technology, can be handled by the Unbalanced distribution to historical use data, improves customer churn prediction model to the accuracy rate and recall rate of future customer attrition prediction.This method comprises: obtaining historical use data, historical use data includes historical user's behavioral data and historical user's status data；Training sample and test sample are chosen in historical use data, training sample is learnt according to pre-defined algorithm, generate customer churn prediction model；Historical user's behavioral data of test sample is inputted into customer churn prediction model, obtains the User Status data of prediction；The test sample of prediction error if it exists then regenerates customer churn prediction model；The historical user's behavioral data obtained in predetermined amount of time is inputted to the customer churn prediction model regenerated, obtains future customer status data.The embodiment of the present invention is applied to network system.

Description

A kind of broadband user's attrition prediction method and system

Technical field

The embodiment of the present invention is related to the communications field more particularly to a kind of broadband user's attrition prediction method and system.

Background technique

Broadband services is as critical position indispensable in the competition of common carrier full-service, with swashing for broadband services The losing issue of strong competition, broadband user is got worse.How preferably to find in time and keep the broadband user that will be lost Become operator urgently to solve the problems, such as.Broadband user's churn analysis is mainly by the historical use data being lost in the past It is analyzed, excavates the feature that may cause customer churn, take appropriate measures in time, reduce the generation of customer churn. This cuts operating costs to enterprise, and improving business performance has particularly important meaning.

Sample data in customer churn prediction project includes two parts content: customer churn sample data and user's non-streaming Lose sample data.Conceptual data sample proportion shared by customer churn sample data be it is very low, only account under normal conditions 2% a left side It is right.Obviously, so low customer churn sample data accounting is very unfavorable for the foundation of customer churn prediction model, existing Have in technology to solve customer churn sample data and the non-imbalance problem for being lost sample data of user, usually uses and owe to adopt The methods of sample and over-sampling, lack sampling and over-sampling are all stochastical sampling, user's non-streaming of a small amount of most classes of lack sampling selection The customer churn sample data for losing sample data and minority class constitutes new training sample, if but the customer churn sample of minority class When the quantity of notebook data is very little, in this way, although making training sample integrally more balanced；But number of training is too The missing that will cause user information less, allowing certain user to be lost feature cannot show well, lead to poor fitting problem.It crosses Sampling is on the contrary, be to do stochastical sampling in a small number of customer churn sample datas, the customer churn sample data of class is lacked in expansion Quantity since the promotion of customer churn sample data volume causes the calculation amount of training process to increase, caused in this way Fitting problems.Either poor fitting problem or overfitting problem can all reduce customer churn prediction model and be lost to future customer The accuracy rate and recall rate of prediction.

Summary of the invention

The embodiment of the present invention provides a kind of broadband user's attrition prediction method and system, can be by historical user's number According to Unbalanced distribution processing so that customer churn prediction model adequately learn to be lost user feature, improve user flow Prediction model is lost to the accuracy rate and recall rate of future customer attrition prediction.

In a first aspect, providing a kind of broadband user's attrition prediction method, comprising: obtain historical use data, historical user Data include historical user's behavioral data and historical user's status data；Wherein, historical user's behavioral data includes that user uses Flow duration and complaint data, historical user's status data includes customer churn status data and the non-loss status number of user According to wherein historical user's status data and first the second predetermined amount of time after a predetermined period of time in the first predetermined amount of time Interior historical user's behavioral data is corresponding；Training sample is chosen in historical use data, and training sample is calculated according to predetermined Method is learnt, and customer churn prediction model is generated；Test sample is chosen in historical use data, by the history of test sample User behavior data inputs customer churn prediction model, obtains the User Status data of prediction；If the User Status data of prediction There are inconsistent with historical user's status data of test sample；Historical user's status data then in test sample is whole When for non-attrition status, weight in training sample is added in the test sample of the first predetermined ratio in the test sample of prediction error Newly-generated customer churn prediction model；If the User Status data of prediction and historical user's status data of test sample exist not It is consistent；Then historical user's status data in test sample, will be in the test sample of prediction error there are when attrition status The test sample of the second predetermined ratio and the test sample of the third predetermined ratio in the correct test sample of prediction be added Customer churn prediction model is regenerated in training sample；The historical user's behavioral data obtained in predetermined amount of time is inputted into weight Newly-generated customer churn prediction model obtains future customer status data.

In above-mentioned broadband user's attrition prediction method, firstly, obtaining historical use data, historical use data includes history User behavior data and historical user's status data；Wherein historical user's status data in the first predetermined amount of time and first pre- Historical user's behavioral data in the second predetermined amount of time after section of fixing time is corresponding；Then it is obtained in historical use data Training sample is taken, study generation, customer churn prediction model are carried out to training sample according to pre-defined algorithm；In addition, being used in history Test sample is chosen in user data, historical user's behavioral data of test sample is inputted into customer churn prediction model, is obtained pre- The User Status data of survey.Historical user's status data of User Status data and test sample is compared, according to correlation data The historical user's behavioral data obtained in predetermined amount of time is inputted to the customer churn prediction model regenerated to prediction model, Obtain future customer status data.The embodiment of the present invention provides a kind of broadband user's attrition prediction method, by using history The Unbalanced distribution of user data is handled, so that customer churn prediction model adequately learns to the feature for being lost user, is improved and is used Accuracy rate and recall rate of the family attrition prediction model to future customer attrition prediction.

Optionally, after acquisition historical use data further include: pre-processed to historical use data.

Optionally, carrying out pretreatment to historical use data includes: to carry out at least following items to historical use data One or more operations: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, and missing values are filled out It fills.

Optionally, the corresponding customer churn prediction of the historical use data of at least two periods before obtaining current time Model, the wherein at least two period, the length of at least two periods was different using current time as end caps；To at least Two customer churn prediction models are assessed, and according to the assessment result of at least two customer churn prediction models, obtain target Customer churn prediction model.

Optionally, assessment result includes below one or more: accuracy rate, recall rate and F1 score；It is wherein accurate Rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users；Recall rate are as follows: prediction is accurate be lost number of users with The practical percentage for being lost number of users；F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+recall rate).

Second aspect provides a kind of broadband user's attrition prediction system, comprising:

Module is obtained, for obtaining historical use data, historical use data includes historical user's behavioral data and history User Status data；Wherein, historical user's behavioral data includes user using flow duration and complains data, historical user's shape State data include customer churn status data and the non-loss status data of user, wherein the historical user in the first predetermined amount of time Status data is corresponding with historical user's behavioral data in first the second predetermined amount of time after a predetermined period of time.

Training module, for choosing training sample according in the historical use data for obtaining module acquisition, to training sample This is learnt according to pre-defined algorithm, generates customer churn prediction model.

Test module, for choosing test sample according in the historical use data for obtaining module acquisition, by test specimens This historical user's behavioral data inputs customer churn prediction model, obtains the User Status data of prediction.

Processing module, if there are different for historical user's status data of User Status data and test sample for predicting It causes；When all non-attrition status of historical user's status data then in test sample, by the test sample of prediction error In the first predetermined ratio test sample be added training sample in regenerate customer churn prediction model.

Processing module, if historical user's status data of the User Status data and test sample that are also used to predict exists not It is consistent；Then historical user's status data in test sample, will be in the test sample of prediction error there are when attrition status The test sample of the second predetermined ratio and the test sample of the third predetermined ratio in the correct test sample of prediction be added Customer churn prediction model is regenerated in training sample.

Prediction module, for will be inputted again in predetermined amount of time according to the historical user's behavioral data for obtaining module acquisition The customer churn prediction model of generation obtains future customer status data.

Optionally, processing module is also used to pre-process the historical use data for obtaining module acquisition.

Optionally, processing module, for being carried out at least following items to the historical use data for obtaining module acquisition One or more operations: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, the filling of missing values.

Optionally, training module, for generating at least two periods obtained before current time according to module is obtained The corresponding customer churn prediction model of historical use data, the wherein at least two period is using current time as end The length of point, at least two periods is different；Evaluation module, for at least two customer churns generated according to training module Prediction model is assessed；According to the assessment result of at least two customer churn prediction models, target user's attrition prediction is obtained Model.

It is to be appreciated that a kind of broadband user's attrition prediction system of above-mentioned offer is for executing presented above first The corresponding method of aspect, therefore, attainable beneficial effect can refer to the method and in detail below of first aspect above The beneficial effect of corresponding scheme in embodiment, details are not described herein again.

Detailed description of the invention

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Fig. 1 is a kind of flow diagram for broadband user's attrition prediction method that the embodiment of the present invention provides；

Fig. 2 is a kind of acquisition historical use data that the embodiment of the present invention provides and pre-processes rank to historical use data The flow diagram of section；

Fig. 3 is a kind of flow diagram for historical use data Unbalanced distribution processing that the embodiment of the present invention provides；

Fig. 4 is a kind of Optimizing Flow schematic diagram for customer churn prediction model that the embodiment of the present invention provides；

Fig. 5 is a kind of Optimizing Flow signal for illustrative customer churn prediction model that the embodiment of the present invention provides Figure；

Fig. 6 is a kind of structural schematic diagram for broadband user's attrition prediction system that the embodiment of the present invention provides.

Specific embodiment

As shared by ratio and the non-streaming appraxia user data sample for being lost user data sample shared by user data sample The ratio difference of user data sample is very big, in reality, to the loss user data and non-streaming appraxia in user data sample The processing method of the Unbalanced distribution problem of user data is generally the method for sampling, makes it by carrying out processing to user data sample Become the user data sample data set of balance from unbalanced user data sample data set, this method is in most of situation It is lower to bring promotion to final result.The method of sampling is broadly divided into over-sampling (up-sampling Oversampling) and lack sampling (down-sampling Undersampling), up-sampling are that handle minority's class from loss user data sample randomly selects more parts of duplication, under Sampling be from non-streaming appraxia user data sample it is generic in reject some samples at random, in other words only from generic middle selected part Sample.Stochastical sampling biggest advantage is simple, but disadvantage is also apparent from.User data sample data after up-sampling concentrates meeting Occur some samples repeatedly, trains the prediction model come and have certain over-fitting；And the user data sample after down-sampling It is lost portion of user data sample, prediction model has only acquired a part of user data sample aggregated model.Therefore, this two Kind method can reduce accuracy rate and recall rate that prediction model predicts customer churn.Such as Fig. 1, the embodiment of the present invention provides one Kind broadband user's attrition prediction method, comprising:

101, historical use data is obtained, historical use data includes historical user's behavioral data and historical user's status number According to；Wherein, historical user's behavioral data includes user using flow duration and complains data, and historical user's status data includes Customer churn status data and the non-loss status data of user.Wherein historical user's status data in the first predetermined amount of time with Historical user's behavioral data in first the second predetermined amount of time after a predetermined period of time is corresponding.

In addition, after obtaining historical use data further include: pre-processed to historical use data；It is detailed, to going through History user data is pre-processed, and is specifically included: carrying out one or more behaviour at least following items to historical use data Make: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, the filling of missing values.

Illustratively, historical user's behavioral data can also include but is not limited to following data: the network entry time of user, AAA (certification Authentication, the abbreviation for authorizing Authorization and charging Accounting), customer relation management The data such as system CRM, report barrier.

102, training sample is chosen in historical use data, training sample is learnt according to pre-defined algorithm, is generated Customer churn prediction model.

103, test sample is chosen in historical use data, and historical user's behavioral data of test sample is inputted into user Attrition prediction model obtains the User Status data of prediction.

If there are inconsistent for historical user's status data of the User Status data and test sample 104, predicted；Then exist It is when all non-attrition status of historical user's status data in test sample, first in the test sample of prediction error is pre- The test sample of certainty ratio is added in training sample and regenerates customer churn prediction model.

If there are inconsistent for historical user's status data of the User Status data and test sample 105, predicted；Then exist Second in the test sample of prediction error is made a reservation for ratio there are when attrition status by historical user's status data in test sample Weight in training sample is added in the test sample of example and the test sample of the third predetermined ratio in the correct test sample of prediction Newly-generated customer churn prediction model.

106, the customer churn that the historical user's behavioral data obtained in predetermined amount of time input regenerates is predicted into mould Type obtains future customer status data.

In order to better understand, referring to Fig. 2, step 101 is described in detail, wherein step 201 is to obtain history to use User data stage, step 202~206 are the pretreatment stages to historical use data.It is specific as follows:

201, historical use data, including historical user's behavioral data and historical user's status data are obtained.

Understand the stage for data source systems and the confirmation for needing to obtain historical use data, data preparation based on business The top priority in stage is the acquisition that complete for historical use data in source data system.The historical use data of acquisition will It is finally loaded into one to be known as in the tables of data of " the wide table of user ", is based on the wide table of user, pretreatment work will be carried out.

202, the attribute variable of historical user's behavioral data in historical use data is reconstructed.

In the construction process of the attribute variable of historical user's behavioral data, not only from existing local service, data The direct historical user's behavioral data attribute variable obtained in the system of source, while further including according to existing direct historical user's behavior The derivative variable of data attribute structure's variable.These, which derive variable, can be divided into the derivative change of trend type according to the difference of make The derivative variable of amount, mean type and the derivative variable of Boolean type.By constructing derivative variable, enrich pre- for constructing customer churn The input variable quantity for surveying model improves the accuracy rate of customer churn prediction model prediction.

203, data audit is carried out to historical use data, wherein historical use data includes the history of reconfiguration attribute variable User behavior data.

The historical use data obtained from local service, data source systems, due to from not homologous ray, historical user's number According to there are problems that a large amount of spelling, input error, illegal value, null value, inconsistent value, write a Chinese character in simplified form, a variety of expressions of same entity (repetition) does not follow referential integrity and other issues, in order to intuitively find the quality problems of these historical use datas, benefit The quality control for magnanimity historical use data is realized with data audit function.The data audit of historical use data is for number The data sample emphasis that value type data are audited with nonumeric type data is different.

204, data classification is carried out to the historical use data after data audit.

Carrying out data classification to the historical use data after data audit is explored based on model, and basic ideas are history User data subdivision will by historical use data being subdivided into different types to historical use data progress preliminary analysis Historical use data is divided into different groups, and historical use data feature in analysis group according to the value of certain determinant attributes Significance degree and group between difference degree.This is the heuristic method that discovery is lost user characteristics, by the exploration of this primary, It can assist the general direction of discovery customer churn prediction model feature.

205, the attribute variable of historical user's behavioral data in the sorted historical use data of data is about subtracted.

And the category of historical user's behavioral data in the historical use data after all data classifications in the wide table of non-user Property variable is involved in modeling process, and the attribute variable of excessive historical user's behavioral data often results in customer churn prediction model The inaccuracy of foundation.This step realizes the attribute variable of pact to(for) nonumeric type historical user behavioral data based on correlation rule Subtract, about subtracting for numeric type historical user behavioral data attribute variable is realized based on correlation coefficient process.

206, the missing values of historical use data are filled, wherein historical use data includes after attribute variable about subtracts Historical user's behavioral data, obtain pretreated historical use data.

There are many missing values for the historical use data obtained from each local service, data source systems, due to missing values Presence will lead to historical use data and can not participate in modeling very well, influence modeling accuracy.Missing values treatment process is based on average The methods of value, correlation rule, which are realized, handles the filling of missing values in historical use data.More than, step 202-206 is not The sequencing that it is executed is limited, i.e., in the preprocessing process to historical use data shown in step 202-206, only One or more operations in the following items to historical use data: the weight of attribute variable can be completed according to certain sequence Structure, data audits, data classification, attribute variable about subtract, the filling of missing values.

Referring to Fig. 3, step 102~105 are described in detail by Fig. 3.

301, home window sample area (training sample) and the outer sample area (test sample) of home window are established.

Training sample is obtained in historical use data and establishes home window sample area, is obtained test sample and is established initial window The outer sample area of mouth.For example, can establish home window sample area as training sample in historical use data 70%, 30% makees Sample area outside home window is established for test sample.

302, study is originally carried out according to pre-defined algorithm to home window sample area and generates customer churn prediction model.

Illustratively, pre-defined algorithm includes but is not limited to decision tree, random forest, support vector machines, naive Bayesian etc. Algorithm.

303, it is generated using historical user's behavioral data test of sample area outside home window based on home window sample area Customer churn prediction model tested, the User Status data predicted.

304, whether the User Status data of comparison prediction and historical user's status data of test sample are consistent.If one It causes, process terminates；If inconsistent, step 305 is carried out；

305, judge in historical user's status data in test sample with the presence or absence of customer churn status data.

If it does not exist, step 306 is carried out；If it exists, step 307 is carried out.

306, the test sample of the first predetermined ratio in the test sample of prediction error is added in training sample again Generate customer churn prediction model；Jump to step 303.

Illustratively, the first predetermined ratio can be 50% of the User Status data of prediction error in test sample.

307, by the test sample of the second predetermined ratio in the test sample of prediction error and prediction correctly test The test sample of third predetermined ratio in sample is added in the training sample and regenerates customer churn prediction model；It jumps To step 303.

Illustratively, the second predetermined ratio can be 50% of the User Status data of prediction error in test sample, the Three predetermined ratios can be to predict the 50% of correct User Status data in test sample.

In addition, being the optimized flow chart of customer churn prediction model referring to shown in Fig. 4, it is shown that specific step is as follows:

401, the corresponding customer churn of the historical use data of at least two periods predicts mould before generating current time Type, the wherein at least two period, the length of at least two periods was different using current time as end caps.

402, at least two customer churn prediction models are assessed.

403, according to the assessment result of at least two customer churn prediction models, target user's attrition prediction model is obtained.

Detailed, assessment result includes below one or more: accuracy rate, recall rate and F1 score；It is wherein accurate Rate (precision), which is that prediction is accurate, to be lost number of users and predicts the percentage for being lost number of users；Recall rate (recall) is pre- Survey the accurate percentage for being lost number of users and being lost number of users with reality；F1 score (F1score) are as follows: 2* accuracy rate * recall rate/ (accuracy rate+recall rate), i.e.,

It is detailed, during customer churn prediction, as time goes by, need periodically to be predicted, due to user's row To change, needs to be updated optimization to customer churn prediction model, especially encounter winter and summer vacation, International Labour Day, National Day Equal festivals or holidays, user behavior change greatly, and establish model using festivals or holidays data, adaptability is poor, in order to make festivals or holidays data Influence to model is reduced to minimum, is illustratively said to the Optimizing Flow of customer churn prediction model referring to Figure 5 Bright, wherein adjacent two vertical lines indicate a period in Fig. 5.It is assumed that predetermined period of customer churn is that (predetermined period can for the moon Think that week or day, principle are identical), then setting period here indicates one month, model foundation and more new logic is as follows:

501, customer churn prediction model is established for the first time.It at least needs to obtain continuous four months user data recently to establish First customer churn prediction model (is chosen minimum time Duan Weisi months) here.Wherein historical use data includes trimestral Historical user's behavioral data and one month historical user's status data, it should be noted that trimestral historical user's behavior Data and one month historical user's status data are at corresponding relationship；In addition, trimestral historical user's behavioral data is at one Before historical user's status data of the moon.

For example, the historical use data in available April to July establishes user if User Status data at the beginning of prediction August Attrition prediction model.Specific step is as follows:

Firstly, the historical use data in April to July is obtained, it is right including historical user's behavioral data in April to June Answer historical user's status data in July.

Then, customer churn prediction model is established according to the historical use data in April to July, wherein according to historical user Data are established the first customer churn prediction model and are described in detail in figure 2 and figure 3, and detailed process is referring to Fig. 2 and figure 3, details are not described herein again.

Finally, obtaining historical user's behavioral data in May to July, and it is entered into customer churn prediction model pre- Survey the User Status data of August.

502, customer churn prediction model is updated second.It obtains continuous four months historical use datas recently and establishes the Two customer churn prediction models；Wherein historical use data includes that three months history user behavior datas and one month history are used Family status data.It should be noted that trimestral historical user's behavioral data and one month historical user's status data at Corresponding relationship, historical user's behavioral data of the other three moon is before one month historical user's status data.It obtains again most Nearly continuous five months historical use datas establish third customer churn prediction model, and wherein historical use data includes four months Historical user's behavioral data and one month historical user's status data.It should be noted that historical user's behavior in four months Data and one month historical user's status data are at corresponding relationship；In addition, four months historical user's behavioral datas are at one Before historical user's status data of the moon.Second user attrition prediction model and third customer churn prediction model are commented Estimate, according to the assessment result of second user attrition prediction model and third customer churn prediction model, obtains target user and be lost Prediction model.To fourth user attrition prediction model, the 5th customer churn prediction model and the 6th customer churn prediction model It is assessed, mould is predicted according to fourth user attrition prediction model, the 5th customer churn prediction model and the 6th customer churn The assessment result of type obtains target user's attrition prediction model.

For example, the User Status data at the beginning of prediction September, the historical use data in available April to August establishes user's stream Lose prediction model.Specific step is as follows:

Firstly, the historical use data for obtaining May to August establishes second user attrition prediction model.April is obtained again to 8 The historical use data of the moon establishes third customer churn prediction model.It wherein establishes second user attrition prediction model and third is used The step of family attrition prediction model, is identical as the first customer churn prediction model, and detailed process is referring to Fig. 5, and details are not described herein again.

Then, second user attrition prediction model and third customer churn prediction model are assessed, is used according to second The assessment result of family attrition prediction model and third customer churn prediction model obtains target user's attrition prediction model.

Finally, if second user attrition prediction forecast result of model is better than third customer churn prediction model, by second Customer churn prediction model obtains historical user's behavioral data in June to August as target user's attrition prediction model, will It is input to the User Status data that September is predicted in customer churn prediction model；If third customer churn prediction model prediction effect Fruit is better than second user attrition prediction model, then using third customer churn prediction model as target user's attrition prediction model, And historical user's behavioral data in May to August is obtained, the user at the beginning of predicting September is entered into customer churn prediction model Status data.

It should be noted that assessment result specifically include it is below one or more: accuracy rate, recall rate and F1 point Number.Wherein, accuracy rate (precision) is the accurate percentage for being lost number of users and being lost number of users with prediction of prediction；Recall rate (recall) number of users and the practical percentage for being lost number of users are lost for prediction is accurate；F1 score (F1score) are as follows:Wherein, F1 score (F1 score) is to be used to measure the one of two disaggregated model accuracy in statistics Kind index, it has combined the accuracy rate and recall rate of customer churn prediction model, and it is accurate that F1 score can be regarded as model A kind of weighted average of rate and recall rate.If such as the accuracy rate of second user attrition prediction model is higher, third customer churn The recall rate of prediction model is higher.If desired the higher target user's attrition prediction model of accuracy rate, then select second user stream Prediction model is lost as target user's attrition prediction model；If desired the higher target user's attrition prediction model of recall rate, then Select third customer churn prediction model as target user's attrition prediction model.

503, third time updates customer churn prediction model, obtains continuous four months historical use datas recently and establishes the Four customer churn prediction models；Wherein historical use data includes that three months history user behavior datas and one month history are used Family status data.It should be noted that trimestral historical user's behavioral data and one month historical user's status data at Corresponding relationship；In addition, trimestral historical user's behavioral data is before one month historical user's status data.It obtains again most Nearly continuous five months historical use datas establish the 5th customer churn prediction model, and wherein historical use data includes four months Historical user's behavioral data and one month historical user's status data.It should be noted that historical user's behavior in four months Data and one month historical user's status data are at corresponding relationship；In addition, four months historical user's behavioral datas are at one Before historical user's status data of the moon.Obtaining nearest continuous six months historical use datas again, to establish the 6th customer churn pre- Model is surveyed, wherein historical use data includes historical user's status data of five months history user behavior datas and one month. It should be noted that five months historical user's behavioral datas and one month historical user's status data are at corresponding relationship；Separately Outside, five months historical user's behavioral datas are before one month historical user's status data.

For example, the User Status data at the beginning of prediction 10 months, the historical use data in available April to September establish user Attrition prediction model.Specific step is as follows:

Firstly, the historical use data for obtaining June to September establishes fourth user attrition prediction model；May is obtained again to 9 The historical use data of the moon establishes the 5th customer churn prediction model；The historical use data for obtaining April to September again establishes the 6th Customer churn prediction model.Wherein establish fourth user attrition prediction model, the 5th customer churn prediction model and the 6th use The step of family attrition prediction model, is identical as the first customer churn prediction model, and details are not described herein again.

Then, fourth user attrition prediction model, the 5th customer churn prediction model and the 6th customer churn are predicted Model is assessed, and determines target user's attrition prediction model according to assessment result.Specific use of the process referring at the beginning of above-mentioned September Family attrition prediction process, is no longer described in detail here.

Customer churn prediction model more new logic and so on, do not remake excessive description herein.

In above-mentioned broadband user's attrition prediction method, firstly, obtaining historical use data, historical use data includes history User behavior data and historical user's status data；Wherein historical user's status data in the first predetermined amount of time and first pre- Historical user's behavioral data in the second predetermined amount of time after section of fixing time is corresponding；Then it is obtained in historical use data Training sample is taken, study generation, customer churn prediction model are carried out to training sample according to pre-defined algorithm；In addition, being used in history Test sample is chosen in user data, historical user's behavioral data of test sample is inputted into customer churn prediction model, is obtained pre- The User Status data of survey.Historical user's status data of User Status data and test sample is compared, according to correlation data The historical user's behavioral data obtained in predetermined amount of time is inputted to the customer churn prediction model regenerated to prediction model, And detailed exemplary illustration is carried out for the above method, which is not described herein again.The application passes through to historical use data Unbalanced distribution processing so that customer churn prediction model adequately learn to be lost user feature, raising customer churn Accuracy rate and recall rate of the prediction model to future customer attrition prediction.

Such as Fig. 6, the embodiment of the present invention provides a kind of broadband user's attrition prediction system 60, comprising:

Module 601 is obtained, for obtaining historical use data, historical use data includes historical user's behavioral data and goes through History User Status data；Wherein, historical user's behavioral data includes user using flow duration and complains data, historical user Status data includes customer churn status data and the non-loss status data of user, wherein the history in the first predetermined amount of time is used Family status data is corresponding with historical user's behavioral data in first the second predetermined amount of time after a predetermined period of time.

Training module 602, it is right for obtaining training sample according in the historical use data for obtaining the acquisition of module 601 Training sample carries out study according to pre-defined algorithm and generates customer churn prediction model.

Test module 603, for choosing test sample in historical use data according to acquisition module 601, by test specimens This historical user's behavioral data inputs customer churn prediction model, obtains the User Status data of prediction.

Processing module 604, if existing for the User Status data of prediction and historical user's status data of test sample Inconsistent；When all non-attrition status of historical user's status data then in test sample, by the test of prediction error The test sample of the first predetermined ratio in sample is added in training sample and regenerates customer churn prediction model.

Processing module 604, if historical user's status data of the User Status data and test sample that are also used to predict is deposited Inconsistent；Then historical user's status data in test sample is there are when attrition status, by the test specimens of prediction error The test sample of the second predetermined ratio in this and the test sample of the third predetermined ratio in the correct test sample of prediction It is added in training sample and regenerates customer churn prediction model.

Prediction module 605, historical user's behavioral data for that will be obtained in predetermined amount of time according to acquisition module 601 are defeated Enter the customer churn prediction model regenerated, obtains future customer status data.

In a kind of illustrative scheme, processing module 604 is also used to use the history that data acquisition module 601 obtains User data is pre-processed.

In a kind of illustrative scheme, processing module 604, for the historical use data for obtaining the acquisition of module 601 Carry out one or more operations at least following items: reconstruct, the data audit, data classification, attribute variable of attribute variable About subtract, the filling of missing values.

In a kind of illustrative scheme, training module 602, for generating current time before according to acquisition module 601 The corresponding customer churn prediction model of historical use data of at least two periods obtained, wherein at least two period are equal Using current time as end caps, the length of at least two periods is different；Evaluation module 606, for according to training module The 602 at least two customer churn prediction models generated are assessed；According to the assessment of at least two customer churn prediction models As a result, obtaining target user's attrition prediction model.

In a kind of illustrative scheme, assessment result includes below one or more: accuracy rate, recall rate and F1 Score；Wherein accuracy rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users；Recall rate are as follows: prediction is quasi- Really it is lost number of users and the practical percentage for being lost number of users；F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+it recalls Rate).

Wherein, the technical effect of above method embodiment is related to content and realization can directly quote system embodiment In description in corresponding functional module, specifically repeat no more.

The step of method in conjunction with described in the disclosure of invention or algorithm can realize in a manner of hardware, can also It is realized in a manner of being to execute software instruction by processor.Such as: above-mentioned processing module can be realized by processor, be obtained Module can by transceiver or other just have signal receiving function circuit realize.The embodiment of the present invention also provides a kind of storage Medium, the storage medium may include memory, for being stored as computer software used in broadband user's attrition prediction system Instruction, it includes execute program code designed by paging method.Specifically, software instruction can be by corresponding software module group At software module can be stored on random access memory (Random Access Memory, RAM), flash memory, read-only storage Device (Read Only Memory, ROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable ROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), register, hard disk, movement are hard In the storage medium of disk, CD-ROM (CD-ROM) or any other form well known in the art.A kind of illustrative storage Medium couples to enable a processor to from the read information, and can be written to the storage medium and believe to processor Breath.Certainly, storage medium is also possible to the component part of processor.

The embodiment of the present invention also provides a kind of computer program, which can be loaded directly into memory, and Containing software code, which is loaded into via computer and can be realized above-mentioned broadband user's attrition prediction after executing Method.

Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted. Computer-readable medium includes computer storage media and communication media, and wherein communication media includes convenient for from a place to another Any medium of one place transmission computer program.Storage medium can be general or specialized computer can access it is any Usable medium.

More than, only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and it is any to be familiar with In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those skilled in the art, should all cover Within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims

1. a kind of broadband user's attrition prediction method characterized by comprising

Historical use data is obtained, the historical use data includes historical user's behavioral data and historical user's status data； Wherein, historical user's behavioral data includes user using flow duration and complains data, historical user's status number According to including customer churn status data and the non-loss status data of user, wherein the historical user in the first predetermined amount of time Status data is opposite with historical user's behavioral data in described first the second predetermined amount of time after a predetermined period of time It answers；

Training sample is chosen in the historical use data, the training sample is learnt according to pre-defined algorithm, is generated Customer churn prediction model；

Test sample is chosen in the historical use data, described in historical user's behavioral data input by the test sample Customer churn prediction model obtains the User Status data of prediction；

If there are inconsistent for historical user's status data of the User Status data of the prediction and the test sample；Then exist When all non-attrition status of historical user's status data in the test sample, by the test sample of prediction error The test sample of one predetermined ratio is added in the training sample and regenerates customer churn prediction model；

If there are inconsistent for historical user's status data of the User Status data of the prediction and the test sample；Then exist Historical user's status data in the test sample is pre- by second in the test sample of prediction error there are when attrition status The training is added in the test sample of certainty ratio and the test sample of the third predetermined ratio in the correct test sample of prediction Customer churn prediction model is regenerated in sample；

The customer churn prediction model that will be regenerated described in the historical user's behavioral data obtained in predetermined amount of time input, is obtained Obtain future customer status data.

2. broadband user's attrition prediction method according to claim 1, which is characterized in that acquisition historical user's number According to later further include: pre-processed to the historical use data.

3. broadband user's attrition prediction method according to claim 2, which is characterized in that described to historical user's number According to being pre-processed, comprising:

One or more operations at least following items: the reconstruct of attribute variable, data are carried out to the historical use data Audit, data classification, attribute variable about subtract, the fillings of missing values.

4. broadband user's attrition prediction method according to claim 1, which is characterized in that the method also includes:

Generate the corresponding customer churn prediction model of historical use data of at least two periods before current time, wherein institute Stated at least two periods using the current time as end caps, the length of at least two period is different；

At least two customer churn prediction models are assessed, according to the assessment knot of at least two customer churn prediction models Fruit obtains target user's attrition prediction model.

5. broadband user's attrition prediction method according to claim 4, which is characterized in that the assessment result includes following It is one or more: accuracy rate, recall rate and F1 score；Wherein

The accuracy rate are as follows: prediction is accurate to be lost number of users and predict the percentage for being lost number of users；

The recall rate are as follows: prediction is accurate to be lost number of users and the practical percentage for being lost number of users；

The F1 score are as follows: 2* accuracy rate * recall rate/(accuracy rate+recall rate).

6. a kind of broadband user's attrition prediction system characterized by comprising

Module is obtained, for obtaining historical use data, the historical use data includes historical user's behavioral data and history User Status data；Wherein, historical user's behavioral data includes user using flow duration and complains data, described to go through History User Status data include customer churn status data and the non-loss status data of user, wherein in the first predetermined amount of time The historical user in historical user's status data and described first the second predetermined amount of time after a predetermined period of time Behavioral data is corresponding；

Training module, for choosing training sample in the historical use data obtained according to the acquisition module, to institute It states training sample to be learnt according to pre-defined algorithm, generates customer churn prediction model；

Test module, for choosing test sample in the historical use data obtained according to the acquisition module, by institute The historical user's behavioral data for stating test sample inputs the customer churn prediction model, obtains the User Status data of prediction；

Processing module, if existing for the User Status data of the prediction and historical user's status data of the test sample Inconsistent；When all non-attrition status of historical user's status data then in the test sample, by prediction error The test sample of the first predetermined ratio in test sample is added in the training sample and regenerates customer churn prediction model；

The processing module, if being also used to the User Status data of the prediction and historical user's status number of the test sample According to there are inconsistent；Then historical user's status data in the test sample is there are when attrition status, by prediction error Test sample in the second predetermined ratio test sample and third predetermined ratio in the correct test sample of prediction Test sample is added in the training sample and regenerates customer churn prediction model；

Prediction module, for will be obtained described in historical user's behavioral data input that module obtains in predetermined amount of time according to described The customer churn prediction model regenerated obtains future customer status data.

7. broadband user's attrition prediction system according to claim 6, which is characterized in that

The processing module is also used to pre-process the historical use data that the data acquisition module obtains.

8. broadband user's attrition prediction system according to claim 7, which is characterized in that

The processing module, the historical use data for obtaining to the acquisition module carry out at least following items One or more operations: the reconstruct of attribute variable, data audit, data classification, attribute variable about subtract, the filling of missing values.

9. broadband user's attrition prediction system according to claim 6, which is characterized in that

The training module, for going through according at least two periods of the acquisition module acquisition before generating current time The corresponding customer churn prediction model of history user data, wherein at least two period is to terminate with the current time The length of endpoint, at least two period is different；Evaluation module, for according to the training module generate described in extremely Few two customer churn prediction models are assessed；According to the assessment result of at least two customer churns prediction model, obtain Obtain target user's attrition prediction model.

10. broadband user's attrition prediction system according to claim 9, which is characterized in that the assessment result include with Under it is one or more: accuracy rate, recall rate and F1 score；Wherein,