CN109190796A - A kind of telecom client attrition prediction method, system and electronic equipment - Google Patents

A kind of telecom client attrition prediction method, system and electronic equipment Download PDF

Info

Publication number
CN109190796A
CN109190796A CN201810871287.3A CN201810871287A CN109190796A CN 109190796 A CN109190796 A CN 109190796A CN 201810871287 A CN201810871287 A CN 201810871287A CN 109190796 A CN109190796 A CN 109190796A
Authority
CN
China
Prior art keywords
client
network
value
sample
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810871287.3A
Other languages
Chinese (zh)
Other versions
CN109190796B (en
Inventor
陈广西
袁明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianyuan Creative Technology Ltd
Original Assignee
Beijing Tianyuan Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianyuan Creative Technology Ltd filed Critical Beijing Tianyuan Creative Technology Ltd
Priority to CN201810871287.3A priority Critical patent/CN109190796B/en
Publication of CN109190796A publication Critical patent/CN109190796A/en
Application granted granted Critical
Publication of CN109190796B publication Critical patent/CN109190796B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q50/40

Abstract

The present invention provides a kind of telecom client attrition prediction method, system and electronic equipment, the described method includes: calculating in telecommunication network at netter family and the corresponding sample division proportion of off-network client, and the improvement Geordie radix of random forests algorithm is determined at netter family and the corresponding sample division proportion of off-network client based on described;Based on the improvement Geordie radix, using random forests algorithm, predict that telecom client is lost;Wherein, the sample division proportion indicates, the correspondence classification of selection accounts for the category in the ratio of net state client's total amount in the sample size of the client of net state.The present invention can effectively solve the problem that under unbalanced data that customer churn prediction accuracy rate caused by the imbalance between classification is low, especially the low problem of high-value user's predictablity rate.

Description

A kind of telecom client attrition prediction method, system and electronic equipment
Technical field
The present invention relates to technical field of information management, more particularly, to a kind of telecom client attrition prediction method, system And electronic equipment.
Background technique
With the development of telecommunications industry, client selects the leeway of telecommunication product and telecommunications company increasing, telecom client Churn management is particularly important in the more and more fierce telecommunications industry of competition.In the case where market tends to saturation, development is new The cost of client is much larger than the cost for retaining existing customer.Therefore, predict that potential customer revenue is particularly important for company, It is necessary to construct the model for being intended to find to have loss orientation client.
Currently, related network reason and non-network reason before the main churn prediction by analysis, building customer churn are pre- Survey model, and the client that will be lost using the model prediction.Currently, there are many this kind of prediction model, such as machine learning algorithm packet Include decision tree, neural network, SVM, random forest and based on their innovatory algorithm etc..Divided accordingly using these Algorithm constitutions Class device can predict customer churn.
But neural network restrains the problem of slow and local minimum point etc., causes calculating error that can not decline.In addition, examining Considering existing telecommunication data has the characteristics that data dimension is high, and random forest has, choosing low to data quality requirement Select variable and sampling the features such as being random, can guarantee the accuracy of classification, and to exceptional value and noise have compared with Good tolerance, therefore big more options random forests algorithm, to be carried out according to the behavior before telecom client off-network to customer churn Prediction.
Although existing random forests algorithm is relatively high to the predictablity rate of customer churn, it is less prone to over-fitting. But random forest, since customer churn ratio is very low, leads to sample centralized activity in the prediction application that telecom client is lost Client is many more than customer revenue number, and such sample is nonequilibrium.And the meter of the Geordie radix of original random forest It is calculated at last using all kinds of numbers, when causing to carry out the prediction of customer churn using random forest in this way, predicted value ratio Compared with the more class of classification sample is biased to, preferably customer churn cannot be predicted.
Summary of the invention
In order to overcome the above problem or at least be partially solved the above problem, the present invention provides a kind of telecom client loss Prediction technique, system and electronic equipment, effectively to solve under unbalanced data, customer churn caused by the imbalance between classification Predictablity rate is low, especially the low problem of high-value user's predictablity rate.
In a first aspect, the present invention provides a kind of telecom client attrition prediction method, comprising: calculate in telecommunication network in netter Family and the corresponding sample division proportion of off-network client, and based on described at netter family and the corresponding sample of off-network client Division proportion determines the improvement Geordie radix of random forests algorithm;Based on the improvement Geordie radix, calculated using random forest Method, prediction telecom client are lost;Wherein, the sample division proportion indicates, the client of the correspondence classification of selection in net state Sample size accounts for the category in the ratio of net state client's total amount.
Further, at netter family and the corresponding sample division proportion of off-network client in the calculating telecommunication network The step of before, the method also includes the values of life based on telecom client, are arranged different predictions for different telecom clients Malfunction penalty term;Correspondingly, at netter family and the corresponding sample division proportion of off-network client in the calculating telecommunication network The step of further comprise: malfunction penalty term, calculate described corresponding at netter family and off-network client in conjunction with the prediction Sample division proportion.
Further, random forests algorithm is utilized described, before predicting the step of telecom client is lost, the method is also It include: the threshold value for setting classification of voting in random forests algorithm;Correspondingly, described utilize random forests algorithm, prediction telecommunications visitor The step of family is lost further comprises: according to the threshold value of the ballot classification, carrying out random forests algorithm operation, prediction telecommunications visitor Family is lost.
Wherein, in the calculating telecommunication network the netter family and off-network client corresponding sample division proportion the step of Further comprise: utilizing following quantity ratio, indicates the sample size of each partitioning site in random forests algorithm:
In formula, AR indicates the sample size of t partitioning site, and t indicates that the left sibling or right node in random forest tree, k indicate For client in net state classification, value is in net or off-network, CtkIndicate sample size of the client in net state classification for k at t node, CkIndicate client in the sample size that net state classification is k, CtIndicate t node sample size, λ1Indicate the first adjustment parameter;
Based on the sample size of each partitioning site, calculate described at netter family and the corresponding sample of off-network client Division proportion is as follows:
In formula, ARP (k | t) indicates ratio of the sample AR value in node t in node t client in net state classification k, k Value is in net or off-network, and AR (k=0 | t) and AR (k=1 | t) respectively indicate the client in node t in net and churn prediction AR value.
Wherein, prediction error penalty term described in the combination calculates described corresponding at netter family and off-network client The step of sample division proportion, further comprises: utilizing following quantity ratio, indicates each partitioning site base in random forests algorithm In the sample size of client's value of life:
In formula, VAR indicates sample size of the t partitioning site based on client's value of life, and t indicates the left section in random forest tree Point or right node, k indicate client in net state classification, and value is in net or off-network, VCtkIndicate that client is netted at t node State classification is the sum of the sample value of life of k, VCkIndicate client in the sum of the sample value of life that net state classification is k, VCtTable Show the sum of t node sample value of life, λ2Indicate the second adjustment parameter;
Based on the sample size of each partitioning site based on client's value of life, calculate described at netter family and off-network visitor The corresponding sample division proportion in family is as follows:
In formula, VARP (k | t) indicate client at node t in ratio of the sample VAR value in node t of net state classification k, K value be in net or off-network, VAR (k=0 | t) and VAR (k=1 | t) respectively indicate the client in node t net and client from The VAR value of net.
Wherein, it is described based on described at netter family and the corresponding sample division proportion of off-network client, determine random gloomy The step of improvement Geordie radix of woods algorithm, further comprises: being drawn with described at netter family and the corresponding sample of off-network client Divide ratio, as the improvement Geordie radix.
Further, it in the sample size based on each partitioning site, calculates described at netter family and off-network visitor After the step of corresponding sample division proportion in family, the method also includes: based on described at netter family and off-network client Corresponding sample division proportion calculates the information gain of the sample size of each partitioning site.
Wherein, described to be based on the improvement Geordie radix, using random forests algorithm, predict the step of telecom client is lost Further comprise: choosing the reckling in the information gain, is drawn as the division attribute of random forests algorithm, and based on described Adhering to separately property creates each node of each tree in random forests algorithm, generates non-beta pruning tree;Based on all non-beta pruning trees, Random forest is constituted, and utilizes the random forest partition testing collection, ballot classification is carried out, determines customer revenue.
Second aspect, the present invention provide a kind of telecom client attrition prediction system, comprising: setting module, for calculating electricity At netter family and the corresponding sample division proportion of off-network client in communication network, and based on described at netter family and off-network client Corresponding sample division proportion determines the improvement Geordie radix of random forests algorithm;Prediction module, for being changed based on described Into Geordie radix, using random forests algorithm, predict that telecom client is lost;Wherein, the sample division proportion indicates, selection Corresponding classification accounts for the category in the ratio of net state client's total amount in the sample size of the client of net state.
The third aspect, the present invention provide a kind of electronic equipment, comprising: at least one processor, is led at least one processor Believe interface and bus;The memory, the processor and the communication interface complete mutual communication by the bus, The communication interface is passed for the information of the electronic equipment and client between net state detection device and data setting equipment It is defeated;The computer program that can be run on the processor is stored in the memory, the processor executes the calculating When machine program, telecom client attrition prediction method as described above is realized.
A kind of telecom client attrition prediction method, system and electronic equipment provided by the invention are accounted for according to sample size is divided Ratio inside this classification, the Geordie radix in the random forests algorithm of computed improved, to improve random forests algorithm Telecom client attrition prediction, can effectively solve the problem that under unbalanced data, customer churn prediction caused by the imbalance between classification Accuracy rate is low, especially the low problem of high-value user's predictablity rate.
Detailed description of the invention
Fig. 1 is a kind of flow chart of telecom client attrition prediction method of the embodiment of the present invention;
Fig. 2 is according to a kind of preferred process flow chart of telecom client attrition prediction method of the embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of telecom client attrition prediction system of the embodiment of the present invention;
Fig. 4 is the structural block diagram of a kind of electronic equipment of the embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention Figure, is clearly and completely described the technical solution in the present invention, it is clear that described embodiment is one of the invention Divide embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making Every other embodiment obtained, shall fall within the protection scope of the present invention under the premise of creative work.
For defect of the existing random forests algorithm in terms of unbalanced data, the embodiment of the present invention proposes a kind of improved Random forests algorithm improves the telecom client attrition prediction of random forests algorithm that is, by computed improved Geordie radix.
As the one aspect of the embodiment of the present invention, the present embodiment provides a kind of telecom client attrition prediction method, references Fig. 1 is a kind of flow chart of telecom client attrition prediction method of the embodiment of the present invention, comprising:
S1 is calculated in telecommunication network at netter family and the corresponding sample division proportion of off-network client, and based in net Client and the corresponding sample division proportion of off-network client, determine the improvement Geordie radix of random forests algorithm;
S2, using random forests algorithm, predicts that telecom client is lost based on Geordie radix is improved;
Wherein, sample division proportion indicates, the correspondence classification of selection accounts for the category in the sample size of the client of net state and exists The ratio of net state client's total amount.
It is to be understood that the present embodiment is solved when carrying out customer churn prediction using random forests algorithm, data are uneven The problem of weighing apparatus.Specifically when calculating the Geordie radix of random forests algorithm, the ratio inside this classification is accounted for using sample size is divided, Only in classification interior contrast, thus carry out the division between balancedunbalanced categories of datasets.Therefore, make different classes of in category division Example it is unaffected, unbalanced situation is distributed between two classes without considering.
It is understood that above-mentioned classification is classifying in net state to all telecommunication users, such as it is included in net state With off-network state, then correspondingly, corresponding classification in the client of net state is in the client of net state or the visitor of off-network state Family.
Specifically in step sl, divided first according to determining random tree node division sample size using given calculation formula It Ji Suan not be in the sample division proportion at netter family and the sample division proportion of off-network client.Then, using in netter family and off-network The corresponding sample division proportion of client determines the Geordie radix of random forests algorithm, i.e. improvement Geordie radix.
Wherein, in one embodiment, it based at netter family and the corresponding sample division proportion of off-network client, determines The step of improvement Geordie radix of random forests algorithm, further comprises: at netter family and the corresponding sample of off-network client Division proportion, as improvement Geordie radix.
Wherein, in another embodiment, it calculates in telecommunication network at netter family and the corresponding sample of off-network client The step of division proportion, further comprises:
Firstly, indicating the sample size of each partitioning site in random forests algorithm using following quantity ratio:
In formula, AR indicates the sample size of t partitioning site, and t indicates that the left sibling or right node in random forest tree, k indicate For client in net state classification, value is in net or off-network, CtkIndicate sample size of the client in net state classification for k at t node, CkIndicate client in the sample size that net state classification is k, CtIndicate t node sample size, λ1Indicate the first adjustment parameter;
Then, the sample size based on each partitioning site is calculated and is drawn at netter family and the corresponding sample of off-network client Divide ratio as follows:
In formula, ARP (k | t) indicates ratio of the sample AR value in node t in node t client in net state classification k, k Value is in net or off-network, and AR (k=0 | t) and AR (k=1 | t) respectively indicate the client in node t in net and churn prediction AR value.
It is understood that the ratio inside this classification is accounted for using sample size is divided, only in class when calculating Geordie radix Other interior contrast.First with above-mentioned quantity ratio (Amount Ratio) AR, the sample size of each partitioning site is indicated.Wherein λ Effect be exactly to be finely adjusted in conjunction with the different classes of data volume of same intra-node, to improve the anti-of same category interior contrast Making an uproar property.
Division (in net and off-network) is exactly the distribution for changing unbalanced dataset between balancing classification, so that algorithm will not be because And this is by too big influence, that is, improves the accuracy rate that off-network client divides.
Each information gain Δ is calculated in new data, need to calculate each AR value ratio shared by each sample value Example (Amount Ratio Poportion) ARP, is calculated with specific reference to above-mentioned ARP calculation formula.
If ARP value is very high, in prediction classification, ratio shared by this kind is also high, and opposite AR is also big, then It is also big that this kind of clients is predicted to be this kind of shared ratios.It therefore no matter is more in the ratio of net and off-network client Few, impurity level measurement under this rule is not influenced by two classification client's ratios are unbalanced, is overcome traditional random gloomy This disadvantage of woods.
Specifically in step s 2, on the basis of determining the improvement Geordie radix of random forests algorithm according to above-mentioned steps, Using the improvement Geordie radix, the operation of random forests algorithm is carried out to the other client of telecommunications to be predicted, is realized to the telecommunications The prediction of classification loss user.
A kind of telecom client attrition prediction method provided in an embodiment of the present invention accounts for inside this classification according to sample size is divided Ratio, the Geordie radix in the random forests algorithm of computed improved, to improve the telecom client of random forests algorithm Attrition prediction can effectively solve the problem that under unbalanced data that customer churn prediction accuracy rate caused by the imbalance between classification is low, especially It is the low problem of high-value user's predictablity rate.
Further, on the basis of the above embodiments, distinguish in calculating telecommunication network at netter family and off-network client Before the step of corresponding sample division proportion, this method further include: the value of life based on telecom client, for different telecommunications visitors Different prediction error penalty terms is arranged in family;
Correspondingly, calculate in telecommunication network the netter family and off-network client corresponding sample division proportion the step of into One step includes: to calculate in conjunction with prediction error penalty term at netter family and the corresponding sample division proportion of off-network client.
It is understood that considering that tradition, when being predicted, has ignored the life valence of client using random forest method Value does not consider the accuracy of high value customer as far as possible when carrying out customer traffic prediction especially, has been easy to cause off-network tendency High value customer sorted out by mistake, make its loss, and the loss of high value customer can cause huge loss to telecommunications company, The present embodiment predicts that error assigns different penalty terms to different clients, and punishment item size is determined by the value of life of client.
Later, it is based on innovatory algorithm same as above-described embodiment, calculates the sample value of each division points, with based on client The quantity of value of life balances the division in net and off-network client than indicating after client's value of life is added.
Wherein, in one embodiment, in conjunction with prediction error penalty term, calculating respectively corresponds at netter family and off-network client Sample division proportion the step of further comprise:
Firstly, indicating that each partitioning site is based on client's value of life in random forests algorithm using following quantity ratio Sample size:
In formula, VAR indicates sample size of the t partitioning site based on client's value of life, and t indicates the left section in random forest tree Point or right node, k indicate client in net state classification, and value is in net or off-network, VCtkIndicate that client is netted at t node State classification is the sum of the sample value of life of k, VCkIndicate client in the sum of the sample value of life that net state classification is k, VCtTable Show the sum of t node sample value of life, λ2Indicate the second adjustment parameter;
Secondly, the sample size based on each partitioning site based on client's value of life, calculates at netter family and off-network client Corresponding sample division proportion is as follows:
In formula, VARP (k | t) indicate client at node t in ratio of the sample VAR value in node t of net state classification k, K value be in net or off-network, VAR (k=0 | t) and VAR (k=1 | t) respectively indicate the client in node t net and client from The VAR value of net.
It is understood that being put down only improving unbalanced data to the improvement of random forest according to above-described embodiment The problem of classification that weighs, but there is no the values of life for considering client.The value of each client is different, loss value High client loses bigger caused by telecommunications company.
The present embodiment calculates the sample value of each division points also according to the improvement of the above method, is based on client with above-mentioned The quantity ratio VAR of value of life is indicated after client's value of life is added, and balances the division at netter family and off-network client.And it is and every A VAR value ratio shared by each node, also according to the present embodiment at netter family and the corresponding sample of off-network client Division proportion formula is calculated.
A kind of telecom client attrition prediction method provided in an embodiment of the present invention is accounted in this classification using division sample size On the basis of the ratio in portion calculates Geordie radix, propose that the node purity based on client's value of life divides, by the life of client Value is integrated into improved gini index, can be improved off-network client's predictablity rate, and especially high value customer prediction is quasi- True rate, thus to avoid the loss of high value customer from providing instruction.
Further, it in the sample size based on each partitioning site, calculates corresponding at netter family and off-network client After the step of sample division proportion, this method further include: based at netter family and the corresponding sample division of off-network client Ratio calculates the information gain of the sample size of each partitioning site.
It is understood that for the calculation method of above-mentioned improvement Geordie radix, in improved random forest calculating process In, it further include the sample size VAR of sample size AR or each partitioning site based on client's value of life to each partitioning site The calculating of corresponding information gain.
Wherein, particularly for the sample size AR of each partitioning site, it is as follows to calculate information gain:
In formula, ΔARThe information gain of the sample size AR of expression partitioning site, and ARP (k | t) it indicates in node t client in net Ratio of the sample AR value of status categories k in node t, k value are AR (k=0 | t=l) and AR (k=1 in net or off-network | t=l) respectively indicate the client in node l in the AR value of net and churn prediction, AR (k=0 | t=r) and AR (k=1 | t=r) The client in node r is respectively indicated in the AR value of net and churn prediction.
Wherein, the sample size VAR particularly for each partitioning site based on client's value of life calculates information gain such as Under:
The division between unbalanced data classification is balanced by above improvement, and the prediction for improving high-value user is quasi- True rate.
It is wherein optional, based on improving Geordie radix, using random forests algorithm, predict the step of telecom client is lost into One step includes:
The reckling in information gain is chosen, as the division attribute of random forests algorithm, and based on attribute is divided, is created Each node of each tree in random forests algorithm, generates non-beta pruning tree;
Based on all non-beta pruning trees, random forest is constituted, and utilizes random forest partition testing collection, carries out ballot classification, Determine customer revenue.
It is understood that according to above-described embodiment, for each AR value or VAR value, it being calculated and respectively corresponds Information gain.For all AR values or VAR value, range of information yield value can get according to above-mentioned calculating, then choosing should The smallest information gain value in series of values, the division attribute as random forests algorithm.Later, under this division attribute, wound The each node for building each tree in random forests algorithm generates non-beta pruning tree.
Non- beta pruning tree is generated to every group of training sample according to above-mentioned processing step, then carries out K training producible K and non-cuts Branch tree.Random forest is constituted using K non-beta pruning tree of generation, and using the random forest partition testing collection constituted, with setting Threshold value compare, carry out ballot classification, and determine customer revenue according to classification results.
Further, on the basis of the above embodiments, random forests algorithm, the step that prediction telecom client is lost are being utilized Before rapid, this method further include: the threshold value for classification of voting in setting random forests algorithm;
Correspondingly, predicting that the step of telecom client is lost further comprises using random forests algorithm: being classified according to ballot Threshold value, carry out random forests algorithm operation, prediction telecom client be lost.
It is understood that under normal circumstances, telecommunications company concerns that off-network client is predicted correctly, and off-network client Sorted out by mistake big to the loss of telecommunications company's bring by mistake classification more than at netter family;And original random forest side Method, is equivalent to the classifier that threshold value is spanning tree half, and threshold value expression is predicted the threshold that classification should be classified as certain class; So under this threshold, customer revenue is easily classified as at netter family.
Consider such case, the present embodiment after obtaining the random forest obtained by beta pruning tree according to above-described embodiment, When carrying out ballot classification, appropriate threshold value is manually set, determines which kind of prediction classification should be classified as.I.e. by the way that threshold is manually set The sample is divided into loss visitor if the number that a certain sample is divided into customer revenue tree to be greater than to the threshold value of setting by value Family sample.
Due to customer revenue ratio very little, this thought, which is equivalent to, reduces the threshold for being classified as customer revenue, therefore can incite somebody to action More customer revenues are correctly sorted out.
A kind of telecom client attrition prediction method provided in an embodiment of the present invention, by changing the value of life introduction of client Into Geordie radix, and random forest vote classify when set appropriate threshold value, improve unbalanced data to customer churn The influence of predictablity rate, and improve the customer churn prediction accuracy rate of high-value user.
To further illustrate technical solution of the present invention, following preferred process flow is provided, but not to guarantor of the invention Shield range is limited.
With reference to Fig. 2, to be somebody's turn to do according to a kind of preferred process flow chart of telecom client attrition prediction method of the embodiment of the present invention Input is training sample (X in preferred flow1,Y1), (X2,Y2) ..., (Xn,Yn), it exports as using ballot mode, according to threshold value T The test set classification of decision.Specific process flow is as follows:
Firstly, sample is divided into n parts, using cross validation, the optimum value of attribute m is randomly choosed on each node;
Secondly, sequentially generating k decision tree;
Again, from the training set of n sample, repeat to extract n sample as training set;And randomly choose m category Property, node split is carried out, using the calculation formula of the information gain of the sample size VAR based on client's value of life, calculates information Gain;And select information gain the smallest as attribute is divided, each node of each tree is successively created, non-beta pruning tree is generated;
Finally, constituting random forest using the k tree generated, thrown using random forest partition testing collection with threshold comparison Ticket classification.
As the other side of the embodiment of the present invention, the present embodiment provides a kind of telecom client stream according to above-described embodiment Forecasting system is lost, is a kind of structural schematic diagram of telecom client attrition prediction system of the embodiment of the present invention with reference to Fig. 3.It include: to set Cover half block 1 and prediction module 2.
Wherein, setting module 1 is divided for calculating in telecommunication network at netter family and the corresponding sample of off-network client Ratio, and based at netter family and the corresponding sample division proportion of off-network client, determine the improvement base of random forests algorithm Thessaloniki number;Prediction module 2 is used to, using random forests algorithm, predict that telecom client is lost based on Geordie radix is improved;Wherein, Sample division proportion indicates that the correspondence classification of selection accounts for the category in net state client's total amount in the sample size of the client of net state Ratio.
It is understood that the telecom client attrition prediction system in the embodiment of the present invention is in the above embodiments Realize the prediction being lost to telecom client.Therefore, the description in telecom client attrition prediction method in the above embodiments And definition, it can be used for the understanding of each execution module in the embodiment of the present invention.
It is understood that can be by hardware processor (hardware processor) come real in the embodiment of the present invention Existing related function module.
A kind of telecom client attrition prediction system provided in an embodiment of the present invention, by the way that corresponding functional module is arranged, On the basis of accounting for the calculating Geordie radix of the ratio inside this classification using division sample size, the section based on client's value of life is proposed Point purity divides, and the value of life of client is integrated into improved gini index, can be improved off-network client's predictablity rate, Especially high value customer predictablity rate, thus to avoid the loss of high value customer from providing instruction.
As the another aspect of the embodiment of the present invention, the present embodiment provides a kind of electronic equipment according to above-described embodiment, Be the structural block diagram of a kind of electronic equipment of the embodiment of the present invention with reference to Fig. 4, comprising: at least one processor 401, at least one Processor 402, communication interface 403 and bus 404.
Wherein, memory 401, processor 402 and communication interface 403 complete mutual communication by bus 404, communicate Interface 403 is transmitted for the information of the electronic equipment and client between net state detection device and data setting equipment;Storage The computer program that can be run on the processor 402 is stored in device 401, it is real when processor 402 executes the computer program The now telecom client attrition prediction method as described in above-described embodiment.
It is to be understood that including at least memory 401, processor 402,403 and of communication interface in the electronic equipment Bus 404, and memory 401, processor 402 and communication interface 403 form mutual communication connection by bus 404, And achievable mutual communication.
Communication interface 403 realizes that the electronic equipment and client are logical between net state detection device and data setting equipment Letter connection, and achievable mutual information transmission, are such as realized by communication interface 403 and are joined to client in net state and user preset Several acquisitions etc..
When electronic equipment is run, processor 402 calls the program instruction in memory 401, real to execute above-mentioned each method Apply method provided by example, for example, calculate in telecommunication network and divide at netter family and the corresponding sample of off-network client Ratio, and changing for random forests algorithm is determined at netter family and the corresponding sample division proportion of off-network client based on described Into Geordie radix;And the value of life based on telecom client, different prediction error penalty terms is set for different telecom clients Deng.
In another embodiment of the present invention, a kind of non-transient computer readable storage medium, the non-transient calculating are provided Machine readable storage medium storing program for executing stores computer instruction, and the computer instruction executes the computer as described in above-described embodiment Telecom client attrition prediction method.
It is to be understood that the logical order in above-mentioned memory 401 can be realized by way of SFU software functional unit And when sold or used as an independent product, it can store in a computer readable storage medium.Alternatively, on realizing Stating all or part of the steps of embodiment of the method, this can be accomplished by hardware associated with program instructions, and program above-mentioned can deposit It is stored in a computer-readable storage medium, which when being executed, executes step including the steps of the foregoing method embodiments;And it is preceding The storage medium stated includes: various Jie that can store program code such as USB flash disk, mobile hard disk, ROM, RAM, magnetic or disk Matter.
The embodiment of electronic equipment described above is only schematical, wherein unit as illustrated by the separation member It may or may not be physically separated, both can be located in one place, or may be distributed over heterogeneous networks On unit.Some or all of the modules therein can be selected to achieve the purpose of the solution of this embodiment according to actual needs.This Field those of ordinary skill is without paying creative labor, it can understands and implements.
By the description of embodiment of above, those skilled in the art is it will be clearly understood that each embodiment can borrow Help software that the mode of required general hardware platform is added to realize, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned Substantially the part that contributes to existing technology can be embodied in the form of software products technical solution in other words, the meter Calculation machine software product may be stored in a computer readable storage medium, such as USB flash disk, mobile hard disk, ROM, RAM, magnetic disk or light Disk etc., including some instructions, with so that a computer equipment (such as personal computer, server or network equipment etc.) Execute method described in certain parts of above-mentioned each method embodiment or embodiment of the method.
A kind of electronic equipment provided in an embodiment of the present invention and a kind of non-transient computer readable storage medium are drawn using Point sample size accounts on the basis of the ratio inside this classification calculates Geordie radix, proposes the node purity based on client's value of life It divides, the value of life of client is integrated into improved gini index, can be improved off-network client's predictablity rate, especially High value customer predictablity rate, thus to avoid the loss of high value customer from providing instruction.
In addition, those skilled in the art are it should be understood that in application documents of the invention, term " includes ", "comprising" or any other variant thereof is intended to cover non-exclusive inclusion so that include a series of elements process, Method, article or equipment not only include those elements, but also including other elements that are not explicitly listed, or are also wrapped It includes as elements inherent to such a process, method, article, or device.In the absence of more restrictions, by sentence " including One ... " limit element, it is not excluded that there is also another in the process, method, article or apparatus that includes the element Outer identical element.
In specification of the invention, numerous specific details are set forth.It should be understood, however, that the embodiment of the present invention can To practice without these specific details.In some instances, well known method, structure and skill is not been shown in detail Art, so as not to obscure the understanding of this specification.Similarly, it should be understood that disclose in order to simplify the present invention and helps to understand respectively One or more of a inventive aspect, in the above description of the exemplary embodiment of the present invention, each spy of the invention Sign is grouped together into a single embodiment, figure, or description thereof sometimes.
However, the disclosed method should not be interpreted as reflecting the following intention: i.e. the claimed invention requirement Features more more than feature expressly recited in each claim.More precisely, as claims are reflected Like that, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows the power of specific embodiment Thus sharp claim is expressly incorporated in the specific embodiment, wherein each claim itself is as independent reality of the invention Apply example.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, and those skilled in the art is it is understood that it still can be right Technical solution documented by foregoing embodiments is modified or equivalent replacement of some of the technical features;And this It modifies or replaces, the spirit and model of technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims (10)

1. a kind of telecom client attrition prediction method characterized by comprising
It calculates in telecommunication network at netter family and the corresponding sample division proportion of off-network client, and based on described at netter family Sample division proportion corresponding with off-network client determines the improvement Geordie radix of random forests algorithm;
Based on the improvement Geordie radix, using random forests algorithm, predict that telecom client is lost;
Wherein, the sample division proportion indicates, the correspondence classification of selection accounts for the category in the sample size of the client of net state and exists The ratio of net state client's total amount.
2. the method according to claim 1, wherein at netter family and off-network visitor in the calculating telecommunication network Before the step of corresponding sample division proportion in family, further includes:
Different prediction error penalty terms is arranged for different telecom clients for value of life based on telecom client;
It is further wrapped in the calculating telecommunication network in the step of netter family and off-network client corresponding sample division proportion It includes:
In conjunction with prediction error penalty term, calculate described at netter family and the corresponding sample division proportion of off-network client.
3. predicting telecom client the method according to claim 1, wherein utilizing random forests algorithm described Before the step of loss, further includes:
Set the threshold value for classification of voting in random forests algorithm;
Correspondingly, the step of described utilize random forests algorithm, and prediction telecom client is lost further comprises:
According to the threshold value of the ballot classification, random forests algorithm operation is carried out, prediction telecom client is lost.
4. the method according to claim 1, wherein at netter family and off-network client in the calculating telecommunication network The step of corresponding sample division proportion, further comprises:
Using following quantity ratio, the sample size of each partitioning site in random forests algorithm is indicated:
In formula, AR indicates the sample size of t partitioning site, and t indicates that the left sibling or right node in random forest tree, k indicate client In net state classification, value is in net or off-network, CtkIndicate that client is in the sample size that net state classification is k, C at t nodekTable Show client in the sample size that net state classification is k, CtIndicate t node sample size, λ1Indicate the first adjustment parameter;
Based on the sample size of each partitioning site, calculate described at netter family and the corresponding sample division of off-network client Ratio is as follows:
In formula, ARP (k | t) indicates ratio of the sample AR value in node t in node t client in net state classification k, k value For in net or off-network, AR (k=0 | t) and AR (k=1 | t) client in node t is respectively indicated in the AR of net and churn prediction Value.
5. according to the method described in claim 2, it is characterized in that, error penalty term is predicted described in the combination, described in calculating Further comprise in the step of netter family and off-network client corresponding sample division proportion:
Using following quantity ratio, sample size of each partitioning site based on client's value of life in random forests algorithm is indicated:
In formula, VAR indicate sample size of the t partitioning site based on client's value of life, t indicate random forest tree in left sibling or Right node, k indicate client in net state classification, and value is in net or off-network, VCtkIndicate that client is in net state class at t node Not Wei k the sum of sample value of life, VCkIndicate client in the sum of the sample value of life that net state classification is k, VCtIndicate t The sum of node sample value of life, λ2Indicate the second adjustment parameter;
Based on the sample size of each partitioning site based on client's value of life, calculate described at netter family and off-network client point Not corresponding sample division proportion is as follows:
In formula, VARP (k | t) indicates client at node t in ratio of the sample VAR value in node t of net state classification k, and k takes For value in net or off-network, VAR (k=0 | t) and VAR (k=1 | t) respectively indicate the client in node t in net and churn prediction VAR value.
6. according to the method described in claim 4, it is characterized in that, described based on described right respectively at netter family and off-network client The sample division proportion answered, the step of determining the improvement Geordie radix of random forests algorithm, further comprise:
With described at netter family and the corresponding sample division proportion of off-network client, as the improvement Geordie radix.
7. according to the method described in claim 6, it is characterized in that, in the sample size based on each partitioning site, Calculate it is described the netter family and off-network client corresponding sample division proportion the step of after, further includes:
Based on described at netter family and the corresponding sample division proportion of off-network client, the sample of each partitioning site is calculated The information gain of this amount.
8. the method according to the description of claim 7 is characterized in that it is described be based on the improvements Geordie radix, using at random it is gloomy The step of woods algorithm, prediction telecom client is lost, further comprises:
The reckling in the information gain is chosen, as the division attribute of random forests algorithm, and is based on the division attribute, The each node for creating each tree in random forests algorithm, generates non-beta pruning tree;
Based on all non-beta pruning trees, random forest is constituted, and utilizes the random forest partition testing collection, carries out ballot point Class determines customer revenue.
9. a kind of telecom client attrition prediction system characterized by comprising
Setting module, for calculating in telecommunication network at netter family and the corresponding sample division proportion of off-network client, and base In described at netter family and the corresponding sample division proportion of off-network client, the improvement Geordie base of random forests algorithm is determined Number;
Prediction module, using random forests algorithm, predicts that telecom client is lost for being based on the improvement Geordie radix;
Wherein, the sample division proportion indicates, the correspondence classification of selection accounts for the category in the sample size of the client of net state and exists The ratio of net state client's total amount.
10. a kind of electronic equipment characterized by comprising at least one processor, at least one processor, communication interface and Bus;
The memory, the processor and the communication interface complete mutual communication, the communication by the bus Interface is transmitted for the information of the electronic equipment and client between net state detection device and data setting equipment;
The computer program that can be run on the processor is stored in the memory, the processor executes the calculating When machine program, the method as described in any in claim 1 to 8 is realized.
CN201810871287.3A 2018-08-02 2018-08-02 Telecommunication customer loss prediction method, system and electronic equipment Active CN109190796B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810871287.3A CN109190796B (en) 2018-08-02 2018-08-02 Telecommunication customer loss prediction method, system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810871287.3A CN109190796B (en) 2018-08-02 2018-08-02 Telecommunication customer loss prediction method, system and electronic equipment

Publications (2)

Publication Number Publication Date
CN109190796A true CN109190796A (en) 2019-01-11
CN109190796B CN109190796B (en) 2021-03-02

Family

ID=64920571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810871287.3A Active CN109190796B (en) 2018-08-02 2018-08-02 Telecommunication customer loss prediction method, system and electronic equipment

Country Status (1)

Country Link
CN (1) CN109190796B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259975A (en) * 2020-01-21 2020-06-09 支付宝(杭州)信息技术有限公司 Method and device for generating classifier and method and device for classifying text
CN112767125A (en) * 2021-01-15 2021-05-07 上海琢学科技有限公司 Customer loss prediction method, device and storage medium
CN113240518A (en) * 2021-07-12 2021-08-10 广州思迈特软件有限公司 Bank-to-public customer loss prediction method based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006190047A (en) * 2005-01-05 2006-07-20 Nippon Signal Co Ltd:The Vehicle selection system
CN106022505A (en) * 2016-04-28 2016-10-12 华为技术有限公司 Method and device of predicting user off-grid
CN107818376A (en) * 2016-09-13 2018-03-20 中国电信股份有限公司 Customer loss Forecasting Methodology and device
CN108280652A (en) * 2016-12-31 2018-07-13 中国移动通信集团辽宁有限公司 The analysis method and device of user satisfaction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006190047A (en) * 2005-01-05 2006-07-20 Nippon Signal Co Ltd:The Vehicle selection system
CN106022505A (en) * 2016-04-28 2016-10-12 华为技术有限公司 Method and device of predicting user off-grid
CN107818376A (en) * 2016-09-13 2018-03-20 中国电信股份有限公司 Customer loss Forecasting Methodology and device
CN108280652A (en) * 2016-12-31 2018-07-13 中国移动通信集团辽宁有限公司 The analysis method and device of user satisfaction

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259975A (en) * 2020-01-21 2020-06-09 支付宝(杭州)信息技术有限公司 Method and device for generating classifier and method and device for classifying text
CN111259975B (en) * 2020-01-21 2022-07-22 支付宝(杭州)信息技术有限公司 Method and device for generating classifier and method and device for classifying text
CN112767125A (en) * 2021-01-15 2021-05-07 上海琢学科技有限公司 Customer loss prediction method, device and storage medium
CN113240518A (en) * 2021-07-12 2021-08-10 广州思迈特软件有限公司 Bank-to-public customer loss prediction method based on machine learning

Also Published As

Publication number Publication date
CN109190796B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN110119413A (en) The method and apparatus of data fusion
CN113935434A (en) Data analysis processing system and automatic modeling method
CN109190796A (en) A kind of telecom client attrition prediction method, system and electronic equipment
CN110088749A (en) Automated ontology generates
CN109242361A (en) A kind of fire-fighting methods of risk assessment, device and terminal device
CN108038052A (en) Automatic test management method, device, terminal device and storage medium
CN112491854B (en) Multi-azimuth security intrusion detection method and system based on FCNN
CN107590196A (en) Earthquake emergency information screening and evaluating system and system in a kind of social networks
CN106803039B (en) A kind of homologous determination method and device of malicious file
CN108564423A (en) Malice occupy-place recognition methods, system, equipment and the storage medium of ticketing service order
CN109558384A (en) Log classification method, device, electronic equipment and storage medium
CN110222733A (en) The high-precision multistage neural-network classification method of one kind and system
CN115147092A (en) Resource approval method and training method and device of random forest model
CN107256461B (en) Charging facility construction address evaluation method and system
EP3185184A1 (en) The method for analyzing a set of billing data in neural networks
CN111275485A (en) Power grid customer grade division method and system based on big data analysis, computer equipment and storage medium
CN111126627A (en) Model training system based on separation degree index
CN114139931A (en) Enterprise data evaluation method and device, computer equipment and storage medium
CN107871055A (en) A kind of data analysing method and device
Śniegula et al. Study of machine learning methods for customer churn prediction in telecommunication company
CN116402546A (en) Store risk attribution method and device, equipment, medium and product thereof
CN110377809A (en) The resource acquisition qualification generation method and relevant device of pre-set user
CN112100165B (en) Traffic data processing method, system, equipment and medium based on quality assessment
CN114676253A (en) Metadata hierarchical classification method based on machine learning algorithm
CN115310865A (en) Product full-quality life cycle quality control platform based on cosmetic detection technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant