CN106529714A - Method and system predicting user loss - Google Patents

Method and system predicting user loss Download PDF

Info

Publication number
CN106529714A
CN106529714A CN201610953569.9A CN201610953569A CN106529714A CN 106529714 A CN106529714 A CN 106529714A CN 201610953569 A CN201610953569 A CN 201610953569A CN 106529714 A CN106529714 A CN 106529714A
Authority
CN
China
Prior art keywords
user
data
lost
matrix
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610953569.9A
Other languages
Chinese (zh)
Inventor
孙鹏
李承霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Datang Converged Communications Ltd By Share Ltd
Datang Telecom Convergence Communications Co Ltd
Original Assignee
Datang Converged Communications Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Datang Converged Communications Ltd By Share Ltd filed Critical Datang Converged Communications Ltd By Share Ltd
Priority to CN201610953569.9A priority Critical patent/CN106529714A/en
Publication of CN106529714A publication Critical patent/CN106529714A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The invention discloses a method and system for predicting user loss. The method includes utilizing historical user data to build a database, and performing statistical processing on the historical user data, thereby obtaining processed user data; performing machine learning on the processed user data, thereby obtaining a user loss characteristic model; and utilizing the user loss characteristic model to predict existing user data, thereby obtaining users to be lost among existing users and the probability that existing users are to be lost. The method and system for predicting user loss provided by the invention perform systematic analytical statistics on data of lost users, can predict the loss trend and loss probability of users, and provide an effective and scientific reference basis for accurate prediction of lost users.

Description

A kind of Forecasting Methodology and system of customer loss
Technical field
The present invention relates to field of broadcast televisions, the Forecasting Methodology and system of more particularly to a kind of customer loss.
Background technology
In recent years, as the quickening of the integration of three networks is advanced, cable television markets competition starts to tend to white-hot, market competition Pressure is increasing.It is all a theme that radio, TV and film industries are concerned about very much all the time that user possesses, however, in the prior art, right Possess user and be lost in the research not system of user data, the prediction to customer loss is inaccurate and science.
The content of the invention
It is an object of the invention to provide the Forecasting Methodology and system of a kind of customer loss, convection current appraxia user data is The research of system, can predict user loss orientation and be lost in probability, for convection current appraxia family Accurate Prediction provide effectively, section Reference frame.
For achieving the above object, the invention provides following scheme:
According to the specific embodiment that the present invention is provided, the invention discloses following technique effect:The present invention is by history The viewing behavior data of user, customer service numeric field data and BOSS business numeric field datas carry out network analysis, statistics and machine Study, obtains being lost in user characteristics model and possesses user characteristics model, by using loss user characteristics model to existing use The user data at family is processed, the user that will be lost in obtaining existing user and its probability that will be lost in, and being will The data foundation of the prediction offer science of the user of loss.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only some enforcements of the present invention Example, for those of ordinary skill in the art, without having to pay creative labor, can be with according to these accompanying drawings Obtain other accompanying drawings.
Schematic flow sheets of the Fig. 1 for the Forecasting Methodology of embodiment of the present invention customer loss;
Fig. 2 dials service calls number of times for the embodiment of the present invention and is lost in, is not lost in the relation schematic diagram of user's number;
Fig. 3 be embodiment of the present invention broadband using be lost in, be not lost in the relation schematic diagram of user's number;
Fig. 4 is embodiment of the present invention type of service and is lost in the strong and weak relation schematic diagram of degree of correlation;
Fig. 5 is embodiment of the present invention Decision Tree Rule program schematic diagram;
Structural representations of the Fig. 6 for the forecasting system of embodiment of the present invention customer loss.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
It is an object of the invention to provide the Forecasting Methodology and system of a kind of customer loss, convection current appraxia user data is The research of system, can predict the loss orientation of user and be lost in probability, be the Accurate Prediction enterprise next step at convection current appraxia family Development plan provides the effective, reference frame of science.
It is understandable to enable the above objects, features and advantages of the present invention to become apparent from, it is below in conjunction with the accompanying drawings and concrete real The present invention is further detailed explanation to apply mode.
Schematic flow sheets of the Fig. 1 for the Forecasting Methodology of embodiment of the present invention customer loss, as shown in figure 1, the present invention is provided The Forecasting Methodology of customer loss comprise the following steps that:
Step 101:Database is built using the user data of history, the user data includes user watched behavior number According to the business numeric field data that, customer service numeric field data and business operation support system BOSS are provided, the business operation support system The business domains data that BOSS is provided include the self attributes data of user, the such as information such as sex, age, customer service domain number According to the complaint data including user, the user audience data includes preference situation data of the user to channel;
Build database detailed process be:The user data of history is carried out cleaning, converted, it is powerful by Spark Distributed computation ability is cleaned to mass data, and is converted, and is to build model to prepare;User data comes from broadcasting and TV Multiple data fields, mainly include BOSS domains, customer service domain, user watched behavior domain.Wherein there is user's base attribute (year in BOSS domains The information such as age, area), while also the state confidence such as whether being lost in including user.Customer service domain includes the data of customer complaint.User The information such as the duration of program are watched in behavior domain comprising user.These data are all structural datas, and cleaning and conversion majority are all Converted using SQL statement.Purpose is for the user characteristics for building each user.Such as user A, like seeing CCTV5, age 25, complain 3 times, lost.
The user data after the cleaning, conversion is entered using distributed file system HDFS, Spark assembler language Row is processed and is stored.
Step 102:Statistical disposition is carried out to the user data of the history, the user data after being processed, the system Meter process is specially:
User's program preferences matrix is built according to the user behavior data;
Business numeric field data, customer service numeric field data according to business operation support system BOSS offer builds user's base This information matrix;
Expiring in statistical history user continues to pay dues user and is lost in user, sets up and is lost in user's matrix and possesses user's square Battle array;
User's program preferences matrix, user basic information matrix, be lost in user's matrix and possess user's matrix for place User data after reason.
User's matrix will be lost in and to possess user's matrix interrelated with the viewing behavior of user and other information respectively.
It is unique that user possesses ID, associates multiple data numeric field datas using unique ID.Such as:
User A, age 25, A are regional, like CCTV5 to be lost in
User B, age 30, B are regional, like CCTV5 to be lost in.
User C, age 23, A are regional, do not like CCTV5, without loss.
By Rule Summary it can be found that liking the customer loss probability of CCTV5 some larger.
If user D also likes CCTV5, the prediction user D that we then can be approximate can also be lost in.
Similar, the self attributes of user also have certain relation with the possibility being lost in.
Step 103:Machine learning is carried out to the user data after the process, customer loss characteristic model is obtained;By institute The user data after processing is stated as the input of the machine learning, the machine learning adopts decision Tree algorithms, obtain described The characteristic model for being lost in user and the characteristic model for possessing user.
The program language that machine learning is adopted is R language and spark mlib assembler languages.
Construct decision tree to find the classifying rules contained in data using decision Tree algorithms, how to construct high precision, rule The little decision tree of mould is the core content of decision Tree algorithms.Decision tree construction can be carried out in two steps.The first step, the life of decision tree Into:The process of decision tree is generated by training sample set.Generally, training sample data collection is that have history according to actual needs , have certain degree of integration, for the data set of Data Analysis Services.Second step, decision tree cut skill:The beta pruning of decision tree It is the process under decision tree to generating on last stage is tested, corrects and repaiies, mainly with new sample data set (referred to as Test data set) in data check Decision Tree Construction in the preliminary rule that produces, by those impacts pre- weighing apparatus accuracy Branch is wiped out.
C5.0 is one of classical decision-tree model algorithm, can generate the decision tree of multiple-limb, and target variable becomes for classification Amount, can generate decision tree or rule set using C5.0 algorithms.C5.0 models are according to the maximum information gain that can be brought Field splits sample.The sample set for determining is split for the first time subsequently to split again, is typically torn open according to another field Point, this process repeats to instruct the sample set can not be till being split.Finally, seize again tearing open for a lowest level Point, the sample set which does not have notable contribution to model value is suggested or prunes.
The foundation of C5.0 algorithms selection branching variables:So that the decrease speed of comentropy is as determination best branch variable and divides Cut the foundation of threshold values.The decline of comentropy means the uncertain decline of information.
Comentropy:The mathematic expectaion of information content, is the average uncertainty before wish sends information, also referred to as priorentropy.
Information ui(i=1,2 ... probability of happening P (u r)i) composition information source Mathematical Modeling,Information content is (single Position be bit, to the truth of a matter take 2):
The property of comentropy H (U):
During H (U)=0, expression only exists unique possibility, there is no uncertainty;
If k signal of information source has identical to send probability, i.e., all of ui has P (ui)=1/k, H (U) to reach most Greatly, it is uncertain maximum;
P (ui) difference is less, and H (U) is bigger;P (ui) difference is big, and H (U) is less;
The application of entropy in decision tree:
If S is a sample set, target variable C has K classification, and freq (Ci, S) represents the sample number for belonging to Ci classes, | S | represent the sample number of sample geometry S.Then the comentropy of geometry S is defined as:
If certain attribute variable T, has N number of classification, then the conditional entropy after attribute variable T is introduced is defined as:
The information gain that attribute variable T brings is:
Gain (T)=Info (S)-Info (T)
The present invention takes July product and will expire user (68965 users), continues to pay dues user point according to expiring and does not continue to pay dues For positive negative sample, the user's accounting that do not continue to pay dues is 69.83%.Data are divided into into two parts, 70% user as training set, 30% User as test set, built by the method for C5.0 decision trees and be lost in Early-warning Model.Extraction model rule, calculates this monthly output Product expire the loss orientation of user, export high-risk loss user.
Data input, imports data to Spss Modeler, reads in data, loss is set after selecting correct data type Target variable is set to, customer number is set to Invalided variable.
Data processing and dimension are selected, and are cleaned for noise datas such as null value, exceptional value, Min-maxes.To each dimension Statistical check is carried out with whether being lost in, the obvious dimension of feature is selected and be lost in.If dimension is not obvious with loss correlation, Decision tree is resettled after exclusion.
Index analysis, analysis show that it is that (this month dials service calls to callrf_cnt to be lost in more related dimension Number), num (product number under customer name), cm_num (broadband user's number), busi_type (types of service:1 high-definition digital, 2 is wide Band, 11 is digital, 31 time shift clients) etc..
This month, is dialed the segmentation of service calls number of times, dial that number of times is 0 point one section, dials number of times more than 0 point one section, figure 2 dial service calls number of times for the embodiment of the present invention and are lost in, are not lost in the relation schematic diagram of user's number, in figure, dark generation Surface low appraxia family, light color represent and are not lost in user, and the customer loss for not dialing customer phone as can be seen from Figure 2 is inclined on the contrary Greatly.
The user that broadband user's number is 0 is divided into into the first kind, the user more than 0 is divided into Equations of The Second Kind, Fig. 3 is present invention enforcement Example broadband using be lost in, be not lost in the relation schematic diagram of user's number, in figure, dark representative is lost in user, and light color is represented not Be lost in user, as seen from Figure 3 broadband user's number be 0 customer loss risk it is larger, it is possible thereby to infer installation broadband meeting Strengthen the stability of client.Fig. 4 is embodiment of the present invention type of service and is lost in the strong and weak relation schematic diagram of degree of correlation, lines More slightly represent that correlation is stronger, as can be seen from Figure 4 the 11st class is that digital subscriber is very high with the correlation being lost in.
Set up decision tree and rule is understood:It is 20 to arrange decision tree minimum branch record number, prunes seriousness and is set to 75%, using global pruning.Generate the decision tree that depth is 13.Fig. 5 is embodiment of the present invention Decision Tree Rule program Schematic diagram, as shown in figure 5, analyze for convenience, we the Rule Extraction of decision tree out.Numeral in Fig. 5 brackets, it is whole The number of users that number delegate rules are included, decimal point represent this regular confidence level.We choose 2 rules and solve as an example below The implication of rule is released, analysis is lost in the feature of user.
As shown in figure 5, rule 4:0(5726;0.994)
If callrf_cnt<=0
And num>0
And cm_num<=0
With busi_type in [2 31]
Then 0
Rule 4 is represented, and it is 0 to beat customer phone number of times, and product number is at least 1 under one's name, but broadband number is 0, product Type is broadband or time shift user, and this kind of user has 5726, and turnover rate is 99.4%, and the customer loss for meeting this category feature inclines To very high.
Rule 6 is used for 0 (17800;0.749)
If callrf_cnt<=0
And num>0
And cm_num<=0
And in_months>13
With busi_type in [11]
Then 0
Rule 6 is represented, and it is 0 to beat customer phone number of times, and product number is at least 1 under one's name, but broadband number is 0, product Type is DTV, is more than 13 months in net duration, and in this kind of user 17800, turnover rate is 74.9%.
The customer loss tendency for meeting this category feature is higher.Can speculate Dimension, the user's viscosity height for having broadband, the tendency of loss is little, and the customer loss of digital TV products tendency is higher, subsequently Some optimizations can be done to the product.
Program weight is calculated, user's viewing program index is calculated first:User's program score=user's program viewing duration/ The all user's total durations of the program/user's viewing total duration.
All user's program scores are divided into into 5 class using median by all user's program scores again, 1-5 is used respectively Numeral is represented.
Mode input, table one are mode input argument table, and as shown in Table 1, the data cycle is the moon.
Table one
Step 104:Existing user data is predicted using the customer loss characteristic model, is obtained in existing user The probability that the user and existing user that will be lost in will be lost in.
Used as a specific embodiment of the present invention, number of users about 200W or so of certain physical features broadcasting and TV is average every The moon, overdue number of users was 5W or so, wherein probably 3W people selects to continue to continue to pay dues, the user of 2W people or so selects not continue to pay dues, i.e., It is lost in user.
The system statistics historical data of nearest a year, i.e., the sample data of general 60W people.
We use SPARK, carry out Distributed Calculation.
The viewing behavior matrix of each user is built first, and matrix is as shown in Table 2.
Table two
CCTV-1 CCTV-2 CCTV-3
User A 11 22 45
User B 15 34 12
Preference value of the matrix representative user for each channel.
Meanwhile, by associating BOSS business numeric field datas, it is possible to obtain user other attributes, such as age, sex, by association Customer service numeric field data, it is possible to obtain the attribute such as customer complaint situation.
It is possible thereby to obtain one comprising 60W rows, the user of hundreds of row puts to the proof.The left and right user for wherein having 60% is in stream Mistake state, another part user is in the state that continues to pay dues.
As matrix is excessive, we are stored in HDFS.
Using matrix as input, using decision Tree algorithms, customer loss is summed up, the regular feature possessed obtains corresponding Model.
By BOSS business numeric field datas, system can confirm that next month expires user, using above-mentioned model, can predict next month Expire user loss orientation how.
Viewing behavior data of the present invention based on user, are aided with other broadcasting and TV business numeric field datas, are reasonably cleared up, whole Close, by the theory of big data, distributed Computational frame, build machine learning model, the user to being lost in carries out pre- Survey, facilitate radio and TV operator to carry out targetedly user and keep.
Entering by the viewing behavior data to historic user, customer service numeric field data and BOSS business numeric field datas of the invention Row network analysis, statistics and machine learning, obtain being lost in user characteristics model and possess user characteristics model, by using stream Appraxia family characteristic model is processed to the user data of existing user, the user that will be lost in obtaining existing user and its The probability that will be lost in, the prediction of the user to be lost in provide the data foundation of science.
To reach above-mentioned purpose, present invention also offers a kind of forecasting system of customer loss, Fig. 6 is the embodiment of the present invention The structural representation of the forecasting system of customer loss, as shown in fig. 6, the forecasting system of the customer loss of present invention offer includes:
Database sharing module 601, builds database for the user data using history, and the user data includes using The business numeric field data that family viewing behavior data, customer service numeric field data and business operation support system BOSS are provided, the business The business domains data that OSS BOSS is provided include the self attributes data of user, and the customer service numeric field data includes The complaint data of user, the user audience data include preference situation data of the user to channel;
Statistical disposition module 602, carries out statistical disposition for the user data to the history, the user after being processed Data;
Machine learning module 603, for carrying out machine learning to the user data after the process, obtains customer loss special Levy model;
Prediction module 604, for being predicted to existing user data using the customer loss characteristic model, is showed The probability that the user and existing user that will be lost in having user will be lost in.
Wherein, the database sharing module 601, specifically includes:
Cleaning conversion unit, for the user data of history is carried out cleaning, converted;
Process memory cell, for using distributed file system HDFS, Spark assembler language to the cleaning, conversion The user data afterwards is processed and is stored.
The machine learning module 603, specifically includes:
Machine learning unit, for being entered to the user data after the process using R language and sparkmlib assembler languages Row machine learning.Machine learning unit includes machine learning subelement, for using the user data after the process as described The input of machine learning, the machine learning adopt decision Tree algorithms, obtain the characteristic model for being lost in user and possess use The characteristic model at family.
The statistical disposition module 602, specifically includes:
Program preferences matrix construction unit, for building user's program preferences matrix according to the user behavior data;
Essential information matrix construction unit, for the business domains number provided according to business operation support system BOSS User basic information matrix is built according to, customer service numeric field data;
User's matrix statistic unit, for statistical history user in expire continue to pay dues user and be lost in user, set up be lost in User's matrix and possess user's matrix;
User's program preferences matrix, user basic information matrix, be lost in user's matrix and possess user's matrix for place User data after reason.
Viewing behavior data of the forecasting system of the customer loss that the present invention is provided excessively to historic user, customer service domain number Network analysis, statistics and machine learning are carried out according to BOSS business numeric field datas, is obtained being lost in and user characteristics model and is possessed User characteristics model, processes to the user data of existing user by using user characteristics model is lost in, obtains existing use The user that will be lost in family and its probability that will be lost in, the prediction of the user to be lost in provide the data of science according to According to.
In this specification, each embodiment is described by the way of progressive, and what each embodiment was stressed is and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For system disclosed in embodiment For, as which corresponds to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
Specific case used herein is set forth to the principle and embodiment of the present invention, and above example is said It is bright to be only intended to help and understand the method for the present invention and its core concept;Simultaneously for one of ordinary skill in the art, foundation The thought of the present invention, will change in specific embodiments and applications.In sum, this specification content is not It is interpreted as limitation of the present invention.

Claims (10)

1. a kind of Forecasting Methodology of customer loss, it is characterised in that methods described includes:
Database is built using the user data of history, the user data includes user audience data, customer service domain The business numeric field data that data and business operation support system BOSS are provided, the business that business operation support system BOSS is provided Numeric field data includes the self attributes data of user, and the customer service numeric field data includes the complaint data of user, and the user receives Include preference situation data of the user to channel depending on behavioral data;
Statistical disposition is carried out to the user data of the history, the user data after being processed;
Machine learning is carried out to the user data after the process, customer loss characteristic model is obtained;
Existing user data is predicted using the customer loss characteristic model, the use that will be lost in obtaining existing user The probability that family and existing user will be lost in.
2. Forecasting Methodology according to claim 1, it is characterised in that the user data of the utilization history builds data Storehouse, specifically includes:
The user data of history is carried out cleaning, converted;
At the user data of distributed file system HDFS, the Spark assembler language to the cleaning, after converting Reason and storage.
3. Forecasting Methodology according to claim 1, it is characterised in that the user data to after the process carries out machine Device learns, and specifically includes:
Machine learning is carried out to the user data after the process using R language and spark mlib assembler languages.
4. Forecasting Methodology according to claim 1, it is characterised in that the user data to the history is counted Process, specifically include:
User's program preferences matrix is built according to the user behavior data;
Business numeric field data, customer service numeric field data according to business operation support system BOSS offer builds user and believes substantially Breath matrix;
Expiring in statistical history user continues to pay dues user and is lost in user, sets up and is lost in user's matrix and possesses user's matrix;
User's program preferences matrix, user basic information matrix, be lost in user's matrix and possess user's matrix for process after User data.
5. method according to claim 3, it is characterised in that the user data to after the process carries out engineering Practise, specifically include:
Using the user data after the process as the machine learning input, the machine learning adopts decision Tree algorithms, Obtain the characteristic model for being lost in user and the characteristic model for possessing user.
6. a kind of forecasting system of customer loss, it is characterised in that
Database sharing module, builds database for the user data using history, and the user data includes user watched The business numeric field data that behavioral data, customer service numeric field data and business operation support system BOSS are provided, the service operation The business domains data that support system BOSS is provided include the self attributes data of user, and the customer service numeric field data includes user's Data, the user audience data are complained to include preference situation data of the user to channel;
Statistical disposition module, carries out statistical disposition for the user data to the history, the user data after being processed;
Machine learning module, for carrying out machine learning to the user data after the process, obtains customer loss characteristic model;
Prediction module, for being predicted to existing user data using the customer loss characteristic model, obtains existing user In the probability that will be lost in of the user that will be lost in and existing user.
7. forecasting system according to claim 6, it is characterised in that the database sharing module, specifically includes:
Cleaning conversion unit, for the user data of history is carried out cleaning, converted;
Memory cell is processed, after using distributed file system HDFS, Spark assembler language to the cleaning, conversion The user data is processed and is stored.
8. forecasting system according to claim 6, it is characterised in that the machine learning module, specifically includes:
Machine learning unit, for being carried out to the user data after the process using R language and spark mlib assembler languages Machine learning.
9. forecasting system according to claim 6, it is characterised in that the statistical disposition module, specifically includes:
Program preferences matrix construction unit, for building user's program preferences matrix according to the user behavior data;
Essential information matrix construction unit, for provided according to business operation support system BOSS business numeric field data, visitor Take business numeric field data and build user basic information matrix;
User's matrix statistic unit, for statistical history user in expire continue to pay dues user and be lost in user, set up be lost in user Matrix and possess user's matrix;
User's program preferences matrix, user basic information matrix, be lost in user's matrix and possess user's matrix for process after User data.
10. system according to claim 8, it is characterised in that the machine learning unit, specifically includes:
Machine learning subelement, for using the user data after the process as the machine learning input, the machine Study adopts decision Tree algorithms, obtains the characteristic model for being lost in user and the characteristic model for possessing user.
CN201610953569.9A 2016-11-03 2016-11-03 Method and system predicting user loss Pending CN106529714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610953569.9A CN106529714A (en) 2016-11-03 2016-11-03 Method and system predicting user loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610953569.9A CN106529714A (en) 2016-11-03 2016-11-03 Method and system predicting user loss

Publications (1)

Publication Number Publication Date
CN106529714A true CN106529714A (en) 2017-03-22

Family

ID=58325503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610953569.9A Pending CN106529714A (en) 2016-11-03 2016-11-03 Method and system predicting user loss

Country Status (1)

Country Link
CN (1) CN106529714A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220732A (en) * 2017-05-31 2017-09-29 福州大学 A kind of power failure complaint risk Forecasting Methodology based on gradient boosted tree
CN107292418A (en) * 2017-05-23 2017-10-24 顺丰科技有限公司 A kind of waybill is detained Forecasting Methodology
CN107749009A (en) * 2017-11-09 2018-03-02 东软集团股份有限公司 The Forecasting Methodology and device of a kind of personnel
CN108156025A (en) * 2017-12-13 2018-06-12 中国联合网络通信集团有限公司 A kind of method and device of user's off-network prediction
CN108377204A (en) * 2018-02-13 2018-08-07 中国联合网络通信集团有限公司 A kind of off-grid prediction technique of user and device
CN108663582A (en) * 2017-11-30 2018-10-16 全球能源互联网研究院有限公司 A kind of fault diagnosis method and system of transformer
WO2019020002A1 (en) * 2017-07-24 2019-01-31 Beijing Didi Infinity Technology And Development Co., Ltd. Methods and systems for preventing user churn
CN109508329A (en) * 2018-12-07 2019-03-22 广州市诚毅科技软件开发有限公司 Customer churn method for early warning, system and storage medium based on broadcasting and TV big data
CN109544197A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 A kind of customer churn prediction technique and device
CN109636443A (en) * 2018-11-17 2019-04-16 南京中数媒介研究有限公司 The deep learning method and device of customer churn prediction
CN109886755A (en) * 2019-03-04 2019-06-14 深圳微品致远信息科技有限公司 A kind of communication user attrition prediction method and system based on evolution algorithm
CN109962795A (en) * 2017-12-22 2019-07-02 中国移动通信集团广东有限公司 A kind of 4G customer churn method for early warning and system based on multidimensional union variable
CN110020133A (en) * 2017-11-07 2019-07-16 腾讯科技(深圳)有限公司 Commending contents treating method and apparatus, computer equipment and storage medium
CN110019166A (en) * 2017-12-25 2019-07-16 大连楼兰科技股份有限公司 Screen the method and customer defection early warning method of attribute data
CN110147803A (en) * 2018-02-08 2019-08-20 北大方正集团有限公司 Customer churn early-warning processing method and device
CN110766481A (en) * 2019-11-04 2020-02-07 泰康保险集团股份有限公司 Client data processing method and device, electronic equipment and computer readable medium
CN111242659A (en) * 2018-11-28 2020-06-05 顺丰科技有限公司 Client component quantity prediction method and device, and transaction client early warning method and device
CN111311318A (en) * 2020-02-12 2020-06-19 上海东普信息科技有限公司 User loss early warning method, device, equipment and storage medium
CN111724185A (en) * 2019-03-21 2020-09-29 北京沃东天骏信息技术有限公司 User maintenance method and device
CN112449240A (en) * 2020-11-10 2021-03-05 深圳市易平方网络科技有限公司 User loss prediction method and terminal based on Internet television use behaviors
CN114430489A (en) * 2020-10-29 2022-05-03 武汉斗鱼网络科技有限公司 Virtual prop compensation method and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150697A (en) * 2011-12-07 2013-06-12 北京四达时代软件技术股份有限公司 Method and device of confirming customer churn
US8775402B2 (en) * 2006-08-15 2014-07-08 Georgia State University Research Foundation, Inc. Trusted query network systems and methods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775402B2 (en) * 2006-08-15 2014-07-08 Georgia State University Research Foundation, Inc. Trusted query network systems and methods
CN103150697A (en) * 2011-12-07 2013-06-12 北京四达时代软件技术股份有限公司 Method and device of confirming customer churn

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292418A (en) * 2017-05-23 2017-10-24 顺丰科技有限公司 A kind of waybill is detained Forecasting Methodology
CN107220732A (en) * 2017-05-31 2017-09-29 福州大学 A kind of power failure complaint risk Forecasting Methodology based on gradient boosted tree
WO2019020002A1 (en) * 2017-07-24 2019-01-31 Beijing Didi Infinity Technology And Development Co., Ltd. Methods and systems for preventing user churn
CN109544197B (en) * 2017-09-22 2023-09-22 中兴通讯股份有限公司 User loss prediction method and device
CN109544197A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 A kind of customer churn prediction technique and device
CN110020133A (en) * 2017-11-07 2019-07-16 腾讯科技(深圳)有限公司 Commending contents treating method and apparatus, computer equipment and storage medium
CN107749009B (en) * 2017-11-09 2020-12-01 东软集团股份有限公司 Personnel loss prediction method and device
CN107749009A (en) * 2017-11-09 2018-03-02 东软集团股份有限公司 The Forecasting Methodology and device of a kind of personnel
CN108663582A (en) * 2017-11-30 2018-10-16 全球能源互联网研究院有限公司 A kind of fault diagnosis method and system of transformer
CN108156025A (en) * 2017-12-13 2018-06-12 中国联合网络通信集团有限公司 A kind of method and device of user's off-network prediction
CN109962795A (en) * 2017-12-22 2019-07-02 中国移动通信集团广东有限公司 A kind of 4G customer churn method for early warning and system based on multidimensional union variable
CN110019166A (en) * 2017-12-25 2019-07-16 大连楼兰科技股份有限公司 Screen the method and customer defection early warning method of attribute data
CN110147803A (en) * 2018-02-08 2019-08-20 北大方正集团有限公司 Customer churn early-warning processing method and device
CN108377204A (en) * 2018-02-13 2018-08-07 中国联合网络通信集团有限公司 A kind of off-grid prediction technique of user and device
CN108377204B (en) * 2018-02-13 2020-03-24 中国联合网络通信集团有限公司 User off-network prediction method and device
CN109636443A (en) * 2018-11-17 2019-04-16 南京中数媒介研究有限公司 The deep learning method and device of customer churn prediction
CN111242659A (en) * 2018-11-28 2020-06-05 顺丰科技有限公司 Client component quantity prediction method and device, and transaction client early warning method and device
CN109508329A (en) * 2018-12-07 2019-03-22 广州市诚毅科技软件开发有限公司 Customer churn method for early warning, system and storage medium based on broadcasting and TV big data
CN109886755A (en) * 2019-03-04 2019-06-14 深圳微品致远信息科技有限公司 A kind of communication user attrition prediction method and system based on evolution algorithm
CN111724185A (en) * 2019-03-21 2020-09-29 北京沃东天骏信息技术有限公司 User maintenance method and device
CN110766481A (en) * 2019-11-04 2020-02-07 泰康保险集团股份有限公司 Client data processing method and device, electronic equipment and computer readable medium
CN111311318A (en) * 2020-02-12 2020-06-19 上海东普信息科技有限公司 User loss early warning method, device, equipment and storage medium
CN114430489A (en) * 2020-10-29 2022-05-03 武汉斗鱼网络科技有限公司 Virtual prop compensation method and related equipment
CN112449240A (en) * 2020-11-10 2021-03-05 深圳市易平方网络科技有限公司 User loss prediction method and terminal based on Internet television use behaviors
CN112449240B (en) * 2020-11-10 2022-12-06 深圳市易平方网络科技有限公司 User loss prediction method and terminal based on Internet television use behaviors

Similar Documents

Publication Publication Date Title
CN106529714A (en) Method and system predicting user loss
CN106156878B (en) Advertisement click rate correction method and device
CN110417607B (en) Flow prediction method, device and equipment
CN105654198B (en) Brand advertisement effect optimization method capable of realizing optimal threshold value selection
US10825030B2 (en) Methods and apparatus to determine weights for panelists in large scale problems
CN103150697A (en) Method and device of confirming customer churn
CN107241623B (en) The user watched behavior prediction method and system of radio and television
CN105824818A (en) Informationized management method, platform and system
CN108521588A (en) A kind of main broadcaster&#39;s arrangement method and system based on time slicing, server and storage medium
CN105913145A (en) Data driving-based AB test method
CN111510368B (en) Family group identification method, device, equipment and computer readable storage medium
CN108777801B (en) Method and device for mining high-quality anchor user, computer storage medium and server
CN105208411A (en) Method and device for realizing digital television target audience statistics
CN114461858A (en) Causal relationship analysis model construction and causal relationship analysis method
CN108038001B (en) Junk file cleaning strategy generation method and device and server
CN111569412B (en) Cloud game resource scheduling method and device
CN112445690A (en) Information acquisition method and device and electronic equipment
CN112148942B (en) Business index data classification method and device based on data clustering
TWI684147B (en) Cloud self-service analysis platform and analysis method thereof
CN104935967B (en) The interest recognition methods of video terminal user a kind of and device
CN110807171A (en) Method and device for analyzing adequacy of seat personnel in business based on weight division
CN110633401A (en) Prediction model of store data and establishment method thereof
CN106126739A (en) A kind of device processing business association data
CN110399399B (en) User analysis method, device, electronic equipment and storage medium
CN110909202A (en) Audio value evaluation method and device and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322