CN109886756A - Communication user upshift prediction probability recognition methods and system based on integrated model - Google Patents

Communication user upshift prediction probability recognition methods and system based on integrated model Download PDF

Info

Publication number
CN109886756A
CN109886756A CN201910161182.3A CN201910161182A CN109886756A CN 109886756 A CN109886756 A CN 109886756A CN 201910161182 A CN201910161182 A CN 201910161182A CN 109886756 A CN109886756 A CN 109886756A
Authority
CN
China
Prior art keywords
data set
model
data
integrated model
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910161182.3A
Other languages
Chinese (zh)
Inventor
周洪峰
雷奥林
邹秋艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Microproducts To Mdt Infotech Ltd
Original Assignee
Shenzhen Microproducts To Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Microproducts To Mdt Infotech Ltd filed Critical Shenzhen Microproducts To Mdt Infotech Ltd
Priority to CN201910161182.3A priority Critical patent/CN109886756A/en
Publication of CN109886756A publication Critical patent/CN109886756A/en
Pending legal-status Critical Current

Links

Abstract

The present invention provides a kind of communication user upshift prediction probability recognition methods and system based on integrated model, the communication user upshift prediction probability recognition methods is the following steps are included: step S1, user basic information data are stored into database, user basic information data set S is obtained;Step S2 carries out Data Detection to the user basic information data set S, and obtains data set after standardization;Step S3, by data setAs input, the data that will be upshiftd with set meal whether is handled the number of households involved moonIntegrated model training is carried out as output;Step S4, by updated user basic information data setIt is input in trained integrated model, the output predictive data set for whether handling set meal upshift the number of households involved moon.The present invention is by carrying out Data Detection and analysis to user basic information data set S, in conjunction with the training of integrated model, and then realizes predictive data setOutput, the specific aim of prediction, relevance and accuracy are very high, as a result very clear.

Description

Communication user upshift prediction probability recognition methods and system based on integrated model
Technical field
The present invention relates to a kind of communication user upshift prediction method more particularly to a kind of communication users based on integrated model Upshift prediction probability recognition methods, and be related to using the communication user upshift prediction probability recognition methods based on integrated model Communication user upshift prediction probability identifying system.
Background technique
The intelligent terminals such as smart phone are very universal, and with digitized progress, the set meal of communication user also can Change therewith, but present user's upshift prediction probability identification and the relevance probability of the existing communication set meal of user are relatively low, That is the upshift prediction probability identifying system do not predicted and analyzed for the existing communication set meal of user now, this If sample, user is not easy to pointedly adjust set meal in time, while also will increase the management difficulty of operator, influences operator Marketing effectiveness and marketing accuracy.
Summary of the invention
It is a kind of convenient for user's direct viewing predictive data set the technical problem to be solved by the present invention is to need to provide, accomplish Digitlization, convenient for can pointedly adjust set meal in time or take different marketing measures, reduces the management difficulty of operator, And effectively improve the marketing effectiveness of operator and the communication user upshift prediction probability knowledge based on integrated model of marketing accuracy Other method;It also needs to further provide for using the logical of the communication user upshift prediction probability recognition methods based on integrated model Interrogate user's upshift prediction probability identifying system.
In this regard, the present invention provides a kind of communication user upshift prediction probability recognition methods based on integrated model, including with Lower step:
Step S1 stores user basic information data into database, obtains user basic information data set S;
Step S2 carries out Data Detection to the user basic information data set S, and is counted after standardization According to collection St
Step S3, by data set StAs input, the data y that will be upshiftd with set meal whether is handled the number of households involved moonnAs export into The training of row integrated model, obtains trained integrated model;
Step S4, by updated user basic information data set SxIt is input to the trained integrated model of step S3 In, it exports and shows with the predictive data set S for whether handling set meal upshift the number of households involved moony
A further improvement of the present invention is that the step S2 includes following sub-step:
Step S201 detects exceptional value and missing values in the user basic information data set S;
Exceptional value is set missing values by step S202;
Step S203 obtains the data source S_ after data cleansing to missing values zero padding;
Step S204 is standardized data source S_ to obtain data set St
A further improvement of the present invention is that passing through formula in the step S204To data source S_ It is standardized to obtain data set St, wherein xnThe nth data of data set in data source S_ are represented, what n was represented is several According to sample size,The mean value of all sample datas is represented, σ represents the standard deviation of all sample datas.
A further improvement of the present invention is that passing through formulaCalculate the mean value of all sample datasPass through FormulaCalculate the standard deviation sigma of all sample datas.
A further improvement of the present invention is that the integrated model contains Logic Regression Models, determines in the step S3 Plan tree-model and model-naive Bayesian;By the data set St, will be with whether handling set meal upshift the number of households involved moon as input Data ynAs output, it is trained simultaneously in Logic Regression Models, decision-tree model and model-naive Bayesian respectively, and The Logic Regression Models, decision-tree model and model-naive Bayesian are separately optimized by way of verifying in the integrated mould Weight in type.
A further improvement of the present invention is that the step S3 includes following sub-step:
Step S301, by the data set StIt is split, is split as training dataset StAWith test data set StB
Step S302, by training dataset StAAs input, by the training dataset StAIt is corresponding whether to use the number of households involved moon Handle the data y of set meal upshiftn1As output, respectively in Logic Regression Models, decision-tree model and model-naive Bayesian It is trained simultaneously;
Step S303, in the training process, by the test data set StBAnd test data set StBCorresponding user Whether the data y of set meal upshift is handled within secondary monthn2As verify data, respectively to Logic Regression Models, decision-tree model and simplicity Training data in Bayesian model is verified, if being verified, by its weight divided by weight constant dt, otherwise weighed Weight is multiplied by weight constant dt
A further improvement of the present invention is that in the step S301, the training dataset StAWith test data set StB Between ratio control be 7:3.
A further improvement of the present invention is that in the step S3, when starting to train, the Logic Regression Models, Weight proportion between decision-tree model and model-naive Bayesian is 1:1:1;The weight constant dtBetween 0~1 Random number.
A further improvement of the present invention is that in the step S1, the user basic information data set S include user only One mark ID, set dinner cost, traffic fee, the first trimester telephone expenses amount of money, super set number, super set traffic fee, integral, user gradation, year Age, gender, call class, net duration, first trimester voice duration mean value, roaming service customer, first trimester GPRS flow, first three Any one or a few in a month telephone expenses mean value and first trimester flow mean value.
The communication user upshift prediction probability identifying system based on integrated model that the present invention also provides a kind of uses as above The communication user upshift prediction probability recognition methods based on integrated model.
Compared with prior art, the beneficial effects of the present invention are: by user basic information data set S carry out data Detection and analysis in conjunction with the training of integrated model, and then realizes predictive data set SyOutput, pass through the instruction of the integrated model Practice (study), better prediction effect can be obtained, have stronger generalization ability, therefore, the specific aim of this prediction, association Property and accuracy are all very high, convenient for being intuitive to see predictive data set S by user's upshift prediction systemy, realize prediction Systematization, it is as a result very clear and timely and convenient, suggest providing good number to provide more standby targetedly set meal upshift According to basis, different marketing measures are taken convenient for the pointedly set meal of the different clients group of adjustment in time or specific aim, are reduced The management difficulty of operator effectively improves the marketing effectiveness and marketing accuracy of operator.
Detailed description of the invention
Fig. 1 is the workflow schematic diagram of an embodiment of the present invention;
Fig. 2 is the operation principle schematic diagram of an embodiment of the present invention;
Fig. 3 is the data flow schematic illustration of an embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawing, preferably embodiment of the invention is described in further detail.
As shown in Figure 1 to Figure 3, this example provides a kind of communication user upshift prediction probability identification side based on integrated model Method, comprising the following steps:
Step S1 stores user basic information data into database, obtains user basic information data set S;
Step S2 carries out Data Detection to the user basic information data set S, and is counted after standardization According to collection St
Step S3, by data set StAs input, the data y that will be upshiftd with set meal whether is handled the number of households involved moonnAs export into The training of row integrated model, obtains trained integrated model;
Step S4, by updated user basic information data set SxIt is input to the trained integrated model of step S3 In, it exports and shows with the predictive data set S for whether handling set meal upshift the number of households involved moony
Step S1 and step S2 described in this example are the processes that data prediction is carried out to user basic information data set S;Institute Stating step S3 is by existing historical data, by the data set S of usertAs input, will be with the number of households involved moon in historical data The no data y for handling set meal upshiftnAs output, and then train required integrated model when being predicted;And institute Stating step S4 then is using the step S3 trained integrated model, by updated user basic information data set SxIt is defeated Enter into the trained integrated model and be trained, and then passes through updated user basic information data set SxAnd training The predictive data set S that good integrated model output is upshiftd with set meal whether is handled the number of households involved moony, and show, to realize accuracy more High communication user upshift prediction probability identifies that the identification of communication user upshift prediction probability described in this example is referred to communication user Set meal upshift demand predicted that is, this example realizes the higher communication user set meal upshift prediction of accuracy.
In step S1 described in this example, the user basic information data set S includes user's unique ID, set dinner cost, stream Amount take, the first trimester telephone expenses amount of money, super set number, super set traffic fee, integral, user gradation, the age, gender, class of conversing, Net duration, first trimester voice duration mean value, roaming service customer, first trimester GPRS flow, first trimester telephone expenses mean value and preceding Any one or a few in three months flow mean values.
Updated user basic information data set S described in this examplexIt equally include user's unique ID, set dinner cost, stream Amount take, the first trimester telephone expenses amount of money, super set number, super set traffic fee, integral, user gradation, the age, gender, class of conversing, Net duration, first trimester voice duration mean value, roaming service customer, first trimester GPRS flow, first trimester telephone expenses mean value and preceding Any one or a few in three months flow mean values, the updated user basic information data set SxIt is preferred that by updated User basic information data, which pass through the pretreatment of the step S1 and step S2 and then are input to the step S3, to be trained Integrated model, and then improve its forecasting accuracy.
More specifically, the related data of the user basic information data set S of needs is first stored in corresponding number by this example According in library, the essential information of the user basic information data set S includes following dimension: user's unique ID, set dinner cost, stream Amount takes, telephone expenses amount of money last month, the month before last telephone expenses amount of money, the telephone expenses amount of money of upper the month before last, super set number, surpasses set traffic fee, integral, use Family grade, the age, gender, call class, net duration, first trimester voice duration mean value, roaming service customer, GPRS last month stream Amount, the month before last GPRS flow, GPRS of upper the month before last flow, first trimester telephone expenses mean value and first trimester flow mean value etc., this User basic information data set comprising these dimensions is referred to as S by example.
Then data set A, data set A will be set as with the data for data this dimension for whether handling set meal upshift the number of households involved moon It is associated between user basic information data set S by user's unique ID, i.e. predictive data set S described in this exampleyWith It is associated between user basic information data set S by user's unique ID, the data after association is then stored in number According in library.Selection user data sample total is n.
Wherein, { s1,s2,....sn∈ S, snIndicate that the corresponding essential information data of a certain user such as number the use for being 1 The corresponding set dinner cost in family, traffic fee, telephone expenses amount of money last month, the month before last telephone expenses amount of money, the telephone expenses amount of money of upper the month before last, super set number, Super set traffic fee, integral, user gradation, the age, gender, call class, in net duration, first trimester voice duration mean value, unrestrained Swim user, GPRS flow last month, the month before last GPRS flow, GPRS of upper the month before last flow, first trimester telephone expenses mean value and first three A month flow mean value etc..
{y1,y2,.....yn∈ A, ynWhat is indicated is whether to handle set meal with the number of households involved moon in a certain user's history data The data of upshift, this be it is known, for training integrated model, ynIn 0 represent and do not need to handle set meal upshift (upgrading), 1 generation Table needs to handle set meal upshift (upgrading);That is A is the data set to be upshiftd with set meal whether is handled the number of households involved moon in historical data, most The data instance of storage in the database is presented below eventually:
User's unique ID Age Networking duration Integral .... It is upshiftd with set meal whether is handled the number of households involved moon
1 22 3 324 .... 1
2 24 2 456 .... 0
3 56 1 245 .... 0
4 15 6 786 .... 0
.... .... .... .... .... ....
n 76 11 1025 .... 1
Step S2 described in this example includes following sub-step:
Step S201 detects exceptional value and missing values in the user basic information data set S;
Exceptional value is set missing values by step S202;
Step S203 obtains the data source S_ after data cleansing to missing values zero padding;
Step S204 is standardized data source S_ to obtain data set St
More specifically, Data Detection is carried out to the user basic information data set S being put in storage, predominantly detects data set Exceptional value and missing values in S;Wherein, exceptional value is primarily referred to as the value different from common sense, due to systematic error, human error or The variation of person's inherent data is so that they and overall behavioural characteristic, structure or correlation etc. are different.This example preferably uses picture The method of box traction substation identifies, draws box traction substation and only needs to take maximum value minimum, upper quartile, median and lower quartile Number can be drawn.Data on upper quartile and under lower quartile are considered as exceptional value.Such as in the age Then it is determined as exceptional value if it is negative.After step S201 described in this example finds exceptional value, it is set to lack in step S202 Mistake value, and handled using the missing values processing mode of step S203, the missing values processing mode is to be lacked using zero padding Mistake value is stuffed entirely with missing values to be 0.It is empty under some field of missing values or numerical value is not considered as missing then Value.
In step S204 described in this example, pass through formulaData source S_ is standardized and is counted According to collection St, wherein xnRepresent data source S_The nth data of middle data set, what n was represented is the sample size of data,Represent institute There is the mean value of sample data, σ represents the standard deviation of all sample datas.
This example passes through formulaCalculate the mean value of all sample datasPass through formulaMeter Calculate the standard deviation sigma of all sample datas.Data set S is obtained after standardization in this wayt, data set StIn data symbols Standardization normal distribution, mean value 0, standard deviation 1.
In step S3 described in this example, the integrated model contains Logic Regression Models, decision-tree model and simple pattra leaves This model;By the data set StAs input, the data y that will be upshiftd with set meal whether is handled the number of households involved moonnAs output, respectively It is trained in Logic Regression Models, decision-tree model and model-naive Bayesian, and is distinguished by way of verifying simultaneously Optimize the weight of the Logic Regression Models, decision-tree model and model-naive Bayesian in the integrated model.
The reason of this example setting integration module, is that, since this example is using the study for having feature, this example passes through integrated Model is trained and learns, and prediction is carried out by combining multiple and different models can effectively improve accuracy.
The integrated model contains Logic Regression Models, decision-tree model and model-naive Bayesian, logistic regression mould Integrated model is done using the method for multiple models ballot between type, decision-tree model and model-naive Bayesian, i.e., to each A model is configured respective weight, this weight is adjusted in real time according to trained result, accurate to improve prediction Degree.
This example is by data set StAs input, the data y that will be upshiftd with set meal whether is handled the number of households involved moonnIt is instructed as output Practice and learn, remembers x=St, remember y=yn, wherein y value in [0,1].
Step S3 described in this example includes following sub-step:
Step S301, by the data set StIt is split, is split as training dataset StAWith test data set StB
Step S302, by training dataset StAAs input, by the training dataset StAIt is corresponding whether to use the number of households involved moon Handle the data y of set meal upshiftn1As output, respectively in Logic Regression Models, decision-tree model and model-naive Bayesian It is trained simultaneously;
Step S303, in the training process, by the test data set StBAnd test data set StBCorresponding user Whether the data y of set meal upshift is handled within secondary monthn2As verify data, respectively to Logic Regression Models, decision-tree model and simplicity Training dataset S in Bayesian modeltAAnd the data y to be upshiftd with set meal whether is handled the number of households involved moonn1It is verified, if testing Card passes through, then by its weight divided by weight constant dt, otherwise by its weight multiplied by weight constant dt
In step S301 described in this example, by the data set StIt is split as training dataset StAWith test data set StBBetween Ratio be preferably controlled to 7:3;The step S3 start train when, the Logic Regression Models, decision-tree model with And the weight proportion between model-naive Bayesian is 1:1:1;The weight constant dtFor the random number between 0~1.
Training dataset S described in this exampletAFor the data set for realizing training integrated model, the test data set StB For the data set for realizing verifying integrated model;The weight W of the Logistic Regression moduleA, decision-tree model weight WBAnd The weight W of model-naive BayesianCDefault initial values are preferably all 1, during subsequent training, pass through weight constant dtInto Row adjustment in real time and update;The weight constant dtRandom number for pre-set constant, between preferably 0~1.Certainly, The selection of these parameters of this example is the preferred value of this example, in practical applications, can be made by oneself according to the actual situation Justice-reparation changes and is arranged.
That is, the preferably described data set S of this exampletIt is split as training dataset StAWith test data set StB, preferably according to training Data set StA: test data set StB=7:3 is divided, then by training dataset S thereintARespectively Logic Regression Models, It is trained accordingly in decision-tree model and model-naive Bayesian this 3 models, gives this 3 models point when training Not Fu weight initial value be 1, the weighted value of the Logistic Regression module is labeled as WA, the weighted value of the decision-tree model is labeled as WB, the weighted value of the model-naive Bayesian is labeled as WC;During training, if Logic Regression Models, decision tree mould Any one category of model mistake in type and model-naive Bayesian this 3 models, i.e. training result and test data set StB Corresponding data are inconsistent, then the model is determined as classification error, to the weighted value of the model multiplied by weight constant dt, into And reduce its weight for participating in training;If in this 3 models of Logic Regression Models, decision-tree model and model-naive Bayesian Any one category of model it is correct, i.e. training result and test data set StBCorresponding data are consistent, then with to the mould The weighted value of type is divided by weight constant dt, and then increase its weight for participating in training.
Wherein, the Logic Regression Models are 0 or 1 using binary classification model, i.e. the y value of sample output.It is given One data point can be 0 to classification and classification is 1 to calculate separately probability, and the classification of maximum probability will be selected as prediction classification.Note Data set StIn multiple dimensions (such as user's unique ID covers dinner cost, traffic fee, telephone expenses amount of money last month, the month before last telephone expenses The amount of money, the telephone expenses amount of money of upper the month before last, super set number, super set traffic fee, integral etc., are detailed in and data set S are initially defined abovetPortion Point) quantity be m, i.e. m be user information data set SttIn number of dimensions, xmIt represents and derives from input data set StIn a certain item In the data value for a certain user that dimension is m-th, logistic regression judgment formula is as follows: z=w1*x1+w2*x2+....wmxm+b。
Result output is finally carried out using sigmoid function, the weighted value w of m-th of number of dimensions is calculated with thismWith Amount of bias b.Because y is known, that is, the use in input data It is known that whether the number of households involved moon, which is lost,.Pass through formula z=w1*x1+w2*x2+....wmxm+ b uses more than two equations then The weighted value w of m-th of number of dimensions can be back-calculated to obtainmWith amount of bias b.
Decision-tree model described in this example obtains a Machine learning classifiers by study, this Machine learning classifiers energy It is enough that correct classification is provided to emerging object.Decision tree needs to select different attributes when doing decision, here Attributions selection is defined according to information theory, and formula is as follows:Wherein m namely above y Value, the so value of m are preferably 2, piFor data set StIn i-th of classification probability, p when calculatingi=(StIn belong to i-th The record number of the set of classification/| St|), Info (St) indicate data set StThe separated information content of different classes.
Model-naive Bayesian described in this example passes through data set St={ st1,st2,.....stnJudge these data sets With whether handling set meal upshift y, the then probability under each classification of this group of feature, a herein the number of households involved mooniThe value of y is represented, aiFor prediction result, i.e. aiValue be just 0 or 1.Formula indicates are as follows: class probability
It is then to think that the given each feature of sample is uncorrelated to other features according to the thought of naive Bayesian.According to Maximize obtains the classification of this group of feature:
Wherein,
classify(f1,f2,....fn) it is the classification results for giving sample group data characteristics, fnRefer to data set StMiddle use The totality of family essential information data characteristics, StiFor data set StIn i-th of data.
That is, step S3 described in this example be based on integrated model for the passing behavior of user carry out different set meal groups into Row upshift prediction identification probability, by data set StAs input, set meal upshift data y whether will be handled with the number of households involved moonnAs output The training for carrying out integrated model updates the integrated model training for reaching in optimization and more having in continuous weight, and carries out integrated mould The output of type.
Step S4 described in this example is by updated user basic information data set SxIt is trained to be input to the step S3 In integrated model, exports and show with the predictive data set S for whether handling set meal upshift the number of households involved moony.More specifically, this example base User's upshift prediction identification probability is carried out in integrated model, using the trained integrated model of step S3 to updated use Family essential information data set SxIt is predicted (i.e. trained), updated user basic information data set SxFrom updated User basic information associated data set, for example before when do model training be user's April or February to four The historical data of the user basic information data of the moon is exactly May updated user basic information data when prediction It is input in the trained integrated model of step S3, and then exports timing node and be whether May needs to carry out set meal liter Prediction result (the predictive data set S of shelvesy), it then can be according to obtaining as a result, the user of set meal upshift can be handled to prediction Carry out business marketing.
Step S4 described in this example preferably includes two sub-steps: step S401, by updated user basic information data Collect SxIt is input in the trained integrated model of step S3, the output prediction data for whether handling set meal upshift the number of households involved moon Collect Sy;Step S402, by predictive data set SyIt is shown by user's upshift prediction system.The step S402 for realizing The displaying of user's upshift prediction probability identifying system, i.e., by predictive data set SyBy user's upshift prediction system demonstration, fortune is allowed Battalion quotient can market directly against that can handle the user of set meal upshift within secondary month, be also convenient for user and targetedly change certainly Oneself set meal.
This example also provides a kind of communication user upshift prediction probability identifying system based on integrated model, uses institute as above The communication user upshift prediction probability recognition methods based on integrated model stated.
In conclusion this example is by carrying out Data Detection and analysis to user basic information data set S, in conjunction with integrated model Training, and then realize predictive data set SyOutput, by the training (study) of the integrated model, can obtain preferably Prediction effect has stronger generalization ability, and therefore, specific aim, relevance and the accuracy of this prediction are all very high, is convenient for Predictive data set S is intuitive to see by user's upshift prediction systemy, forecasting system is realized, it is as a result very clear, and and When it is convenient, suggest providing good data basis to provide more standby targetedly set meal upshift, convenient for pointedly in time The set meal or specific aim for adjusting different clients group take different marketing measures, reduce the management difficulty of operator, effectively mention The marketing effectiveness and marketing accuracy of high operator.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims (10)

1. a kind of communication user upshift prediction probability recognition methods based on integrated model, which comprises the following steps:
Step S1 stores user basic information data into database, obtains user basic information data set S;
Step S2 carries out Data Detection to the user basic information data set S, and obtains data set after standardization St
Step S3, by data set StAs input, the data y that will be upshiftd with set meal whether is handled the number of households involved moonnCollected as output At model training, trained integrated model is obtained;
Step S4, by updated user basic information data set SxIt is input in the trained integrated model of step S3, it is defeated Out and displaying with whether handle the number of households involved moon set meal upshift predictive data set Sy
2. the communication user upshift prediction probability recognition methods according to claim 1 based on integrated model, feature exist In the step S2 includes following sub-step:
Step S201 detects exceptional value and missing values in the user basic information data set S;
Exceptional value is set missing values by step S202;
Step S203 obtains the data source S_ after data cleansing to missing values zero padding;
Step S204 is standardized data source S_ to obtain data set St
3. the communication user upshift prediction probability recognition methods according to claim 2 based on integrated model, feature exist In passing through formula in the step S204Data source S_ is standardized to obtain data set St, In, xnRepresent data source SThe nth data of middle data set, what n was represented is the sample size of data,Represent all sample numbers According to mean value, σ represents the standard deviation of all sample datas.
4. the communication user upshift prediction probability recognition methods according to claim 3 based on integrated model, feature exist In passing through formulaCalculate the mean value of all sample datasPass through formulaCalculate all samples The standard deviation sigma of notebook data.
5. the communication user upshift prediction probability identification side according to any one of claims 1 to 4 based on integrated model Method, which is characterized in that in the step S3, the integrated model contains Logic Regression Models, decision-tree model and simple shellfish This model of leaf;By the data set StAs input, the data y that will be upshiftd with set meal whether is handled the number of households involved moonnAs output, divide It is not trained in Logic Regression Models, decision-tree model and model-naive Bayesian, and is divided by way of verifying simultaneously Do not optimize the weight of the Logic Regression Models, decision-tree model and model-naive Bayesian in the integrated model.
6. the communication user upshift prediction probability recognition methods according to claim 5 based on integrated model, feature exist In the step S3 includes following sub-step:
Step S301, by the data set StIt is split, is split as training dataset StAWith test data set StB
Step S302, by training dataset StAAs input, by the training dataset StAIt is corresponding whether to be handled with the number of households involved moon The data y of set meal upshiftn1As output, respectively in Logic Regression Models, decision-tree model and model-naive Bayesian simultaneously It is trained;
Step S303, in the training process, by the test data set StBAnd test data set StBIt is corresponding to use the number of households involved moon Whether the data y of set meal upshift is handledn2As verify data, respectively to Logic Regression Models, decision-tree model and simple pattra leaves Training data in this model is verified, if being verified, by its weight divided by weight constant dt, otherwise its weight is multiplied With weight constant dt
7. the communication user upshift prediction probability recognition methods according to claim 6 based on integrated model, feature exist In, in the step S301, the training dataset StAWith test data set StBBetween ratio control be 7:3.
8. the communication user upshift prediction probability recognition methods according to claim 6 based on integrated model, feature exist In, in the step S3, when starting to train, the Logic Regression Models, decision-tree model and model-naive Bayesian Between weight proportion be 1:1:1;The weight constant dtFor the random number between 0~1.
9. the communication user upshift prediction probability identification side according to any one of claims 1 to 4 based on integrated model Method, which is characterized in that in the step S1, the user basic information data set S include user's unique ID, set dinner cost, Traffic fee, the first trimester telephone expenses amount of money, super set number, super set traffic fee, integral, user gradation, the age, gender, call class, Net duration, first trimester voice duration mean value, roaming service customer, first trimester GPRS flow, first trimester telephone expenses mean value and Any one or a few in first trimester flow mean value.
10. a kind of communication user upshift prediction probability identifying system based on integrated model, which is characterized in that use such as right It is required that the communication user upshift prediction probability recognition methods described in 1 to 9 any one based on integrated model.
CN201910161182.3A 2019-03-04 2019-03-04 Communication user upshift prediction probability recognition methods and system based on integrated model Pending CN109886756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910161182.3A CN109886756A (en) 2019-03-04 2019-03-04 Communication user upshift prediction probability recognition methods and system based on integrated model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910161182.3A CN109886756A (en) 2019-03-04 2019-03-04 Communication user upshift prediction probability recognition methods and system based on integrated model

Publications (1)

Publication Number Publication Date
CN109886756A true CN109886756A (en) 2019-06-14

Family

ID=66930582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910161182.3A Pending CN109886756A (en) 2019-03-04 2019-03-04 Communication user upshift prediction probability recognition methods and system based on integrated model

Country Status (1)

Country Link
CN (1) CN109886756A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449569A (en) * 2020-11-02 2022-05-06 中国移动通信集团广东有限公司 User traffic usage processing method, network device and service processing system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389713A (en) * 2015-10-15 2016-03-09 南京大学 Mobile data traffic package recommendation algorithm based on user historical data
CN106022505A (en) * 2016-04-28 2016-10-12 华为技术有限公司 Method and device of predicting user off-grid
CN106779079A (en) * 2016-11-23 2017-05-31 北京师范大学 A kind of forecasting system and method that state is grasped based on the knowledge point that multimodal data drives
CN106845731A (en) * 2017-02-20 2017-06-13 重庆邮电大学 A kind of potential renewal user based on multi-model fusion has found method
CN107480687A (en) * 2016-06-08 2017-12-15 富士通株式会社 Information processor and information processing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389713A (en) * 2015-10-15 2016-03-09 南京大学 Mobile data traffic package recommendation algorithm based on user historical data
CN106022505A (en) * 2016-04-28 2016-10-12 华为技术有限公司 Method and device of predicting user off-grid
CN107480687A (en) * 2016-06-08 2017-12-15 富士通株式会社 Information processor and information processing method
CN106779079A (en) * 2016-11-23 2017-05-31 北京师范大学 A kind of forecasting system and method that state is grasped based on the knowledge point that multimodal data drives
CN106845731A (en) * 2017-02-20 2017-06-13 重庆邮电大学 A kind of potential renewal user based on multi-model fusion has found method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449569A (en) * 2020-11-02 2022-05-06 中国移动通信集团广东有限公司 User traffic usage processing method, network device and service processing system
CN114449569B (en) * 2020-11-02 2024-01-16 中国移动通信集团广东有限公司 User traffic usage processing method, network equipment and service processing system

Similar Documents

Publication Publication Date Title
CN107851097B (en) Data analysis system, data analysis method, data analysis program, and storage medium
CN103927675B (en) Judge the method and device of age of user section
CN107766929B (en) Model analysis method and device
US7328218B2 (en) Constrained tree structure method and system
CN109447364B (en) Label-based electric power customer complaint prediction method
CN103761254B (en) Method for matching and recommending service themes in various fields
CN108681970A (en) Finance product method for pushing, system and computer storage media based on big data
CN107861951A (en) Session subject identifying method in intelligent customer service
CN109189904A (en) Individuation search method and system
CN109345302A (en) Machine learning model training method, device, storage medium and computer equipment
CN107507038B (en) Electricity charge sensitive user analysis method based on stacking and bagging algorithms
CN104750674B (en) A kind of man-machine conversation's satisfaction degree estimation method and system
CN109933660B (en) API information search method towards natural language form based on handout and website
CN110163647A (en) A kind of data processing method and device
CN107230108A (en) The processing method and processing device of business datum
CN109886755A (en) A kind of communication user attrition prediction method and system based on evolution algorithm
CN107622326A (en) User's classification, available resources Forecasting Methodology, device and equipment
CN110288350A (en) User's Value Prediction Methods, device, equipment and storage medium
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN104572915B (en) One kind is based on the enhanced customer incident relatedness computation method of content environment
CN113435627A (en) Work order track information-based electric power customer complaint prediction method and device
CN109919675A (en) Communication user upshift prediction probability recognition methods neural network based and system
CN108122173A (en) A kind of conglomerate load forecasting method based on depth belief network
CN105164672A (en) Content classification
CN110310012A (en) Data analysing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190614