CN109146569A - A kind of communication user logout prediction technique based on decision tree - Google Patents

A kind of communication user logout prediction technique based on decision tree Download PDF

Info

Publication number
CN109146569A
CN109146569A CN201810998919.2A CN201810998919A CN109146569A CN 109146569 A CN109146569 A CN 109146569A CN 201810998919 A CN201810998919 A CN 201810998919A CN 109146569 A CN109146569 A CN 109146569A
Authority
CN
China
Prior art keywords
attribute
value
gain
decision tree
comentropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810998919.2A
Other languages
Chinese (zh)
Inventor
龙华
王瑞
邵玉斌
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810998919.2A priority Critical patent/CN109146569A/en
Publication of CN109146569A publication Critical patent/CN109146569A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Abstract

The communication user logout prediction technique based on decision tree that the present invention relates to a kind of, belongs to field of artificial intelligence.The present invention sorts by calculating the comentropy of class label attribute, the information gain of the entropy of each Attribute transposition subset, class label attribute, by each attribute according to its information gain size, obtains the attribute of maximum information gain;Secondly Bayesian formula is used, concentrates each attribute value to carry out weight judgement training data;Node is finally created with the attribute of maximum information gain, and with this attribute label, branch is created to each value of attribute, the maximum attribute value of weight connects next attribute and establishes customer churn Early-warning Model by the building of decision tree.

Description

A kind of communication user logout prediction technique based on decision tree
Technical field
The communication user logout prediction technique based on decision tree that the present invention relates to a kind of, belongs to field of artificial intelligence.
Background technique
Currently, the competition between major operator in the communications industry increasingly swashs with the continuous development of the communications industry Strong, mobile client losing issue is constantly subjected to the close attention of common carrier.Under new mobile Internet industrial situation, Other than the internal competition of common carrier, operator will also be faced with the external competitive from internet, spread out under the new era The mobile network's instant messaging tools born is that the business dependence of offer of the user to operator gradually weakens.
Due to communicating the characteristics such as set meal is many kinds of, evaluation criterion is single, user's history data are imperfect on the market at present, Requirement of the customer churn early warning problem for algorithm is very high, and conventional method mainly includes Fundamental Analysis and technology analytic approach, divides Customer churn is not analyzed by the market factor such as supply-demand relationship and statistical analysis, prediction difficulty is larger, and prediction result is quasi- True property is not high.
Data mining and flourishing for big data provide a large amount of technical support, face for provider customer's attrition prediction To user characteristic data, customer churn early warning mechanism is targetedly established, analyzes the factor of customer churn, foundation is accurately marketed Strategy takes corresponding adjustment for weak link, improves the market competitiveness of operator.
Summary of the invention
The communication user logout prediction technique based on decision tree that the technical problem to be solved in the present invention is to provide a kind of, is used for It solves the above problems.
The technical scheme is that a kind of communication user logout prediction technique based on decision tree, specific steps are as follows:
Step1, data acquisition: sample communications user base information and consumer consumption behavior are put into training data set S In;
Wherein communication user basic information includes: Subscriber Number, attribute party A-subscriber's age, when attribute B gender, attribute C open an account Between, attribute D customer grade, attribute E monthly consumption charge;
Consumer consumption behavior includes: the attribute F duration of call, attribute G flow dosage, attribute H short message dosage, attribute J increment Business dosage;
Step2, data processing: every generic attribute data that S is concentrated are classified;
Step3, class label characteristic value is divided into n class, wherein class label characteristic value has t value, tuFor sample contained by every class Number, for given class label characteristic value, comentropy be may be defined as shown in formula (1):
Wherein
Any attribute extracted in attribute ABCDEFGHJ is concentrated from S, is constituted its any one subset and is denoted as Sk
(k=A, B, C, D, E, F, G, H, J), in subset SkIn, S is divided into according to its tagsortkjClass (j=1 ..., v), Wherein every one kind has Skij(i=1 ..., m) a value;
The comentropy of each classification can be obtained according to classification value:
Step4, the entropy for calculating each Attribute transposition subset are as shown in formula (3):
Step5, the expectation reduced value that entropy is measured with information gain then select attribute k divide to S the information of acquisition Gain is as shown in formula (4):
Gain (k)=I (T1,T2,...,Tn)-Ent(k) (4)
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k;
Step6, Bayesian formula is usedWherein (k=A, B, C, D, E, F, G, H, J) is to training data Each attribute value is concentrated to carry out weight judgement;
Each attribute is sorted according to its information gain size, obtains maximum information gain by Step7, building decision tree Attribute;Node is created, and with this attribute label, branch is created to each value of attribute;The maximum attribute value connection of weight is next A attribute;
Step8, according to constructed decision tree, establish customer churn Early-warning Model.
Further, the probability distribution of sample is more balanced in the Step3, then comentropy is bigger, and sample set mixes journey It spends also higher;Using comentropy as a measurement of training set degree of purity, entropy is smaller, and degree of purity is higher.
Further, Gain (k) causes the expectation of entropy to be compressed after representing known attribute k in the Step5;Comentropy is smaller Representing that node is purer, the definition based on information gain, information gain is bigger, and the reduction amount of comentropy is bigger, and node tends to be pure, Then Gain (k) is bigger, and the information for selecting testing attribute k to provide classification is more.
It is difficult to carry out data at profound place the beneficial effects of the present invention are: solving traditional data analysis tool Reason, the method combined by the decision Tree algorithms in data mining analysis with Bayesian formula, to magnanimity, huge, numerous Trivial, mixed and disorderly data are handled, analyze have potential using value communicating user data therefrom to communication user be lost into Row early warning improves prediction accuracy, increases the market competitiveness that quotient is used in communication.
Detailed description of the invention
Fig. 1 is flow chart of steps of the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1: as shown in Figure 1, a kind of communication user logout prediction technique based on decision tree, by calculating class label The information gain of the comentropy of attribute, the entropy of each Attribute transposition subset, class label attribute, by each attribute according to its information The sequence of gain size, obtains the attribute of maximum information gain;Secondly Bayesian formula is used, each attribute is concentrated to training data Value carries out weight judgement;Node is finally created with the attribute of maximum information gain, and with this attribute label, to each of attribute Value creation branch, the maximum attribute value of weight connect next attribute and establish customer churn early warning mould by the building of decision tree Type.
Specific steps are as follows:
Step1, data acquisition, are put into training data set S for sample communications user base information and consumer consumption behavior In;
Wherein communication user basic information includes: Subscriber Number, attribute party A-subscriber's age, when attribute B gender, attribute C open an account Between, attribute D customer grade, attribute E monthly consumption charge;
Consumer consumption behavior includes: the attribute F duration of call (the min/ month), attribute G flow dosage (the GB/ month), attribute H short message Dosage (item/moon), attribute J value-added service dosage (member/moon);
Step2, data processing, every generic attribute data that S is concentrated, classify;
≤ 10 ,≤18 ,≤40 ,≤60, > 60 specifically, following four classes (year) is divided by age of user for attribute A:
Following two categories is divided by gender for attribute B: male, female
Following six class (year) is divided by the time of opening an account for attribute C :≤3 ,≤5 ,≤10 ,≤15 ,≤20, > 20
Following five class: a star user, two star users, three-star user, four stars is divided into for attribute D customer grade Grade user, five-star user
Following five class (member/moon) is divided by monthly consumption charge for attribute E :≤50 ,≤100 ,≤150 ,≤200, > 200
Following five class (minute/moon) is divided by duration of call expense for attribute F :≤300 ,≤500 ,≤1000 ,≤ 1500,2000 >
Attribute G flow dosage is divided into following seven class (the G/ month) :≤5G ,≤10G ,≤20G ,≤30G ,≤40G ,≤ 50G, > 50G
≤ 100 ,≤300 ,≤500 ,≤1000, > following seven class (item/moon) is divided into for attribute H short message dosage: 1000
It can be divided according to the actual situation, division rule is without being limited thereto;
Step3, class label characteristic value is divided into n class, wherein class label characteristic value has t value, tuFor sample contained by every class Number, for given class label characteristic value, comentropy be may be defined as shown in formula (1):
Wherein
Any attribute extracted in attribute ABCDEFGHJ is concentrated from S, is constituted its any one subset and is denoted as Sk(k=A, B, C, D, E, F, G, H, J), in subset SkIn, S is divided into according to its tagsortkjClass (j=1 ..., v), wherein every one kind has Skij(i=1 ..., m) a value;
The comentropy of each classification can be obtained according to classification value:
Step4, the entropy for calculating each Attribute transposition subset are as shown in formula (3):
Step5, the expectation reduced value that entropy is measured with information gain then select attribute k divide to S the information of acquisition Gain is as shown in formula (4):
Gain (k)=I (T1,T2,...,Tn)-Ent(k) (4)
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k;
Step6, Bayesian formula is usedWherein (k=A, B, C, D, E, F, G, H, J) is to training data Each attribute value is concentrated to carry out weight judgement;
Each attribute is sorted according to its information gain size, obtains maximum information gain by Step7, building decision tree Attribute;Node is created, and with this attribute label, branch is created to each value of attribute;The maximum attribute value connection of weight is next A attribute;
Step8, according to constructed decision tree, establish customer churn Early-warning Model.
Further, the probability distribution of sample is more balanced in the Step3, then comentropy is bigger, and sample set mixes journey It spends also higher;Using comentropy as a measurement of training set degree of purity, entropy is smaller, and degree of purity is higher.
Further, Gain (k) causes the expectation of entropy to be compressed after representing known attribute k in the Step5;Comentropy is smaller Representing that node is purer, the definition based on information gain, information gain is bigger, and the reduction amount of comentropy is bigger, and node tends to be pure, Then Gain (k) is bigger, and the information for selecting testing attribute k to provide classification is more.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (3)

1. a kind of communication user logout prediction technique based on decision tree, it is characterised in that:
Step1, data acquisition: sample communications user base information and consumer consumption behavior are put into training data set S;
Wherein communication user basic information includes: Subscriber Number, attribute party A-subscriber's age, attribute B gender, attribute C open an account the time, Attribute D customer grade, attribute E monthly consumption charge;
Consumer consumption behavior includes: the attribute F duration of call, attribute G flow dosage, attribute H short message dosage, attribute J value-added service Dosage;
Step2, data processing: every generic attribute data that S is concentrated are classified;
Step3, class label characteristic value is divided into n class, wherein class label characteristic value has t value, tuFor number of samples contained by every class, For given class label characteristic value, comentropy be may be defined as shown in formula (1):
Wherein
Any attribute extracted in attribute ABCDEFGHJ is concentrated from S, is constituted its any one subset and is denoted as Sk(k=A, B, C, D, E, F, G, H, J), in subset SkIn, S is divided into according to its tagsortkjClass (j=1 ..., v), wherein every one kind has Skij(i =1 ..., m) a value;
The comentropy of each classification can be obtained according to classification value:
Step4, the entropy for calculating each Attribute transposition subset are as shown in formula (3):
Step5, the expectation reduced value that entropy is measured with information gain then select attribute k divide to S the information gain of acquisition For shown in such as formula (4):
Gain (k)=I (T1,T2,...,Tn)-Ent(k) (4)
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k;
Step6, Bayesian formula is usedWherein (k=A, B, C, D, E, F, G, H, J) concentrates training data Each attribute value carries out weight judgement;
Each attribute is sorted according to its information gain size, obtains the attribute of maximum information gain by Step7, building decision tree; Node is created, and with this attribute label, branch is created to each value of attribute;The maximum attribute value of weight connects next category Property;
Step8, according to constructed decision tree, establish customer churn Early-warning Model.
2. the communication user logout prediction technique according to claim 1 based on decision tree, it is characterised in that: described The probability distribution of sample is more balanced in Step3, then comentropy is bigger, and the severity of mixing up of sample set is also higher;Using comentropy as One measurement of training set degree of purity, entropy is smaller, and degree of purity is higher.
3. the communication user logout prediction technique according to claim 1 based on decision tree, it is characterised in that: described Gain (k) causes the expectation of entropy to be compressed after representing known attribute k in Step5;The smaller node that represents of comentropy is purer, is based on information The definition of gain, information gain is bigger, and the reduction amount of comentropy is bigger, and node tends to be pure, then Gain (k) is bigger, and selection is surveyed It is more to try the information that attribute k provides classification.
CN201810998919.2A 2018-08-30 2018-08-30 A kind of communication user logout prediction technique based on decision tree Pending CN109146569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810998919.2A CN109146569A (en) 2018-08-30 2018-08-30 A kind of communication user logout prediction technique based on decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810998919.2A CN109146569A (en) 2018-08-30 2018-08-30 A kind of communication user logout prediction technique based on decision tree

Publications (1)

Publication Number Publication Date
CN109146569A true CN109146569A (en) 2019-01-04

Family

ID=64829112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810998919.2A Pending CN109146569A (en) 2018-08-30 2018-08-30 A kind of communication user logout prediction technique based on decision tree

Country Status (1)

Country Link
CN (1) CN109146569A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256044A (en) * 2020-02-13 2021-08-13 中国移动通信集团广东有限公司 Strategy determination method and device and electronic equipment
CN115481825A (en) * 2022-11-03 2022-12-16 广州智算信息技术有限公司 Customer service personnel loss assessment early warning system based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050170528A1 (en) * 2002-10-24 2005-08-04 Mike West Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications
CN102567391A (en) * 2010-12-20 2012-07-11 中国移动通信集团广东有限公司 Method and device for building classification forecasting mixed model
CN104537010A (en) * 2014-12-17 2015-04-22 温州大学 Component classifying method based on net establishing software of decision tree

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050170528A1 (en) * 2002-10-24 2005-08-04 Mike West Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications
CN102567391A (en) * 2010-12-20 2012-07-11 中国移动通信集团广东有限公司 Method and device for building classification forecasting mixed model
CN104537010A (en) * 2014-12-17 2015-04-22 温州大学 Component classifying method based on net establishing software of decision tree

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
尹婷 等: "贝叶斯决策树在客户流失预测中的应用", 《计算机工程与应用》 *
张宇 等: "一种基于C5.0决策树的客户流失预测模型研究", 《统计与信息论坛》 *
杨孝成: "基于决策树的移动通信用户流失预警模型研究与实现", 《中国海洋大学硕士学位论文》 *
迟准: "电信运营企业客户流失预测与评价研究", 《中国博士学位论文全文数据库 经济与管理科学辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256044A (en) * 2020-02-13 2021-08-13 中国移动通信集团广东有限公司 Strategy determination method and device and electronic equipment
CN113256044B (en) * 2020-02-13 2023-08-15 中国移动通信集团广东有限公司 Policy determination method and device and electronic equipment
CN115481825A (en) * 2022-11-03 2022-12-16 广州智算信息技术有限公司 Customer service personnel loss assessment early warning system based on big data

Similar Documents

Publication Publication Date Title
US20210365963A1 (en) Target customer identification method and device, electronic device and medium
CN107766929B (en) Model analysis method and device
CN106951925A (en) Data processing method, device, server and system
CN110097066A (en) A kind of user classification method, device and electronic equipment
CN103761254B (en) Method for matching and recommending service themes in various fields
CN110377804A (en) Method for pushing, device, system and the storage medium of training course data
CN108921472A (en) A kind of two stages vehicle and goods matching method of multi-vehicle-type
CN109359137B (en) User growth portrait construction method based on feature screening and semi-supervised learning
CN104915397A (en) Method and device for predicting microblog propagation tendencies
CN106228389A (en) Network potential usage mining method and system based on random forests algorithm
CN105654196A (en) Adaptive load prediction selection method based on electric power big data
CN110766438B (en) Method for analyzing user behavior of power grid user through artificial intelligence
CN110516057B (en) Petition question answering method and device
CN109146569A (en) A kind of communication user logout prediction technique based on decision tree
CN108876076A (en) The personal credit methods of marking and device of data based on instruction
Globa et al. Ontology model of telecom operator big data
CN103647673A (en) Method and device for QoS (quality of service) prediction of Web service
CN107305640A (en) A kind of method of unbalanced data classification
CN109474755A (en) Abnormal phone active predicting method and system based on sequence study and integrated study
CN108710999A (en) The confidence level automatic evaluation method of shared resource under a kind of environment based on big data
CN106056137B (en) A kind of business recommended method of telecommunications group based on data mining multi-classification algorithm
CN107666403A (en) The acquisition methods and device of a kind of achievement data
CN110163525A (en) Terminal recommended method and terminal recommender system
CN115328870A (en) Data sharing method and system for cloud manufacturing
CN104573034A (en) CDR call ticket based user group division method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication