CN109146569A - A kind of communication user logout prediction technique based on decision tree - Google Patents
A kind of communication user logout prediction technique based on decision tree Download PDFInfo
- Publication number
- CN109146569A CN109146569A CN201810998919.2A CN201810998919A CN109146569A CN 109146569 A CN109146569 A CN 109146569A CN 201810998919 A CN201810998919 A CN 201810998919A CN 109146569 A CN109146569 A CN 109146569A
- Authority
- CN
- China
- Prior art keywords
- attribute
- value
- gain
- decision tree
- comentropy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004891 communication Methods 0.000 title claims abstract description 20
- 238000003066 decision tree Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 title claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000017105 transposition Effects 0.000 claims abstract description 5
- 239000012141 concentrate Substances 0.000 claims abstract 2
- 238000005259 measurement Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Abstract
The communication user logout prediction technique based on decision tree that the present invention relates to a kind of, belongs to field of artificial intelligence.The present invention sorts by calculating the comentropy of class label attribute, the information gain of the entropy of each Attribute transposition subset, class label attribute, by each attribute according to its information gain size, obtains the attribute of maximum information gain;Secondly Bayesian formula is used, concentrates each attribute value to carry out weight judgement training data;Node is finally created with the attribute of maximum information gain, and with this attribute label, branch is created to each value of attribute, the maximum attribute value of weight connects next attribute and establishes customer churn Early-warning Model by the building of decision tree.
Description
Technical field
The communication user logout prediction technique based on decision tree that the present invention relates to a kind of, belongs to field of artificial intelligence.
Background technique
Currently, the competition between major operator in the communications industry increasingly swashs with the continuous development of the communications industry
Strong, mobile client losing issue is constantly subjected to the close attention of common carrier.Under new mobile Internet industrial situation,
Other than the internal competition of common carrier, operator will also be faced with the external competitive from internet, spread out under the new era
The mobile network's instant messaging tools born is that the business dependence of offer of the user to operator gradually weakens.
Due to communicating the characteristics such as set meal is many kinds of, evaluation criterion is single, user's history data are imperfect on the market at present,
Requirement of the customer churn early warning problem for algorithm is very high, and conventional method mainly includes Fundamental Analysis and technology analytic approach, divides
Customer churn is not analyzed by the market factor such as supply-demand relationship and statistical analysis, prediction difficulty is larger, and prediction result is quasi-
True property is not high.
Data mining and flourishing for big data provide a large amount of technical support, face for provider customer's attrition prediction
To user characteristic data, customer churn early warning mechanism is targetedly established, analyzes the factor of customer churn, foundation is accurately marketed
Strategy takes corresponding adjustment for weak link, improves the market competitiveness of operator.
Summary of the invention
The communication user logout prediction technique based on decision tree that the technical problem to be solved in the present invention is to provide a kind of, is used for
It solves the above problems.
The technical scheme is that a kind of communication user logout prediction technique based on decision tree, specific steps are as follows:
Step1, data acquisition: sample communications user base information and consumer consumption behavior are put into training data set S
In;
Wherein communication user basic information includes: Subscriber Number, attribute party A-subscriber's age, when attribute B gender, attribute C open an account
Between, attribute D customer grade, attribute E monthly consumption charge;
Consumer consumption behavior includes: the attribute F duration of call, attribute G flow dosage, attribute H short message dosage, attribute J increment
Business dosage;
Step2, data processing: every generic attribute data that S is concentrated are classified;
Step3, class label characteristic value is divided into n class, wherein class label characteristic value has t value, tuFor sample contained by every class
Number, for given class label characteristic value, comentropy be may be defined as shown in formula (1):
Wherein
Any attribute extracted in attribute ABCDEFGHJ is concentrated from S, is constituted its any one subset and is denoted as Sk
(k=A, B, C, D, E, F, G, H, J), in subset SkIn, S is divided into according to its tagsortkjClass (j=1 ..., v),
Wherein every one kind has Skij(i=1 ..., m) a value;
The comentropy of each classification can be obtained according to classification value:
Step4, the entropy for calculating each Attribute transposition subset are as shown in formula (3):
Step5, the expectation reduced value that entropy is measured with information gain then select attribute k divide to S the information of acquisition
Gain is as shown in formula (4):
Gain (k)=I (T1,T2,...,Tn)-Ent(k) (4)
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k;
Step6, Bayesian formula is usedWherein (k=A, B, C, D, E, F, G, H, J) is to training data
Each attribute value is concentrated to carry out weight judgement;
Each attribute is sorted according to its information gain size, obtains maximum information gain by Step7, building decision tree
Attribute;Node is created, and with this attribute label, branch is created to each value of attribute;The maximum attribute value connection of weight is next
A attribute;
Step8, according to constructed decision tree, establish customer churn Early-warning Model.
Further, the probability distribution of sample is more balanced in the Step3, then comentropy is bigger, and sample set mixes journey
It spends also higher;Using comentropy as a measurement of training set degree of purity, entropy is smaller, and degree of purity is higher.
Further, Gain (k) causes the expectation of entropy to be compressed after representing known attribute k in the Step5;Comentropy is smaller
Representing that node is purer, the definition based on information gain, information gain is bigger, and the reduction amount of comentropy is bigger, and node tends to be pure,
Then Gain (k) is bigger, and the information for selecting testing attribute k to provide classification is more.
It is difficult to carry out data at profound place the beneficial effects of the present invention are: solving traditional data analysis tool
Reason, the method combined by the decision Tree algorithms in data mining analysis with Bayesian formula, to magnanimity, huge, numerous
Trivial, mixed and disorderly data are handled, analyze have potential using value communicating user data therefrom to communication user be lost into
Row early warning improves prediction accuracy, increases the market competitiveness that quotient is used in communication.
Detailed description of the invention
Fig. 1 is flow chart of steps of the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1: as shown in Figure 1, a kind of communication user logout prediction technique based on decision tree, by calculating class label
The information gain of the comentropy of attribute, the entropy of each Attribute transposition subset, class label attribute, by each attribute according to its information
The sequence of gain size, obtains the attribute of maximum information gain;Secondly Bayesian formula is used, each attribute is concentrated to training data
Value carries out weight judgement;Node is finally created with the attribute of maximum information gain, and with this attribute label, to each of attribute
Value creation branch, the maximum attribute value of weight connect next attribute and establish customer churn early warning mould by the building of decision tree
Type.
Specific steps are as follows:
Step1, data acquisition, are put into training data set S for sample communications user base information and consumer consumption behavior
In;
Wherein communication user basic information includes: Subscriber Number, attribute party A-subscriber's age, when attribute B gender, attribute C open an account
Between, attribute D customer grade, attribute E monthly consumption charge;
Consumer consumption behavior includes: the attribute F duration of call (the min/ month), attribute G flow dosage (the GB/ month), attribute H short message
Dosage (item/moon), attribute J value-added service dosage (member/moon);
Step2, data processing, every generic attribute data that S is concentrated, classify;
≤ 10 ,≤18 ,≤40 ,≤60, > 60 specifically, following four classes (year) is divided by age of user for attribute A:
Following two categories is divided by gender for attribute B: male, female
Following six class (year) is divided by the time of opening an account for attribute C :≤3 ,≤5 ,≤10 ,≤15 ,≤20, > 20
Following five class: a star user, two star users, three-star user, four stars is divided into for attribute D customer grade
Grade user, five-star user
Following five class (member/moon) is divided by monthly consumption charge for attribute E :≤50 ,≤100 ,≤150 ,≤200,
> 200
Following five class (minute/moon) is divided by duration of call expense for attribute F :≤300 ,≤500 ,≤1000 ,≤
1500,2000 >
Attribute G flow dosage is divided into following seven class (the G/ month) :≤5G ,≤10G ,≤20G ,≤30G ,≤40G ,≤
50G, > 50G
≤ 100 ,≤300 ,≤500 ,≤1000, > following seven class (item/moon) is divided into for attribute H short message dosage:
1000
It can be divided according to the actual situation, division rule is without being limited thereto;
Step3, class label characteristic value is divided into n class, wherein class label characteristic value has t value, tuFor sample contained by every class
Number, for given class label characteristic value, comentropy be may be defined as shown in formula (1):
Wherein
Any attribute extracted in attribute ABCDEFGHJ is concentrated from S, is constituted its any one subset and is denoted as Sk(k=A,
B, C, D, E, F, G, H, J), in subset SkIn, S is divided into according to its tagsortkjClass (j=1 ..., v), wherein every one kind has
Skij(i=1 ..., m) a value;
The comentropy of each classification can be obtained according to classification value:
Step4, the entropy for calculating each Attribute transposition subset are as shown in formula (3):
Step5, the expectation reduced value that entropy is measured with information gain then select attribute k divide to S the information of acquisition
Gain is as shown in formula (4):
Gain (k)=I (T1,T2,...,Tn)-Ent(k) (4)
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k;
Step6, Bayesian formula is usedWherein (k=A, B, C, D, E, F, G, H, J) is to training data
Each attribute value is concentrated to carry out weight judgement;
Each attribute is sorted according to its information gain size, obtains maximum information gain by Step7, building decision tree
Attribute;Node is created, and with this attribute label, branch is created to each value of attribute;The maximum attribute value connection of weight is next
A attribute;
Step8, according to constructed decision tree, establish customer churn Early-warning Model.
Further, the probability distribution of sample is more balanced in the Step3, then comentropy is bigger, and sample set mixes journey
It spends also higher;Using comentropy as a measurement of training set degree of purity, entropy is smaller, and degree of purity is higher.
Further, Gain (k) causes the expectation of entropy to be compressed after representing known attribute k in the Step5;Comentropy is smaller
Representing that node is purer, the definition based on information gain, information gain is bigger, and the reduction amount of comentropy is bigger, and node tends to be pure,
Then Gain (k) is bigger, and the information for selecting testing attribute k to provide classification is more.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (3)
1. a kind of communication user logout prediction technique based on decision tree, it is characterised in that:
Step1, data acquisition: sample communications user base information and consumer consumption behavior are put into training data set S;
Wherein communication user basic information includes: Subscriber Number, attribute party A-subscriber's age, attribute B gender, attribute C open an account the time,
Attribute D customer grade, attribute E monthly consumption charge;
Consumer consumption behavior includes: the attribute F duration of call, attribute G flow dosage, attribute H short message dosage, attribute J value-added service
Dosage;
Step2, data processing: every generic attribute data that S is concentrated are classified;
Step3, class label characteristic value is divided into n class, wherein class label characteristic value has t value, tuFor number of samples contained by every class,
For given class label characteristic value, comentropy be may be defined as shown in formula (1):
Wherein
Any attribute extracted in attribute ABCDEFGHJ is concentrated from S, is constituted its any one subset and is denoted as Sk(k=A, B, C,
D, E, F, G, H, J), in subset SkIn, S is divided into according to its tagsortkjClass (j=1 ..., v), wherein every one kind has Skij(i
=1 ..., m) a value;
The comentropy of each classification can be obtained according to classification value:
Step4, the entropy for calculating each Attribute transposition subset are as shown in formula (3):
Step5, the expectation reduced value that entropy is measured with information gain then select attribute k divide to S the information gain of acquisition
For shown in such as formula (4):
Gain (k)=I (T1,T2,...,Tn)-Ent(k) (4)
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k;
Step6, Bayesian formula is usedWherein (k=A, B, C, D, E, F, G, H, J) concentrates training data
Each attribute value carries out weight judgement;
Each attribute is sorted according to its information gain size, obtains the attribute of maximum information gain by Step7, building decision tree;
Node is created, and with this attribute label, branch is created to each value of attribute;The maximum attribute value of weight connects next category
Property;
Step8, according to constructed decision tree, establish customer churn Early-warning Model.
2. the communication user logout prediction technique according to claim 1 based on decision tree, it is characterised in that: described
The probability distribution of sample is more balanced in Step3, then comentropy is bigger, and the severity of mixing up of sample set is also higher;Using comentropy as
One measurement of training set degree of purity, entropy is smaller, and degree of purity is higher.
3. the communication user logout prediction technique according to claim 1 based on decision tree, it is characterised in that: described
Gain (k) causes the expectation of entropy to be compressed after representing known attribute k in Step5;The smaller node that represents of comentropy is purer, is based on information
The definition of gain, information gain is bigger, and the reduction amount of comentropy is bigger, and node tends to be pure, then Gain (k) is bigger, and selection is surveyed
It is more to try the information that attribute k provides classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810998919.2A CN109146569A (en) | 2018-08-30 | 2018-08-30 | A kind of communication user logout prediction technique based on decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810998919.2A CN109146569A (en) | 2018-08-30 | 2018-08-30 | A kind of communication user logout prediction technique based on decision tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109146569A true CN109146569A (en) | 2019-01-04 |
Family
ID=64829112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810998919.2A Pending CN109146569A (en) | 2018-08-30 | 2018-08-30 | A kind of communication user logout prediction technique based on decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109146569A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256044A (en) * | 2020-02-13 | 2021-08-13 | 中国移动通信集团广东有限公司 | Strategy determination method and device and electronic equipment |
CN115481825A (en) * | 2022-11-03 | 2022-12-16 | 广州智算信息技术有限公司 | Customer service personnel loss assessment early warning system based on big data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050170528A1 (en) * | 2002-10-24 | 2005-08-04 | Mike West | Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications |
CN102567391A (en) * | 2010-12-20 | 2012-07-11 | 中国移动通信集团广东有限公司 | Method and device for building classification forecasting mixed model |
CN104537010A (en) * | 2014-12-17 | 2015-04-22 | 温州大学 | Component classifying method based on net establishing software of decision tree |
-
2018
- 2018-08-30 CN CN201810998919.2A patent/CN109146569A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050170528A1 (en) * | 2002-10-24 | 2005-08-04 | Mike West | Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications |
CN102567391A (en) * | 2010-12-20 | 2012-07-11 | 中国移动通信集团广东有限公司 | Method and device for building classification forecasting mixed model |
CN104537010A (en) * | 2014-12-17 | 2015-04-22 | 温州大学 | Component classifying method based on net establishing software of decision tree |
Non-Patent Citations (4)
Title |
---|
尹婷 等: "贝叶斯决策树在客户流失预测中的应用", 《计算机工程与应用》 * |
张宇 等: "一种基于C5.0决策树的客户流失预测模型研究", 《统计与信息论坛》 * |
杨孝成: "基于决策树的移动通信用户流失预警模型研究与实现", 《中国海洋大学硕士学位论文》 * |
迟准: "电信运营企业客户流失预测与评价研究", 《中国博士学位论文全文数据库 经济与管理科学辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256044A (en) * | 2020-02-13 | 2021-08-13 | 中国移动通信集团广东有限公司 | Strategy determination method and device and electronic equipment |
CN113256044B (en) * | 2020-02-13 | 2023-08-15 | 中国移动通信集团广东有限公司 | Policy determination method and device and electronic equipment |
CN115481825A (en) * | 2022-11-03 | 2022-12-16 | 广州智算信息技术有限公司 | Customer service personnel loss assessment early warning system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210365963A1 (en) | Target customer identification method and device, electronic device and medium | |
CN107766929B (en) | Model analysis method and device | |
CN106951925A (en) | Data processing method, device, server and system | |
CN110097066A (en) | A kind of user classification method, device and electronic equipment | |
CN103761254B (en) | Method for matching and recommending service themes in various fields | |
CN110377804A (en) | Method for pushing, device, system and the storage medium of training course data | |
CN108921472A (en) | A kind of two stages vehicle and goods matching method of multi-vehicle-type | |
CN109359137B (en) | User growth portrait construction method based on feature screening and semi-supervised learning | |
CN104915397A (en) | Method and device for predicting microblog propagation tendencies | |
CN106228389A (en) | Network potential usage mining method and system based on random forests algorithm | |
CN105654196A (en) | Adaptive load prediction selection method based on electric power big data | |
CN110766438B (en) | Method for analyzing user behavior of power grid user through artificial intelligence | |
CN110516057B (en) | Petition question answering method and device | |
CN109146569A (en) | A kind of communication user logout prediction technique based on decision tree | |
CN108876076A (en) | The personal credit methods of marking and device of data based on instruction | |
Globa et al. | Ontology model of telecom operator big data | |
CN103647673A (en) | Method and device for QoS (quality of service) prediction of Web service | |
CN107305640A (en) | A kind of method of unbalanced data classification | |
CN109474755A (en) | Abnormal phone active predicting method and system based on sequence study and integrated study | |
CN108710999A (en) | The confidence level automatic evaluation method of shared resource under a kind of environment based on big data | |
CN106056137B (en) | A kind of business recommended method of telecommunications group based on data mining multi-classification algorithm | |
CN107666403A (en) | The acquisition methods and device of a kind of achievement data | |
CN110163525A (en) | Terminal recommended method and terminal recommender system | |
CN115328870A (en) | Data sharing method and system for cloud manufacturing | |
CN104573034A (en) | CDR call ticket based user group division method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |
|
RJ01 | Rejection of invention patent application after publication |