CN106204083A - A kind of targeted customer's sorting technique, Apparatus and system - Google Patents

A kind of targeted customer's sorting technique, Apparatus and system Download PDF

Info

Publication number
CN106204083A
CN106204083A CN201510219456.1A CN201510219456A CN106204083A CN 106204083 A CN106204083 A CN 106204083A CN 201510219456 A CN201510219456 A CN 201510219456A CN 106204083 A CN106204083 A CN 106204083A
Authority
CN
China
Prior art keywords
class
subscriber
probability
characteristic attribute
targeted customer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510219456.1A
Other languages
Chinese (zh)
Other versions
CN106204083B (en
Inventor
王晓磊
王新印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Shandong Co Ltd
Original Assignee
China Mobile Group Shandong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Shandong Co Ltd filed Critical China Mobile Group Shandong Co Ltd
Priority to CN201510219456.1A priority Critical patent/CN106204083B/en
Publication of CN106204083A publication Critical patent/CN106204083A/en
Application granted granted Critical
Publication of CN106204083B publication Critical patent/CN106204083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of targeted customer's sorting technique, Apparatus and system, comprise determining that the probability of each class of subscriber in training sample, and under each class of subscriber the probability of each characteristic attribute group, under this each class of subscriber, the probability of characteristic attribute group is in the training sample under this class of subscriber, in this feature set of properties, each characteristic attribute meets the ratio of pre-conditioned training sample quantity corresponding to this feature set of properties and the training sample quantity of this class of subscriber, separate between each characteristic attribute group;Use Bayesian formula, estimate according to the conditional probability of each characteristic attribute group under the probability of each class of subscriber determined and each user, determine the targeted customer to be sorted posterior probability in each classification;Classification corresponding for posterior probability maximum is defined as the class of subscriber of described targeted customer to be sorted.Use the scheme of the embodiment of the present invention, improve the accuracy of targeted customer's classification.

Description

A kind of targeted customer's sorting technique, Apparatus and system
Technical field
The present invention relates to areas of information technology, particularly relate to a kind of targeted customer's sorting technique, Apparatus and system.
Background technology
Data mining technology was widely applied in recent years, classification be data mining technology main contents it One, the most perfect along with related algorithm, sorting algorithm has been applied in every field.Bank, operator, The service occupatioies such as supermarket when carrying out the promotion of new product or correlated activation, different users can be carried out for Property publicity, targeted customer is the basis of accurately marketing accurately, only determines certain classification in the consumer group Mark user, could launch effectively have affairs of marketing targetedly.Therefore, how targeted customer is carried out Effective classification becomes the emphasis that every profession and trade is paid close attention to.
The existing sorting technique classifying targeted customer mainly uses traditional decision-tree and Bayes side Method.Wherein, owing to bayes method is the combination of acyclic figure and probability theory, there is solid probability Theoretical basis and be widely used.For all of user data, characteristic attribute characterizes the relevant of user Information, as a example by mobile phone user: the sex of user, age, length of surfing the Net, monthly flow, flow package Value, talk times, call rate etc. are all characteristic attributes.And when pushing the services such as product to user, can root Push according to different classes of user, such as: can be using the age user more than 30 years old as the first mesh Marking class of subscriber, the age is not more than the user of 30 years old as second targeted customer's classification.To targeted customer When classifying, it is first determined the probability that each classification occurs in data sample, and each characteristic attribute Conditional probability estimation i.e. prior probability to each classification, by bayesian algorithm, uses fixed condition Probability Estimation, calculates the targeted customer to be sorted posterior probability in each classification, and maximum a posteriori probability is corresponding Classification as the classification of targeted customer to be sorted.
The above-mentioned bayes method used of classifying targeted customer needs to assume that each characteristic attribute is the most solely Vertical, but be that there is certain dependency between the characteristic attribute of actually user data, the most this independence It is inaccurate that the hypothesis of property makes targeted customer classify.
Summary of the invention
The embodiment of the present invention provides a kind of targeted customer's sorting technique, Apparatus and system, in order to solve existing skill The problem that present in art, targeted customer's classification accuracy is low.
The embodiment of the present invention provides a kind of targeted customer's sorting technique, including:
Determine the probability of each class of subscriber in training sample, and each characteristic attribute group under each class of subscriber Conditional probability estimate, the probability of described class of subscriber is quantity and the training of training sample under this class of subscriber The ratio of total sample number amount, under described each class of subscriber, the conditional probability of characteristic attribute group is estimated as in this use In training sample under the classification of family, in this feature set of properties, to meet this feature set of properties corresponding for each characteristic attribute The ratio of the training sample quantity of pre-conditioned training sample quantity and this class of subscriber;Described characteristic attribute Group includes the characteristic attribute with dependency extracted in all characteristic attributes of described training sample, and each spy Levying between set of properties separate, described characteristic attribute characterizes the feature of training sample data;
Use Bayesian formula, according to each characteristic attribute under the probability of each class of subscriber determined and each user The conditional probability of group is estimated, determines the targeted customer to be sorted posterior probability in each classification;
Classification corresponding for posterior probability maximum is defined as the class of subscriber of described targeted customer to be sorted.
The said method provided by the embodiment of the present invention, is belonged to the characteristic attribute composition characteristic with dependency Property separate between group, and characteristic attribute group, meet use bayes method each parameter separate Assumed condition, when therefore classifying targeted customer, improves the accuracy of targeted customer's classification.
Further, described targeted customer to be sorted uses equation below true in the posterior probability of each classification Fixed:
P ( X | C i ) = P ( C i ) Π k = 1 , j = 1 k = n , j = r P ( X kj | C i )
Wherein, CiFor i-th class of subscriber, 1≤i≤m, m are the total quantity of class of subscriber, P (Xkj|Ci) Each characteristic attribute of expression kth characteristic attribute group is when pre-conditioned j, at class of subscriber CiLower kth The conditional probability of characteristic attribute group is estimated, n is characterized the quantity of set of properties, and r is pre-conditioned number, P(Ci) represent class of subscriber CiThe probability occurred, and P (X | Ci) represent that targeted customer X to be sorted is at class of subscriber CiPosterior probability.
Further, said method, also include:
Before classification corresponding for posterior probability maximum is defined as the classification of described targeted customer to be sorted, will The maximum posterior probability determined compares with default risk control coefficient, and after determining described maximum Test probability more than the risk control coefficient preset.
Further, said method, also include:
When the posterior probability determining described maximum is not more than the risk control coefficient preset, gives up and treat described The classification of class object user judges.
So, maximum posterior probability is not more than the targeted customer to be sorted house of the risk control coefficient preset Abandon, reduce marketing risk, marketing success rate can be improved.
The embodiment of the present invention additionally provides a kind of targeted customer's sorter, including:
First determines unit, for determining the probability of each class of subscriber in training sample, and each user Under classification, the conditional probability of each characteristic attribute group is estimated, the probability of described class of subscriber is to instruct under this class of subscriber Practice the quantity of sample and the ratio of training sample total quantity, the bar of characteristic attribute group under described each class of subscriber Part probability Estimation is in the training sample under this class of subscriber, and in this feature set of properties, each characteristic attribute meets The training sample quantity of pre-conditioned training sample quantity corresponding to this feature set of properties and this class of subscriber Ratio;Having that described characteristic attribute group includes extracting in all characteristic attributes of described training sample is relevant Property characteristic attribute, and separate between each characteristic attribute group, described characteristic attribute characterizes number of training According to feature;
Second determines unit, is used for using Bayesian formula, according to the probability of each class of subscriber determined and every Under individual user, the conditional probability of each characteristic attribute group is estimated, determines that targeted customer to be sorted is after each classification Test probability;
3rd determines unit, uses for classification corresponding for posterior probability maximum is defined as described target to be sorted The class of subscriber at family.
The said apparatus provided by the embodiment of the present invention, is belonged to the characteristic attribute composition characteristic with dependency Property separate between group, and characteristic attribute group, meet use bayes method each parameter separate Assumed condition, when therefore classifying targeted customer, improves the accuracy of targeted customer's classification.
Further, described second determines unit, specifically for using equation below to determine described mesh to be sorted Mark user is in the posterior probability of each classification:
P ( X | C i ) = P ( C i ) Π k = 1 , j = 1 k = n , j = r P ( X kj | C i )
Wherein, CiFor i-th class of subscriber, 1≤i≤m, m are the total quantity of class of subscriber, P (Xkj|Ci) Each characteristic attribute of expression kth characteristic attribute group is when pre-conditioned j, at class of subscriber CiLower kth The conditional probability of characteristic attribute group is estimated, n is characterized the quantity of set of properties, and r is pre-conditioned number, P(Ci) represent class of subscriber CiThe probability occurred, and P (X | Ci) represent that targeted customer X to be sorted is at class of subscriber CiPosterior probability.
Further, said apparatus, also include:
Comparing unit, for being defined as described targeted customer to be sorted by classification corresponding for posterior probability maximum Classification before, the maximum posterior probability determined is compared with default risk control coefficient, and really The posterior probability of fixed described maximum is more than the risk control coefficient preset.
Further, said apparatus, also include:
Give up unit, for being not more than, when the posterior probability determining described maximum, the risk control coefficient preset Time, give up the classification to described targeted customer to be sorted and judge.
So, maximum posterior probability is not more than the targeted customer to be sorted house of the risk control coefficient preset Abandon, reduce marketing risk, marketing success rate can be improved.
The embodiment of the present invention additionally provides a kind of targeted customer's categorizing system, including:
The targeted customer's sorter provided such as above-described embodiment.
Other features and advantage will illustrate in the following description, and, partly from explanation Book becomes apparent, or understands by implementing the application.The purpose of the application and other advantages can Realize by structure specifically noted in the description write, claims and accompanying drawing and obtain ?.
Accompanying drawing explanation
Accompanying drawing is for providing a further understanding of the present invention, and constitutes a part for description, with this Bright embodiment is used for explaining the present invention together, is not intended that limitation of the present invention.In the accompanying drawings:
The flow chart of targeted customer's sorting technique that Fig. 1 provides for the embodiment of the present invention;
The flow chart of targeted customer's sorting technique that Fig. 2 provides for the embodiment of the present invention 1;
The structural representation of targeted customer's sorter that Fig. 3 provides for the embodiment of the present invention 2.
Detailed description of the invention
In order to provide the implementation improving targeted customer's classification accuracy, embodiments provide one Targeted customer's sorting technique, Apparatus and system, below in conjunction with Figure of description to the preferred embodiments of the present invention Illustrate, it will be appreciated that preferred embodiment described herein is merely to illustrate and explains the present invention, and It is not used in the restriction present invention.And in the case of not conflicting, in embodiment in the application and embodiment Feature can be mutually combined.
The embodiment of the present invention provides a kind of targeted customer's sorting technique, as it is shown in figure 1, include:
Step 101, determine the probability of each class of subscriber in training sample, and each under each class of subscriber The probability of characteristic attribute group, the probability of this class of subscriber is quantity and the training of training sample under this class of subscriber The ratio of total sample number amount, under this each class of subscriber, the probability of characteristic attribute group is under this class of subscriber In training sample, in this feature set of properties, each characteristic attribute meets corresponding pre-conditioned of this feature set of properties The ratio of the training sample quantity of training sample quantity and this class of subscriber, this feature set of properties includes training sample The characteristic attribute with dependency extracted in this all characteristic attributes, and between each characteristic attribute group mutually Independent, the feature of this feature attribute characterization training sample data.
Step 102, use Bayesian formula, according under the probability of each class of subscriber determined and each user The probability of each characteristic attribute group, determines the targeted customer to be sorted posterior probability in each classification.
Step 103, classification corresponding for posterior probability maximum is defined as the user class of this targeted customer to be sorted Not.
In the embodiment of the present invention, the sorting technique of targeted customer can apply the accurate battalion in each businessman or enterprise In pin service, for a kind of marketing service, training sample can be to use the early stage under this marketing service Cross the basic data of each user of this service, acquire by the way of stochastic sampling.In this marketing service Under, a user data is an example.Wherein, characteristic attribute characterizes the feature of training sample, to move As a example by dynamic service, training sample includes that early stage user uses the various data of Information Mobile Service, characteristic attribute respectively May include that user's sex, age, length of surfing the Net, monthly flow, flow package value, talk times, lead to Telephone expenses etc..
All of characteristic attribute in training sample for a kind of marketing service, extracts the feature with dependency Attribute, will have the characteristic attribute constitutive characteristic set of properties of dependency, separate between characteristic attribute group. Concrete, the quantity of characteristic attribute group can be arranged flexibly according to different marketing services.
Wherein, class of subscriber is the type pushing specific product for user set in advance.
Below in conjunction with the accompanying drawings, the method and device provided the present invention with specific embodiment and corresponding system are carried out Describe in detail.
Embodiment 1:
The flow chart of targeted customer's sorting technique that Fig. 2 provides for the embodiment of the present invention 1, specifically includes as follows Process step:
Step 201, construction feature set of properties.
In the present embodiment, for a kind of marketing service, the early stage under this marketing service was used this service The basic data of each user as raw sample data, the basic data of each user is an original sample Data, randomly draw the raw sample data of default sample size as training sample.In raw sample data Including various characteristic attributes, in conjunction with the data characteristics of this marketing service, all of characteristic attribute selects tool There is the characteristic attribute constitutive characteristic set of properties of dependency.Such as: as a example by Information Mobile Service, can will have phase The characteristic attribute of closing property is divided into several groups, and flow group (comprises flow ARPU (Average Revenue Per-User, every user's average income), monthly flow, super set meal flow, flow package is worth), terminal Group (comprise terminal standard, machine age), phone group (comprises talk times, call rate), and customer charge group (is used The monthly expense in family).Above-mentioned this mode is that all characteristic attributes with dependency are divided into a feature belong to Property group, further, it is also possible to from all characteristic attributes with dependency select Partial Feature attribute make It is characterized set of properties, such as: flow group (monthly flow, flow ARPU), set of terminal ((machine can be selected Age), phone group's (talk times, call rate), customer charge group (the monthly expense of user) belongs to as feature Property group.
Assume that s characteristic attribute is respectively A1, A2... As, the quantity of class of subscriber is m, point Wei C1, C2... Cm, in training sample evidence, each characteristic attribute value is respectively (X1, X2... Xs), build n characteristic attribute group and be respectively B1=(A1, A2, A3), B2=(A4, A6), B3=(A5)…… Bn.Below as a example by concrete training sample, it is assumed that training sample is divided into 2 class of subscribers, C1For 4G Set meal user, C2For non-4G set meal user, the quantity of training sample is 50,000 users, wherein, 5000 Individual user is 4G set meal user, and 45000 users are non-4G set meal users.
Step 202, determine the probability of each class of subscriber in this training sample.
In this step, class of subscriber C1Probability P (the C occurred1)=5000/50000=0.1, class of subscriber C2 Probability P (the C occurred2)=45000/50000=0.9.
Step 203, determine each characteristic attribute group under each class of subscriber conditional probability estimate.
In this step, under each class of subscriber, the conditional probability of each characteristic attribute group is estimated, for using at each In training sample under the classification of family, meet this feature set of properties pair for characteristic attribute each in this feature set of properties The ratio of the training sample quantity of the pre-conditioned training sample quantity answered and this class of subscriber.Wherein, special It can be multiple pre-conditioned for levying corresponding pre-conditioned of set of properties.
Such as: kth characteristic attribute group includes 2 characteristic attributes, monthly flow-A1And flow APRU-A2, corresponding pre-conditioned of this feature set of properties has 4 kinds: (1) A1≤ 10, A2≤10;(2) A1≤ 10, A2> 10;(3)A1> 10, A2≤10;(4)A1> 10, A2> 10.4G set meal user Training sample data in, meet respectively 4G set meal number of users pre-conditioned in above-mentioned 4 be respectively 500,2500,1000,1000, then, in kth characteristic attribute group, each characteristic attribute meets the first Time pre-conditioned, at class of subscriber C1The conditional probability of lower kth characteristic attribute group is estimated P(Xk1|C1)=500/5000=0.1;In kth characteristic attribute group, to meet the second pre-conditioned for each characteristic attribute Time, at class of subscriber C1The conditional probability of lower kth characteristic attribute group is estimated P(Xk2|C1)=2500/5000=0.5, in kth characteristic attribute group, each characteristic attribute meets the third default bar At class of subscriber C during part1The conditional probability of lower kth characteristic attribute group is estimated P(Xk3|C1)=1000/5000=0.2;In kth characteristic attribute group, each characteristic attribute meets the 4th kind default article At class of subscriber C during part1The conditional probability of lower kth characteristic attribute group is estimated P(Xk4|C1)=1000/5000=0.2.It is similar to, it may be determined that each characteristic attribute in kth characteristic attribute group When meeting pre-conditioned in above-mentioned 4 respectively, at class of subscriber C2The condition of lower kth characteristic attribute group is general Rate is estimated.
Use above-mentioned identical mode, it may be determined that it is corresponding that other each characteristic attribute group meets this feature set of properties Pre-conditioned time, under each class of subscriber this feature set of properties conditional probability estimate, the bar determined Part probability Estimation is the characteristic attribute group prior probability to each class of subscriber, is also equivalent to, by right Training sample data use the mode of step 201-step 203 to be trained generating grader.
Step 204, employing Bayesian formula, the probability and the condition that occur according to each class of subscriber determined are general Rate is estimated, determines the targeted customer to be sorted posterior probability in each classification.
In this step, equation below is used to determine the targeted customer to be sorted posterior probability in each classification:
P ( X | C i ) = P ( C i ) Π k = 1 , j = 1 k = n , j = r P ( X kj | C i )
Wherein, CiFor i-th class of subscriber, 1≤i≤m, m are the total quantity of class of subscriber, P (Xkj|Ci) Each characteristic attribute of expression kth characteristic attribute group is when pre-conditioned j, at class of subscriber CiLower kth The conditional probability of characteristic attribute group is estimated, n is characterized the quantity of set of properties, and r is pre-conditioned number, P(Ci) represent class of subscriber CiThe probability occurred, and P (X | Ci) represent that targeted customer X to be sorted is at class of subscriber CiPosterior probability.
Step 205, determine the posterior probability of maximum whether more than the risk control coefficient preset, if it is, Enter step 206, if it does not, enter step 207.
Wherein, the risk control coefficient preset can be arranged flexibly according to practical situation.
Step 206, the classification of this targeted customer to be sorted that classification corresponding for posterior probability maximum is defined as.
Step 207, give up this targeted customer to be distinguished classification judge.
In the embodiment of the present invention, owing to when carrying out marketing service, needing to push away to different classes of targeted customer Sending the service that the category is corresponding, even if the classification that the posterior probability determining maximum is corresponding, the category is corresponding Service it could also be possible that this targeted customer to be sorted is not intended to pushed, the risk control coefficient therefore preset It is used for judging the degree of risk that the category has, if the posterior probability of maximum is not more than this risk control system Number, then it is assumed that the classification of this targeted customer to be sorted is risky, and this classification is also inaccurate, gives up The classification of this targeted customer to be distinguished judges, follow-up no longer to this targeted customer's Push Service to be sorted.
The method provided by the embodiment of the present invention 1, will have the characteristic attribute composition characteristic attribute of dependency Between group, and characteristic attribute group separate, meet the separate vacation of each parameter using bayes method If condition, when therefore targeted customer being classified, improve the accuracy of targeted customer's classification.Further, will be Big posterior probability is not more than the targeted customer to be sorted of the risk control coefficient preset and gives up, and reduces marketing Risk, can improve marketing success rate.
Embodiment 2:
Based on same inventive concept, the targeted customer's sorting technique provided according to the above embodiment of the present invention, phase Ying Di, the embodiment of the present invention 2 additionally provides a kind of targeted customer's sorter, its structural representation such as Fig. 3 Shown in, specifically include:
First determines unit 301, for determining the probability of each class of subscriber in training sample, and each Under class of subscriber, the conditional probability of each characteristic attribute group is estimated, the probability of described class of subscriber is this class of subscriber The quantity of lower training sample and the ratio of training sample total quantity, characteristic attribute group under described each class of subscriber Conditional probability be estimated as in the training sample under this class of subscriber, each characteristic attribute in this feature set of properties Meet the training sample of pre-conditioned training sample quantity corresponding to this feature set of properties and this class of subscriber The ratio of quantity;Described characteristic attribute group includes that extracts in all characteristic attributes of described training sample has Between the characteristic attribute of dependency, and each characteristic attribute group separate, described characteristic attribute characterize training sample The feature of notebook data;
Second determines unit 302, is used for using Bayesian formula, according to the probability of each class of subscriber determined The conditional probability of each characteristic attribute group is estimated with under each user, determines that targeted customer to be sorted is in each classification Posterior probability;
3rd determines unit 303, for classification corresponding for posterior probability maximum is defined as described mesh to be sorted The class of subscriber of mark user.
Further, the described each characteristic attribute determined in each characteristic attribute group meets this feature set of properties correspondence Pre-conditioned time instruction under the conditional probability of each class of subscriber is estimated as at each class of subscriber Practice in sample data, meet this feature set of properties for each characteristic attribute in each characteristic attribute group corresponding The ratio of the training sample quantity under pre-conditioned training sample quantity and described class of subscriber.
Further, second determines unit 302, specifically for using equation below to determine described mesh to be sorted Mark user is in the posterior probability of each classification:
P ( X | C i ) = P ( C i ) Π k = 1 , j = 1 k = n , j = r P ( X kj | C i )
Wherein, CiFor i-th class of subscriber, 1≤i≤m, m are the total quantity of class of subscriber, P (Xkj|Ci) Each characteristic attribute of expression kth characteristic attribute group is when pre-conditioned j, at class of subscriber CiLower kth The conditional probability of characteristic attribute group is estimated, n is characterized the quantity of set of properties, and r is pre-conditioned number, P(Ci) represent class of subscriber CiThe probability occurred, and P (X | Ci) represent that targeted customer X to be sorted is at class of subscriber CiPosterior probability.
Further, said apparatus, also include:
Comparing unit 304, for being defined as described target to be sorted by classification corresponding for posterior probability maximum Before the classification of user, the maximum posterior probability determined is compared with the risk control coefficient preset, And determine that the posterior probability of described maximum is more than the risk control coefficient preset.
Further, said apparatus, also include:
Give up unit 305, for being not more than, when the posterior probability determining described maximum, the risk control system preset During number, give up the classification to described targeted customer to be sorted and judge.
The embodiment of the present invention 2 additionally provides a kind of targeted customer's categorizing system, including:
Above-mentioned targeted customer's sorter that the embodiment of the present invention 2 provides.
The function of above-mentioned each unit may correspond to the respective handling step in flow process shown in Fig. 1 or Fig. 2, at this Repeat no more.
In sum, the scheme that the embodiment of the present invention provides, comprise determining that each class of subscriber in training sample Probability, and the probability of each characteristic attribute group under each class of subscriber, the probability of this class of subscriber was for should The quantity of training sample and the ratio of training sample total quantity, feature under this each class of subscriber under class of subscriber The probability of set of properties is in the training sample under this class of subscriber, and in this feature set of properties, each characteristic attribute is full The number of training of foot pre-conditioned training sample quantity corresponding to this feature set of properties and this class of subscriber The ratio of amount, what this feature set of properties included extracting in all characteristic attributes of training sample has dependency Between characteristic attribute, and each characteristic attribute group separate, the spy of this feature attribute characterization training sample data Point;Use Bayesian formula, according to each characteristic attribute under the probability of each class of subscriber determined and each user The conditional probability of group is estimated, determines the targeted customer to be sorted posterior probability in each classification;By posterior probability Maximum corresponding classification is defined as the class of subscriber of described targeted customer to be sorted.Use the embodiment of the present invention Scheme, improves the accuracy of targeted customer's classification.
Targeted customer's sorter that embodiments herein is provided can be realized by computer program.Ability Field technique personnel are it should be appreciated that above-mentioned Module Division mode is only in numerous Module Division mode Kind, if being divided into other modules or not dividing module, as long as targeted customer's sorter has above-mentioned functions, All should be within the protection domain of the application.
The application is with reference to method, equipment (system) and the computer program product according to the embodiment of the present application The flow chart of product and/or block diagram describe.It should be understood that can by computer program instructions flowchart and / or block diagram in each flow process and/or flow process in square frame and flow chart and/or block diagram and/ Or the combination of square frame.These computer program instructions can be provided to general purpose computer, special-purpose computer, embedding The processor of formula datatron or other programmable data processing device is to produce a machine so that by calculating The instruction that the processor of machine or other programmable data processing device performs produces for realizing at flow chart one The device of the function specified in individual flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and computer or the process of other programmable datas can be guided to set In the standby computer-readable memory worked in a specific way so that be stored in this computer-readable memory Instruction produce and include the manufacture of command device, this command device realizes in one flow process or multiple of flow chart The function specified in flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, makes Sequence of operations step must be performed to produce computer implemented place on computer or other programmable devices Reason, thus the instruction performed on computer or other programmable devices provides for realizing flow chart one The step of the function specified in flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
Obviously, those skilled in the art can carry out various change and modification without deviating from this to the present invention Bright spirit and scope.So, if the present invention these amendment and modification belong to the claims in the present invention and Within the scope of its equivalent technologies, then the present invention is also intended to comprise these change and modification.

Claims (9)

1. targeted customer's sorting technique, it is characterised in that including:
Determine the probability of each class of subscriber in training sample, and each characteristic attribute group under each class of subscriber Conditional probability estimate, the probability of described class of subscriber is quantity and the training of training sample under this class of subscriber The ratio of total sample number amount, under described each class of subscriber, the conditional probability of characteristic attribute group is estimated as in this use In training sample under the classification of family, in this feature set of properties, to meet this feature set of properties corresponding for each characteristic attribute The ratio of the training sample quantity of pre-conditioned training sample quantity and this class of subscriber;Described characteristic attribute Group includes the characteristic attribute with dependency extracted in all characteristic attributes of described training sample, and each spy Levying between set of properties separate, described characteristic attribute characterizes the feature of training sample data;
Use Bayesian formula, according to each characteristic attribute under the probability of each class of subscriber determined and each user The conditional probability of group is estimated, determines the targeted customer to be sorted posterior probability in each classification;
Classification corresponding for posterior probability maximum is defined as the class of subscriber of described targeted customer to be sorted.
2. the method for claim 1, it is characterised in that described targeted customer to be sorted is each The posterior probability of classification uses equation below to determine:
P ( X | C i ) = P ( C i ) Π k = 1 , j = 1 k = n , j = r P ( X kj | C i )
Wherein, CiFor i-th class of subscriber, 1≤i≤m, m are the total quantity of class of subscriber, P (Xkj|Ci) Each characteristic attribute of expression kth characteristic attribute group is when pre-conditioned j, at class of subscriber CiLower kth The conditional probability of characteristic attribute group is estimated, n is characterized the quantity of set of properties, and r is pre-conditioned number, P(Ci) represent class of subscriber CiThe probability occurred, and P (X | Ci) represent that targeted customer X to be sorted is at class of subscriber CiPosterior probability.
3. the method for claim 1, it is characterised in that by class corresponding for posterior probability maximum Before not being defined as the classification of described targeted customer to be sorted, also include:
The maximum posterior probability determined is compared with default risk control coefficient, and described in determining Big posterior probability is more than the risk control coefficient preset.
4. method as claimed in claim 3, it is characterised in that also include:
When the posterior probability determining described maximum is not more than the risk control coefficient preset, gives up and treat described The classification of class object user judges.
5. targeted customer's sorter, it is characterised in that including:
First determines unit, for determining the probability of each class of subscriber in training sample, and each user Under classification, the conditional probability of each characteristic attribute group is estimated, the probability of described class of subscriber is to instruct under this class of subscriber Practice the quantity of sample and the ratio of training sample total quantity, the bar of characteristic attribute group under described each class of subscriber Part probability Estimation is in the training sample under this class of subscriber, and in this feature set of properties, each characteristic attribute meets The training sample quantity of pre-conditioned training sample quantity corresponding to this feature set of properties and this class of subscriber Ratio;Having that described characteristic attribute group includes extracting in all characteristic attributes of described training sample is relevant Property characteristic attribute, and separate between each characteristic attribute group, described characteristic attribute characterizes number of training According to feature;
Second determines unit, is used for using Bayesian formula, according to the probability of each class of subscriber determined and every Under individual user, the conditional probability of each characteristic attribute group is estimated, determines that targeted customer to be sorted is after each classification Test probability;
3rd determines unit, uses for classification corresponding for posterior probability maximum is defined as described target to be sorted The class of subscriber at family.
6. device as claimed in claim 5, it is characterised in that described second determines unit, specifically uses The described targeted customer to be sorted posterior probability in each classification is determined in using equation below:
P ( X | C i ) = P ( C i ) Π k = 1 , j = 1 k = n , j = r P ( X kj | C i )
Wherein, CiFor i-th class of subscriber, 1≤i≤m, m are the total quantity of class of subscriber, P (Xkj|Ci) Each characteristic attribute of expression kth characteristic attribute group is when pre-conditioned j, at class of subscriber CiLower kth The conditional probability of characteristic attribute group is estimated, n is characterized the quantity of set of properties, and r is pre-conditioned number, P(Ci) represent class of subscriber CiThe probability occurred, and P (X | Ci) represent that targeted customer X to be sorted is at class of subscriber CiPosterior probability.
7. device as claimed in claim 5, it is characterised in that also include:
Comparing unit, for being defined as described targeted customer to be sorted by classification corresponding for posterior probability maximum Classification before, the maximum posterior probability determined is compared with default risk control coefficient, and really The posterior probability of fixed described maximum is more than the risk control coefficient preset.
8. device as claimed in claim 7, it is characterised in that also include:
Give up unit, for being not more than, when the posterior probability determining described maximum, the risk control coefficient preset Time, give up the classification to described targeted customer to be sorted and judge.
9. targeted customer's categorizing system, it is characterised in that including:
Device as described in claim 5-8 is arbitrary.
CN201510219456.1A 2015-04-30 2015-04-30 Target user classification method, device and system Active CN106204083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510219456.1A CN106204083B (en) 2015-04-30 2015-04-30 Target user classification method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510219456.1A CN106204083B (en) 2015-04-30 2015-04-30 Target user classification method, device and system

Publications (2)

Publication Number Publication Date
CN106204083A true CN106204083A (en) 2016-12-07
CN106204083B CN106204083B (en) 2020-02-18

Family

ID=57458538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510219456.1A Active CN106204083B (en) 2015-04-30 2015-04-30 Target user classification method, device and system

Country Status (1)

Country Link
CN (1) CN106204083B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229989A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The Attribute class method for distinguishing and device of a kind of determining user property
WO2019114481A1 (en) * 2017-12-13 2019-06-20 腾讯科技(深圳)有限公司 Cluster type recognition method, apparatus, electronic apparatus, and storage medium
CN109962956A (en) * 2017-12-26 2019-07-02 中国电信股份有限公司 For recommending the method and system of communication service to user
CN110442722A (en) * 2019-08-13 2019-11-12 北京金山数字娱乐科技有限公司 Method and device for training classification model and method and device for data classification
CN110580483A (en) * 2018-05-21 2019-12-17 上海大唐移动通信设备有限公司 indoor and outdoor user distinguishing method and device
CN111324641A (en) * 2020-02-19 2020-06-23 腾讯科技(深圳)有限公司 Personnel estimation method and device, computer-readable storage medium and terminal equipment
CN111797942A (en) * 2020-07-23 2020-10-20 深圳壹账通智能科技有限公司 User information classification method and device, computer equipment and storage medium
CN113111284A (en) * 2021-04-12 2021-07-13 中国铁塔股份有限公司 Classification information display method and device, electronic equipment and readable storage medium
CN113591018A (en) * 2021-07-30 2021-11-02 中国联合网络通信集团有限公司 Communication client classification management method, system, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577866A (en) * 2008-05-07 2009-11-11 中国移动通信集团公司 User classification method, advertisement release method and device
CN101685458A (en) * 2008-09-27 2010-03-31 华为技术有限公司 Recommendation method and system based on collaborative filtering
CN102081655A (en) * 2011-01-11 2011-06-01 华北电力大学 Information retrieval method based on Bayesian classification algorithm
US20110178964A1 (en) * 2010-01-21 2011-07-21 National Cheng Kung University Recommendation System Using Rough-Set and Multiple Features Mining Integrally and Method Thereof
CN103778206A (en) * 2014-01-14 2014-05-07 河南科技大学 Method for providing network service resources
CN104281635A (en) * 2014-03-13 2015-01-14 电子科技大学 Method for predicting basic attributes of mobile user based on privacy feedback
CN104298719A (en) * 2014-09-23 2015-01-21 新浪网技术(中国)有限公司 Method and system for conducting user category classification and advertisement putting based on social behavior

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577866A (en) * 2008-05-07 2009-11-11 中国移动通信集团公司 User classification method, advertisement release method and device
CN101685458A (en) * 2008-09-27 2010-03-31 华为技术有限公司 Recommendation method and system based on collaborative filtering
US20110178964A1 (en) * 2010-01-21 2011-07-21 National Cheng Kung University Recommendation System Using Rough-Set and Multiple Features Mining Integrally and Method Thereof
CN102081655A (en) * 2011-01-11 2011-06-01 华北电力大学 Information retrieval method based on Bayesian classification algorithm
CN103778206A (en) * 2014-01-14 2014-05-07 河南科技大学 Method for providing network service resources
CN104281635A (en) * 2014-03-13 2015-01-14 电子科技大学 Method for predicting basic attributes of mobile user based on privacy feedback
CN104298719A (en) * 2014-09-23 2015-01-21 新浪网技术(中国)有限公司 Method and system for conducting user category classification and advertisement putting based on social behavior

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229989B (en) * 2016-12-14 2020-09-22 北京国双科技有限公司 Method and device for determining attribute category of user attribute
CN108229989A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The Attribute class method for distinguishing and device of a kind of determining user property
WO2019114481A1 (en) * 2017-12-13 2019-06-20 腾讯科技(深圳)有限公司 Cluster type recognition method, apparatus, electronic apparatus, and storage medium
CN109962956A (en) * 2017-12-26 2019-07-02 中国电信股份有限公司 For recommending the method and system of communication service to user
CN109962956B (en) * 2017-12-26 2022-06-07 中国电信股份有限公司 Method and system for recommending communication services to a user
CN110580483A (en) * 2018-05-21 2019-12-17 上海大唐移动通信设备有限公司 indoor and outdoor user distinguishing method and device
CN110442722A (en) * 2019-08-13 2019-11-12 北京金山数字娱乐科技有限公司 Method and device for training classification model and method and device for data classification
CN110442722B (en) * 2019-08-13 2022-05-13 北京金山数字娱乐科技有限公司 Method and device for training classification model and method and device for data classification
CN111324641A (en) * 2020-02-19 2020-06-23 腾讯科技(深圳)有限公司 Personnel estimation method and device, computer-readable storage medium and terminal equipment
CN111324641B (en) * 2020-02-19 2022-09-09 腾讯科技(深圳)有限公司 Personnel estimation method and device, computer-readable storage medium and terminal equipment
CN111797942A (en) * 2020-07-23 2020-10-20 深圳壹账通智能科技有限公司 User information classification method and device, computer equipment and storage medium
CN113111284A (en) * 2021-04-12 2021-07-13 中国铁塔股份有限公司 Classification information display method and device, electronic equipment and readable storage medium
CN113591018A (en) * 2021-07-30 2021-11-02 中国联合网络通信集团有限公司 Communication client classification management method, system, electronic device and storage medium

Also Published As

Publication number Publication date
CN106204083B (en) 2020-02-18

Similar Documents

Publication Publication Date Title
CN106204083A (en) A kind of targeted customer's sorting technique, Apparatus and system
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
WO2022126963A1 (en) Customer profiling method based on customer response corpora, and device related thereto
CN105468742A (en) Malicious order recognition method and device
CN111192004A (en) Method for displaying current to-do task and subsequent to-do workflow
Nassif et al. Regression model for software effort estimation based on the use case point method
CN110516057B (en) Petition question answering method and device
CN105912645A (en) Intelligent question and answer method and apparatus
CN109784368A (en) A kind of determination method and apparatus of application program classification
CN105812554A (en) Method and system for intelligently managing text messages in mobile phones
CN105894183A (en) Project evaluation method and apparatus
CN110609908A (en) Case serial-parallel method and device
CN106204091A (en) Data processing method and device
CN105260913A (en) CTR estimation method and system, and DSP server used for Internet advertisement putting
CN106651368A (en) Order-scalping-preventing payment mode control method and control system
CN107015993B (en) User type identification method and device
CN107291774B (en) Error sample identification method and device
CN109918645A (en) Method, apparatus, computer equipment and the storage medium of depth analysis text
CN113887551B (en) Target person analysis method based on ticket data, terminal device and storage medium
CN109145932A (en) User's gender prediction's method, device and equipment
CN111061948A (en) User label recommendation method and device, computer equipment and storage medium
CN109657929A (en) Appraisal procedure, device and the computer equipment of trade mark registration percent of pass
CN107016460A (en) User changes planes Forecasting Methodology and device
CN103778210B (en) Method and device for judging specific file type of file to be analyzed
CN112256844A (en) Text classification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant