CN105701498A - User classification method and server - Google Patents

User classification method and server Download PDF

Info

Publication number
CN105701498A
CN105701498A CN201511033392.2A CN201511033392A CN105701498A CN 105701498 A CN105701498 A CN 105701498A CN 201511033392 A CN201511033392 A CN 201511033392A CN 105701498 A CN105701498 A CN 105701498A
Authority
CN
China
Prior art keywords
user
attribute
mark
data
kind initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201511033392.2A
Other languages
Chinese (zh)
Other versions
CN105701498B (en
Inventor
王莉峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201511033392.2A priority Critical patent/CN105701498B/en
Publication of CN105701498A publication Critical patent/CN105701498A/en
Application granted granted Critical
Publication of CN105701498B publication Critical patent/CN105701498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a user classification method and server. The method includes the steps of: obtaining at least one labeled user having a first attributed based on historical service data of social network users; obtaining at least one characteristic parameter corresponding to the labeled user from at least one dimension, and based on the characteristic parameters of the labeled users and the first attribute corresponding to the labeled users, determining a classification model for the first attribute of users; and based on the classification model for the first attribute of the users, dividing for at least one target user in the social network the category of the first attribute corresponding to the at least one target user.

Description

A kind of user classification method and server
Technical field
The present invention relates to the user profile treatment technology in the communications field, particularly relate to a kind of user classification method and server。
Background technology
In current social networks and media information transmission system, directly using user to register the property content filled on social networks, such as emotion/love and marriage state, the classification carrying out media information sends。But, user fills in the content of attribute and there are following two problems: one, cover user incomplete: user is possible without actively carrying out filling in of attribute;Two, content is inaccurate: because there is the expired problem not upgraded in time, causes that attribute lacks ageing problem。Visible, the attribute filled in based on user in current social networks is it is possible that inaccurate problem of classifying。
Summary of the invention
In view of this, it is an object of the invention to provide a kind of user classification method and server, can at least solve the above-mentioned problems in the prior art。
For reaching above-mentioned purpose, the technical scheme is that and be achieved in that:
Embodiments providing a kind of user classification method, described method includes:
History based on social network user services data, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
At least one characteristic parameter that described mark user is corresponding is got, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user from least one dimension;
Based on the disaggregated model of described the first attribute for user, divide the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Embodiments provide a kind of server, including:
User's acquiring unit, services data for the history based on social network user, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Unit set up by model, for getting, from least one dimension, at least one characteristic parameter that described mark user is corresponding, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Taxon, for the disaggregated model based on described the first attribute for user, divides the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Embodiments provide user classification method and server, based on history service data acquisition at least one the mark user possessing the first attribute, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
Accompanying drawing explanation
Fig. 1 is embodiment of the present invention user classification method schematic flow sheet;
Fig. 2 is that the embodiment of the present invention chooses mark user's scene schematic diagram one;
Fig. 3 is that the embodiment of the present invention chooses mark user's scene schematic diagram one;
Fig. 4 is that the embodiment of the present invention chooses mark user's scene schematic diagram one;
Fig. 5 is that embodiment of the present invention user characteristics extracts scene schematic diagram;
Fig. 6 is embodiment of the present invention feature extraction content schematic diagram;
Fig. 7 is that the embodiment of the present invention sets up disaggregated model logical schematic;
Fig. 8 is embodiment of the present invention server composition structural representation;
Fig. 9 is embodiment of the present invention server hardware composition structural representation。
Detailed description of the invention
Below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail。
Embodiment one,
Embodiments provide a kind of user classification method, as it is shown in figure 1, described method includes:
Step 101: the history based on social network user services data, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Step 102: get at least one characteristic parameter that described mark user is corresponding from least one dimension, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Step 103: based on the disaggregated model of described the first attribute for user, divide the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Here, the scheme that the present embodiment provides can apply to server side。
Wherein, described in the disaggregated model of first attribute of user using the characteristic parameter of user as input parameter, using the classification of the first attribute corresponding to user as output parameter。
Before execution above-mentioned steps 101 gets at least one the mark user possessing the first attribute, described method also includes:
History based on social network user services data, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;Described first attribute can be the marital status of user;Accordingly, the classification that described first attribute is corresponding can be two kinds, and first category can be married, and second category can be unmarried;
History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;
Based on the common characteristic that described first kind initial user is corresponding, choosing the common characteristic difference value with described first kind initial user from described social networks and exceed at least one Equations of The Second Kind initial user of predetermined threshold value, the first attribute arranging described Equations of The Second Kind initial user is second category;
History based on described first kind initial user and Equations of The Second Kind initial user services data, sets up the disaggregated model of the first attribute for user。
Wherein, described in choose the method for at least one first kind initial user that the first attribute is first category and may include that the history service data according to user, choose and user that the first attribute is first category be set as first kind initial user。Described first category is married, and accordingly, first kind initial user is married user。Here, first choose first kind initial user and be because it is assumed that the love and marriage state filled in when social network user is registered is accurately, simply there are some problems not upgraded in time for a long time, but, for " married " state, once user steps into this state, substantially without change in reality, so, one can consider that be this state under data very accurate。
From the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user, may refer to Fig. 2, that is, regard at least one first kind initial user as positive example (Positivedata), preset ratio Equations of The Second Kind initial user is randomly choosed as negative example (Negativedata) from remaining whole users after excluding first kind initial user, i.e. unlabeled data (Unlabeleddata), set up and train the disaggregated model of the first attribute for user as training data based on first kind initial user and Equations of The Second Kind initial user。
Wherein, preset ratio for be configured, such as can choose the user of 30% according to practical situation as Equations of The Second Kind initial user from remaining user;Or, it is possible to choose the user of 50% as Equations of The Second Kind initial user。
The disaggregated model of described the first attribute for user can be a kind of binary classifier, is used for judging whether " married ", adopts LogisiticRegression (LR) machine learning algorithm, and training obtains model, i.e. LRModel。
Further, at least one the mark user possessing the first attribute is got described in, it is possible to including:
History based on social network user services data, chooses at least one user being provided with the first attribute as pending user;
Described pending user classified the classification results obtained for described pending user based on the disaggregated model of described the first attribute for user;
Determine the probability that the corresponding classification results of first attribute of described pending user is identical, choose the probability pending user higher than predetermined probabilities threshold value as mark user。
The content arranged in described first attribute can obtain based on the label of user。In described at least one user being provided with the first attribute, when user is configured for the first attribute, can exist and multiple content is set, it is possible to include: married, unmarried, unmarried, have in children, new marriage, love, engaged, departure, the plurality of kinds of contents such as divorced;
Accordingly, when the probability that the classification results that the first attribute determining described pending user is corresponding is identical, first can according to the content arranged in first attribute of pending user, the classification of correspondence is chosen for pending user, such as, have in the first attribute that married classification can be corresponding is arranged: married, newly-married, have children;Have in the first attribute that unmarried classification is corresponding is arranged: in unmarried, unmarried, in love, engaged, say good-bye and divorced etc.。
On the basis of Fig. 2, it is used for describing the process of above-mentioned data acquisition (DataAcquisition) referring to Fig. 3, estimate particularly as follows: social networks has all users of love and marriage fill state do classification, determine whether " married " crowd, probability is p (c | instance), retains the data meeting following condition as many classification candidate's training datasets:
P (c=0 | instance, label=0) > threshold1
P (c=1 | instance, label=1) > threshold2
Wherein, c be the disaggregated model of the first attribute for user estimate classification, namely at least one the second attribute and disaggregated model based on user judge that whether user married;Instance is pending user, and " married " whether label is the classification of instance mark, namely。Threshold represents interceptive value, and threshold1 is used for retaining the high probability crowd estimated as unmarried, and threshold2 is used for retaining the high probability crowd estimated as married。
Visible, by adopting such scheme, just can based on history service data acquisition to possess the first attribute at least one mark user, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
Embodiment two,
Embodiments provide a kind of user classification method, as it is shown in figure 1, described method includes:
Step 101: the history based on social network user services data, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Step 102: get at least one characteristic parameter that described mark user is corresponding from least one dimension, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Step 103: based on the disaggregated model of described the first attribute for user, divide the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Here, the scheme that the present embodiment provides can apply to server side。
Wherein, described in the disaggregated model of first attribute of user using the characteristic parameter of user as input parameter, using the classification of the first attribute corresponding to user as output parameter。
Before execution above-mentioned steps 101 gets at least one the mark user possessing the first attribute, described method also includes:
History based on social network user services data, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;Described first attribute can be the marital status of user;Accordingly, the classification that described first attribute is corresponding can be two kinds, and first category can be married, and second category can be unmarried;
History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;
Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value;
History based on described first kind initial user and Equations of The Second Kind initial user services data, sets up the disaggregated model of the first attribute for user。
Wherein, described in choose the method for at least one first kind initial user that the first attribute is first category and may include that the history service data according to user, choose and user that the first attribute is first category be set as first kind initial user。Described first category is married, and accordingly, first kind initial user is married user。Here, first choose first kind initial user and be because it is assumed that the love and marriage state filled in when social network user is registered is accurately, simply there are some problems not upgraded in time for a long time, but, for " married " state, once user steps into this state, substantially without change in reality, so, one can consider that be this state under data very accurate。
Based on aforesaid operations, the present embodiment additionally provides described chooses at least one Equations of The Second Kind initial user from the whole users removing at least one first kind initial user described, including:
History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;
Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value。
Selection for negative example, randomized policy may result in there are the data that should be Positive and be not marked out in Unlabeleddata, because married user's accounting is significantly high in reality, so, it is possible to only from the data big with known Positivedata comparison in difference, randomly select more structurally sound negative example for training。Here the cosine similarity between sample characteristics (as interest preference is distributed) can be passed through as judging basis。
The disaggregated model of described the first attribute for user can be a kind of binary classifier, is used for judging whether " married ", adopts LogisiticRegression (LR) machine learning algorithm, and training obtains model, i.e. LRModel。
Further, at least one the mark user possessing the first attribute is got described in, it is possible to including:
History based on social network user services data, chooses at least one user being provided with the first attribute as pending user;
Described pending user classified the classification results obtained for described pending user based on the disaggregated model of described the first attribute for user;
Determine the probability that the corresponding classification results of first attribute of described pending user is identical, choose the probability pending user higher than predetermined probabilities threshold value as mark user。
The content arranged in described first attribute can obtain based on the label of user。In described at least one user being provided with the first attribute, when user is configured for the first attribute, can exist and multiple content is set, it is possible to include: married, unmarried, unmarried, have in children, new marriage, love, engaged, departure, the plurality of kinds of contents such as divorced;
Accordingly, when the probability that the classification results that the first attribute determining described pending user is corresponding is identical, first can according to the content arranged in first attribute of pending user, the classification of correspondence is chosen for pending user, such as, have in the first attribute that married classification can be corresponding is arranged: married, newly-married, have children;Have in the first attribute that unmarried classification is corresponding is arranged: in unmarried, unmarried, in love, engaged, say good-bye and divorced etc.。
Preferably, the present embodiment, after choosing mark user, also can further ensure the quality of training data, further mark user is calibrated, concrete, described in choose the probability pending user higher than predetermined probabilities threshold value as after mark user, described method also includes:
The history service data that mark user is corresponding are got respectively from least one dimension;
History based at least one dimension described services data, described mark user is screened, the mark user after being screened。
Wherein, at least one dimension described can include at least one of: the frequency of the preset kind website that user browses;The type of the customer group that user adds;The type of the target data of user operation;The content that the attribute of the preset kind of user is corresponding。Described preset kind can be the website of love and marriage type;Customer group can be unmarried group, mother and baby group etc.;The target data of operation can be the photo type in photograph album。
Such as; user as often browsed marriage and making friend's class website can not in non-" unmarried " training set; often it is active in the user in mother and baby's monoid photograph album in non-" married & child-bearing " training set, can not comprise the user of wedding photography can not occur in non-" newly-married & is married " training set。
From the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user, may refer to Fig. 2, that is, regard at least one first kind initial user as positive example (Positivedata), preset ratio Equations of The Second Kind initial user is randomly choosed as negative example (Negativedata) from remaining whole users after excluding first kind initial user, i.e. unlabeled data (Unlabeleddata), set up and train the disaggregated model of the first attribute for user as training data based on first kind initial user and Equations of The Second Kind initial user。
On the basis of Fig. 2, it is used for describing the process of above-mentioned data acquisition (DataAcquisition) referring to Fig. 3, estimate particularly as follows: social networks has all users of love and marriage fill state do classification, determine whether " married " crowd, probability is p (c | instance), retains the data meeting following condition as many classification candidate's training datasets:
P (c=0 | instance, label=0) > threshold1
P (c=1 | instance, label=1) > threshold2
Wherein, c be the disaggregated model of the first attribute for user estimate classification, namely at least one the second attribute and disaggregated model based on user judge that whether user married;Instance is pending user, and " married " whether label is the classification of instance mark, namely。Threshold represents interceptive value, and threshold1 is used for retaining the high probability crowd estimated as unmarried, and threshold2 is used for retaining the high probability crowd estimated as married。
With further reference to Fig. 4; data calibration (DataCalibration): in order to be further ensured that training data quality; Manual definition's rule; candidate's training dataset is corrected; as follows: to collect the user of high-accuracy under each state; user as often browsed marriage and making friend's class website can not in non-" unmarried " training set; often being active in the user in mother and baby's monoid can not in non-" married & child-bearing " training set; the user comprising wedding photography in photograph album can not occur in non-" newly-married & is married " training set, etc.。It is " love " or " unmarried " that user less than 18 years old is only possible to。Accordingly, it is possible to get a large amount of user annotation data set with love and marriage state, for the training of model。
Visible, by adopting such scheme, just can based on history service data acquisition to possess the first attribute at least one mark user, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
Embodiment three,
Embodiments provide a kind of user classification method, as it is shown in figure 1, described method includes:
Step 101: the history based on social network user services data, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Step 102: get at least one characteristic parameter that described mark user is corresponding from least one dimension, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Step 103: based on the disaggregated model of described the first attribute for user, divide the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Here, the scheme that the present embodiment provides can apply to server side。
Wherein, described in the disaggregated model of first attribute of user using the characteristic parameter of user as input parameter, using the classification of the first attribute corresponding to user as output parameter。
Before execution above-mentioned steps 101 gets at least one the mark user possessing the first attribute, described method also includes:
History based on social network user services data, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;Described first attribute can be the marital status of user;Accordingly, the classification that described first attribute is corresponding can be two kinds, and first category can be married, and second category can be unmarried;
History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;
Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value;
History based on described first kind initial user and Equations of The Second Kind initial user services data, sets up the disaggregated model of the first attribute for user。
Wherein, described in choose the method for at least one first kind initial user that the first attribute is first category and may include that the history service data according to user, choose and user that the first attribute is first category be set as first kind initial user。Described first category is married, and accordingly, first kind initial user is married user。Here, first choose first kind initial user and be because it is assumed that the love and marriage state filled in when social network user is registered is accurately, simply there are some problems not upgraded in time for a long time, but, for " married " state, once user steps into this state, substantially without change in reality, so, one can consider that be this state under data very accurate。
Wherein, preset ratio for be configured, such as can choose the user of 30% according to practical situation as Equations of The Second Kind initial user from remaining user;Or, it is possible to choose the user of 50% as Equations of The Second Kind initial user。
Based on aforesaid operations, the present embodiment additionally provides described chooses at least one Equations of The Second Kind initial user from the whole users removing at least one first kind initial user described, including:
History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;
Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value。
Selection for negative example, randomized policy may result in there are the data that should be Positive and be not marked out in Unlabeleddata, because married user's accounting is significantly high in reality, so, it is possible to only from the data big with known Positivedata comparison in difference, randomly select more structurally sound negative example for training。Here the cosine similarity between sample characteristics (as interest preference is distributed) can be passed through as judging basis。
The disaggregated model of described the first attribute for user can be a kind of binary classifier, is used for judging whether " married ", adopts LogisiticRegression (LR) machine learning algorithm, and training obtains model, i.e. LRModel。
Further, at least one the mark user possessing the first attribute is got described in, it is possible to including:
History based on social network user services data, chooses at least one user being provided with the first attribute as pending user;
Described pending user classified the classification results obtained for described pending user based on the disaggregated model of described the first attribute for user;
Determine the probability that the corresponding classification results of first attribute of described pending user is identical, choose the probability pending user higher than predetermined probabilities threshold value as mark user。
The content arranged in described first attribute can obtain based on the label of user。In described at least one user being provided with the first attribute, when user is configured for the first attribute, can exist and multiple content is set, it is possible to include: married, unmarried, unmarried, have in children, new marriage, love, engaged, departure, the plurality of kinds of contents such as divorced;
Accordingly, when the probability that the classification results that the first attribute determining described pending user is corresponding is identical, first can according to the content arranged in first attribute of pending user, the classification of correspondence is chosen for pending user, such as, have in the first attribute that married classification can be corresponding is arranged: married, newly-married, have children;Have in the first attribute that unmarried classification is corresponding is arranged: in unmarried, unmarried, in love, engaged, say good-bye and divorced etc.。
Preferably, the present embodiment, after choosing mark user, also can further ensure the quality of training data, further mark user is calibrated, concrete, described in choose the probability pending user higher than predetermined probabilities threshold value as after mark user, described method also includes:
The history service data that mark user is corresponding are got respectively from least one dimension;
History based at least one dimension described services data, described mark user is screened, the mark user after being screened。
Wherein, at least one dimension described can include at least one of: the frequency of the preset kind website that user browses;The type of the customer group that user adds;The type of the target data of user operation;The content that the attribute of the preset kind of user is corresponding。Described preset kind can be the website of love and marriage type;Customer group can be unmarried group, mother and baby group etc.;The target data of operation can be the photo type in photograph album。
Further, user's love and marriage state classifier is it is important that user characteristics extracts and design of algorithm。Wherein, extraction is effectively characterized by of paramount importance。Referring to Fig. 5, wherein, data source represents the data of the user carrying out feature extraction, and feature extraction can for carry out feature extraction according at least one dimension, the character representation of normal distribution, chooses feature misaligned each other from the feature extracted。
The present embodiment illustrates for the foundation of disaggregated model, training and the adjustment of first attribute of user, described gets described mark at least one characteristic parameter corresponding to user from least one dimension, including at least one of:
History based on mark user services the data acquisition base attribute parameter to described mark user;
History service data acquisition based on mark user marks user's operating parameter for target data to described;
Based on the interaction feature parameter that the history service data acquisition of mark user is determined to the interaction data between described mark user and other users except described mark user。
Mainly can as shown in Figure 6, including following several classes:
Crowd's attribute (Demographics): user's base attribute information, including age, sex, occupation, education degree, consumption habit, local, permanent residence etc.;
Behavior hobby (Behavioral): user's commercial interest and key word Tag, excavates source and includes group, ad click, mobile App, web page browsing etc.;
Marketing rule (RemarketingRule) again: submit the regular identification information that the ID number bag uploaded generates to according to advertiser, it is also possible to be associated with advertising message according to rule identification information。
Further, above-mentioned at least one characteristic parameter is illustrated:
The base attribute parameter of described mark user, including at least one of: login position information, login time section, add the group of preset name and at the frequency of interaction of described group;
Described mark user for the operating parameter of target data, at least includes: for the operation frequency of target information and the operation time period of preset kind;
Described mark user and the interaction feature parameter determined except the interaction data between described other users marked except user, including at least one of: frequency of interaction between the gender attribute of other users described, other users described and described mark user and the entry address information with other users described。
Accordingly, the history based at least one dimension described services data, described mark user is screened, the mark user after being screened, it is possible to at least one of:
The condition of predeterminated frequency and preset time period is met for the operation frequency of target information of preset kind and operation time period;Such as, LBS behavior: be always active in the youngster in campus it is more likely that unmarried or in love;Line duration section: total late into the night, online user was it is more likely that unmarried user;Good friend's packet name: whether comprise the packet of specific appellation and interactive frequency;
The interaction feature parameter that interaction data between described mark user and other users except described mark user is determined meets pre-conditioned;
Such as, the gender attribute of other users described is different from the gender attribute of described mark user, that is, described mark user often chats with friends of the opposite sex, the more likely unmarried user of right and wrong, it is, of course, also possible to whether consider between described mark user and other users described is mutually be satisfied by described pre-conditioned simultaneously, being namely used for judging whether is unique interactive object of the other side;And can also judge whether other users are the good friend comprising specific appellation, and interactive frequency between the two;
Login behavior based on mark user with other users judges, such as, whether two men and women good friends log in frequently by same IP, especially distinguish evening, weekend, festivals or holidays;
Furthermore it is also possible to get the love and marriage state of other users described: more good friend's love and marriage state is more likely consistent with contacting。
Operation frequency and operation time period based on the target information for preset kind, it is judged that whether the operation frequency for the target information of preset kind meets frequency threshold, and whether operation time period meets preset period of time requirement;
Such as, photograph album classification: whether upload new marriage, child-bearing class photograph album in the recent period;
Or, UGC is dynamic: whether delivered the word of lovers, new marriage, child-bearing class in the recent period。
Referring to Fig. 7, on the basis of Fig. 5, it is possible to choose one or more feature as user characteristics according to feature configuration from multiple features that left side extracts;After the labeled data formed further according to mark user and user characteristics mate, obtain training data and test data;Wherein, training data and test data can be chosen according to practical situation, such as can every 4 data be chosen one remaining as training data as test data;
Based on training data, disaggregated model is trained, wherein, is trained for the multiple features according to user as input data, by type corresponding for known user as a result, disaggregated model to be trained;
Based on test data, disaggregated model is predicted, wherein, being predicted can for the multiple features according to user as input data, the output result of correspondence is obtained based on disaggregated model, judge the probability of output result and the type matching of user, when probability is higher than the threshold value preset, it is determined that disaggregated model is successfully established;Otherwise, training is proceeded。
The foundation of disaggregated model and train us simultaneously to attempt using two kinds of strategies: single SoftmaxRegression multivariate classification device and multiple One-vs-AllLogisticRegression binary classifier, by tuning training data scale, positive and negative example ratio, optimized algorithm and regular factor etc., choose grader strategy and parameter, the learning model of optimum。Finally, whole users are done classification and estimates, choose the love and marriage label of a maximum of probability for each user。In order to ensure accuracy rate, it is possible to maximum of probability is arranged threshold value and blocks, final guarantee accuracy rate and user cover balance, reach the effect of the best。
Visible, by adopting such scheme, just can based on history service data acquisition to possess the first attribute at least one mark user, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
Embodiment four,
Embodiments provide a kind of server, as shown in Figure 8, including:
User's acquiring unit 81, services data for the history based on social network user, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Unit 82 set up by model, for getting, from least one dimension, at least one characteristic parameter that described mark user is corresponding, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Taxon 83, for the disaggregated model based on described the first attribute for user, divides the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Here, the scheme that the present embodiment provides can apply to server side。
Wherein, described in the disaggregated model of first attribute of user using the characteristic parameter of user as input parameter, using the classification of the first attribute corresponding to user as output parameter。
User's acquiring unit 81, services data for the history based on social network user, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;Described first attribute can be the marital status of user;Accordingly, the classification that described first attribute is corresponding can be two kinds, and first category can be married, and second category can be unmarried;History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;Based on the common characteristic that described first kind initial user is corresponding, choosing the common characteristic difference value with described first kind initial user from described social networks and exceed at least one Equations of The Second Kind initial user of predetermined threshold value, the first attribute arranging described Equations of The Second Kind initial user is second category;History based on described first kind initial user and Equations of The Second Kind initial user services data, sets up the disaggregated model of the first attribute for user。
Wherein, described in choose the method for at least one first kind initial user that the first attribute is first category and may include that the history service data according to user, choose and user that the first attribute is first category be set as first kind initial user。Described first category is married, and accordingly, first kind initial user is married user。Here, first choose first kind initial user and be because it is assumed that the love and marriage state filled in when social network user is registered is accurately, simply there are some problems not upgraded in time for a long time, but, for " married " state, once user steps into this state, substantially without change in reality, so, one can consider that be this state under data very accurate。
From the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user, may refer to Fig. 2, that is, regard at least one first kind initial user as positive example (Positivedata), preset ratio Equations of The Second Kind initial user is randomly choosed as negative example (Negativedata) from remaining whole users after excluding first kind initial user, i.e. unlabeled data (Unlabeleddata), set up and train the disaggregated model of the first attribute for user as training data based on first kind initial user and Equations of The Second Kind initial user。
Wherein, preset ratio for be configured, such as can choose the user of 30% according to practical situation as Equations of The Second Kind initial user from remaining user;Or, it is possible to choose the user of 50% as Equations of The Second Kind initial user。
The disaggregated model of described the first attribute for user can be a kind of binary classifier, is used for judging whether " married ", adopts LogisiticRegression (LR) machine learning algorithm, and training obtains model, i.e. LRModel。
Further, user's acquiring unit 81, service data for the history based on social network user, choose at least one user being provided with the first attribute as pending user;Described pending user classified the classification results obtained for described pending user based on the disaggregated model of described the first attribute for user;Determine the probability that the corresponding classification results of first attribute of described pending user is identical, choose the probability pending user higher than predetermined probabilities threshold value as mark user。
The content arranged in described first attribute can obtain based on the label of user。In described at least one user being provided with the first attribute, when user is configured for the first attribute, can exist and multiple content is set, it is possible to include: married, unmarried, unmarried, have in children, new marriage, love, engaged, departure, the plurality of kinds of contents such as divorced;
Accordingly, when the probability that the classification results that the first attribute determining described pending user is corresponding is identical, first can according to the content arranged in first attribute of pending user, the classification of correspondence is chosen for pending user, such as, have in the first attribute that married classification can be corresponding is arranged: married, newly-married, have children;Have in the first attribute that unmarried classification is corresponding is arranged: in unmarried, unmarried, in love, engaged, say good-bye and divorced etc.。
On the basis of Fig. 2, it is used for describing the process of above-mentioned data acquisition (DataAcquisition) referring to Fig. 3, estimate particularly as follows: social networks has all users of love and marriage fill state do classification, determine whether " married " crowd, probability is p (c | instance), retains the data meeting following condition as many classification candidate's training datasets:
P (c=0 | instance, label=0) > threshold1
P (c=1 | instance, label=1) > threshold2
Wherein, c be the disaggregated model of the first attribute for user estimate classification, namely at least one the second attribute and disaggregated model based on user judge that whether user married;Instance is pending user, and " married " whether label is the classification of instance mark, namely。Threshold represents interceptive value, and threshold1 is used for retaining the high probability crowd estimated as unmarried, and threshold2 is used for retaining the high probability crowd estimated as married。
Visible, by adopting such scheme, just can based on history service data acquisition to possess the first attribute at least one mark user, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
Embodiment five,
Embodiments provide a kind of server, as shown in Figure 8, including:
User's acquiring unit 81, services data for the history based on social network user, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Unit 82 set up by model, for getting, from least one dimension, at least one characteristic parameter that described mark user is corresponding, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Taxon 83, for the disaggregated model based on described the first attribute for user, divides the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Wherein, described in the disaggregated model of first attribute of user using the characteristic parameter of user as input parameter, using the classification of the first attribute corresponding to user as output parameter。
User's acquiring unit 81, services data for the history based on social network user, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;Described first attribute can be the marital status of user;Accordingly, the classification that described first attribute is corresponding can be two kinds, and first category can be married, and second category can be unmarried;History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value;History based on described first kind initial user and Equations of The Second Kind initial user services data, sets up the disaggregated model of the first attribute for user。
Based on aforesaid operations, the present embodiment additionally provides described from the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user, user's acquiring unit 81, data are serviced, it is determined that the common characteristic that described first kind initial user is corresponding for the history based on described first kind initial user;Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value。
Selection for negative example, randomized policy may result in there are the data that should be Positive and be not marked out in Unlabeleddata, because married user's accounting is significantly high in reality, so, it is possible to only from the data big with known Positivedata comparison in difference, randomly select more structurally sound negative example for training。Here the cosine similarity between sample characteristics (as interest preference is distributed) can be passed through as judging basis。
The disaggregated model of described the first attribute for user can be a kind of binary classifier, is used for judging whether " married ", adopts LogisiticRegression (LR) machine learning algorithm, and training obtains model, i.e. LRModel。
Further, user's acquiring unit 81, service data for the history based on social network user, choose at least one user being provided with the first attribute as pending user;Described pending user classified the classification results obtained for described pending user based on the disaggregated model of described the first attribute for user;Determine the probability that the corresponding classification results of first attribute of described pending user is identical, choose the probability pending user higher than predetermined probabilities threshold value as mark user。
The content arranged in described first attribute can obtain based on the label of user。In described at least one user being provided with the first attribute, when user is configured for the first attribute, can exist and multiple content is set, it is possible to include: married, unmarried, unmarried, have in children, new marriage, love, engaged, departure, the plurality of kinds of contents such as divorced;
Accordingly, when the probability that the classification results that the first attribute determining described pending user is corresponding is identical, first can according to the content arranged in first attribute of pending user, the classification of correspondence is chosen for pending user, such as, have in the first attribute that married classification can be corresponding is arranged: married, newly-married, have children;Have in the first attribute that unmarried classification is corresponding is arranged: in unmarried, unmarried, in love, engaged, say good-bye and divorced etc.。
Preferably, the present embodiment is after choosing mark user, also can further ensure the quality of training data, further mark user is calibrated, concrete, described probability of choosing is higher than the pending user of predetermined probabilities threshold value as after mark user, and user's acquiring unit 81, for getting, from least one dimension, the history service data that mark user is corresponding respectively;History based at least one dimension described services data, described mark user is screened, the mark user after being screened。
Wherein, at least one dimension described can include at least one of: the frequency of the preset kind website that user browses;The type of the customer group that user adds;The type of the target data of user operation;The content that the attribute of the preset kind of user is corresponding。Described preset kind can be the website of love and marriage type;Customer group can be unmarried group, mother and baby group etc.;The target data of operation can be the photo type in photograph album。
Such as; user as often browsed marriage and making friend's class website can not in non-" unmarried " training set; often it is active in the user in mother and baby's monoid photograph album in non-" married & child-bearing " training set, can not comprise the user of wedding photography can not occur in non-" newly-married & is married " training set。
From the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user, may refer to Fig. 2, that is, regard at least one first kind initial user as positive example (Positivedata), preset ratio Equations of The Second Kind initial user is randomly choosed as negative example (Negativedata) from remaining whole users after excluding first kind initial user, i.e. unlabeled data (Unlabeleddata), set up and train the disaggregated model of the first attribute for user as training data based on first kind initial user and Equations of The Second Kind initial user。
On the basis of Fig. 2, it is used for describing the process of above-mentioned data acquisition (DataAcquisition) referring to Fig. 3, estimate particularly as follows: social networks has all users of love and marriage fill state do classification, determine whether " married " crowd, probability is p (c | instance), retains the data meeting following condition as many classification candidate's training datasets:
P (c=0 | instance, label=0) > threshold1
P (c=1 | instance, label=1) > threshold2
Wherein, c be the disaggregated model of the first attribute for user estimate classification, namely at least one the second attribute and disaggregated model based on user judge that whether user married;Instance is pending user, and " married " whether label is the classification of instance mark, namely。Threshold represents interceptive value, and threshold1 is used for retaining the high probability crowd estimated as unmarried, and threshold2 is used for retaining the high probability crowd estimated as married。
With further reference to Fig. 4; data calibration (DataCalibration): in order to be further ensured that training data quality; Manual definition's rule; candidate's training dataset is corrected; as follows: to collect the user of high-accuracy under each state; user as often browsed marriage and making friend's class website can not in non-" unmarried " training set; often being active in the user in mother and baby's monoid can not in non-" married & child-bearing " training set; the user comprising wedding photography in photograph album can not occur in non-" newly-married & is married " training set, etc.。It is " love " or " unmarried " that user less than 18 years old is only possible to。Accordingly, it is possible to get a large amount of user annotation data set with love and marriage state, for the training of model。
Visible, by adopting such scheme, just can based on history service data acquisition to possess the first attribute at least one mark user, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
Embodiment six,
Embodiments provide a kind of server, as shown in Figure 8, including:
User's acquiring unit 81, services data for the history based on social network user, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Unit 82 set up by model, for getting, from least one dimension, at least one characteristic parameter that described mark user is corresponding, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Taxon 83, for the disaggregated model based on described the first attribute for user, divides the classification of the first attribute of its correspondence at least one targeted customer in social networks。
Wherein, described in the disaggregated model of first attribute of user using the characteristic parameter of user as input parameter, using the classification of the first attribute corresponding to user as output parameter。
User's acquiring unit 81, services data for the history based on social network user, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;Described first attribute can be the marital status of user;Accordingly, the classification that described first attribute is corresponding can be two kinds, and first category can be married, and second category can be unmarried;History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value;History based on described first kind initial user and Equations of The Second Kind initial user services data, sets up the disaggregated model of the first attribute for user。
Wherein, described in choose the method for at least one first kind initial user that the first attribute is first category and may include that the history service data according to user, choose and user that the first attribute is first category be set as first kind initial user。Described first category is married, and accordingly, first kind initial user is married user。Here, first choose first kind initial user and be because it is assumed that the love and marriage state filled in when social network user is registered is accurately, simply there are some problems not upgraded in time for a long time, but, for " married " state, once user steps into this state, substantially without change in reality, so, one can consider that be this state under data very accurate。
Wherein, preset ratio for be configured, such as can choose the user of 30% according to practical situation as Equations of The Second Kind initial user from remaining user;Or, it is possible to choose the user of 50% as Equations of The Second Kind initial user。
User's acquiring unit 81, services data for the history based on described first kind initial user, it is determined that the common characteristic that described first kind initial user is corresponding;Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value。
Selection for negative example, randomized policy may result in there are the data that should be Positive and be not marked out in Unlabeleddata, because married user's accounting is significantly high in reality, so, it is possible to only from the data big with known Positivedata comparison in difference, randomly select more structurally sound negative example for training。Here the cosine similarity between sample characteristics (as interest preference is distributed) can be passed through as judging basis。
The disaggregated model of described the first attribute for user can be a kind of binary classifier, is used for judging whether " married ", adopts LogisiticRegression (LR) machine learning algorithm, and training obtains model, i.e. LRModel。
Further, user's acquiring unit 81, service data for the history based on social network user, choose at least one user being provided with the first attribute as pending user;Described pending user classified the classification results obtained for described pending user based on the disaggregated model of described the first attribute for user;Determine the probability that the corresponding classification results of first attribute of described pending user is identical, choose the probability pending user higher than predetermined probabilities threshold value as mark user。
The content arranged in described first attribute can obtain based on the label of user。In described at least one user being provided with the first attribute, when user is configured for the first attribute, can exist and multiple content is set, it is possible to include: married, unmarried, unmarried, have in children, new marriage, love, engaged, departure, the plurality of kinds of contents such as divorced;
Accordingly, when the probability that the classification results that the first attribute determining described pending user is corresponding is identical, first can according to the content arranged in first attribute of pending user, the classification of correspondence is chosen for pending user, such as, have in the first attribute that married classification can be corresponding is arranged: married, newly-married, have children;Have in the first attribute that unmarried classification is corresponding is arranged: in unmarried, unmarried, in love, engaged, say good-bye and divorced etc.。
Preferably, the present embodiment is after choosing mark user, also can further ensure the quality of training data, further mark user is calibrated, concrete, described probability of choosing is higher than the pending user of predetermined probabilities threshold value as after mark user, and user's acquiring unit 81, for getting, from least one dimension, the history service data that mark user is corresponding respectively;History based at least one dimension described services data, described mark user is screened, the mark user after being screened。
Wherein, at least one dimension described can include at least one of: the frequency of the preset kind website that user browses;The type of the customer group that user adds;The type of the target data of user operation;The content that the attribute of the preset kind of user is corresponding。Described preset kind can be the website of love and marriage type;Customer group can be unmarried group, mother and baby group etc.;The target data of operation can be the photo type in photograph album。
Further, user's love and marriage state classifier is it is important that user characteristics extracts and design of algorithm。Wherein, extraction is effectively characterized by of paramount importance。Referring to Fig. 5, wherein, data source represents the data of the user carrying out feature extraction, and feature extraction can for carry out feature extraction according at least one dimension, the character representation of normal distribution, chooses feature misaligned each other from the feature extracted。
The present embodiment illustrates for the foundation of disaggregated model, training and the adjustment of first attribute of user, described gets described mark at least one characteristic parameter corresponding to user from least one dimension, including at least one of:
History based on mark user services the data acquisition base attribute parameter to described mark user;
History service data acquisition based on mark user marks user's operating parameter for target data to described;
Based on the interaction feature parameter that the history service data acquisition of mark user is determined to the interaction data between described mark user and other users except described mark user。
Mainly can as shown in Figure 6, including following several classes:
Crowd's attribute (Demographics): user's base attribute information, including age, sex, occupation, education degree, consumption habit, local, permanent residence etc.;
Behavior hobby (Behavioral): user's commercial interest and key word Tag, excavates source and includes group, ad click, mobile App, web page browsing etc.;
Marketing rule (RemarketingRule) again: submit the regular identification information that the ID number bag uploaded generates to according to advertiser, it is also possible to be associated with advertising message according to rule identification information。
Further, above-mentioned at least one characteristic parameter is illustrated:
The base attribute parameter of described mark user, including at least one of: login position information, login time section, add the group of preset name and at the frequency of interaction of described group;
Described mark user for the operating parameter of target data, at least includes: for the operation frequency of target information and the operation time period of preset kind;
Described mark user and the interaction feature parameter determined except the interaction data between described other users marked except user, including at least one of: frequency of interaction between the gender attribute of other users described, other users described and described mark user and the entry address information with other users described。
Accordingly, the history based at least one dimension described services data, described mark user is screened, the mark user after being screened, it is possible to at least one of:
The condition of predeterminated frequency and preset time period is met for the operation frequency of target information of preset kind and operation time period;Such as, LBS behavior: be always active in the youngster in campus it is more likely that unmarried or in love;Line duration section: total late into the night, online user was it is more likely that unmarried user;Good friend's packet name: whether comprise the packet of specific appellation and interactive frequency;
The interaction feature parameter that interaction data between described mark user and other users except described mark user is determined meets pre-conditioned;
Such as, the gender attribute of other users described is different from the gender attribute of described mark user, that is, described mark user often chats with friends of the opposite sex, the more likely unmarried user of right and wrong, it is, of course, also possible to whether consider between described mark user and other users described is mutually be satisfied by described pre-conditioned simultaneously, being namely used for judging whether is unique interactive object of the other side;And can also judge whether other users are the good friend comprising specific appellation, and interactive frequency between the two;
Login behavior based on mark user with other users judges, such as, whether two men and women good friends log in frequently by same IP, especially distinguish evening, weekend, festivals or holidays;
Furthermore it is also possible to get the love and marriage state of other users described: more good friend's love and marriage state is more likely consistent with contacting。
Operation frequency and operation time period based on the target information for preset kind, it is judged that whether the operation frequency for the target information of preset kind meets frequency threshold, and whether operation time period meets preset period of time requirement;
Such as, photograph album classification: whether upload new marriage, child-bearing class photograph album in the recent period;
Or, UGC is dynamic: whether delivered the word of lovers, new marriage, child-bearing class in the recent period。
Referring to Fig. 7, on the basis of Fig. 5, it is possible to choose one or more feature as user characteristics according to feature configuration from multiple features that left side extracts;After the labeled data formed further according to mark user and user characteristics mate, obtain training data and test data;Wherein, training data and test data can be chosen according to practical situation, such as can every 4 data be chosen one remaining as training data as test data;
Based on training data, disaggregated model is trained, wherein, is trained for the multiple features according to user as input data, by type corresponding for known user as a result, disaggregated model to be trained;
Based on test data, disaggregated model is predicted, wherein, being predicted can for the multiple features according to user as input data, the output result of correspondence is obtained based on disaggregated model, judge the probability of output result and the type matching of user, when probability is higher than the threshold value preset, it is determined that disaggregated model is successfully established;Otherwise, training is proceeded。
The foundation of disaggregated model and train us simultaneously to attempt using two kinds of strategies: single SoftmaxRegression multivariate classification device and multiple One-vs-AllLogisticRegression binary classifier, by tuning training data scale, positive and negative example ratio, optimized algorithm and regular factor etc., choose grader strategy and parameter, the learning model of optimum。Finally, whole users are done classification and estimates, choose the love and marriage label of a maximum of probability for each user。In order to ensure accuracy rate, it is possible to maximum of probability is arranged threshold value and blocks, final guarantee accuracy rate and user cover balance, reach the effect of the best。
Visible, by adopting such scheme, just can based on history service data acquisition to possess the first attribute at least one mark user, determine the disaggregated model of the first attribute for user again based on the first attribute of at least one characteristic parameter of at least one dimension and mark user, divide classification according to described disaggregated model at least one targeted customer。So, it is possible to avoid the first attribute owing to user does not fill in or to fill in the first attribute out-of-date, and the problem that cannot divide classification for targeted customer accurately caused。
If module integrated described in the embodiment of the present invention is using the form realization of software function module and as independent production marketing or use, it is also possible to be stored in a computer read/write memory medium。Based on such understanding, the part that prior art is contributed by the technical scheme of the embodiment of the present invention substantially in other words can embody with the form of software product, this computer software product is stored in a storage medium, including some instructions with so that a computer equipment (can be personal computer, base station or the network equipment etc.) performs all or part of of method described in each embodiment of the present invention。And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-OnlyMemory), the various media that can store program code such as random access memory (RAM, RandomAccessMemory), magnetic disc or CD。So, the embodiment of the present invention is not restricted to the combination of any specific hardware and software。
The present embodiment provides a concrete hardware based on the said equipment embodiment, as it is shown in figure 9, described device includes processor 92, storage medium 94 and at least one external communication interface 91;Described processor 92, storage medium 94 and external communication interface 91 connect each through bus 93。Described processor 92 can be that microprocessor, central processing unit, digital signal processor or programmable logic array etc. have the electronic devices and components processing function。In described storage medium, storage has computer-executable code。
Described hardware can be described server。When described processor performs described computer-executable code, at least can realize following functions: the history based on social network user services data, get at least one the mark user possessing the first attribute;At least one characteristic parameter that described mark user is corresponding is got, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user from least one dimension;Based on the disaggregated model of described the first attribute for user, divide the classification of the first attribute of its correspondence at least one targeted customer in social networks。
The above, be only presently preferred embodiments of the present invention, is not intended to limit protection scope of the present invention。

Claims (14)

1. a user classification method, it is characterised in that described method includes:
History based on social network user services data, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
At least one characteristic parameter that described mark user is corresponding is got, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user from least one dimension;
Based on the disaggregated model of described the first attribute for user, divide the classification of the first attribute of its correspondence at least one targeted customer in social networks。
2. method according to claim 1, it is characterised in that before getting at least one the mark user possessing the first attribute, described method also includes:
History based on social network user services data, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;
From the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user;
Based on described first kind initial user and Equations of The Second Kind initial user, set up the binary classification model of the first attribute for user。
3. method according to claim 2, it is characterised in that described choose at least one Equations of The Second Kind initial user from the whole users removing at least one first kind initial user described, including:
History based on described first kind initial user services data, it is determined that the common characteristic that described first kind initial user is corresponding;
Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value。
4. method according to claim 2, it is characterised in that described in get possess the first attribute at least one mark user, including:
History based on social network user services data, chooses at least one user being provided with the first attribute as pending user;
Described pending user classified the classification results obtained for described pending user based on described binary classification model;
The probability that the corresponding classification results of the first attribute according to described pending user is identical, chooses the probability pending user higher than predetermined probabilities threshold value as mark user。
5. method according to claim 4, it is characterised in that described in choose probability higher than predetermined probabilities threshold value pending user as mark user after, described method also includes:
The history service data that mark user is corresponding are got respectively from least one dimension;
History based at least one dimension described services data, described mark user is screened, the mark user after being screened。
6. method according to claim 1, it is characterised in that described get described mark at least one characteristic parameter corresponding to user from least one dimension, including at least one of:
History based on mark user services the data acquisition base attribute parameter to described mark user;
History service data acquisition based on mark user marks user's operating parameter for target data to described;
Based on the interaction feature parameter that the history service data acquisition of mark user is determined to the interaction data between described mark user and other users except described mark user。
7. method according to claim 6, it is characterised in that the base attribute parameter of described mark user, including at least one of: login position information, login time section, add the group of preset name and at the frequency of interaction of described group;
Described mark user for the operating parameter of target data, at least includes: for the operation frequency of target information and the operation time period of preset kind;
Described mark user and the interaction feature parameter determined except the interaction data between described other users marked except user, including at least one of: frequency of interaction between the gender attribute of other users described, other users described and described mark user and the entry address information with other users described。
8. a server, it is characterised in that including:
User's acquiring unit, services data for the history based on social network user, gets at least one the mark user possessing the first attribute;Wherein, described first attribute is for characterizing the love and marriage state of described social network user;
Unit set up by model, for getting, from least one dimension, at least one characteristic parameter that described mark user is corresponding, based on the first attribute that the described characteristic parameter marking user and described mark user are corresponding, it is determined that for the disaggregated model of first attribute of user;
Taxon, for the disaggregated model based on described the first attribute for user, divides the classification of the first attribute of its correspondence at least one targeted customer in social networks。
9. server according to claim 8, it is characterised in that
User's acquiring unit, services data for the history based on social network user, chooses at least one first kind initial user that the first attribute is first category;Wherein, including first category and second category in described first attribute, described first category is different from described second category;From the whole users removing at least one first kind initial user described, choose at least one Equations of The Second Kind initial user;Based on described first kind initial user and Equations of The Second Kind initial user, set up the binary classification model of the first attribute for user。
10. server according to claim 9, it is characterised in that
Described user's acquiring unit, services data for the history based on described first kind initial user, it is determined that the common characteristic that described first kind initial user is corresponding;Based on the common characteristic that described first kind initial user is corresponding, from described social networks, choose the common characteristic difference value with described first kind initial user exceed at least one Equations of The Second Kind initial user of predetermined threshold value。
11. server according to claim 9, it is characterised in that
Described user's acquiring unit, services data for the history based on social network user, chooses at least one user being provided with the first attribute as pending user;Described pending user classified the classification results obtained for described pending user based on described binary classification model;The probability that the corresponding classification results of the first attribute according to described pending user is identical, chooses the probability pending user higher than predetermined probabilities threshold value as mark user。
12. server according to claim 11, it is characterised in that
Described user's acquiring unit, for getting, from least one dimension, the history service data that mark user is corresponding respectively;History based at least one dimension described services data, described mark user is screened, the mark user after being screened。
13. server according to claim 8, it is characterised in that
Unit set up by described model, for getting, from least one of dimension, at least one characteristic parameter that described mark user is corresponding:
History based on mark user services the data acquisition base attribute parameter to described mark user;
History service data acquisition based on mark user marks user's operating parameter for target data to described;
Based on the interaction feature parameter that the history service data acquisition of mark user is determined to the interaction data between described mark user and other users except described mark user。
14. server according to claim 13, it is characterised in that the base attribute parameter of described mark user, including at least one of: login position information, login time section, add the group of preset name and at the frequency of interaction of described group;
Described mark user for the operating parameter of target data, at least includes: for the operation frequency of target information and the operation time period of preset kind;
Described mark user and the interaction feature parameter determined except the interaction data between described other users marked except user, including at least one of: frequency of interaction between the gender attribute of other users described, other users described and described mark user and the entry address information with other users described。
CN201511033392.2A 2015-12-31 2015-12-31 User classification method and server Active CN105701498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511033392.2A CN105701498B (en) 2015-12-31 2015-12-31 User classification method and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511033392.2A CN105701498B (en) 2015-12-31 2015-12-31 User classification method and server

Publications (2)

Publication Number Publication Date
CN105701498A true CN105701498A (en) 2016-06-22
CN105701498B CN105701498B (en) 2021-09-07

Family

ID=56226820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511033392.2A Active CN105701498B (en) 2015-12-31 2015-12-31 User classification method and server

Country Status (1)

Country Link
CN (1) CN105701498B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106204060A (en) * 2016-06-28 2016-12-07 腾讯科技(深圳)有限公司 The method and device that user is divided to cluster realized by computer system
CN106709755A (en) * 2016-11-28 2017-05-24 加和(北京)信息科技有限公司 Method of predicting user frequency and apparatus thereof
CN106875183A (en) * 2016-06-28 2017-06-20 阿里巴巴集团控股有限公司 Determine Bank Account Number, identification card number, the method and apparatus of information state to be checked
CN107330459A (en) * 2017-06-28 2017-11-07 联想(北京)有限公司 A kind of data processing method, device and electronic equipment
CN107392259A (en) * 2017-08-16 2017-11-24 北京京东尚科信息技术有限公司 The method and apparatus for building unbalanced sample classification model
CN107563429A (en) * 2017-07-27 2018-01-09 国家计算机网络与信息安全管理中心 A kind of sorting technique and device of network user colony
CN108268495A (en) * 2016-12-30 2018-07-10 上海互联网软件集团有限公司 Network user's categorizing system based on big data
CN108268511A (en) * 2016-12-30 2018-07-10 上海互联网软件集团有限公司 Network user classification method based on big data
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
WO2018145596A1 (en) * 2017-02-13 2018-08-16 腾讯科技(深圳)有限公司 Method and device for extracting feature information, server cluster, and storage medium
WO2018205999A1 (en) * 2017-05-11 2018-11-15 腾讯科技(深圳)有限公司 Data processing method and apparatus
CN109063736A (en) * 2018-06-29 2018-12-21 考拉征信服务有限公司 Data classification method, device, electronic equipment and computer readable storage medium
CN109492658A (en) * 2018-09-21 2019-03-19 北京车和家信息技术有限公司 A kind of point cloud classifications method and terminal
CN109816134A (en) * 2017-11-22 2019-05-28 北京京东尚科信息技术有限公司 Shipping address prediction technique, device and storage medium
CN109818782A (en) * 2018-12-31 2019-05-28 南京红柑桔信息技术有限公司 The method that a kind of pair of server is classified
CN112468385A (en) * 2019-09-09 2021-03-09 腾讯科技(深圳)有限公司 Virtual grouping configuration method and device, storage medium and electronic device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101266619A (en) * 2008-05-12 2008-09-17 腾讯科技(深圳)有限公司 User information excavation method and system
CN102625940A (en) * 2009-06-12 2012-08-01 电子湾有限公司 Internet preference learning facility
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags
US20140207518A1 (en) * 2013-01-23 2014-07-24 24/7 Customer, Inc. Method and Apparatus for Building a User Profile, for Personalization Using Interaction Data, and for Generating, Identifying, and Capturing User Data Across Interactions Using Unique User Identification
US20140358630A1 (en) * 2013-05-31 2014-12-04 Thomson Licensing Apparatus and process for conducting social media analytics
CN104298741A (en) * 2014-10-09 2015-01-21 百度在线网络技术(北京)有限公司 Method and device for providing push information
CN104657369A (en) * 2013-11-19 2015-05-27 深圳市腾讯计算机系统有限公司 User attribute information generating method and system
CN104718547A (en) * 2013-10-11 2015-06-17 文化便利俱乐部株式会社 Customer data analysis system
CN104737565A (en) * 2012-10-19 2015-06-24 脸谱公司 Method relating to predicting the future state of a mobile device user
CN104933075A (en) * 2014-03-20 2015-09-23 百度在线网络技术(北京)有限公司 User attribute predicting platform and method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101266619A (en) * 2008-05-12 2008-09-17 腾讯科技(深圳)有限公司 User information excavation method and system
CN102625940A (en) * 2009-06-12 2012-08-01 电子湾有限公司 Internet preference learning facility
CN104737565A (en) * 2012-10-19 2015-06-24 脸谱公司 Method relating to predicting the future state of a mobile device user
US20140207518A1 (en) * 2013-01-23 2014-07-24 24/7 Customer, Inc. Method and Apparatus for Building a User Profile, for Personalization Using Interaction Data, and for Generating, Identifying, and Capturing User Data Across Interactions Using Unique User Identification
US20140358630A1 (en) * 2013-05-31 2014-12-04 Thomson Licensing Apparatus and process for conducting social media analytics
CN104718547A (en) * 2013-10-11 2015-06-17 文化便利俱乐部株式会社 Customer data analysis system
CN104657369A (en) * 2013-11-19 2015-05-27 深圳市腾讯计算机系统有限公司 User attribute information generating method and system
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags
CN104933075A (en) * 2014-03-20 2015-09-23 百度在线网络技术(北京)有限公司 User attribute predicting platform and method
CN104298741A (en) * 2014-10-09 2015-01-21 百度在线网络技术(北京)有限公司 Method and device for providing push information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李冠辰: "一个基于hadoop的并行社交网络挖掘系统", 《软件》 *
董彩玲: "几种典型数据挖掘方法及其应用研究", 《中国优秀硕士学位论文全文数据库_信息科技辑》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875183A (en) * 2016-06-28 2017-06-20 阿里巴巴集团控股有限公司 Determine Bank Account Number, identification card number, the method and apparatus of information state to be checked
CN106875183B (en) * 2016-06-28 2020-07-28 阿里巴巴集团控股有限公司 Method and device for determining bank account number, identity card number and state of information to be checked
CN106204060A (en) * 2016-06-28 2016-12-07 腾讯科技(深圳)有限公司 The method and device that user is divided to cluster realized by computer system
CN106709755A (en) * 2016-11-28 2017-05-24 加和(北京)信息科技有限公司 Method of predicting user frequency and apparatus thereof
CN108268495A (en) * 2016-12-30 2018-07-10 上海互联网软件集团有限公司 Network user's categorizing system based on big data
CN108268511A (en) * 2016-12-30 2018-07-10 上海互联网软件集团有限公司 Network user classification method based on big data
WO2018145596A1 (en) * 2017-02-13 2018-08-16 腾讯科技(深圳)有限公司 Method and device for extracting feature information, server cluster, and storage medium
US11436430B2 (en) 2017-02-13 2022-09-06 Tencent Technology (Shenzhen) Company Limited Feature information extraction method, apparatus, server cluster, and storage medium
WO2018205999A1 (en) * 2017-05-11 2018-11-15 腾讯科技(深圳)有限公司 Data processing method and apparatus
CN107330459A (en) * 2017-06-28 2017-11-07 联想(北京)有限公司 A kind of data processing method, device and electronic equipment
CN107330459B (en) * 2017-06-28 2021-09-14 联想(北京)有限公司 Data processing method and device and electronic equipment
CN107563429A (en) * 2017-07-27 2018-01-09 国家计算机网络与信息安全管理中心 A kind of sorting technique and device of network user colony
CN107392259B (en) * 2017-08-16 2021-12-07 北京京东尚科信息技术有限公司 Method and device for constructing unbalanced sample classification model
CN107392259A (en) * 2017-08-16 2017-11-24 北京京东尚科信息技术有限公司 The method and apparatus for building unbalanced sample classification model
CN109816134B (en) * 2017-11-22 2021-07-20 北京京东尚科信息技术有限公司 Method and device for predicting delivery address and storage medium
CN109816134A (en) * 2017-11-22 2019-05-28 北京京东尚科信息技术有限公司 Shipping address prediction technique, device and storage medium
CN108399418B (en) * 2018-01-23 2021-09-03 北京奇艺世纪科技有限公司 User classification method and device
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
CN109063736A (en) * 2018-06-29 2018-12-21 考拉征信服务有限公司 Data classification method, device, electronic equipment and computer readable storage medium
CN109492658A (en) * 2018-09-21 2019-03-19 北京车和家信息技术有限公司 A kind of point cloud classifications method and terminal
CN109818782A (en) * 2018-12-31 2019-05-28 南京红柑桔信息技术有限公司 The method that a kind of pair of server is classified
CN112468385A (en) * 2019-09-09 2021-03-09 腾讯科技(深圳)有限公司 Virtual grouping configuration method and device, storage medium and electronic device
CN112468385B (en) * 2019-09-09 2022-07-01 腾讯科技(深圳)有限公司 Virtual grouping configuration method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN105701498B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN105701498A (en) User classification method and server
CN107369075B (en) Commodity display method and device and electronic equipment
CN105447730B (en) Target user orientation method and device
US10719854B2 (en) Method and system for predicting future activities of user on social media platforms
US20180096431A1 (en) Geographical Location Recommendation System
CN109360057B (en) Information pushing method, device, computer equipment and storage medium
CN104008184A (en) Method and device for pushing information
WO2014193399A1 (en) Influence score of a brand
JP6547070B2 (en) Method, device and computer storage medium for push information coarse selection sorting
CN107545451B (en) Advertisement pushing method and device
CN104951544A (en) User data processing method and system and method and system for providing user data
CN104317959A (en) Data mining method and device based on social platform
CN106649316A (en) Video pushing method and device
US20180307733A1 (en) User characteristic extraction method and apparatus, and storage medium
CN108416616A (en) The sort method and device of complaints and denunciation classification
CN102365637A (en) Characterizing user information
CN101901252A (en) Method for integrating same user data on multiple websites and integration platform
CN109767267B (en) Target user recommendation method and device for advertisement delivery
CN105590240A (en) Discrete calculating method of brand advertisement effect optimization
CN105654198A (en) Brand advertisement effect optimization method capable of realizing optimal threshold value selection
CN109598171A (en) A kind of data processing method based on two dimensional code, apparatus and system
CN113127723B (en) User portrait processing method, device, server and storage medium
CN109408714A (en) A kind of recommender system and method for multi-model fusion
CN103544150A (en) Method and system for providing recommendation information for mobile terminal browser
CN110619090B (en) Regional attraction assessment method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant