Summary of the invention
The object of the present invention is to provide a kind of directed application process and system based on user's Sex Discrimination result, can calculate comparatively exactly user's gender tendency, and overcome effectively that the user is reluctant to fill in or wrongly fill out sex information and cause accurately knowing the problem of its sex, improve as based on the different personalized search of gender, personalized recommendation and advertisement fixing to the directed efficient of using such as throwing in.
For addressing the above problem, the invention provides a kind of directed application process based on user's Sex Discrimination result, comprise the following steps:
Step 1: the behavioral data of sample of users of collecting and arrange the known true sex of an internet site;
Step 2: obtain the relation of gender tendency and behavioral data according to behavioral data and the known true sex of described sample of users, and deposit the database of described internet site in;
Step 3: the behavioral data of collecting and arrange all users of described internet site;
Step 4: obtain all users' gender tendency according to the relation of described gender tendency and behavioral data and described all users' behavioral data, and deposit the database of described internet site in;
Step 5: inquire about the database of described internet site, output user's to be checked gender tendency.
Step 6: provide gender tendency's information based on the user to be checked of output to described user to be checked.
Further, in described step 1, the behavioral data of the sample of users of collecting and arranging comprises: each sample of users is accessed the content and the access weight of each sample of users to each content of described internet site.
Further, in described step 1, the behavioral data of the sample of users of collecting and arranging also comprises: each sample of users is accessed time of the act and/or the behavior vector of described internet site.
Further, in described step 3, the behavioral data of collecting and arrange all users of described internet site comprises: each user accesses each content and the access weight of each user to each content of described internet site.
Further, in described step 3, the behavioral data of collecting and arrange all users of described internet site also comprises: each user accesses time of the act and/or the behavior vector of described internet site.
Further, in described step 2, the relation of described gender tendency and behavioral data comprises:
The relation of the gender tendency of described internet site integral body and the behavioral data of all sample of users; And
The relation of the behavioral data of the sample of users of the gender tendency of each content of described internet site and this content of access.
Further, the computing formula of the relation of the behavioral data of the gender tendency of described internet site integral body and all sample of users is:
The computing formula of the relation of the gender tendency of each content of described internet site and the behavioral data of all sample of users is:
Wherein, P (g|w
j) represent that j content is for the probability of sex g; u
i(g) be Dirac function, represent whether i sample of users has the g sex; r
ijRepresent that i sample of users is to the access weight of j content.
Further, in described step 4, the computing formula that obtains all users' gender tendency according to the relation of described gender tendency and behavioral data and described all users' behavioral data is:
Wherein, g
0Represent the women, g
1Represent the male sex; J content is for the probability of sex g, r
ijRepresent user u to be checked
iTo the access weight of j content, odds (g, Xu
i) be user u to be checked
iGender's likelihood ratio, P (g
0) probability of women's tendency of expression described internet site integral body, P (g
1) probability of male sex's tendency of expression described internet site integral body.
Further, in described step 2, the relation of described gender tendency and behavioral data also comprises: adopt decision tree, logistic recurrence, neural network or support vector machine, process behavioral data and the known true sex of described sample of users, obtain the relation between described user's individual behavior and its gender tendency.
Further, in described step 4, obtain all users' gender tendency according to the relation between described user's individual behavior and its gender tendency and described all users' behavioral data, and deposit the database of described internet site in.
Further, in described step 5, inquire about the database of described internet site, when exporting user's to be checked gender tendency, if the database of described internet site has described user's to be checked gender tendency, export described user's gender tendency; If the database of described internet site is without described user's to be checked gender tendency, grasp the current content of the described internet site of described user's current accessed to be checked, inquire about the gender tendency of described current content according to the current content of crawl in the database of described internet site, the gender tendency of the described current content of output is as described user's to be checked gender tendency; If the database of described internet site without the gender tendency of described content to be checked, is exported the gender tendency of internet site's integral body as described user's to be checked gender tendency.
Accordingly, the present invention also provides a kind of directed application system based on user's Sex Discrimination result, comprising:
The sample of users data collection module is used for collecting and to arrange the behavioral data of sample of users of the known true sex of an internet site;
Concerning of behavioral data and gender tendency is used for obtaining the relation of gender tendency and behavioral data according to behavioral data and the known true sex of described sample of users, and deposits the database of described internet site in the unit;
The total user data collector unit is for all users' that collect and arrange described internet site behavioral data;
Gender tendency's computing unit is used for obtaining all users' gender tendency according to the relation of described gender tendency and behavioral data and described all users' behavioral data, and deposits the database of described internet site in;
Gender tendency's output unit, for the database of inquiring about described internet site, output user's to be checked gender tendency;
Directed applying unit is used for the gender tendency according to described gender tendency's output unit output, provides information based on the gender tendency of described output to described user to be checked.
Compared with prior art, directed application process and system based on user's Sex Discrimination result of the present invention, behavioral data and sex by sample of users, calculate the relation of gender tendency and behavioral data, again according to each user's who collects behavioral data and the gender tendency who calculates and the relation of behavioral data, obtain each user's gender tendency, the behavioral data that the user is true, objective, complete is applied to Internet user's Sex Discrimination, and result of calculation accurately, reliably; In the situation that the true sex loss of learning of user, falseness can obtain user's sex information more accurately; Directed application process and system based on user's Sex Discrimination result of the present invention makes as personalized search, personalized recommendation and advertisement fixing and greatly improves to accuracy rate and the efficient used based on the different orientation of user gender such as throwing in, and improves the efficient of internet personalized application; Further, there is not the user filtering condition restriction in the present invention, can substantially cover the whole network station user, and user coverage rate is high; And can realize a kind of differentiation process of going forward one by one, the behavioral data along with the user in internet site constantly increases, and its gender tendency's accuracy of computation also improves constantly.
Embodiment
Below in conjunction with the drawings and specific embodiments, directed application process and the system based on user's Sex Discrimination result that the present invention proposes is described in further detail.
Embodiment one
As shown in Figure 1, the present embodiment provides a kind of directed application process based on user's Sex Discrimination result, comprising:
Step 1: the behavioral data of sample of users of collecting and arrange the known true sex of an internet site;
Step 2: obtain the relation of gender tendency and behavioral data according to behavioral data and the known true sex of described sample of users, and deposit the database of described internet site in;
Step 3: the behavioral data of collecting and arrange all users of described internet site;
Step 4: obtain all users' gender tendency according to the relation of described gender tendency and behavioral data and described all users' behavioral data, and deposit the database of described internet site in;
Step 5: inquire about the database of described internet site, output user's to be checked gender tendency;
Step 6: to the information of described user's directive sending to be checked based on the user's to be checked of output gender tendency.
Need to prove, sample of users of the present invention refers to the user of known true sex information, and the gender data of sample of users can be obtained by modes such as I.D., customer service communication, questionnaires; Behavioral data of the present invention refers to the user in the data of the behavior generation of the arbitrary content of accessing internet site, and the arbitrary content of accessing internet site can refer to concrete webpage, video or books etc., and the granularity of content can generally be changed and refinement.For example, concrete video is summarized as visual classification, can also be subdivided into finance and economics, physical culture in visual classification, make laughs etc., any user's that the present invention collects and arranges behavioral data comprises: this user accesses all the elements of described internet site and the access weight between the frequency and described user and each content, all user behavior datas can be described by parameter S, S=(U, W, R), U={u wherein
1, u
2, u
3..., u
iRepresentative of consumer, W={w
1, w
2, w
3..., w
jRepresent that all users access all different contents of internet site, R={r
ijBe access matrix, r
ijRepresentative of consumer i and content w
jBetween concern weight, r in the present embodiment
ijFor user i to content w
jAccess weight, generally can by user i to content w
jAccess frequency f
jEstimate, for example r
ij=A
wj* f
j, A
wjDenoting contents w
jWeight coefficient.
Therefore, the behavioral data of the sample of users in step 1 and all users' in step 3 behavioral data is all collected and is arranged by S=(U, W, R) mode.That is to say, in described step 1, the behavioral data of the sample of users of collecting and arranging comprises: each sample of users is accessed the content of described internet site and the access weight between the frequency and each sample of users and each content; In described step 3, the behavioral data of collecting and arrange all users of described internet site comprises: each user accesses each content of described internet site and the access weight between the frequency and each user and each content; In other embodiments of the invention, in described step 1, the behavioral data of the sample of users of collecting and arranging can also comprise: each sample of users is accessed time of the act, behavior vector and any relevant information that can describe this time behavior of described internet site, accordingly, in described step 3, the behavioral data of collecting and arrange all users of described internet site also comprises: each user accesses time of the act, behavior vector and any relevant information that can describe this time behavior of described internet site.
In the step 2 of the present embodiment, the computing formula of the relation of the gender tendency of described internet site integral body and the behavioral data of all sample of users is:
In the step 2 of the present embodiment, the computing formula of the relation of the gender tendency of each content of described internet site and the behavioral data of all sample of users is:
Wherein, P (g|w
j) represent that j content is for the probability of sex g; u
i(g) be Dirac function, represent whether i sample of users has the g sex; r
ijRepresent that i sample of users is to the access weight of j content.If g
0Represent the women, g
1Represent the male sex, P (g
0)+P (g
1)=P (g
0| w
j)+P (g
1| w
j)=1.
The behavioral data that the step 4 of the present embodiment can adopt bayes method to process the relation of described gender tendency and behavioral data and described all users obtains all users' gender tendency.Wherein, Bayes' theorem is:
Wherein P (A), P (B) are respectively the prior probabilities of event A, B, and P (A|B), P (B|A) are respectively the B of the generation again conditional probabilities after the conditional probability of A, known A occur after known B occurs again occuring.
In the step 4 of the present embodiment, make behavior pattern X={x
1, x
2, x
3..., x
k, i.e. behavior pattern X is made of K access to content; C
iBe classification designator.Suppose access behavior x each time
kBetween separate, the content that the k time behavior accessed is w
k, have:
Have gender's likelihood ratio of the user of identical behavior pattern X with odds (g, X) representative:
Make Xu
iRepresentative of consumer u
iBehavior pattern:
With odds (g, Xu
i) estimating user u
iThe computing formula of gender's probability be respectively:
Computing formula by the relation of the behavioral data of the gender tendency of each content of the computing formula of the relation of the behavioral data of the gender tendency of the computing formula of above-mentioned gender's probability and the internet site's integral body in step 2 and all sample of users and internet site and all sample of users, can calculate gender's probability estimate of each user, gender tendency as each user deposits database in.
In the step 5 of the present embodiment, inquire about the database of described internet site, when exporting user's to be checked gender tendency, if the database of described internet site has described user's to be checked gender tendency P (g
0) and P (g
1), export described user's gender tendency P (g
0| u) and P (g
1| u), if the database of described internet site is without described user's to be checked gender tendency, grasp the current content of the described internet site of described user's current accessed to be checked, inquire about the gender tendency P (g of described current content according to the current content of crawl in the database of described internet site
0| w
j) and P (g
1| w
j), the gender tendency P (g of the described current content of output
0| w
j) and P (g
1| w
j) as described user's to be checked gender tendency P (g
0| u) and P (g
1| u), if the database of described internet site without the gender tendency of described content to be checked, is exported the gender tendency P (g of internet site's integral body
0) and P (g
1) as described user's to be checked gender tendency P (g
0| u) and P (g
1| u).
As from the foregoing, the present embodiment is a kind of method that realizes Internet user gender tendency differentiation based on Bayesian Classification Arithmetic, can obtain gender tendency's prior probability of internet site's integral body and gender tendency's prior probability of each content according to the behavioral data of stating sample of users and the known true sex of collecting in step 2, then each user's who collects in integrating step three behavioral data utilizes Bayesian formula to calculate each user gender tendency's posterior probability in step 4; And for any inferior user who accesses, all can calculate the gender tendency.Therefore this method of differentiating based on the Internet user gender tendency of Bayesian Classification Arithmetic of the present embodiment can cover the most users in website.When user's access websites content for the first time, because its behavior has contingency, the result of calculation confidence level is relatively low; When user's access times increase gradually, form comparatively stable behavior pattern, result of calculation tends towards stability gradually, can realize a kind of differentiation process of going forward one by one, behavioral data along with the user in internet site constantly increases, and its gender tendency's accuracy of computation also improves constantly, and contingency reduces, need not training process, maintenance cost is lower.
In other embodiments of the invention, described step 2 can also adopt the classification algorithms such as decision tree, logistic recurrence, neural network or support vector machine, process behavioral data and the known true sex of described sample of users, obtain the relation between user's individual behavior and its gender tendency, and obtain all users' gender tendency according to the relation between described user's individual behavior and its gender tendency and described all users' behavioral data in step 4.For example, adopt the classification algorithms such as decision tree, logistic recurrence, neural network or support vector machine to obtain funtcional relationship between user's individual behavior and its gender tendency in step 2, root with the described funtcional relationship of behavioral data substitution of each user in step 3, just can obtain each user's gender tendency in step 4 accordingly.User behavior pattern may gradually change along with the time.In order to keep higher judgment accuracy, realize based on classification algorithms such as decision tree, logistic recurrence, neural network or support vector machine the method that the Internet user gender tendency differentiates in the present invention, need regularly or irregularly to train and Renewal model, maintenance cost is higher.In addition, said method can't be realized a kind of progressive process that improves gradually its sex judgment accuracy that increases along with user behavior.Directed application process based on user's Sex Discrimination result of the present invention can be after obtaining comparatively accurately user's sex, provide information based on user's sex to the user, greatly improve to accuracy rate and the efficient used based on the different orientation of user gender such as throwing in thereby make as personalized search, personalized recommendation and advertisement fixing, improve the efficient of web2.0 personalized application.
Embodiment two
As shown in Figure 2, the present embodiment provides a kind of directed application system based on user's Sex Discrimination result, comprising:
Sample of users data collection module 21 is used for collecting and to arrange the behavioral data of sample of users of the known true sex of an internet site;
Behavioral data and gender tendency concern unit 22, are used for obtaining the relation of gender tendency and behavioral data according to behavioral data and the known true sex of the sample of users of described sample of users data collection module 21, and deposit the database of described internet site in;
Total user data collector unit 23 is for all users' that collect and arrange described internet site behavioral data;
Gender tendency's computing unit 24, be used for obtaining all users' gender tendency according to all users' of the relation of described behavioral data and gender tendency's the gender tendency who concerns unit 22 and behavioral data and described total user data collector unit 23 behavioral data, and deposit the database of described internet site in;
Gender tendency's output unit 25, for the database of inquiring about described internet site, output user's to be checked gender tendency;
Directed applying unit 26 is used for the gender tendency according to described gender tendency's output unit output, provides information based on the gender tendency of described output to described user to be checked.
In the present embodiment, the behavioral data of the sample of users that described sample of users data collection module 21 is collected and arranged comprises: each sample of users is accessed each content of described internet site and the access weight between the frequency and each sample of users and each content; The behavioral data that all users of described internet site were collected and arranged to described total user data collector unit 23 comprises: each user accesses each content of described internet site and the access weight between the frequency and each user and each content.
In the present embodiment, described behavioral data and gender tendency's concern gender tendency that unit 22 obtains and the relation of behavioral data comprise: the relation of the gender tendency of described internet site integral body and the behavioral data of all sample of users, and the relation of the behavioral data of the gender tendency of each content of described internet site and all sample of users; In the present embodiment, described gender tendency's computing unit 24 adopts described behavioral data and gender tendency's gender tendency and the relation of behavioral data and described all users' the behavioral data that unit 22 obtains that concern, obtains all users' gender tendency.
in the present embodiment, described gender tendency's output unit 25 is inquired about the database of described internet site, when exporting user's to be checked gender tendency, if the database of described internet site has described user's to be checked gender tendency, export described user's gender tendency, if the database of described internet site is without described user's to be checked gender tendency, grasp the current content of the described internet site of described user's current accessed to be checked, inquire about the gender tendency of described current content according to the current content of crawl in the database of described internet site, export the gender tendency of described current content as described user's to be checked gender tendency, if the database of described internet site without the gender tendency of described content to be checked, is exported the gender tendency of internet site's integral body as described user's to be checked gender tendency.
need to prove, in the present embodiment, described behavioral data and gender tendency's the unit 22 that concerns can obtain gender tendency's prior probability of internet site's integral body and gender tendency's prior probability of each content according to the behavioral data of stating sample of users and the known true sex of collecting, then gender tendency's computing unit 24 is in conjunction with each user's who collects in total user data collector unit 23 behavioral data, utilize Bayesian formula to calculate each user gender tendency's posterior probability, realize a kind of system of differentiating based on the Internet user gender tendency of Bayesian Classification Arithmetic.When user's access websites content for the first time, because its behavior has contingency, the result of calculation confidence level is relatively low; When user's access times increase gradually, form comparatively stable behavior pattern, result of calculation tends towards stability gradually, can realize a kind of differentiation process of going forward one by one, behavioral data along with the user in internet site constantly increases, and its gender tendency's accuracy of computation also improves constantly, and contingency reduces, need not training process, maintenance cost is lower.
in other embodiments of the invention, described behavioral data and gender tendency's the unit 22 that concerns can also adopt decision tree, logistic returns, the classification such as neural network or support vector machine algorithm, process behavioral data and the known true sex of described sample of users, obtain the relation between user's individual behavior and its gender tendency, then gender tendency's computing unit 24 obtains all users' gender tendency in conjunction with the behavioral data according to all users that collect in the relation between described user's individual behavior and its gender tendency and described total user data collector unit 23.For example, the classification algorithms such as unit 22 employing decision trees, logistic recurrence, neural network or support vector machine that concern described behavioral data and gender tendency have obtained the funtcional relationship between user's individual behavior and its gender tendency, then gender tendency's computing unit 24 with the described funtcional relationship of behavioral data substitution of each user in total user data collector unit 23, just can obtain each user's gender tendency.User behavior pattern may gradually change along with the time.In order to keep higher judgment accuracy, this user's Sex Discrimination system that realizes Internet user gender tendency differentiation based on classification algorithms such as decision tree, logistic recurrence, neural network or support vector machine, need regularly or irregularly to train and Renewal model, maintenance cost is higher, and can't realize a kind of progressive process that improves gradually its sex judgment accuracy that increases along with user behavior.
Directed application system based on user's Sex Discrimination result of the present invention can be based on personalized search, personalized recommendation and the advertisement fixing of user's sex to application systems such as inputs.
In sum, directed application process and system based on user's Sex Discrimination result of the present invention, behavioral data and sex by sample of users, calculate the relation of gender tendency and behavioral data, again according to each user's who collects behavioral data and the relation of this gender tendency and behavioral data, obtain each user's gender tendency, the behavioral data that the user is true, objective, complete is applied to Internet user's Sex Discrimination, and result of calculation accurately, reliably; In the situation that the true sex loss of learning of user, falseness can obtain user's sex information more accurately; Directed application process and system based on user's Sex Discrimination result of the present invention makes as personalized search, personalized recommendation and advertisement fixing and greatly improves to accuracy rate and the efficient used based on the different orientation of user gender such as throwing in, and improves the efficient of internet personalized application; Further, there is not the user filtering condition restriction in the present invention, can substantially cover the whole network station user, and user coverage rate is high; And can realize a kind of differentiation process of going forward one by one, the behavioral data along with the user in internet site constantly increases, and its gender tendency's accuracy of computation also improves constantly.
Obviously, those skilled in the art can carry out various changes and modification and not break away from the spirit and scope of the present invention invention.Like this, if within of the present invention these are revised and modification belongs to the scope of claim of the present invention and equivalent technologies thereof, the present invention also is intended to comprise these changes and modification interior.