CN104205100A - Method and device for estimating recessive character distribution of users - Google Patents

Method and device for estimating recessive character distribution of users Download PDF

Info

Publication number
CN104205100A
CN104205100A CN201480000467.4A CN201480000467A CN104205100A CN 104205100 A CN104205100 A CN 104205100A CN 201480000467 A CN201480000467 A CN 201480000467A CN 104205100 A CN104205100 A CN 104205100A
Authority
CN
China
Prior art keywords
user
character
cap
website
recessive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201480000467.4A
Other languages
Chinese (zh)
Other versions
CN104205100B (en
Inventor
陈宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Infervision Medical Technology Co Ltd
Original Assignee
Shenzhen Tuixiang Big Data Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tuixiang Big Data Information Technology Co Ltd filed Critical Shenzhen Tuixiang Big Data Information Technology Co Ltd
Publication of CN104205100A publication Critical patent/CN104205100A/en
Application granted granted Critical
Publication of CN104205100B publication Critical patent/CN104205100B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a method and a device for estimating recessive character distribution of users. The method comprises obtaining users using a website and dominant characters of the users; obtaining character information of the whole population from a population database, the character information comprising dominant characters and recessive characters; and according to the character information of the whole population, the users using the website and dominant characters of the users, recessive character distribution of the users is calculated through the bayesian algorithm. Through the abovementioned method, the estimating results are more accurate when recessive characters of users are estimated.

Description

A kind of method and device of the recessive character distribution of estimating user
Technical field
The present invention relates to networking technology area, particularly a kind of method and device of the recessive character distribution of estimating user.
Background technology
Under normal circumstances, user, using when website, need to be registered as the user of website, and user is while being registered as the user of website, needs filling registration information, for example: user's name, identification card number etc.
If portal management person need to carry out advertisement marketing accurately, push different advertisements to different user, only according to user's registration information, inadequate, also need more user profile, can, according to log-on message of user, calculate other information of user, for example: know user's title, want to estimate age, race, sex of user etc.
In prior art, by known dominant character estimation recessive character, realize according to Bayes's equation, specific as follows:
Suppose that x is the user's of our estimation interested recessive character, suppose that t is we user's that can observe dominant character, want to estimate x, Bayes's equation is as follows:
P ( x | t ) = P ( t | x ) P ( t ) P ( x )
Wherein, the sample space of Bayes's equation is national demographic data, for example: t is user's name, x is user's sex, by checking that national demographic data obtains the probability P (t|x) that name t occurs in the middle of each sex x, the probability P (x) of each sex x, and the probability of name t appearance, thus P (x|t) can be calculated.
It should be noted that: the sample space of above-mentioned Bayes's equation is national demographic data, and use user's formation and the formation of national population of website often to there is very large difference, for example: user crowd's major part of Sina's microblogging is young university student, the user of Renren Network major part is students.Now, if apply by force Bayes's equation, the recessive character estimating will have larger error, illustrates as follows:
(be equivalent to dominant character t) if observe certain user's of certain website F user name Jo, wish estimation Jo age level, suppose that age level is A in 0~50 years old, age level 50~100 years old is B, and respectively account for half population, P (A)=P (B)=0.5.Suppose that 50~100 age level nobody uses website F, P (F|B)=0.Finding be distributed as 0~50 age level of Jo in the middle of population by demographic database is 1 people, and 50~100 age levels have 99 people,
P ( A | t ) P ( B | t ) = P ( t | A ) P ( t ) P ( A ) P ( t | B ) P ( t ) P ( B ) = P ( t | A ) P ( A ) P ( t | B ) P ( B ) = P ( t | A ) * 0.5 P ( t | B ) * 0.5 = P ( t | A ) P ( t | B ) = 1 99
Calculating according to Bayes's equation the probability that the age level of Jo is 0~50 is 1%, be that 50~100 probability is 99%, be that 0~50 probability is 100% but actual conditions are exactly Jo age level, be that 50~100 age level is 0%, just because of using, the formation of sample space and the formation of national population of website F are not identical, but calculate time but adopt national demographic data, sample space difference, cause result of calculation produce serious deviation.And the each have their own feature in common each website, the crowd that each website attracts is each have their own feature also, crowd's formation is generally different from the formation of national population, if according to the sample space estimation user's of national demographic data recessive character, must cause resultant error.
Summary of the invention
In order to overcome the above problems at least partly, the present invention proposes method and the device of a kind of user's of estimation recessive character distribution, make in the time of estimation user's recessive character, estimation result is more accurate.
For solving the problems of the technologies described above, the method that the recessive character that the technical scheme that the present invention adopts is a kind of user of estimation distributes, comprises obtaining using the user of website and user's dominant character; Obtain the characteristic information of all populations from demographic database, wherein, described characteristic information comprises dominant character and recessive character; According to the characteristic information of described all populations, the use user of website and described user's dominant character, calculate described user's recessive character in conjunction with bayesian algorithm and distribute.
Wherein, described according to the characteristic information of described all populations, the use user of website and described user's dominant character, the step of calculating described user's recessive character distribution in conjunction with bayesian algorithm is specially: if under any user's recessive character, the Probability Independence condition that user uses website and user to have dominant character is set up, calculate described user's recessive character according to following formula
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
Wherein, further comprise, judge under any user's recessive character, whether the Probability Independence condition that user uses website and user to have dominant character is set up, described judgement concrete steps comprise: according to the characteristic information of all populations, the use user of website and user's dominant character, calculate any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
According to the characteristic information of described all populations, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
If described any user's P 1with P 2all equate, described Probability Independence condition is set up.
Wherein, described method also comprises: according to described user's dominant character and recessive character, analyze described user behavior custom.
For solving the problems of the technologies described above, another technical solution used in the present invention is: the device that provides a kind of user's of estimation recessive character to distribute, comprising: the first acquisition module, uses the user of website and user's dominant character for obtaining; The second acquisition module, for obtain the characteristic information of all populations from national demographic database, wherein, described characteristic information comprises dominant character and recessive character; Computing module, for according to the characteristic information of described all populations, the use user of website and described user's dominant character, calculates described user's recessive character in conjunction with bayesian algorithm and distributes.
Wherein, if under any user's recessive character, the Probability Independence condition that user uses website and user to have dominant character is set up, and calculates described user's recessive character according to following formula,
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
Wherein, described device also comprises judge module; Described judge module, for according to described all users' characteristic information, the use user of website and user's dominant character, calculates any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
With,
According to described all users' characteristic information, the use user of website and user's dominant character, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
And,
Judge described any user's P 1with P 2whether equate, if equate, described Probability Independence condition is set up.
Wherein, described device also comprises analysis module; Described analysis module, for according to user's dominant character and recessive character, analyzes described user behavior custom.
For solving the problems of the technologies described above, another technical scheme that the present invention adopts is: the device that provides a kind of user's of estimation recessive character to distribute, and device comprises processor; Processor is for using the user of website and user's dominant character for obtaining, with, obtain the characteristic information of all populations from demographic database, wherein, described characteristic information comprises dominant character and recessive character, and, according to the characteristic information of described all populations, the use user of website and described user's dominant character, calculate described user's recessive character in conjunction with bayesian algorithm and distribute;
Wherein, described processor is according to the characteristic information of described all populations, the use user of website and described user's dominant character, the step of calculating described user's recessive character distribution in conjunction with bayesian algorithm is specially: if described processor is under the recessive character any user, the Probability Independence condition that user uses website and user to have dominant character is set up, calculate described user's recessive character according to following formula
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
Wherein, described processor is also for judging whether the Probability Independence condition that user uses website and user to have dominant character is set up under any user's recessive character, and described judgement concrete steps comprise:
According to the characteristic information of all populations, the use user of website and user's dominant character, calculate any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
According to the characteristic information of described all populations, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
If described any user's P 1with P 2all equate, described Probability Independence condition is set up.
Wherein, described processor also, for according to described user's dominant character and recessive character, is analyzed described user behavior custom.
The invention has the beneficial effects as follows: the situation that is different from prior art, the present invention is in the time calculating user's recessive character, add the user's who uses this website data, make to have the probability having in the crowd of dominant character in the middle of recessive character in the middle of calculating the user group of website time, that user group using website is as sample space, instead of national demographic data, the difference of sample space does not just exist, thereby the error of result of calculation is not existed, corrected Calculation result, and then make result of calculation more accurate.
Brief description of the drawings
Fig. 1 is the process flow diagram that the present invention estimates the method embodiment of user's recessive character distribution;
Fig. 2 be the present invention estimate user recessive character distribute method embodiment in dominant character and recessive character distribution schematic diagram in sample space;
Fig. 3 is that the present invention estimates the schematic diagram that has recessive character and use the user of website to distribute in sample space in the method embodiment that user's recessive character distributes;
Fig. 4 is that the present invention estimates in the method embodiment that user's recessive character distributes and revises the schematic diagram that dominant character and recessive character distribute in sample space;
Fig. 5 is that the present invention estimates device the first embodiment structural representation that user's recessive character distributes;
Fig. 6 is that the present invention estimates device the second embodiment structural representation that user's recessive character distributes.
Embodiment
Below in conjunction with drawings and embodiments, the present invention is described in detail.
Refer to Fig. 1, method comprises:
Step S201: obtain and use the user of website and user's dominant character;
Website records user's relevant information, for example: user's log-on message, visit information of user etc., wherein, user's relevant information is generally held in the statistics on backstage, website, can whom obtain by statistics and use website, for example: statistics records Zhang San, Li Si and be registered as the user of website, can know that by statistics Zhang San and Li Si have used website, certainly, user's relevant information for example requires, for real: real name, real age etc.
User's dominant character is the feature of directly obtaining, such as: in statistics, record registered user's Real Name, the indicating characteristic that name is user.
User's recessive character is the feature that cannot directly obtain, such as: in statistics, do not record registered user's race, cannot directly obtain by statistics user's race, race is user's recessive character.
Step S202: obtain the characteristic information of all populations from demographic database, wherein, described characteristic information comprises dominant character and recessive character;
Demographic database at large records the characteristic information of all populations, for example: people's name, sex, age etc.What deserves to be explained is: the characteristic information of demographic database comprises dominant character and recessive character, wherein, the dominant character of dominant character respective user, the recessive character of recessive character respective user, for example: user's name is indicating characteristic, the name in demographic database is indicating characteristic, and user's race is recessive character, and the race in demographic database is recessive character.
In embodiment of the present invention, demographic database can, for the demographic database of being announced by national authority mechanism, can acquire from open channel.
Step S203: according to the characteristic information of all populations, the use user of website and user's dominant character, calculate the distribution of user's recessive character in conjunction with bayesian algorithm;
Wherein, before calculating the distribution of user's recessive character in conjunction with bayesian algorithm, also need checking under any user's recessive character, whether the Probability Independence condition that user uses website and user to have dominant character is set up, step S203 can be specially again: if under any user's recessive character, the Probability Independence condition that user uses website and user to have dominant character is set up, and calculates described user's recessive character according to following formula
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f ) ----formula 1
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
When following L=1, the origin of formula 1 is described.From background technology, owing to using, user's formation and the formation of national population of website are different, if apply mechanically by force Bayes's equation, can cause result of calculation to produce error.Produce error for fear of result of calculation, need to revise sample space, the user who uses website is joined to Bayes's equation, revised Bayes's equation is:
P ( x 1 | t ∩ f ) = P ( t ∩ f | x 1 ) P ( x 1 ) P ( t ∩ f ) -----formula 2
Wherein, if the establishment of probability independent condition, P (t ∩ f|x 1)=P (t|x 1) P (f|x 1),
? P ( x 1 | t ∩ f ) = P ( t | x 1 ) P ( f | x 1 ) P ( x 1 ) P ( t ∩ f ) ----formula 3
From formula 3, the probability problem of three kinds of conditions, be reduced to three kinds of conditions probability problem between any two, simplify the requirement to data.
Further, formula 3 and formula 2 are known, and the Bayes's equation being reduced to need to meet probability independent condition, and concrete reason, describes as follows for example:
As shown in Figure 2, suppose that the recessive character x of website only may present two value A and B, what on figure, show is A and two regions of B, and hypothesis a and b are respectively the areas of A and B in figure.Suppose that the dominant character t that can observe is represented by middle small rectangle, with two codomain common factor parts of recessive character be TA and TB, area is respectively ta and tb.Needing the problem solving is the Area Ratio that will obtain between TA and TB, is normalized to 1 and just can draws both likelihood ratios later.
If A and B are for covering whole demographic sample space completely, simple Bayes's equation is:
P ( A | t ) P ( B | t ) = P ( t | A ) P ( t ) P ( A ) P ( t | B ) P ( t ) P ( B ) = P ( t | A ) P ( A ) P ( t | B ) P ( B )
If show with graphics area ratio: if the both members sample space of Area Ratio is consistent and is A+B, and equation must be set up.
If the sample space on both sides is inconsistent, the equation of Area Ratio existing problems, as shown in Figure 3, suppose in the middle of the crowd of B to only have some people to use website F, be labeled as B ', area is b ', and common factor between dominant character t and B ' is TB ', area is tb ', and our in fact interested numerical value has become so
P ( A | t ) P ( B ′ | t )
Now, the sample space on the equation left side is A+B ', if we continue to apply mechanically simply Bayes's equation, equation the right continues as:
P ( t | A ) P ( A ) P ( t | B ) P ( B )
Now, sample space or the A+B on equation the right.
If with cartographic represenation of area, Bayes's equation equation left side is: the right of equation is: obviously, the equation left side is not equal to the right of equation, that is to say that Bayes's equation both sides are unequal, applies mechanically simply Bayes's equation and can cause result of calculation to produce error.
Obviously, cause result of calculation to produce error former because: the sample space of equation the right and left is unlikely, therefore, need to revise sample space, makes the sample space of the right and left of equation consistent.
As shown in Figure 4, the sample space of TA forms with the sample space of A and forms phase, and the composition of sample of TB is identical with the composition of sample of B, and the people who uses website F in the middle of the crowd of B is B ' time,
tb ′ tb = b ′ b , ta ′ ta = a ′ a .
Wherein, revise sample space, make the sample space formation of TA and the sample space of A form phase, when the composition of sample of TB is identical with the composition of sample of B,
Bayes's equation is:
P ( B ′ | t ∩ f ) = P ( t | B ′ ) P ( f | B ′ ) P ( B ′ ) P ( t ∩ f ) ,
With cartographic represenation of area be: tb ′ a + b = tb ′ b ′ b ′ a + b ta + tb ′ a + b = tb ′ b ′ b ′ a + b ta + tb ′ a + b = tb b b ′ a + b ta + tb ′ a + b = tb b b ′ a + b b b ta + t b ′ a + b = tb b b ′ b b a + b ta + t b ′ a + b
Revised Bayes's equation can be:
Now, can pass through demographic database, obtain population distribution data, for example: each recessive character value { x 1..., x lunder, there are how many people also to have the dominant character value t that we observe simultaneously, and in the ratio of total population.Wherein, enough detailed database (as the Census data of the U.S.) can let us be determined everyone and they corresponding dominant character and recessive character, total total w people's data in the middle of tentation data storehouse, v people's data are (t v, x v), supposing that Π { } is event indicial equation. we can do following calculating to the probability in the middle of deviation Bayes update equation:
( t | x ) = Σ v = 1 w Π { ( t v , x v ) = ( t , x ) } Σ v = 1 w Π { x v = x i }
( x ) = Σ v = 1 w Π { x v = x } w
Now we also need P (f|x), in the middle of the crowd who is x at each recessive character, there are how many people to use website F (for example having how many people to use this website in the middle of the crowd in 12-19 year), under normal circumstances, in the backstage statistics of website, can record relevant user's data, can obtain the data that need by statistics.
Further, for recessive character add up to n,
Therefore, P ( x | t ∩ f ) = P ( t | x ) P ( f | x ) P ( x ) Σ j = 1 n P ( t | x j ) P ( f | x j ) P ( x j )
Above-mentionedly describe as an example of single recessive character example, in like manner, extend to multiple recessive characters, revised Bayes's equation is:
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
Be noted that: revise sample space, make the sample space formation of TA form phase with the sample space of A, when the composition of sample of TB is identical with the composition of sample of B, wherein, must meet Probability Independence condition, contrary, meeting under Probability Independence condition, the sample space of TA forms identical with the sample space formation of A, the composition of sample of TB is also identical with the formation of the sample space of B, therefore, and in the time using revised Bayes's equation, can first verify and whether meet Probability Independence condition, method also comprises:
Judge whether the Probability Independence condition that user uses website and user to have dominant character is set up under any user's recessive character, and described judgement concrete steps comprise:
According to the characteristic information of all populations, the use user of website and user's dominant character, calculate any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
According to the characteristic information of all populations, calculate any user's P 2value, wherein, P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
If user's P arbitrarily 1with P 2all equate, Probability Independence condition is set up.
Described L is more than or equal to 1 integer, and wherein, L is 1 o'clock, is single recessive character, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
Further, getting after user's dominant character and recessive character, can be according to user's dominant character and recessive character analyzing web site behavioural habits, thus can formulate advertising strategy according to user's behavioural habits, or, push suitable value-added service etc. to user.Wherein, get user's dominant character and recessive character, can more accurately determine user's behavioural habits, and then making advertising strategy or the value-added service of propelling movement of formulating more reasonable, improving success ratio.
The present invention revises the sample space offset issue producing in the middle of recessiveness estimation problem, makes to estimate that operation result is more close to correct theory value, and wherein, the deviation of sample space is stronger, and the necessity that uses the present invention to revise is stronger.And have per family very strong deviation in current numerous popular use, for example foreign social online media sites Facebook, show the crowd in 18-29 year the data of 2012 in the middle of, there is 83% people to use, and over-65s crowd only has 40% people using, if we do not take to revise, with respect to the 18-29 probability in year, common most probable number method and bayesian algorithm amplify each user for the probability of over-65s is to the more than 2 times of right value by mistake, this can cause material impact to every calculating based on this and analysis backward, may bring serious deviation for net result.
In embodiment of the present invention, in the time calculating user's recessive character, add the user's who uses this website data, make to have the probability having in the crowd of dominant character in the middle of recessive character in the middle of calculating the user group of website time, be user group using website as sample space, instead of national demographic data, the difference of sample space does not just exist, thereby the error of result of calculation is not existed, corrected Calculation result.
Device the first embodiment that the present invention also provides estimation user's recessive character to distribute, as shown in Figure 5, device comprises the first acquisition module 301, the second acquisition module 302 and computing module 304.
The first acquisition module 301 obtains and uses the user of website and user's dominant character.The second acquisition module 302 obtains the characteristic information of all populations from national demographic database, wherein, characteristic information comprises dominant character and recessive character.
Computing module 304, according to the characteristic information of all populations, the use user of website and user's dominant character, calculates user's recessive character in conjunction with bayesian algorithm and distributes.Concrete, if computing module 304 can be under the recessive character any user, the Probability Independence condition that user uses website and user to have dominant character is set up, adopting bayesian algorithm to calculate user's recessive character distributes, if computing module 304 again can be specifically under the recessive character any user, the Probability Independence condition that user uses website and user to have dominant character is set up, and calculates described user's recessive character according to following formula
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, described t is user's dominant character, described f is the user who uses described website, and can consult estimation user's recessive character Distributed Implementation mode for the origin of above-mentioned computing formula, now repeats no longer one by one.
Device also can comprise judge module 303 and analysis module 305.Judge module 303, for according to all users' characteristic information, the use user of website and user's dominant character, calculates any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
With,
According to described all users' characteristic information, the use user of website and user's dominant character, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
And,
Judge described any user's P 1with P 2whether equate, if equate, described Probability Independence condition is set up.
Analysis module 305 is according to user's dominant character and recessive character, analysis user behavioural habits, thus can formulate advertising strategy according to user's behavioural habits, or, push suitable value-added service etc. to user.Wherein, get user's dominant character and recessive character, can more accurately determine user's behavioural habits, and then making advertising strategy or the value-added service of propelling movement of formulating more reasonable, improving success ratio.
In embodiment of the present invention, computing module 304 is in the time calculating user's recessive character, add the user's who uses this website data, make to have the probability having in the crowd of dominant character in the middle of recessive character in the middle of calculating the user group of website time, be user group using website as sample space, instead of national demographic data, the difference of sample space does not just exist, thereby the error of result of calculation is not existed, corrected Calculation result.
Device the second embodiment that the present invention also provides estimation user's recessive character to distribute, as shown in Figure 6, device comprises processor 401, storer 402 and bus 403.Processor 401 is all connected with bus 403 with storer 402.
Processor 401 uses the user of website and user's dominant character for obtaining, obtain the characteristic information of all populations from demographic database, wherein, described characteristic information comprises dominant character and recessive character, according to the characteristic information of described all populations, the use user of website and described user's dominant character, calculate described user's recessive character in conjunction with bayesian algorithm and distribute.
Further, processor 401 is according to the characteristic information of described all populations, the use user of website and described user's dominant character, the step of calculating described user's recessive character distribution in conjunction with bayesian algorithm is specially: if under any user's recessive character, the Probability Independence condition that user uses website and user to have dominant character is set up, calculate described user's recessive character according to following formula
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.And judge, under any user's recessive character, whether the Probability Independence condition that user uses website and user to have dominant character is set up, described judgement concrete steps comprise:
According to the characteristic information of all populations, the use user of website and user's dominant character, calculate any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
According to the characteristic information of described all populations, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
If described any user's P 1with P 2all equate, described Probability Independence condition is set up.
Processor 401 also, for according to described user's dominant character and recessive character, is analyzed described user behavior custom.
It should be noted that: use the user of website and user's dominant character to be obtained by backstage, website statistics, and be stored in storer 402, processor 401 extracts and uses the user of website and user's dominant character from storer 402.And the content of demographic database also can be stored in storer 402 on backstage, website in advance from open channel gets, while needing population in use database, from storer 402, extract, also can obtain from open channel more when needed.
In embodiment of the present invention, processor 401 is in the time calculating user's recessive character, add the user's who uses this website data, make to have the probability having in the crowd of dominant character in the middle of recessive character in the middle of calculating the user group of website time, be user group using website as sample space, instead of national demographic data, the difference of sample space does not just exist, thereby the error of result of calculation is not existed, corrected Calculation result.
The foregoing is only embodiments of the present invention; not thereby limit the scope of the claims of the present invention; every equivalent structure or conversion of equivalent flow process that utilizes instructions of the present invention and accompanying drawing content to do; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (12)

1. a method of estimating user's recessive character distribution, is characterized in that, described method comprises:
Obtain and use the user of website and user's dominant character;
Obtain the characteristic information of all populations from demographic database, wherein, described characteristic information comprises dominant character and recessive character;
According to the characteristic information of described all populations, the use user of website and described user's dominant character, calculate described user's recessive character in conjunction with bayesian algorithm and distribute.
2. method according to claim 1, is characterized in that,
Described according to the characteristic information of described all populations, the use user of website and described user's dominant character, the step of calculating described user's recessive character distribution in conjunction with bayesian algorithm is specially:
If under any user's recessive character, the Probability Independence condition that user uses website and user to have dominant character is set up, and calculates described user's recessive character according to following formula,
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
3. method according to claim 2, is characterized in that, further comprises, judges whether the Probability Independence condition that user uses website and user to have dominant character is set up under any user's recessive character, and described judgement concrete steps comprise:
According to the characteristic information of all populations, the use user of website and user's dominant character, calculate any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
According to the characteristic information of described all populations, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
If described any user's P 1with P 2all equate, described Probability Independence condition is set up.
4. according to the method described in any one in claim 1~3, it is characterized in that, described method also comprises:
According to described user's dominant character and recessive character, analyze described user behavior custom.
5. a device of estimating user's recessive character distribution, is characterized in that, comprising:
The first acquisition module, uses the user of website and user's dominant character for obtaining;
The second acquisition module, for obtain the characteristic information of all populations from national demographic database, wherein, described characteristic information comprises dominant character and recessive character;
Computing module, for according to the characteristic information of described all populations, the use user of website and described user's dominant character, calculates described user's recessive character distribution in conjunction with bayesian algorithm.
6. the method for stating according to claim 5, is characterized in that, described computing module specifically for
If under any user's recessive character, the Probability Independence condition that user uses website and user to have dominant character is set up, and calculates described user's recessive character according to following formula,
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
7. method according to claim 6, is characterized in that, described device also comprises judge module;
Described judge module, for according to described all users' characteristic information, the use user of website and user's dominant character, calculates any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
With,
According to described all users' characteristic information, the use user of website and user's dominant character, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
And,
Judge described any user's P 1with P 2whether equate, if equate, described Probability Independence condition is set up.
8. according to the device described in claim 5~7, it is characterized in that, described device also comprises analysis module;
Described analysis module, for according to user's dominant character and recessive character, analyzes described user behavior custom.
9. a device of estimating user's recessive character distribution, is characterized in that, described device comprises processor;
Described processor uses the user of website and user's dominant character for obtaining, with, obtain the characteristic information of all populations from demographic database, wherein, described characteristic information comprises dominant character and recessive character, and, according to the characteristic information of described all populations, the use user of website and described user's dominant character, calculate described user's recessive character in conjunction with bayesian algorithm and distribute.
10. device according to claim 9, is characterized in that,
Described processor is according to described according to the characteristic information of described all populations, the use user of website and described user's dominant character, and the step of calculating described user's recessive character distribution in conjunction with bayesian algorithm is specially:
If described processor is under the recessive character any user, the Probability Independence condition that user uses website and user to have dominant character is set up, and calculates described user's recessive character according to following formula,
P ( x 1 ∩ . . . . ∩ x L | t ∩ f ) = P ( t | x 1 ∩ . . . . . ∩ x L ) P ( f | x 1 ∩ . . . . . ∩ x L ) P ( x 1 ∩ . . . . . ∩ x L ) P ( t ∩ f )
Wherein, described L is more than or equal to 1 integer, the recessive character that described x is user, and the dominant character that described t is user, described f is the user who uses described website.
11. devices according to claim 10, is characterized in that,
Described processor is also for judging whether the Probability Independence condition that user uses website and user to have dominant character is set up under any user's recessive character, and described judgement concrete steps comprise:
According to the characteristic information of all populations, the use user of website and user's dominant character, calculate any user's P 1value, wherein, described P 1computing formula as follows:
P 1=P(t∩f|x 1∩....∩x L)
According to the characteristic information of described all populations, calculate any user's P 2value, wherein, described P 2computing formula as follows:
P 2=P(t|x 1∩.....∩x L)P(f|x 1∩.....∩x L)
If described any user's P 1with P 2all equate, described Probability Independence condition is set up.
12. install according to described in any one in claim 9~11, it is characterized in that,
Described processor also, for according to described user's dominant character and recessive character, is analyzed described user behavior custom.
CN201480000467.4A 2014-06-05 2014-06-05 A kind of method and device for the recessive character distribution for estimating user Active CN104205100B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/079258 WO2015184619A1 (en) 2014-06-05 2014-06-05 Method and apparatus for estimating recessive character distribution of users

Publications (2)

Publication Number Publication Date
CN104205100A true CN104205100A (en) 2014-12-10
CN104205100B CN104205100B (en) 2018-02-02

Family

ID=52088179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480000467.4A Active CN104205100B (en) 2014-06-05 2014-06-05 A kind of method and device for the recessive character distribution for estimating user

Country Status (2)

Country Link
CN (1) CN104205100B (en)
WO (1) WO2015184619A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765362A (en) * 2017-04-20 2018-11-06 优信数享(北京)信息技术有限公司 A kind of vehicle checking method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060010029A1 (en) * 2004-04-29 2006-01-12 Gross John N System & method for online advertising
CN103164470A (en) * 2011-12-15 2013-06-19 盛大计算机(上海)有限公司 Directional application method based on user gender distinguished results and system thereof
CN103744917A (en) * 2013-12-27 2014-04-23 东软集团股份有限公司 Mixed recommendation method and system
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060010029A1 (en) * 2004-04-29 2006-01-12 Gross John N System & method for online advertising
CN103164470A (en) * 2011-12-15 2013-06-19 盛大计算机(上海)有限公司 Directional application method based on user gender distinguished results and system thereof
CN103744917A (en) * 2013-12-27 2014-04-23 东软集团股份有限公司 Mixed recommendation method and system
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765362A (en) * 2017-04-20 2018-11-06 优信数享(北京)信息技术有限公司 A kind of vehicle checking method and device

Also Published As

Publication number Publication date
CN104205100B (en) 2018-02-02
WO2015184619A1 (en) 2015-12-10

Similar Documents

Publication Publication Date Title
Crawford et al. Hidden population size estimation from respondent-driven sampling: a network approach
US11625755B1 (en) Determining targeting information based on a predictive targeting model
Leitao et al. Is this scaling nonlinear?
Vij et al. When is big data big enough? Implications of using GPS-based surveys for travel demand analysis
Fienen et al. Social. Water—A crowdsourcing tool for environmental data acquisition
Akgül et al. Inferences on stress–strength reliability based on ranked set sampling data in case of Lindley distribution
Behrens et al. Shocking habits: Methodological issues in analyzing changing personal travel behavior over time
Jin et al. Item response theory models for performance decline during testing
CN105447145A (en) Item-based transfer learning recommendation method and recommendation apparatus thereof
CN103678431A (en) Recommendation method based on standard labels and item grades
Rose et al. Attribute exclusion strategies in airline choice: accounting for exogenous information on decision maker processing strategies in models of discrete choice
CN102880992A (en) Intelligent push system and method for study of students
CN104731958A (en) User-demand-oriented cloud manufacturing service recommendation method
CN106056413A (en) Interest point recommendation method based on space-time preference
Dey et al. Generalized inverted exponential distribution: Different methods of estimation
Yokoo et al. Measures of speeding from a GPS-based travel behavior survey
CN106126519A (en) The methods of exhibiting of media information and server
Warren et al. Verbal feedback in therapeutic communities: Pull-ups and reciprocated pull-ups as predictors of graduation
Adams et al. Strategies for collecting social network data
Yu et al. Estimation of sensitive proportion by randomized response data in successive sampling
CN104834990A (en) Passenger informatization coding method and device
US20170213241A1 (en) Reach and frequency for online advertising based on data aggregation and computing
CN104205100A (en) Method and device for estimating recessive character distribution of users
Wong Comparing the fit of the gravity model for different cross-border flows
Guilkey et al. Cost-effectiveness analysis for health communication programs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20171128

Address after: 100089, room 6, room 602-27, No. 52, North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing imagine Technology Co., Ltd.

Address before: 518000 Guangdong Shenzhen Keyuan Road Changyuan new material port 1

Applicant before: SHENZHEN TUIXIANG BIG DATA INFORMATION TECHNOLOGY CO., LTD.

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room B401, floor 4, building 1, No. 12, Shangdi Information Road, Haidian District, Beijing 100085

Patentee after: Tuxiang Medical Technology Co., Ltd

Address before: 100089, room 6, room 602-27, No. 52, North Fourth Ring Road, Haidian District, Beijing

Patentee before: Beijing Tuoxiang Technology Co.,Ltd.