CN105354244A - Time-space LDA model for social network community mining - Google Patents

Time-space LDA model for social network community mining Download PDF

Info

Publication number
CN105354244A
CN105354244A CN201510670779.2A CN201510670779A CN105354244A CN 105354244 A CN105354244 A CN 105354244A CN 201510670779 A CN201510670779 A CN 201510670779A CN 105354244 A CN105354244 A CN 105354244A
Authority
CN
China
Prior art keywords
microblogging
community
theme
space
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510670779.2A
Other languages
Chinese (zh)
Inventor
段炼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Teachers College
Original Assignee
Guangxi Teachers College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Teachers College filed Critical Guangxi Teachers College
Priority to CN201510670779.2A priority Critical patent/CN105354244A/en
Publication of CN105354244A publication Critical patent/CN105354244A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of public opinion monitoring, and relates to social network user recommendation and service recommendation technology analysis, in particular to a time-space LDA model for social network community mining. A process mainly comprises the following steps: establishing an expression of a microblog element; performing mathematic modeling on microblog vocabulary restriction; performing modeling on a microblog time-space topic model; and performing parameter calculation on the microblog time-space topic model. The time-space LDA model has the beneficial effects that social network communities are discovered by utilizing time, space, microblog topics and user interaction, so that the validity and accuracy of community mining are improved; a relationship between a microblog and a potential geographic region is controlled from topic distribution of regions and communities, the geographic position of the microblog and the space range of the potential geographic region, so that mutual overlapping in space or generation of the contained potential geographic region is reduced, infinite increase in the space range of the potential geographic region is limited, and the accuracy of microblog topic judgment can be improved.

Description

A kind of space-time LDA model excavated for myspace
Technical field
The invention belongs to public sentiment monitoring field, relate to social network user and recommend and social networking service recommendation, particularly relate to a kind of space-time LDA model excavated for myspace.
Background technology
In recent years, microblogging become public opinion, trade marketing and city function " sensor ", social expansion and multiple field such as public sentiment monitoring is extensively penetrated into the investigation and application of microblogging.So-called community, refers to that user assembles according to small world and forms some colonies.The architectural feature of research microblogging community, better can understand information disclosure model, user interaction patterns colony Evolution, have great science and using value.
Current, more technology is all (turn note based on user's social relationships (as perpetual object, good friend) and Twitter message switching response, follow-up, comment) density degree find community, or carry out network partition to obtain community based on models such as clusters, have ignored the potential theme feature of community, as " physical culture ", " science and technology " etc., these reflect the interest tendency of user in some.
Carry out in the technology of community discovery at the potential theme of introducing, utilize the user's homogeney in LDA analysis Twitter, excavate active microblog users group; Or carry out community discovery based on the exchanging visit type between user and microblogging topic similarity, calculate the probability that user is under the jurisdiction of certain community.These community discovery methods based on microblogging theme only find community according to the theme intensity under certain period, and the community structure at this moment found is static, can not embody the Characteristics of Evolution of community.Interest for expression user can be passed in time and change, also user-community's the distribution influence whether being subject to a upper timestamp by Bernoulli Jacob's distribution and expression user theme is had, but the parameter of Bernoulli Jacob's distribution is artificial setting, self-adaptation cannot reflect theme evolving state in time.
Except potential theme, the specific economic civilization environment that geographic area has has strong impact to microblogging theme.And due to the impact of society's work and rest pattern, in different time sections user pay close attention to as if different, thus cause microblogging theme under different time, have specific theme distribution trend.In this respect, user regarded as " text ", number of times of being registered the position of user, as " vocabulary " in text, carries out user's Similarity measures and position recommendation based on topic model; Or carry out user's Similarity measures based on the space and time order track plyability of user.But they do not consider the user interest preference that content of microblog reflects, only weigh user's similarity from space-time position angle.
In the topic model research that geography is relevant, researcher adopts the form of potential geographic area to express to have the microblogging accumulation area of topic similarity, and its regional space dividing mode mainly contains four kinds of modes: regular grid, area under one's jurisdiction (as provincial boundaries, regional boundary etc.), Irregular Geogrid and divide based on probability distribution adaptive region.Due to the pre-set border of Geographic Unit of aforementioned three kinds of modes, the microblogging being difficult to multiple similar topic that accurate description is crossed in multiple Geographic Unit or same geographical unit gathers region, but, existing adaptive region division methods have ignored the distance limit to potential geographical zone boundary on distance measure, easily cause the spatial dimension of some potential geographical region overlay excessive, such as adopt the potential geographic area of the similar blog title of dimensional Gaussian model tormulation, but more than half U.S. is crossed in some potential geographic areas, the potential geography of part is interregional also overlapped, the microblogging theme distribution in potential geographic area is not only caused to be tending towards background theme distribution, cannot outburst area " characteristic " theme distribution, inhibit and introduce space factor meaning in topic model, simultaneously, be positioned at the microblogging theme that multiple potential geographical Regional Gravity folds scope to obscure, be unfavorable for the correct identification of microblogging theme, in addition, existing method also needs to preset potential geographical region quantity, cannot utilize the potential geographical coverage area of feature self-adaptative adjustment and the quantity of data self, also not take the selection preference of user to geographic area into account.
Summary of the invention
Goal of the invention of the present invention is that the social networks corporations discrimination for existing in prior art is not high, does not take user into account to problems such as the selection preference of geographic area, provides a kind of space-time LDA model excavated for myspace.
To achieve these goals, the technical solution used in the present invention is as follows:
For the space-time LDA model that myspace excavates, comprise the following steps:
(1) expression formula of microblogging key element is set up: for carrying out the modeling of conceptual model to microblogging, wherein expression formula is: d i=(W, t, l, r, u, c), wherein W represents this microblogging word bag, is contained in vocabulary summary table V={w 1, w 2..., w | V|, w 1, w 2..., w | V|represent each different vocabulary respectively, t represents microblogging issuing time, and l represents that the geographic position that microblogging is issued, r represent potential geographic area, microblogging place, and u represents microblog users, c representative of consumer place community;
(2) to the mathematical modeling of microblogging vocabulary restriction: for describing different spaces, community to the influence power size of microblogging vocabulary, comprise the mathematical modeling that space and community restrict microblogging vocabulary, described model expression is:
Wherein, represent to there is background theme-vocabulary distribution, represent the theme-vocabulary distribution of each potential geographic area, represent the theme-vocabulary distribution of each community;
(3) microblogging space-time topic model modeling: for describing time, region and community's key element to the generative process of microblogging, the expression formula of described microblogging theme is:
P(z|c,r)=P(z|θ 0rc)=Multi(z|θ 0rc)
Wherein, z represents known theme, θ 0represent background theme distribution, θ rrepresent the theme distribution average of region r, θ crepresent the theme distribution of community c;
(4) microblogging space-time topic model parameter calculates: for obtaining model parameter, thus determine that user belongs to the probability of community, estimates the parameter of space-time topic model based on EM algorithm method and gibbs sampler.
Further, described EM algorithm method comprises the E step that the probability that belongs to each latent factor to microblogging is sampled and the M step being obtained each latent factor Grad in model by gradient descent method.
Further, r is adopted respectively in described E step d, c dand z drepresent the probability of potential geographic area r, community c and theme z in each latent factor, wherein r d, c dand z dbe expressed as:
r d~c d×z d×P(l dr,∑ r)P(r|η 0u)
c d~r d×z d×P(c|γ u)
Wherein, η 0for user is to regioselective background preference parameter, η ufor user u is to the selection preference in region, l dfor microblogging geographic coordinate, μ rfor the average that the dimensional Gaussian of region r distributes, γ ufor user u is under the jurisdiction of the probability of each community, ∑ rfor the variance that the dimensional Gaussian of region r distributes, P (l d| μ r, ∑ r) be in region r and position l for microblogging dprobability, P (r| η 0, η u) be at user u and background area Selection parameter η u, η 0the probability that lower area r occurs, P (c| γ u) belong to the probability of community c, ξ for u zthe time polynomial distribution parameter that the z that is the theme is corresponding, P (Z| θ 0, θ r, θ c) be community theme distribution θ c, area topic distribution θ r, background theme distribution θ 0time theme Z occur probability, for under the distribution situation of known theme z, the generating probability of vocabulary w.
Further, adopt respectively in described M step with represent the Grad of potential geographic area r, community c and theme z distribution parameter in latent factor, wherein with be expressed as:
∂ θ r , z = Σ u ∈ U d u , r , z - Σ u ∈ U d u , r × P ( z | θ 0 , θ r , θ c )
∂ θ c , z = Σ u ∈ U d u , c , z - Σ u ∈ U d u , c × P ( z | θ 0 , θ r , θ c )
∂ θ 0 , z = Σ u ∈ U d u , z - Σ u ∈ U Σ r ∈ R d u , r × P ( z | θ 0 , θ r , θ c )
Wherein, d u, r, zfor the microblogging quantity theming as z that user u delivers at potential geographic area r, d u, c, zfor user u delivers the microblogging quantity theming as z at community c, d u,zfor user delivers the microblogging quantity theming as z, d u,rrepresent user u in potential geographic area r send out microblogging quantity, d u,cmicroblogging quantity is sent out in when user u is subordinate to community c.
In sum, this programme introduces the key elements such as space, time, community and user in LDA model, describe microblogging theme and the distribution of vocabulary in different space-time and community, based on user preference, modeling is carried out to region and community, to find that user is subordinate to the probability of different community simultaneously.The beneficial effect of hinge structure of the present invention is:
1. utilize the mutual discovery jointly carrying out social network Web Community between time, space, microblogging theme and user, improve validity and the accuracy of community mining.
2. introducing into topic model by mutual between time, space, microblogging theme and user, is a kind of Thinking Creation to topic model, theory innovation and technological innovation.
3. utilize EM algorithm method and gibbs sampler to estimate the parameter of space-time topic model, this step distributes from area topic, community's theme distribution, the many aspects such as microblogging geographic position and potential geographical regional space scope control the relation of microblogging and potential geographic area, comparatively previous methods is compared, this not only reduces the spatially overlapped or generation of potential geographic area that comprises, limit the unlimited increase of potential geographical regional space scope, further improves the otherness of each potential geographical area topic distribution, improve the accuracy that microblogging theme differentiates.
Accompanying drawing explanation
Fig. 1 is community space-time topic model figure.
Embodiment
Below with reference to the drawings and specific embodiments, technical scheme provided by the invention is described.
Embodiment 1
Step 1: the expression formula setting up microblogging key element
Every bar microblogging is expressed as 6 key element: d in form i=(W, t, l, r, u, c), wherein W represents this microblogging " word bag ", is contained in vocabulary summary table V={w 1, w 2..., w | V|, t represents microblogging issuing time, and l represents that the geographic position that microblogging is issued, r represent potential geographic area, microblogging place, and u represents microblog users, c representative of consumer place community.
Step 2: to the mathematical modeling of microblogging vocabulary restriction
Microblogging vocabulary affects by background environment, potential geographic area and community, namely there is background theme-vocabulary multinomial distribution theme-vocabulary the multinomial distribution of each potential geographic area with the theme-vocabulary multinomial distribution of each community these multinomial distributions have an impact to the vocabulary generating probability under certain theme simultaneously.Thus, based on sparse increment type generation model, under the distribution situation of known theme z, the generating probability of vocabulary w is:
Step 3: microblogging space-time topic model modeling
(1) space and user mathematical modeling that microblogging theme is restricted
Use θ 0represent background theme multinomial distribution parameter, the generating probability representing the potential geographic area of different themes is regarded as the multinomial distribution sampled result under Dirichlet prior distribution, thus based on DirichletProcessMixtureModel (DPMM), describing the generative process of user to the selection preference of potential geographic area and microblogging geographic position: first, obtaining base distribution G from being uniformly distributed 0, setting focuses parameters α r, then by DirichletProcess-DP (α r, G 0) obtain region multinomial distribution G, based on the selection preference η of user u to region uwith the theme distribution average θ of region r robtain region r from multinomial distribution G, finally generate microblogging region and position l thereof based on dimensional Gaussian distribution i:
l i|r~N(μ r,Σ 0);
r i~G(θ r,u)=G(θ r,η u);
G~DP(α rG 0);
G 0=Uniform();
Wherein, μ r, Σ rfor expressing average and the variance of the dimensional Gaussian distribution of region r.Here in DPMM, each position only obtains from single region, and in addition, G is according to each DP (α rg 0) sampling, its number of parameters is likely different, thus makes G have expressed the potential geographical region quantity of dynamic change.Herein, G 0value unsuitable excessive and too small, to prevent the quantity of managing region potentially too much or too small, be traditionally arranged to be [0.003,0.008].
(2) community and time mathematical modeling that microblogging theme is restricted
User can belong to different community based on its identity and interest preference, and we adopt and select by multinomial distribution γ uexpress the probability that user is under the jurisdiction of each community.In addition, similar with potential geographic area, each community can produce specific theme distribution θ equally c, affect user in this community send out the theme of microblogging, that is:
P(c|u)=Multi(c|γ u)
Once after community and potential geographic area determine, based on sparse increment type generation model, the theme of microblogging can be obtained by their multinomial distribution, that is:
P(z|c,r)=P(z|θ 0rc)=Multi(z|θ 0rc)
One day is on average divided into 24 time slots, the corresponding multinomial distribution ξ of each theme z, with the probability of occurrence of each time slot under expressing each theme, that is:
P(t|z)=Multi(t|ξ z)
Microblogging document structure tree formalization process is:
(1) from Laplace distribution L (0, O u), L (0, ρ u) and Dirichlet (α 0), Dirichlet (α rg 0), Dirichlet (α c), sampling obtains user-potential regional distribution parameter η respectively in Dirichlet (ψ) u, user-community distribution parameter γ u, background theme distribution parameter θ 0, potential geographic area-theme distribution parameter θ r, community-theme distribution parameter θ c, theme-Annual distribution parameter ξ z;
(2) for each potential geographic area r=1 ..., R:
A) based on potential geographic area-theme distribution parameter θ rwith user-potential regional distribution parameter η ustructure multinomial distribution, and therefrom extract potential geographic area r;
B) from potential geographic area r, sampling obtains microblogging geographic position l d;
(3) for each community c=1 ..., C: based on user-community's distribution parameter structure multinomial distribution, and therefrom extract community c;
(4) for each theme z=1 .., K:
A) by θ 0, θ r, θ cstructure theme multinomial distribution, and therefrom extract theme z;
B) from theme-time polynomial distribution, sampling obtains time t;
C) from Dirichlet (β 0) in obtain background vocabulary distribution parameter
D) from Dirichlet (β r) in obtain vocabulary distribution parameter in potential geographic area r
E) from Dirichlet (β c) in obtain vocabulary distribution parameter in community c
(5) for each vocabulary w=1 in microblogging d ..., Nd: from for extracting vocabulary w in the theme-vocabulary multinomial distribution of parameter.
Step 4: microblogging space-time topic model parameter calculates
The parameter of space-time topic model is estimated based on EM method and gibbs sampler.
1) E step, probability microblogging being belonged to each latent factor is sampled.These probability are the probability that microblogging d is under the jurisdiction of potential geographic area r, community c and theme z respectively, and we adopt r respectively d, c dand z drepresent:
r d~c d×z d×P(l dr,∑ r)P(r|η 0u)(1)
c d~r d×z d×P(c|γ u)(2)
Wherein, η 0for user is to regioselective background preference parameter, η ufor user u is to the selection preference in region, l dfor microblogging geographic coordinate, μ rfor the average that the dimensional Gaussian of region r distributes, γ ufor user u is under the jurisdiction of the probability of each community, ∑ rfor the variance that the dimensional Gaussian of region r distributes, P (l d| μ r, ∑ r) be in region r and position l for microblogging dprobability, P (r| η 0, η u) be at user u and background area Selection parameter η u, η 0the probability that lower area r occurs, P (c| γ u) belong to the probability of community c, ξ for u zthe time polynomial distribution parameter that the z that is the theme is corresponding, P (Z| θ 0, θ r, θ c) be community theme distribution θ c, area topic distribution θ r, background theme distribution θ 0time theme Z occur probability, for under the distribution situation of known theme z, the generating probability of vocabulary w.
Express microblogging d by DPMM and be positioned at certain domain of the existence r jor the probability of new region r`.Therefore, amendment formula (1), becomes:
r d = j , ~ r d j Σ i ∈ R r d i + G 0 r ` , ~ G 0 Σ i ∈ R r d i + G 0 - - - ( 4 )
r d i ~ c d × z d × P ( l d | μ i , Σ l ) P ( r | η 0 , η u ) × 1 ln ( Σ i ) + 1 - - - ( 5 )
Wherein, G 0the probability generating new potential geographic area r` can be regarded as, ∑ ifor expressing the variance of the dimensional Gaussian distribution of i-th potential geographic area r.When the probability that microblogging is under the jurisdiction of current all potential geographic areas is all very low, model can be inclined to the new potential geographic area r` of generation one, and the average expressing the dimensional Gaussian distribution of r` is the geographic position of this microblogging, and variance is an acquiescence initial value Σ 0, this value size can not affect the spatial dimension of final each potential geographic area, and EM algorithm iteration number of times only can be caused to increase to some extent or reduce, in text, if global area area is R 0, then in formula (5) the spatial dimension of potential geographic area is limited.Because if a potential geographic area r ithe spatial dimension covered is excessive, the variance ∑ namely in the distribution of its dimensional Gaussian ivery large, then the probability that geographically close with this region microblogging adds this potential geographic area is then comparatively large, thus expands r further icoverage area, cause r iin theme tend to be diluted, introducing can punish the potential geographic area having and manage coverage significantly, to gather in the moderate potential geographic area of coverage area making microblogging tend to, or the new potential geographic area of auto-building model is to adapt to the space distribution of this microblogging theme.
2) M step, other parameters of fixed model, maximize the posteriority likelihood value of model, calculate each latent factor.
Based on the r in E step dupgrade the dimensional Gaussian distribution parameter of potential geographic area r:
μ r = Σ d ∈ r l d # ( r , d ) - - - ( 6 )
Σ r = Σ d ∈ r ( l d - μ d ) T ( l d - μ d ) # ( r , d ) - 1 - - - ( 7 )
Wherein # (r, d) represents microblogging sum in the r of region, l dfor microblogging geographic coordinate.Except the distribution parameter of potential geographic area, other each parameter values cannot be obtained by exact numerical formulae discovery, and sampled gradients descent method carries out the iteration reasoning of parameters herein.
Calculate user u to the Grad of the selection preference distribution parameter of potential geographic area r calculate total user to the Grad of the selection preference distribution parameter of r simultaneously wherein, d u, rrepresent user u in potential geographic area r send out microblogging quantity, d urepresent that microblogging is sent out by user u institute total:
∂ η u , r = d u , r - d u × P ( r | η 0 , η u ) - - - ( 8 )
∂ η 0 , r = Σ u ∈ U d u , r - Σ u ∈ U d u × P ( r | η 0 , η u ) - - - ( 9 )
Calculate the Grad of community's theme distribution parameter wherein, d u, cmicroblogging quantity is sent out in when user u is subordinate to community c:
∂ γ u , c = d u , c - d u × P ( c | γ u ) - - - ( 10 )
Calculate the Grad of potential geographical area topic, community's theme and background theme distribution parameter: with wherein, d u, r, zfor the microblogging quantity theming as z that user u delivers at potential geographic area r, d u, c, zfor user u delivers the microblogging quantity theming as z at community c, d u,zfor user delivers the microblogging quantity theming as z, d u,rrepresent user u in potential geographic area r send out microblogging quantity, d u,cmicroblogging quantity is sent out in when user u is subordinate to community c.
∂ θ r , z = Σ u ∈ U d u , r , z - Σ u ∈ U d u , r × P ( z | θ 0 , θ r , θ c ) - - - ( 11 )
∂ θ c , z = Σ u ∈ U d u , c , z × P ( z | θ 0 , θ r , θ c ) - - - ( 12 )
∂ θ 0 , z = Σ u ∈ U Σ r ∈ R d u , r × P ( z | θ 0 , θ r , θ c ) - - - ( 13 )
Calculate the Grad of the vocabulary distribution parameter under the given potential geographic area of difference, community and background theme: with wherein, n u, r, z, vfor user u vocabulary v in the r of region belongs to the number of times of theme z, n u, r, zfor user belongs to the vocabulary number of times of theme z in the r of region, n u, c, z, vfor user u vocabulary v in community c belongs to the number of times of theme z, n u, c, zfor user u belongs to the vocabulary quantity of theme z in community c, n vfor vocabulary v occurrence number, n zfor belonging to the vocabulary quantity of theme z:
Computing time distribution parameter Grad wherein, d z, tbe the theme as z and the microblogging quantity sent at time t:
∂ ξ z , t = d z , t - Σ t ∈ T d z , t × P ( t | ξ ) - - - ( 17 )
In the M step that (m-1) takes turns, by the difference of latent factor and Grad thereof, obtain m and take turns the parameter value will used in E step, with the θ of m wheel rfor example, θ rfor:
θ r ( m ) = θ r ( m - 1 ) - λ ∂ ∂ θ r J ( Ψ ) = θ r ( m - 1 ) - ∂ θ r - - - ( 18 )
Wherein, λ is learning rate, and J (Ψ) is loss function, the error of this model of abstract representation and true value, the parameters in Ψ Unified Expression model.
In sum, in E step, in conjunction with parameter each in DPMM and topic model, calculate microblogging and be under the jurisdiction of each theme, potential geographic area and community's probability (latent factor); In M step, by gradient descent method, obtain the updated value of each latent factor in model.So repeatedly, until all parameter convergences of model.The final probability γ belonging to each community according to user u uthe height of value, can be divided into user in different community.We think, can be overlapped between community, and namely a user can belong to multiple community, therefore, gets γ uthe individual maximum probability γ of top-k (herein k=10) u,ccorresponding community structure candidate collection, this set has k candidate community, if community's degree of membership average γ ‾ u = Σ i ∈ k γ u , i k , Will { γ u , c | c = 1 , 2 , ... , k , γ u , c > γ ‾ u } The community that is subordinate to as user u of community c.If the γ of this top-k community u,csuitable with degree of membership average, then community's quantity of belonging to of user is more, otherwise user is only under the jurisdiction of less several communities.

Claims (4)

1., for the space-time LDA model that myspace excavates, it is characterized in that, comprise the following steps:
(1) expression formula of microblogging key element is set up: for carrying out the modeling of conceptual model to microblogging, wherein expression formula is: d i=(W, t, l, r, u, c), wherein W represents this microblogging word bag, is contained in vocabulary summary table V={w 1, w 2..., w | V|, w 1, w 2..., w | V|represent each different vocabulary respectively, t represents microblogging issuing time, and l represents that the geographic position that microblogging is issued, r represent potential geographic area, microblogging place, and u represents microblog users, c representative of consumer place community;
(2) to the mathematical modeling of microblogging vocabulary restriction: for describing different spaces, community to the influence power size of microblogging vocabulary, comprise the mathematical modeling that space and community restrict microblogging vocabulary, described model expression is:
Wherein, represent to there is background theme-vocabulary distribution, represent the theme-vocabulary distribution of each potential geographic area, represent the theme-vocabulary distribution of each community;
(3) microblogging space-time topic model modeling: for describing time, region and community's key element to the generative process of microblogging, the expression formula of described microblogging theme is:
P(z|c,r)=P(z|θ 0rc)=Multi(z|θ 0rc)
Wherein, z represents known theme, θ 0represent background theme distribution, θ rrepresent the theme distribution average of region r, θ crepresent the theme distribution of community c;
(4) microblogging space-time topic model parameter calculates: for obtaining model parameter, thus determine that user belongs to the probability of community, estimates the parameter of space-time topic model based on EM algorithm method and gibbs sampler method.
2. a kind of space-time LDA model excavated for myspace according to claim 1, is characterized in that: EM algorithm method comprises the E step that the probability that belongs to each latent factor to microblogging is sampled and the M step being obtained each latent factor Grad in model by gradient descent method in described step (4).
3. a kind of space-time LDA model excavated for myspace according to claim 2, is characterized in that: adopt r respectively in described E step d, c dand z drepresent the probability of potential geographic area r, community c and theme z in each latent factor, wherein r d, c dand z dbe expressed as:
r d~c d×z d×P(l dr,∑ r)P(r|η 0u)
c d~r d×z d×P(c|γ u)
Wherein, η 0for user is to regioselective background preference parameter, η ufor user u is to the selection preference in region, l dfor microblogging geographic coordinate, μ rfor the average that the dimensional Gaussian of region r distributes, γ ufor user u is under the jurisdiction of the probability of each community, ∑ rfor the variance that the dimensional Gaussian of region r distributes, P (l d| μ r, ∑ r) be in region r and position l for microblogging dprobability, P (r| η 0, η u) be at user u and background area Selection parameter η u, η 0the probability that lower area r occurs, P (c| γ u) belong to the probability of community c, ξ for u zthe time polynomial distribution parameter that the z that is the theme is corresponding, P (Z| θ 0, θ r, θ c) be community theme distribution θ c, area topic distribution θ r, background theme distribution θ 0time theme Z occur probability, for under the distribution situation of known theme z, the generating probability of vocabulary w.
4. a kind of space-time LDA model excavated for myspace according to claim 2, is characterized in that: adopt respectively in described M step with represent the Grad of potential geographic area r, community c and theme z distribution parameter in latent factor, wherein with be expressed as:
Wherein, d u, r, zfor the microblogging quantity theming as z that user u delivers at potential geographic area r, d u, c, zfor user u delivers the microblogging quantity theming as z at community c, d u,zfor user delivers the microblogging quantity theming as z, d u,rrepresent user u in potential geographic area r send out microblogging quantity, d u,cmicroblogging quantity is sent out in when user u is subordinate to community c.
CN201510670779.2A 2015-10-13 2015-10-13 Time-space LDA model for social network community mining Pending CN105354244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510670779.2A CN105354244A (en) 2015-10-13 2015-10-13 Time-space LDA model for social network community mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510670779.2A CN105354244A (en) 2015-10-13 2015-10-13 Time-space LDA model for social network community mining

Publications (1)

Publication Number Publication Date
CN105354244A true CN105354244A (en) 2016-02-24

Family

ID=55330217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510670779.2A Pending CN105354244A (en) 2015-10-13 2015-10-13 Time-space LDA model for social network community mining

Country Status (1)

Country Link
CN (1) CN105354244A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095841A (en) * 2016-06-05 2016-11-09 西华大学 Method is recommended in a kind of mobile Internet advertisement based on collaborative filtering
CN106844416A (en) * 2016-11-17 2017-06-13 中国科学院计算技术研究所 A kind of sub-topic method for digging
CN107341261A (en) * 2017-07-13 2017-11-10 南京邮电大学 A kind of point of interest of facing position social networks recommends method
CN107463624A (en) * 2017-07-06 2017-12-12 深圳市城市规划设计研究院有限公司 A kind of method and system that city interest domain identification is carried out based on social media data
CN107908766A (en) * 2017-11-28 2018-04-13 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN108717421A (en) * 2018-04-23 2018-10-30 深圳市城市规划设计研究院有限公司 A kind of social media text subject extracting method and system based on change in time and space
CN108833211A (en) * 2018-06-28 2018-11-16 浙江理工大学 The unbiased delay sampling method of social networks
CN110019639A (en) * 2017-07-18 2019-07-16 腾讯科技(北京)有限公司 Data processing method, device and storage medium
CN114528393A (en) * 2022-01-10 2022-05-24 华南理工大学 Method, system and medium for mining and evolution analysis of interest tag research by scholars

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145771A1 (en) * 2007-03-15 2010-06-10 Ariel Fligler System and method for providing service or adding benefit to social networks
CN101916256A (en) * 2010-07-13 2010-12-15 北京大学 Community discovery method for synthesizing actor interests and network topology
CN103399932A (en) * 2013-08-06 2013-11-20 武汉大学 Situation identification method based on semantic social network entity analysis technique
CN104462592A (en) * 2014-12-29 2015-03-25 东北大学 Social network user behavior relation deduction system and method based on indefinite semantics
CN104572797A (en) * 2014-05-12 2015-04-29 深圳市智搜信息技术有限公司 Individual service recommendation system and method based on topic model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145771A1 (en) * 2007-03-15 2010-06-10 Ariel Fligler System and method for providing service or adding benefit to social networks
CN101916256A (en) * 2010-07-13 2010-12-15 北京大学 Community discovery method for synthesizing actor interests and network topology
CN103399932A (en) * 2013-08-06 2013-11-20 武汉大学 Situation identification method based on semantic social network entity analysis technique
CN104572797A (en) * 2014-05-12 2015-04-29 深圳市智搜信息技术有限公司 Individual service recommendation system and method based on topic model
CN104462592A (en) * 2014-12-29 2015-03-25 东北大学 Social network user behavior relation deduction system and method based on indefinite semantics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
段炼等: ""基于社区时空主题模型的微博社区发现方法"", 《电子科技大学学报》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095841B (en) * 2016-06-05 2019-05-03 西华大学 A kind of mobile Internet advertisement recommended method based on collaborative filtering
CN106095841A (en) * 2016-06-05 2016-11-09 西华大学 Method is recommended in a kind of mobile Internet advertisement based on collaborative filtering
CN106844416A (en) * 2016-11-17 2017-06-13 中国科学院计算技术研究所 A kind of sub-topic method for digging
CN107463624A (en) * 2017-07-06 2017-12-12 深圳市城市规划设计研究院有限公司 A kind of method and system that city interest domain identification is carried out based on social media data
CN107341261B (en) * 2017-07-13 2020-10-27 南京邮电大学 Interest point recommendation method oriented to location social network
CN107341261A (en) * 2017-07-13 2017-11-10 南京邮电大学 A kind of point of interest of facing position social networks recommends method
CN110019639A (en) * 2017-07-18 2019-07-16 腾讯科技(北京)有限公司 Data processing method, device and storage medium
CN110019639B (en) * 2017-07-18 2023-04-18 腾讯科技(北京)有限公司 Data processing method, device and storage medium
CN107908766A (en) * 2017-11-28 2018-04-13 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN108717421A (en) * 2018-04-23 2018-10-30 深圳市城市规划设计研究院有限公司 A kind of social media text subject extracting method and system based on change in time and space
CN108833211A (en) * 2018-06-28 2018-11-16 浙江理工大学 The unbiased delay sampling method of social networks
CN114528393A (en) * 2022-01-10 2022-05-24 华南理工大学 Method, system and medium for mining and evolution analysis of interest tag research by scholars
CN114528393B (en) * 2022-01-10 2023-02-14 华南理工大学 Method, system and medium for mining and evolution analysis of interest tag research by scholars

Similar Documents

Publication Publication Date Title
CN105354244A (en) Time-space LDA model for social network community mining
Ali et al. Review of urban building energy modeling (UBEM) approaches, methods and tools using qualitative and quantitative analysis
Ali et al. A data-driven approach for multi-scale GIS-based building energy modeling for analysis, planning and support decision making
Bedi et al. Empirical mode decomposition based deep learning for electricity demand forecasting
Shrivastava et al. A multiobjective framework for wind speed prediction interval forecasts
Aydin Production modeling in the oil and natural gas industry: an application of trend analysis
Xu et al. Scenario tree reduction in stochastic programming with recourse for hydropower operations
Zhang et al. Dynamic streamflow simulation via online gradient-boosted regression tree
He et al. Simultaneously simulate vertical and horizontal expansions of a future urban landscape: A case study in Wuhan, Central China
Chen et al. Groundwater level prediction using SOM-RBFN multisite model
CN106022614A (en) Data mining method of neural network based on nearest neighbor clustering
Liao et al. POI recommendation of location-based social networks using tensor factorization
Abunima et al. A new solar radiation model for a power system reliability study
Sun et al. Spatial modelling the location choice of large-scale solar photovoltaic power plants: Application of interpretable machine learning techniques and the national inventory
Giupponi et al. Spatial assessment of water use efficiency (SDG indicator 6.4. 1) for regional policy support
Sušnik Economic metrics to estimate current and future resource use, with a focus on water withdrawals
Puppala et al. Identification and analysis of barriers for harnessing geothermal energy in India
Viola et al. Modelling the mutual interactions between hydrology, society and water supply systems
Hung et al. Assessing adaptive irrigation impacts on water scarcity in nonstationary environments—a multi‐agent reinforcement learning approach
Jia et al. A distributed probabilistic modeling algorithm for the aggregated power forecast error of multiple newly built wind farms
Lin et al. Data-driven prediction of building energy consumption using an adaptive multi-model fusion approach
Mason et al. Identifying and modeling dynamic preference evolution in multipurpose water resources systems
Li et al. Evolution of FDI flows in the global network: 2003–2012
Abuamra et al. Medium-term forecasts for salinity rates and groundwater levels
Efstratiadis et al. Generalized storage-reliability-yield framework for hydroelectric reservoirs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160224