CN103020221A

CN103020221A - Social search method based on multi-mode self-adaptive social relation strength excavation

Info

Publication number: CN103020221A
Application number: CN 201210535907
Authority: CN
Inventors: 徐常胜; 桑基韬
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2012-12-12
Filing date: 2012-12-12
Publication date: 2013-04-03

Abstract

The invention discloses a self-adaptive social relation strength excavation method based on a multi-mode generative model. The method comprises the following steps of: collecting picture information uploaded by users and users which have social relationships with the picture information, and enabling each user to correspond to a triple consisting of an uploaded image set, an image annotation set and a social network; reversing the generation process of picture contents and annotations according to an input triple through the multi-mode generative model for concluding to obtain a theme space which can be used for describing user interest distribution and the theme distribution of the users; and calculating the obtained theme space and the user theme distribution to obtain the relation strength of theme sensitivity among users. The method is applied to self-adaptive multimedia retrieval and the like.

Description

A kind of social searching method that excavates based on multi-modal self-adaptation social relationships intensity

Technical field

The present invention relates to the multimedia search field, particularly a kind of social searching method that excavates based on multi-modal self-adaptation social relationships intensity.

Background technology

Social Media (Social Media) has greatly changed the user and has shared mode and custom with obtaining information.In the Social Media service, the user consists of community, i.e. so-called community network with other user interactions inevitably.Comprise two-way social relationships in the community network, such as " related (Connect) " among the LinkedIn and " the adding as a friend (Add Friend) " among the Facebook, and unidirectional social relationships, such as " the subscribing to (Subscribe) " among " following (Follow) " among the Twitter and the Youtube.These social relationships are considered to affect user's behavior and the active development of community network.As, the colleague on the LinkedIn can affect the selection on the personal work, and the good friend on the Facebook then can affect behavior and the demand in the personal lifestyle.Can urge to give birth to a lot of important application by analyzing and excavate these social relationships, such as viral marketing, Collaborative Recommendation and cooperative information search etc.Take based on the multimedia collaborative search of one-dimensional society relation as example, its basic assumption and starting point are: by analyzing the behavior to other users of the influential relation of search subscriber, real demand that can the forecasting search user is also adjusted Search Results.

The method of excavating for social relationships at present mainly concentrates on and studies the prediction of strength that whether has social relationships and social relationships.In a lot of problems, binaryzation or continuous social relationships can not be satisfied the demand of application.As in the multimedia search problem, for different search words, the social relationships between the user are different.Suppose that the user will be the photo of honeymoon trip search " Hawaii " of oneself, have the good friend of tourism speciality can be maximum to his help, we wish the social relationships grow between them; And when the photo of same user search " fashion show ", can wish that then pop fashion has the good friend of research can affect more Search Results, i.e. social relationships grow between them.We claim that this social relationships relevant with problem are adaptive social relationships intensity, and will introduce in the present invention a kind of self-adaptation social relationships intensity method for digging based on multi-modal production model.

Summary of the invention

The technical matters that (one) will solve

The purpose of this invention is to provide according to the adaptive social relationships intensity of excavating and carry out picture searching, when the different demand of user, can automatically obtain the assistance from different user, thereby help to understand and the real demand of predictive user, and then accurate finger URL share the Search Results of family real demand.

(2) technical scheme

For solving the problems of the technologies described above, the invention provides a kind of social searching method that excavates based on multi-modal self-adaptation social relationships intensity, it is as follows that the method comprising the steps of:

Step 1: collect the pictorial information that the user uploads and the user that one-dimensional society's relation is arranged with it, each user corresponding one by uploading image collection, image labeling set and gathering the tlv triple that forms with its user that concerns by one-dimensional society's relation;

Step 2: according to the tlv triple of input, set up multi-modal probability production model, the generative process of the image labeling information in the image content in the described image collection and the image labeling set is inferred;

Step 3: calculate user theme space and the distribution of user's theme according to inferred results, calculate the social relationships intensity of the theme sensitivity between user and the user;

Step 4: Search Results is sorted according to resulting user's theme space, the distribution of user's theme and social relationships intensity.

(3) beneficial effect

The present invention has adopted multi-modal production model, and user's community network, the user who observes uploaded image and provide mark counter pushing away, and proposes a kind of social relationships intensity method for digging of theme sensitivity.This invention has solved the social relationships intensity problem that self-adaptation is adjusted in different problems, wherein considers simultaneously text marking data and visual pattern feature, can analyze preferably the social relationships intensity in the multimedia application; The theme that in addition, can obtain simultaneously theme space, user by the method distribute and the user between the pass tie up to intensity on the different themes.

Description of drawings

Fig. 1 is the process flow diagram according to the self-adaptation social relationships intensity method for digging based on multi-modal production model of the present invention;

Fig. 2 is the synoptic diagram according to multi-modal production topic model of the present invention;

Fig. 3 is the synoptic diagram according to the implementation result of method provided by the present invention on the Flickr data set;

Fig. 4 is another synoptic diagram according to the implementation result of method provided by the present invention on the Flickr data set.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.

The present invention realizes the method for the social relationships intensive analysis of theme sensitivity under a kind of social multimedia environment, can be for different application, and self-adaptation is regulated social relationships intensity.Compare existing social relationships intensive analysis method, obtain on the one hand the social relationships intensity of theme sensitivity, but self-adaptation is regulated according to problem; On the other hand, by considering text message and visual information, can serve better multimedia application.

Fig. 1 is the process flow diagram according to the self-adaptation social relationships intensity method for digging based on multi-modal production model of the present invention.As shown in Figure 1, method provided by the present invention comprises the steps:

Step 1: input pre-treatment step, namely collect the pictorial information that the user uploads and the user that one-dimensional society's relation is arranged with it, the oriented social action that means in the Social Media sharing website is closed in the one-dimensional society here, such as follow (follow) among the contact among the Flickr (contact) or the Twitter.Each user corresponding one by the tlv triple that image collection that user's set, this user upload and image labeling set form that concerns that has one-dimensional society's relation with it, wherein image labeling refers to original tag (Tag) information of the Description Image that this user provides;

Step 2: the responsive social relationships excavation step of multi-modal theme, namely according to the described tlv triple of inputting, by a kind of multi-modal production model, the anti-row that advances of the generative process of the image labeling information in the image content in the described image collection and the image labeling set is inferred, obtain one and can describe theme space and the distribution of user's theme that user interest distributes;

Step 3: the output parameter calculation procedure, namely the described theme space and the distribution of described user's theme that obtain are calculated, obtain the relationship strength of the theme sensitivity between user and the user.

The below is described in detail each step.Following table has provided the used key symbol tabulation of the present invention and corresponding description thereof.

Step 1: input pre-treatment step.

The social relationships intensity Mining Problems of theme sensitivity is at first described with mathematical linguistics:

The user who defines in the 1 given Social Media (such as Flickr) gathers U, the corresponding tlv triple { C of each user u ∈ U wherein _u, D _u, T _u, C wherein _u, D _u, T _uExpression and user u have the image labeling set that image collection that user's set, user u upload and user u add that concerns of one-dimensional society relation respectively.

The purpose that the social relationships intensity of theme sensitivity is excavated is learnt exactly:

(1) theme space

With

Φ wherein ^w, Φ ^vThe mark word that is the theme distributes and the vision descriptor distributes, and w, v are user's mark word and vision descriptor vector, and R is real number field, and K is the theme sum, and k represents the k theme.All users' mark word forms the mark dictionary, and all vision descriptors form vision descriptor dictionary, | W| is the size of mark dictionary, | V| is the size that vision is described dictionary.The mark term vector here refers to the response that marks on the dictionary that is labeled in of certain user's use, such as the w={ landscape, and travelling, landscape, ..., vision descriptor vector refers to the response of picture on vision descriptor dictionary that certain user uploads, such as υ={ descriptor 1, descriptor 5, descriptor 1 ...;

(2) theme of each user u distributes

For the user uses the probability of each theme, the interest that can be regarded as the user distributes;

(3) the social relationships intensity of theme sensitivity

K={1 ..., K}, That record is user u ₂To user u ₁Social relationships intensity on k theme,

Attention: the social relationships intensity here is unidirectional, namely

With

Different.

Pretreated operation is exactly that three kinds of elements to the tlv triple of inputting gather and represent.

Step 11: concern that the user gathers C _uCollection and pre-service.

To each user u, according to its social relation network collection has one-dimensional society to concern with it user, form set C _u

Step 12: user's uploading pictures set D _uCollection and pre-service.

To each user u, gather the picture that it is uploaded, represent with the response vector of picture at vision descriptor dictionary, and form image collection D _uThe vision descriptor here can adopt any word bag (bag-of-words) feature, the present invention has adopted maximum stable extremal region feature (Maximally Stable Extremal Region in test experiments, MSER) come the vision content of Description Image, compare with the feature based on key point, the MSER feature is described is local homogeneity part in the image, have higher consistance, be more suitable in problem background of the present invention.

Step 13: user images mark set T _uCollection and pre-service.

In the media sharing website, the user can add label so that management and description, i.e. markup information for its image of uploading.To each user u, gather the markup information of its interpolation.Each user's markup information also represents with its response vector at the mark dictionary, and forms set T _u

Step 2: the responsive social relationships excavation step of multi-modal theme.

User's online behavior is considered to be subject to have with it very big impact of user of social relationships, in the image sharing website, uploads image and image is marked the observation that can think the online behavior of user.Based on this, the present invention proposes a kind of multi-modal probability production model, by the production process of analog image and mark, infer inherent social relationships structure.Concrete thought is: suppose that to each user its image of uploading and mark are produced by dual mode, or depend on the interest of oneself, or are subjected to other users' impact.According to this hypothesis, the present invention proposes the responsive social relation model of a kind of multi-modal theme, this model comprises the steps:

Step 21: the multi-modal probability production model of setting up the theme sensitivity.

Fig. 2 has shown the structural representation of the production model that hypothesis thus proposes.In the production model, arrow represents the conditional relationship hypothesis, corresponding to sampling from distribute; Circle represents variable, and wherein solid circles represents observational variable, comprises that mainly the social relationships user gathers C _u, upload image collection D _uWith the image labeling set T that adds _uEmpty circles represents hidden variable, mainly comprises switch hidden variable s, theme record hidden variable z, sampling user hidden variable c, and theme space variable Φ and user's theme distribution variable Ω.Switch hidden variable s is used for the generative process of control observation variable, specifically sees lower; The theme that theme record hidden variable z record obtains by user's theme profile samples; The unidirectional relationship user that the sampling of sampling user hidden variable c record obtains.

Among the present invention, for the image vision content, at first make up a vision descriptor dictionary, then its vision descriptor vector v in the response formation of vision descriptor dictionary of every width of cloth imagery exploitation represents.Model is introduced binary switch hidden variable s ^wAnd s ^vControl and record certain mark word and vision descriptor and be by the spontaneous generation of user u or the impact that is subjected to other users.Work as s ^w=1 o'clock, expression mark root was according to user's oneself theme distribution Ω _uProduce; Work as s ^w=0 o'clock, expression mark word was concerned user list C _uIn certain user c ^wImpact, and distribute according to this theme that affects the user

Produce.

Therefore, the production process of the mark word of user u is as follows:

Sampling obtains switching variable from Bei Nuli distributes:

s ^w～Bernoulli (λ), the wherein shape of λ control Bei Nuli distribution;

If s ^w=0, then concern the user list according to multinomial distribution that from user u sampling obtains one and affects the user:

Wherein γ controls the shape of multinomial distribution;

From affecting the user Theme distribute Theme of middle sampling is recorded as variable

If s ^w=1, then from the theme distribution Ω of user u oneself _uTheme of middle sampling is recorded as variable

Distribute from the mark word of theme

Middle sampling obtains marking word w _{U, i}

The production process of vision descriptor similarly, vision descriptor v _{U, i}It is the vision descriptor distribution by theme

Middle sampling produces.By this step set up such as the production model of Fig. 2 and the production process of above-mentioned hypothesis, the gibbs sampler during we can carry out steps 22, each the hidden variable value that obtains sampling, thereby the production model is found the solution.

Step 22: the multi-modal probability production model of finding the solution the theme sensitivity.

Described in step 1, this model be input as one group of user, corresponding tlv triple { C of each user wherein _u, D _u, T _u.Described in step 21, we are to a kind of production process of this input hypothesis and introduced a series of hidden variables, and finding the solution this model namely needs to sample by the production process of the input that observes and hypothesis and infer the value of these hidden variables.Finally, according to the sampling value of hidden variable, we can carry out output parameter calculating, and this will state in step 3.The production model that proposes comprises three class hidden variables: switch hidden variable s ^w, s ^v, sampling user hidden variable c ^w, c ^vAnd theme record hidden variable z ^w, z ^v, s wherein ^v, c ^vAnd z ^vThe switch hidden variable that sampling produced when expression generated the vision descriptor respectively, sampling user's hidden variable and theme record hidden variable, the present invention utilizes gibbs (Gibbs) sampling to carry out the deduction of model hidden variable and comprises that theme space, user's theme distribute and the finding the solution of the model parameter of the relationship strength of theme sensitivity.

When using gibbs sampler to find the solution the production model, each iteration can obtain a value for each hidden variable sampling, and the sampling of the hidden variable value that the sampling of last iteration obtains hidden variable can upgrade next iteration the time.Each hidden variable is carried out the iteration renewal by fixing its dependent variable,, the update rule of the hidden variable relevant with the mark word is as follows:

p (s_{i}^{w} = 0 | s_{- i}^{w}, u_{i}^{w}, c_{i}^{w}, z_{i}^{w}, \cdot) &Proportional; \frac{N_{U, S}^{w} (u_{i}^{w}, 0) + α_{λ} - 1}{N_{U}^{w} (u_{i}^{w}) {+ 2 α}_{λ} - 1} \cdot \frac{N_{U, Z}^{w} (c_{i}^{w}, z_{i}^{w}) + α_{Ω} - 1}{N_{U}^{w} (c_{i}^{w}) + {Kα}_{Ω} - 1}

p (s_{i}^{w} = 0 | s_{- i}^{w}, u_{i}^{w}, c_{i}^{w}, z_{i}^{w}, \cdot) &Proportional; \frac{N_{U, S}^{w} (u_{i}^{w}, 1) + α_{λ} - 1}{N_{U}^{w} (u_{i}^{w}) {+ 2 α}_{λ} - 1} \cdot \frac{N_{U, S, Z}^{w} (c_{i}^{w}, 1, z_{i}^{w}) + α_{Ω} - 1}{N_{U, S}^{w} (c_{i}^{w}) + {Kα}_{Ω} - 1}

p (c_{i}^{w} | c_{- i}^{w}, s_{i}^{w} = 0, u_{i}^{w}, z_{i}^{w}, C_{u_{i}^{w}}, \cdot) &Proportional; \frac{N_{U, C, S, Z}^{w} (u_{i}^{w}, c_{i}^{w}, 0, z_{i}^{w}) + α_{γ}}{N_{U, S, Z}^{w} (u_{i}^{w}, 0, z_{i}^{w}) + | C_{u_{i}^{w}} |} \cdot \frac{N_{U, Z}^{w} (c_{i}^{w}, z_{i}^{w}) + α_{Ω} - 1}{N_{U}^{w} (c_{i}^{w}) + {Kα}_{Ω} - 1}

p (z_{i}^{w} | z_{- i}^{w}, s_{i}^{w} = 0, w_{i}, \cdot) &Proportional; \frac{N_{U, Z}^{w} + (c_{i}^{w}, z_{i}^{w}) + α_{Ω} - 1}{N_{U}^{w} (c_{i}^{w}) + {Kα}_{Ω} - 1} \cdot \frac{N_{Z, W}^{w} (z_{i}^{w}, w_{i}) + α_{Φ^{w}}}{N_{Z}^{w} (z_{i}^{w}) + | W | α_{Φ^{w}}}

p (z_{i}^{w} | z_{- i}^{w}, s_{i}^{w} = 1, w_{i}, \cdot) &Proportional; \frac{N_{U, S, Z}^{w} + (c_{i}^{w}, 1, z_{i}^{w}) + α_{Ω} - 1}{N_{U, S}^{w} (c_{i}^{w}, 1) + {Kα}_{Ω} - 1} \cdot \frac{N_{Z, W}^{w} (z_{i}^{w}, w_{i}) + α_{Φ^{w}}}{N_{Z}^{w} (z_{i}^{w}) + | W | α_{Φ^{w}}} - - - (1)

Wherein

Represent i the user under the mark word,

Represent that i the theme under the mark word distributes, be generalization, the minimizing model learning complexity that guarantees model, the all priori of hypothesis is all obeyed symmetrical Dirichlet distribute in the model learning process, determine the parameter of these prior distribution parameters, i.e. super parameter, the subscript that adds dependent variable with α represents, α _Ω,

α _λ, α _γBe respectively the super parameter of symmetry of the corresponding Di Li Cray prior distribution of control, it is manually specified when realizing and regulates.N () represents counter, is used for the number of samples that expression iteration sampling process meets certain condition.As

The expression user

The mark word in by concerning the user

Impact

Result from theme

Sample size;

The expression user The mark word in by concerning that customer impact (S=0) results from theme

Sample size; The expression user The mark word in by the sample size that concerns customer impact (S=0);

The expression user

The quantity of mark word.Each counter is all from the gibbs sampler process: sampling obtains meeting the hidden variable of certain counter condition, and this Counter Value namely adds one.The variable update rule relevant with the vision descriptor similarly.The switch hidden variable s that tries to achieve by above-mentioned formula ^w, s ^v, sampling user hidden variable c ^w, c ^vAnd theme record hidden variable z ^w, z ^vConditional probability distribution can when each iteration, obtain the sampling of each hidden variable, and iteration upgrades.

Step 3: output parameter calculation procedure.

The input of this step is the sampled value to each hidden variable that obtains in the step 2 gibbs sampler process, and output is three kinds of parameters that the social relationships intensity Mining Problems of defined theme sensitivity in the step 1 will obtain: theme space Φ ^wAnd Φ ^v, each user u theme distribution Ω _uAnd the social relationships intensity Ψ (k) of theme sensitivity.

Step 31: the calculating of theme space Φ and user's theme distribution Ω.

Through gibbs sampler, can obtain hidden variable

Sampled value.When hidden variable and the mark that observes and the joint distribution of uploading image are stable, the update rule that namely calculates according to sampled value is when meeting observation data and conform to most, and the iteration renewal process reaches convergence.This process is similar to maximal possibility estimation.The sampled value of each hidden variable that obtain behind the Statistical Convergence this moment, and refresh counter can directly be calculated theme space Φ and user's theme distribution Ω.The vision descriptor distribution Φ of the mark word of described theme and theme ^w, Φ ^vThe subspace that expression study is arrived can be distributed by the theme of sampling

Calculate.Because What reality was described is the probability that produces j mark word in k theme, so it can be by normalization counter N _{Z, W}() obtains, and calculates Φ ^vProcess similar, that is:

Φ_{k, j}^{w} = \frac{N_{Z, W}^{w} (Z_{k}, w_{j}) + α_{Φ^{w}}}{N_{Z}^{w} (Z_{k}) + | W | α_{Φ^{w}}} - - - (2)

Φ_{k, j}^{v} = \frac{N_{Z, W}^{v} (Z_{k}, v_{j}) + α_{Φ^{v}}}{N_{Z}^{v} (Z_{k}) + | V | α_{Φ^{v}}} - - - (3)

Z wherein _kRepresent k theme.M user U _mTheme distribute and can followingly calculate:

Ω_{m, k} = \frac{N_{U, S, Z}^{w} (U_{m}, 1, Z_{k}) + N_{U, S, Z}^{v} (U_{m}, 1, Z_{k}) + α_{Ω}}{N_{U, S}^{w} (U_{m}, 1) + N_{U, S}^{v} (U_{m}, 1) + {Kα}_{Ω}} - - - (4)

Step 32: the calculating of the relationship strength Ψ of the theme sensitivity between user and the user.

User U under k the theme _M1To U _M2Social relationships intensity Ψ _{M1, m2}(k), can be by under k the theme, user U _M2Mark word/vision descriptor be subjected to user U _M1The number of impact calculates, i.e. N _{U, C, S, Z}(U _M2, U _M1, 0, Z _k):

ψ_{m 1, m 2} (k) = \frac{N_{U, C, S, Z}^{w} (U_{m 2}, U_{m 1}, 0, Z_{k}) + N_{U, C, S, Z}^{v} (U_{m 2}, U_{m 1}, 0, Z_{k}) + α_{γ}}{N_{U, S, Z}^{w} (U_{m 2}, 0, Z_{k}) + N_{U, S, Z}^{v} (U_{m 2}, 0, Z_{k}) + | C_{U_{m} 2} | α_{γ}} - - - (5)

Wherein

User U _M2The size of user's set of one-dimensional society's relation is arranged, and the practical significance of this formula is, if the user is U _M2Much from theme Z _kMark word or image concerned user U _M1Affect, then U _M1To U _M2Social relationships on k theme are just stronger.Wherein, α _Ω,

α _λ, α _γBe the super parameter of symmetry of the corresponding Di Li Cray prior distribution of control, can when the denominator register N () of each formula value is zero, carry out smoothly simultaneously that these values need to manually be specified and regulate when realizing.

Step 4: Search Results is sorted according to resulting social relationships intensity.

Take the picture search problem as example, can analyze user's query word, the theme of query word q distributes and can be calculated by following formula:

p (Z_{k} | q) = \underset{w_{i} &Element; q}{Π} p (w_{i} | Z_{k})

Wherein ∏ takes advantage of symbol, p (w for connecting _i| Z _k) expression mark word w _iProbability in k theme, it obtains according to described user's theme space.Suppose that current search subscriber is u, c is its unidirectional relationship user, can obtain adaptive social relationships intensity Ψ according to the theme distribution of query word _{U, c}(k) p (Z _k| q).As weight, calculate the relevance scores of the picture that every width of cloth searches by the social relationships intensity that obtains, be used for final ordering.Relevance scores such as picture d can followingly be calculated:

\hat{R} (q, u, d) = R (q, u, d) + \underset{c &Element; C_{u}}{Σ} \underset{k}{Σ} Ψ_{u, c} (k) p (Z_{k} | q) R (q, c, d)

Wherein q represents query word, R (q, u, d) expression is for the correlativity of search subscriber u picture d and query word q, R (q, c, d) represent that then this correlativity can be calculated by any picture indices method or distance metric method for search subscriber u the user c picture d of one-dimensional society's relation and the correlativity of query word q being arranged.Wherein, the following calculating of described correlativity R (q, u, d):

R (q, u, d) = \underset{k}{Σ} p (Z_{k} | u) p (Z_{k} | q) p (Z_{k} | d),

Wherein, p (Z _k| u) be the theme distribution of user u, expression user u uses the probability of k theme, is the Ω that calculates in the step 31 _{U, k}, p (Z _k| d) be the theme distribution of image, following calculating:

p (Z_{k} | d) = \underset{v_{i} &Element; d}{Π} p (v_{i} | Z_{k}),

P (v wherein _i| Z _k) expression vision descriptor v _iProbability in k theme.

The below is the implementation result according to method provided by the present invention.

In order to assess the present invention, the present invention has crawled 3,372 users' image, mark and has concerned user network information from picture sharing website Flickr, obtain altogether 124,099 of images, 30,108 of mark words.

Fig. 3 has shown two dark places in 20 theme spaces that obtain according to method provided by the present invention, and each theme has shown the highest mark word and maximally related five images of the first five ordering.Can find out, by considering simultaneously text marking word and visual pattern content, the theme that extracts by the inventive method has kept a lot of consistance at semantic concept and visual theme, and this provides advantage for the social relationships analysis of further carrying out the theme sensitivity.

Fig. 4 shown two test subscribers and on theme #2 and #13 to its user profile that has the greatest impact.The theme distribution intensity of the length respective user of grey blocks, this has reflected that the user distributes in the interest of corresponding theme.User's preference can be predicted by being presented at the following picture that likes best.Each is concerned the user, has provided its tagger's number among Fig. 4, with and the example images uploaded and mark cloud.Tagger's number can reflect their social influence power, uploads image has reflected their theme sensitivity with the mark cloud speciality.

Can find out that method provided by the present invention can be analyzed the social relationships intensity of theme sensitivity preferably.The strong social relationships user who finds by method provided by the present invention has more follower, and has shown stronger speciality at corresponding theme.Very large in the distribution of theme #2 such as user " 95386698@N00 ", it has carried out a lot of activities relevant with theme #2 to the blit picture as can be known with the mark cloud from it; On the other hand, according to the numerous follower's number of user " 26324110@N00 " with upload the professional of image, can infer roughly that it is the prevalent fashion aspect, i.e. the expert of theme #13.

Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; be understood that; the above only is specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. social searching method that excavates based on multi-modal self-adaptation social relationships intensity, it is as follows that the method comprising the steps of:

2. the method for claim 1 is characterized in that, described step 1 comprises:

Step 11: to each user u, according to its social relation network collection has one-dimensional society to concern with it user, form set C _u

Step 12: to each user u, gather the picture that it is uploaded, form set D _u

Step 13: to each user u, gather it to the mark that the picture of uploading adds, form set T _u

3. the method for claim 1 is characterized in that, described step 2 comprises:

Step 21: set up the multi-modal probability production model of theme sensitivity, simulate the generative process of picture and mark; Wherein by hidden variable is set the process of setting up multi-modal probability production model is described; Wherein, described hidden variable comprises switch hidden variable s, theme record hidden variable z and sampling user hidden variable c, and described switch hidden variable s represents to mark word and image to be the spontaneous generation of user or to be concerned customer impact and produce; The theme that described theme record hidden variable z represents to sample and obtains; The user that concerns that described user's hidden variable c represents to sample and obtains;

Step 22: find the solution described multi-modal probability production model, wherein infer the value that obtains described hidden variable by gibbs sampler.

4. method as claimed in claim 3 is characterized in that, described step 3 comprises:

Step 31: the value according to the described hidden variable that obtains is calculated theme space Φ and user's theme distribution Ω;

Step 32: according to the social relationships intensity Ψ of the theme sensitivity between described theme space Φ and user's theme distribution Ω calculating user and the user.

5. the method for claim 1 is characterized in that, described step 4 comprises:

Calculate the relevance scores of the picture that searches according to the social relationships intensity that obtains, described relevance scores is used for the ordering of net result, and wherein the computing formula of relevance scores is as follows:

\hat{R} (q, u, d) = R (q, u, d) + \underset{c &Element; C_{u}}{Σ} \underset{k}{Σ} Ψ_{u, c} (k) p (Z_{k} | q) R (q, c, d)

Wherein, q represents query word, R (q, u, d) expression is for the correlativity of search subscriber u picture d and query word q, R (q, c, d) represent that then k represents k theme, C for search subscriber u the correlativity that concerns user c picture d and query word q of one-dimensional society relation being arranged _uExpression has the user that concerns of one-dimensional society's relation to gather with user u; Ψ _{U, c}(k) expression concerns that user c is to the social relationships intensity of user u, p (Z _k| q) theme of expression query word q distributes, and its computing formula is as follows:

p (Z_{k} | q) = \underset{w_{i} &Element; q}{Π} p (w_{i} | Z_{k})

P (w wherein _i| Z _k) expression mark word w _iProbability in k theme, it obtains according to described user's theme space.

6. method as claimed in claim 5 is characterized in that, concerns user U _M1To user U _M2The following calculating of social relationships intensity:

Ψ_{m 1, m 2} (k) = \frac{N_{U, C, S, Z}^{w} (U_{m 2}, U_{m 1}, 0, Z_{k}) + N_{U, C, S, Z}^{v} (U_{m 2}, U_{m 1}, 0, Z_{k}) + α_{γ}}{N_{U, S, Z}^{w} (U_{m 2}, 0, Z_{k}) + N_{U, S, Z}^{v} (U_{m 2}, 0, Z_{k}) + | C_{U_{m 2}} | α_{γ}}

Wherein,

Be and user U _M2The size that concerns user's set that one-dimensional society's relation is arranged,

N_{U, C, S, Z}^{w} (U_{m 2}, U_{m 1}, 0, Z_{k})

Expression user U _M2The mark word in by concerning user U _M1Impact results from theme Z _kSample size;

N_{U, C, S, Z}^{v} (U_{m 2}, U_{m 1}, 0, Z_{k})

Expression user U _M2Upload in the vision descriptor of image by concerning user U _M1Impact results from theme Z _kSample size;

N_{U, S, Z}^{w} (U_{m 2}, 0, Z_{k})

Expression user U _M2Concern that by all customer impact results from theme Z in the word of mark _kSample size;

N_{U, S, Z}^{v} (U_{m 2}, 0, Z_{k})

Expression user U _M2Upload in the vision descriptor of image and concern that by all customer impact results from theme Z _kSample size; Wherein, α _γIt is the super parameter of symmetry of the corresponding Di Li Cray prior distribution of control; Described social relationships intensity represents if the user is U _M2Much from theme Z _kMark word or image concerned user U _M1Impact, then concern user U _M1To user U _M2Social relationships on k theme are stronger.

7. method as claimed in claim 5 is characterized in that, the described following calculating of correlativity R (q, u, d) for search subscriber u picture d and query word q:

R (q, u, d) = \underset{k}{Σ} p (Z_{k} | u) p (Z_{k} | q) p (Z_{k} | d),

Wherein, p (Z _k| u) be the theme distribution of user u, expression user u produces the probability of k theme, p (Z _k| d) be the theme distribution of image, following calculating:

p (Z_{k} | d) = \underset{v_{i} &Element; d}{Π} p (v_{i} | Z_{k}),

Wherein, p (v _i| Z _k) expression vision descriptor v _iProbability in k theme.

8. method as claimed in claim 3 is characterized in that, the multi-modal probability production model of described theme sensitivity comprises the generative process that marks word and the generative process of vision descriptor, and the generative process that wherein marks word is as follows:

At first sampling obtains switching variable: s from Bei Nuli distributes ^w～Bernoulli (λ);

If s ^w=0, then concern the user set that from user u sampling obtains one and concerns the user:

From concerning the user

Theme distribute Theme of middle sampling is recorded as variable

Distribute from the mark word of theme

Middle sampling obtains marking word w _{U, i}

Carry out in the same way the generative process of vision descriptor, wherein vision descriptor V _{U, i}Distribute from the vision descriptor of theme Middle sampling produces.

9. method as claimed in claim 5 is characterized in that, and is described for having the user c picture d of one-dimensional society's relation and the correlativity R (q, c, d) of query word q to calculate by picture indices method or distance metric method with search subscriber u.