CN101685464B

CN101685464B - Method for automatically labeling images based on community potential subject excavation

Info

Publication number: CN101685464B
Application number: CN2009100999166A
Authority: CN
Inventors: 吴飞; 邵健; 庄越挺; 陈烨; 朱科
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2009-06-18
Filing date: 2009-06-18
Publication date: 2011-08-24
Anticipated expiration: 2029-06-18
Also published as: CN101685464A

Abstract

The invention discloses a method for automatically labeling images based on community potential subject excavation, which comprises the following steps: 1) adopting a hidden Dirichlet allocation model to excavate implicit subjects in a single community; 2) after obtaining the probability distribution of image labels and the implicit subjects by analyzing community potential subjects, deleting the image labels of which the probability of the community image labels and the implicit subjects is smaller than a set value k to perform 'de-noising' filtration on the community image labels; 3) generating image candidate labeling labels of the images to be labeled by propagating similar image labels; 4) optimizing the image candidate labeling labels according to the relativity between the image candidate labeling labels and the implicit subjects of the images; and 5) obtaining the final labeling result of the images through the information fusion of a plurality of communities. The method makes full use of the information on different communities in which the images are positioned and the information on the community potential subjects in a social shared network to label the images, and compared with the conventional labeling method, the generated labeling result is more accurate.

Description

The method of the automatic image annotation that excavates based on the potential theme of community

Technical field

The present invention relates to the automatic mark field of image, relate in particular to a kind of method that marks automatically based on the image of social sharing network.

Background technology

We is along with the fast development of network and multimedia technology, and the amount of images on the internet is explosive increase.According to statistics, 2008, Google index Web webpage scale reached 1,000,000,000,000, and wherein view data is above tens.In recent years, shared network has caused Internet user's special concern, marks on the Flickr of website the masses that provide digital picture to share, and the image of its index surpasses 3,000,000,000, and increases fast with every month speed of millions of.

The image tag information that the Internet user manually adds for the Flickr image is that the high-efficiency management and the retrieval of image brought very big facility.But, analyse in depth discovery by result to the manual mark of Flickr image, the label of 64% image all is less than or equals 3.How the image of a large amount of no labels or label deficiency is added automatically or improves it and have the hot issue that label is a current research.

Different with normal image, the internet is shared image and is had following several characteristics:

Shared network picture quality is uneven, by different user by different cameral at different time from different angles or use the difference skill of taking pictures to take and obtain;

The shared network picture material is abundant, the label entry of Flickr image has surpassed 100,000,000 3 thousand ten thousand, has contained more than 6,000 ten thousand notions, has included various content, incident and object or the like such as landscape, building, personal portrait, active clip;

Shared network image, semantic complexity, an image often comprises a plurality of different subject informations simultaneously, may both comprise subject informations such as " Sky ", " Clouds " such as an image, has also comprised subject informations such as " Water ", " River " simultaneously.

Because the shared network image has These characteristics, therefore be difficult to use traditional algorithm that it is effectively marked.The shared image of analysing in depth on the Flickr can be found a notable feature: when the user according to time, place or incident with image uploading behind album, can further it be recommended in the corresponding community and go according to image subject.Community among the Flickr is meant the image collection that comprises a certain particular topic, and when the user uploads the image on community the time that does not meet the community theme, the keeper can delete these unrelated images, and this has just guaranteed the consistance on the community image subject.Therefore, can utilize the subject information of image place community that image is marked.Simultaneously, can further be subdivided into the fact of a plurality of sub-topicses again at a certain community theme, can imply theme to community and excavate, the combining image visual similarity is finally obtained meticulousr annotation results then.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, a kind of method of the automatic image annotation that excavates based on the potential theme of community is provided.

The method of the automatic image annotation that excavates based on the potential theme of community comprises the steps:

1) adopt latent Di Li Cray apportion model that the implicit theme in the single community is excavated;

2) by after potential subject analysis obtains the probability distribution of image tag and implicit theme to community, deletion community image tag and implicit theme probability come the community image tag is carried out " denoising " filtration less than the image tag of setting value k;

3) propagate the image candidate who produces image to be marked by the similar image label and mark label;

4) marking between the implicit theme of label and image correlativity according to the image candidate marks label to the image candidate and is optimized;

5) obtain the final annotation results of image by many community information fusion.

Described image candidate by similar image label propagation generation image to be marked marks the step of label: for image I to be marked in the community _u, image I to be marked _uAnd the probability between the image tag w calculates from following formula:

P (w | I_{u}) = \underset{J &Element; T}{Σ} P (w, I_{u} | J) = \underset{J &Element; T}{Σ} P (w | J) P (I_{u} | J),

Wherein P (w|J) represents image tag w occurrence number among the training image J, P (I _u| J) represent image I to be marked _uAnd the visual similarity between the training image J, choose and image I to be marked _uThe highest pairing image tag w of 10 width of cloth training image J of visual similarity is as image I to be marked _uThe candidate mark label, i.e. P (w|I _u) be worth 10 maximum image tag w as image I to be marked _uThe image candidate mark label.

Correlativity marks the step that label is optimized to the image candidate between the described implicit theme that marks label and image according to the image candidate:

1) marks label w by calculating in all implicit themes that two image candidates mark probability product between the label and obtaining the image candidate _kAnd w _lBetween implicit topic similarity, computing formula is:

P (w_{k} | w_{l}) = Σ_{j = 1}^{T} P (w_{k} | z = j) P (z = j | w_{l}) = Σ_{j = 1}^{T} φ_{k}^{j} φ_{l}^{j},

The probability distribution of ф presentation video label and implicit theme wherein;

2) marking label and other image candidate by the computed image candidate marks implicit topic relativity sum between the label and obtains the image candidate and mark label w _iWith image I to be marked _uThe correlativity of implicit theme, computing formula is:

R (w_{i}, I_{u}) = e^{\underset{j &NotEqual; i}{Σ} P (w_{j} | w_{i})},

P (w wherein _j| w _i) the presentation video candidate marks label w _jAnd w _iBetween implicit topic similarity;

3) recomputate the image candidate and mark label w _iWith image I to be marked _uProbability, computing formula is: P ' (w _i| I _u)=P (w _i| I _u) * R (w _i, I _u), P (w|I wherein _u) represent image I to be marked _uWith image tag w _iBetween probability, R (w _i, I _u) the presentation video candidate marks label w _iWith image I to be marked _uThe correlativity of implicit theme.

Described step of image being carried out final mark by many community information fusion:

1) by being chosen at the theme that the most frequent image tag is represented community appears in the community from the title of each community, just in WordNet " entity " semantic tree, find the node of representing this community by this image tag then, constitute the HD between each community;

2) by the HD between each community, each community is carried out final mark by merging to image from top to bottom successively, for obtaining a new father node by the markup information of each child node community is averaged between each community that contains common ancestor's node, delete child node, reach the purpose of fusion;

3) mark the final annotation results that preceding 5 values of label obtain image to be marked by choosing the image candidate.

The present invention has made full use of the information of the different community at image place, and the potential subject information that utilizes the community place comes the mark label is carried out " denoising " and optimization, therefore more accurate than traditional result that mask method marked, markup information is also more extensive.

Description of drawings

Fig. 1 is based on the method flow diagram of the automatic image annotation of the potential theme excavation of community.

Fig. 2 is automatic image annotation result of the present invention.

Fig. 3 is latent Di Li Cray apportion model.

Embodiment

The latent Di Li Cray apportion model of described employing is as follows to the step that the implicit theme in the single community excavates:

1) latent Di Li Cray apportion model is commonly used to text is carried out subject analysis; Concern between image (document) d, implicit theme z, the image tag w in the latent Di Li Cray apportion model (as Fig. 3) mainly by implicit variable θ and φ decision, wherein θ presentation video d theme distributes, the β z label that is the theme distributes, and α, β are the prior probability of implicit variable θ, φ.Prior probability α, β obey the Di Li Cray and distribute, and T is a community theme sum, and D is the community total number of images, N _dBe each image tagged total number of labels;

2) because direct computed image is concentrated the probability more complicated between implicit theme z and image d and the image tag w, adopt Gibbs to sample usually and simplify the LDA Model Calculation; For i image tag token, the image tag index of this token is w _i, token corresponding image index is d _i, Gibbs sampling alternately is considered each image tag token, by calculating the number of times that other token is assigned to each theme, estimates which theme current token is assigned to.In this process, theme is recycled sampling, and the theme conditional probability is:

P (z_{i} = j | z_{- 1}, w_{i}, d_{i}, \cdot) &Proportional; \frac{C_{w_{i} j}^{WT} + β}{Σ_{w = 1}^{W} C_{w_{i} j}^{WT} + Wβ} \cdot \frac{C_{d_{i} j}^{DT} + α}{Σ_{t = 1}^{T} C_{d_{i} j}^{DT} + Tα} - - - (1)

Wherein, z _i=j represents that theme j is given token i, z by assignment _-iExpression except token i the theme of other image tag token distribute, " " represents other all Given informations, such as all other image tag index w _-i, image index d _-iAnd prior probability α, β.C ^WTWith C ^DTBe respectively that size is W*T, D*T dimension matrix.

Presentation video label w is assigned to the number of times of theme j,

The image tag of presentation video d the inside is assigned to the number of times (not comprising current label token i) of theme j;

3) in each Gibbs sampling, all images label all is assigned to some themes in the image set.When Gibbs sampling by iteration behind enough number of times, the theme probability just approaches priori Di Li Cray and distributes.After the Gibbs sampling finished, the label-theme distribution ф and the theme-image θ that have just obtained finding the solution distributed, and the theme conditional probability is:

φ_{i}^{j} = \frac{C_{ij}^{WT} + β}{Σ_{k = 1}^{W} C_{kj}^{WT} + Wβ},

θ_{j}^{d} = \frac{C_{dj}^{DT} + α}{Σ_{k = 1}^{T} C_{dk}^{DT} + Tα} - - - (2)

Wherein

Presentation video label w is assigned to the number of times of theme j,

The image tag of presentation video d the inside is assigned to the number of times of theme j, the number of W presentation video label, and T represents the number of theme, α, β represent prior probability.

P (w | I_{u}) = \underset{J &Element; T}{Σ} P (w, I_{u} | J) = \underset{J &Element; T}{Σ} P (w | J) P (I_{u} | J),

P (w_{k} | w_{l}) = Σ_{j = 1}^{T} P (w_{k} | z = j) P (z = j | w_{l}) = Σ_{j = 1}^{T} φ_{k}^{j} φ_{l}^{j},

R (w_{i}, I_{u}) = e^{\underset{j &NotEqual; i}{Σ} P (w_{j} | w_{i})},

The present invention has made full use of the information of the different community in image place in the social sharing network, and utilize the potential subject information in community place to come the mark label is carried out " denoising " and optimization, therefore more accurate than the annotation results of traditional mask method generation, markup information is also more extensive.

As shown in Figure 1, the method for the automatic image annotation that excavates based on the potential theme of community specifies as follows:

1), finds N the different community at this image place for an image to be marked;

2) distributing mould to imply theme to the latent Di Li Cray of each community utilization excavates;

3) according to the correlativity of community label and the implicit theme of community the community label being carried out " denoising " filters;

4) propagate the image candidate who produces image to be marked by the similar image label and mark label;

5) marking between the implicit theme of label and image correlativity according to the image candidate marks label to the image candidate and is optimized;

6) come image is marked by many community information fusion;

7) obtain the final image annotation results of image to be marked.

Embodiment 1

Fig. 2 has provided an object lesson of the automatic image annotation that excavates based on the potential theme of community.

1) chooses an image to be marked, find 3 different community at this image place: community 1 " Water, Oceans, Lakes, Rivers, Creeks ", community 2 " Sky ﹠amp; Clouds ", community 3 " Beautiful Scenery ";

2) utilizing latent Di Li Cray to distribute mould to imply theme respectively to 3 community excavates;

3) according to the correlativity of community label and the implicit theme of community 3 community labels being carried out " denoising " filters;

4) propagate the image candidate who produces image to be marked by the similar image label and mark label " river sanwater antonio bexar county courthouse blue clouds sea ";

5) marking between the implicit theme of label and image correlativity according to the candidate marks label to the candidate and is optimized and obtains the image candidate and mark label " river san water bexar blue courthouse antonio countyclouds sea ";

6) obtain the image candidate by 2 community information fusion and mark label " clouds river san skywater bexar blue courthouse antonio count "; Obtain the image candidate by 3 community information fusion and mark label " sky blue clouds river water san landscape bexar courthouse mountains "

7) mark the final image annotation results " sky blue clouds river water " that preceding 5 values of label obtain image to be marked by choosing the image candidate.

Can see from top example, different with traditional image labeling method is, the present invention has made full use of the information of the different community at image place in the social sharing network, and the potential subject information that utilizes the community place comes the mark label is carried out " denoising " and optimization, therefore more accurate than traditional annotation results that mask method produced, markup information is also more extensive.

Claims

1. the method based on the automatic image annotation of the potential theme excavation of community is characterized in that comprising the steps:

5) obtain the final annotation results of image by many community information fusion;

Described image candidate by similar image label propagation generation image to be marked marks the step of label: for image I to be marked in the community _u, image I to be marked _uAnd the probability between the image tag w calculates from following formula: Wherein P (w|J) represents image tag w occurrence number among the training image J, P (I _u| J) represent image I to be marked _uAnd the visual similarity between the training image J, T is a community theme sum, chooses and image I to be marked _uThe highest pairing image tag w of 10 width of cloth training image J of visual similarity marks label as the candidate of image I u to be marked, i.e. P (w|I _u) be worth 10 maximum image tag w as image I to be marked _uThe image candidate mark label;

The probability distribution of ф presentation video label and implicit theme wherein, T is a community theme sum;

3) recomputate the image candidate and mark label w _iWith image I to be marked _uProbability, computing formula is: P ' (w _i| I _u)=P (w _i| I _u) * R (w _i, I _u), P (w wherein _i| I _u) represent image I to be marked _uWith image tag w _iBetween probability, R (w _i, I _u) the presentation video candidate marks label w _iWith image I to be marked _uThe correlativity of implicit theme;