CN103714178A

CN103714178A - Automatic image marking method based on word correlation

Info

Publication number: CN103714178A
Application number: CN201410008553.1A
Authority: CN
Inventors: 安震
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2014-01-08
Filing date: 2014-01-08
Publication date: 2014-04-09
Anticipated expiration: 2034-01-08
Also published as: CN103714178B

Abstract

The invention discloses an automatic image marking method based on work correlation. A training set T comprises l images, n marking words are marked on each image of the training set T, the training set T is provided with a corresponding vision lemma, and the image to be marked is I. The method includes the steps that a semantic vector of each marking word w is calculated according to a formula, the marking word w is represented by the vector form w=<v1, v2,......, vm>, ci is an associated word in a context, and the associated words in the context total m; semantic similarity of the marking words is calculated according to a formula, and vector module calculated is achieved as is shown in the specification; p(A) is calculated according to the formula, wherein A is a marking word group in w1, w2,......wn, and n is the number of the marking word groups; the conditional probability p(I/wi) is calculated according to a formula; the marking word group A of the image I to be marked is calculated according to the formula A=arg maxAp(I/A) p(A).

Description

A kind of image automatic annotation method based on correlativity between word

Technical field

The present invention relates to image processing field, particularly a kind of image automatic annotation method based on correlativity between word.

Background technology

Along with the high speed development of multimedia and Internet technology, daily life, work more and more stronger to the dependence of the multimedia messagess such as image.Image retrieval based on semantic can not only accurately be expressed user's retrieval intention, is also convenient to user's use, so this retrieval mode not only becomes a kind of important form of image retrieval, and becomes the hot technology that researchist chases.

And automatic image annotation technology is an important and challenging job in Semantic Image Retrieval, the appearance of automatic image annotation technology is the semantic information comprising in automatic acquisition image vision content, it attempts between image bottom visual signature and high-level semantic, to build a bridge block, thereby in semantic level, semantic retrieval is made to support.Therefore, the automatic marking algorithm research based on image, semantic has become very active research branch and a gordian technique in field of image search, has good application prospect and researching value.

Automatic image annotation is exactly to allow computing machine automatically to the image without mark, add the semantic key words that can reflect picture material.It utilizes the image collection or other the obtainable information that have marked automatically to practise the relational model in semantic concept space and visual signature space, and with this model, marks the image of unknown semantics.By setting up a kind of mapping relations between the high-layer semantic information at image and low-level image feature, solve to a certain extent semantic gap problem.

The image automatic annotation method of associating media correlation model is current a kind of image labeling algorithm based on being most widely used in generation model image labeling method, has obtained scholar's broad research.The basic thought of this marking model is to utilize the method for probability statistics to set up the probability relativity in Image Visual Feature space and semantic concept space, the joint probability distribution existing between the two by statistical learning, find out that one group of semantic tagger word makes it and picture material between joint probability maximum, and the final mark using this group mark word as testing image.

But associating media correlation model belongs to a kind of of probability model, this class model has skewed popularity to the high mark word of occurrence frequency.Secondly in associating media correlation model automatic image annotation mark method, different candidates mark word and in mark process, are assumed to be it is separate, and the correlativity marking between word is not fully excavated.In fact with in piece image, between different labeled word, exist the multiple associations such as symbiosis, level or space.

The image that has comprised semantic objects such as " sun, sky, cloud, mountain, tree " such as a width, image vision content, can find out that " sun " and " sky " object exists certain spatial correlation, " sun " can not depart from " sky " this semantic object and independent existence; Equally, for " mountain " in picture material and " tree " two semantic objects, the existence of " tree " object is to take " mountain " semantic object as vision content background, the two has equally inseparable contact in image vision content, can not suppose utterly that these two mark words are separate marks.Therefore, associating media correlation model automatic image annotation algorithm thinks that in mark process different candidates mark way separate between word and have certain defect, may cause marking in annotation results semantic inconsistent phenomenon between word because ignoring between word correlativity.

Summary of the invention

In view of this, the invention provides a kind of image automatic annotation method based on correlativity between word, to overcome associating media correlation model automatic image annotation algorithm, in mark process, think that different candidates mark the defect that way separate between word exists, solve because ignoring between word correlativity and cause marking in annotation results semantic inconsistent problem between word.The technical scheme that the present invention proposes is:

An image automatic annotation method based on correlativity between word, training set T comprises l image, described l image construction image collection P=[p ₁p ₂p _l]; Each image labeling of described training set T has n mark word, and in training set T, all mark words form mark set of words W=[w ₁w ₂w _s]; Each image of training set T has corresponding vision lemma, and in training set T, all vision lemmas form the set B=[b of visual word unit ₁b ₂b _y], image to be marked is I, the method comprises:

A. according to formula

in calculation training set T, the semantic vector of each mark word w, is expressed as vector form w=< v by mark word w ₁, v ₂..., v _m>, wherein, c _ifor context relation word, total m context relation word, p (c _i) be context relation word c _ioverall distribution probability, p (c _i/ w) represent context relation word c _ithe ratio of the total degree that the co-occurrence number of times with mark word w in training set T and mark word w occur in training set T,

B. according to formula

calculate the semantic similarity between mark word, wherein || || for vectorial mould calculates;

C. according to formula calculate p (A), wherein A is mark phrase { w ₁, w ₂... w _n, n is the number of mark phrase;

D. according to formula

design conditions Probability p (I/w _i);

E. basis

calculate p (I/A);

F. by formula A=argmax _ap (I/A) p (A) calculates the mark phrase A of image I to be marked.

In such scheme, step D further comprises:

P(w _i) be mark word w _ithe number of times occurring in training set T gathers with training the ratio that total degree appears in all mark words of T,

P(w _i, b ₁..., b _n) computing method be:

P (w_{i}, b_{1}, b_{2}, . . ., b_{n}) = \underset{J &Element; T}{Σ} P (J) P (w_{i} | J) Π_{k = 1}^{n} P (b_{k} | J),

Wherein p (J) is illustrated in the probability of randomly drawing a width training image J in image collection P; p(w _i/ J) represent to occur vocabulary w in training image J _iposterior probability; And p (b _k/ J) represent to occur vision lemma b in training image J _kposterior probability; And

P (w_{i} | J) = (1 - α_{J}) \frac{# (w_{i}, J)}{| J |} + α_{J} \frac{# (w_{i}, T)}{| T |} - - - (1)

P (b_{k} | J) = (1 - β_{J}) \frac{# (b_{k}, J)}{| J |} + β_{J} \frac{# (b_{k}, T)}{| T |} - - - (2)

Wherein, α _jwith β _jfor smoothing parameter, it is an empirical value;

# (w _i, J) represent mark word w _iin training image J, whether occur, if so, # (w _i, J)=1, otherwise # (w _i, J)=0;

# (w _i, T) represent mark word w _iin training set T, whether occur, if so, # (w _i, T)=1, otherwise # (w _i, T)=0;

# (b _k, J) represent vision lemma b _kin training image J, whether occur, if so, # (b _k, J)=1, otherwise # (b _k, J)=0;

| J| represents to mark in training image J total number of word and vision lemma; | T| represents to mark in training set T total number of word and vision lemma.

In such scheme, described context relation word is the mark word in training set T.

In sum, the technical scheme that the present invention proposes converts the joint probability calculation process of the mark word in associating media correlation model and image the probability of mark entry part hypograph appearance and two-part the solving of prior probability of mark phrase to, greatly reduce high frequency candidate and mark word for the impact of probability statistics model, make non-high frequency candidate mark the larger effect of word performance, recall ratio and precision ratio that non-high frequency candidate marks word have been improved, the similar language model of semanteme is incorporated in the middle of associating media correlation model simultaneously, by the similar language model of semanteme, remove to estimate the prior probability of one group of mark word, so more likely produce one group of mark word that semantic dependency is stronger.Thereby improve the integral body mark effect of image.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the embodiment of the present invention.

Embodiment

Clearer for what the object, technical solutions and advantages of the present invention were expressed, below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail.

Technical scheme of the present invention is:

A. according to formula

B. according to formula calculate the semantic similarity between mark word, wherein || || for vectorial mould calculates;

C. according to formula

calculate p (A), wherein A is mark phrase { w ₁, w ₂... w _n, n is the number of mark phrase;

D. according to formula

design conditions Probability p (I/w _i);

E. basis calculate p (I/A);

F. by formula A=arg max _ap (I/A) p (A) calculates the mark phrase A of image I to be marked.

Image labeling problem can be defined as at present: a given training set T, this training set T comprises image collection P and mark set of words W, and every width image p _iall completed mark word mark, the mark word of all images forms mark set of words W, does the one group of mark word A how choosing from described mark set of words W wherein mark a width new images I?

Image labeling method of the present invention adopts probability model, and its target is just to locate mark phrase A, its conditional probability p (A/I) maximum, that is:

A＝arg?max _Ap(A/I) (3)

Wherein A is a mark phrase { w ₁, w ₂... w _n, one group of visual signature { b for image I ₁, b ₂, b _mrepresent, by image I is carried out pre-service (such as image cut apart, the operation such as feature extraction, characteristic value normalization) and image block region sort out computing and obtain.P (A/I) can be rewritten as following form:

P (A / I) = \frac{p (A, I)}{P (I)} - - - (4)

Because the prior probability of piece image is considered to obey equally distributed conventionally, so p (I) can be regarded as a constant, and

p(A,I)＝p(I/A)p(A) （5）

With formula (4), (5), formula (3) is simplified, is obtained:

A＝arg?max _Ap(I/A)p(A) (6)

By p (I/A) and two probability of p (A) are combined, solve maximal value and find best mark phrase A.P (I/A) can obtain from original image marking model, and p (A) can obtain from language model.By giving different weights to two probability, represent original image model and the mark influential effect ability of language model to final acquisition:

A = \arg ma x_{A} p {(I / A)}^{λ_{1}} P {(A)}^{λ_{2}} - - - (7)

It is carried out to following formal transformation:

A＝arg?max _A(λ ₁log?p(I/A)+λ ₂log?p(A)) (8)

As long as calculate p (A) and p (I/A), just can obtain mark phrase A.Wherein, λ ₁with λ ₂be to determine in the machine learning of training plan image set and model process of establishing, carrying out in the automatic marking process of testing image is two constants.

The training set T that comprises l image of take below describes technical solution of the present invention as example, and image to be marked is I.L the image construction image collection P=[p of training set T ₁p ₂p _l]; Each image labeling of training set T has n mark word, and in training set T, all mark words form mark set of words W=[w ₁w ₂w _s]; Each image of training set T has corresponding vision lemma, and in training set T, all vision lemmas form the set B=[b of visual word unit ₁b ₂b _y].

Fig. 1 is the process flow diagram of the present embodiment, as shown in Figure 1, comprises the following steps:

Step 101: image I to be marked is carried out to image pre-service and segmented areas classification computing.

In this step, image I to be marked is carried out to image pre-service (image is cut apart, feature extraction, characteristic value normalization etc.), then carry out image block region and sort out computing, utilize clustering algorithm to sort out each image block region, and combine presentation video vision content: I={i with visual word unit ₁i ₂i _f.The preparation method of vision lemma is prior art, no longer describes in detail herein.

Step 102: calculate p (A) by the similar language model of semanteme.

In order to introduce correlation information between mark word in the similarity between mark word, the present invention has adopted each mark word w of semantic vector model representation: context relation set of words C=[c ₁c ₂c _m], each element c _irepresent a context relation word, total m context relation word, can choose all mark words of the mark set of words W in training set T as context relation word, i.e. C=W.Each the context relation word vector representation associated with it for w of mark word, i.e. w=< v ₁, v ₂..., v _m>, wherein each semantic component v _icalculating be defined as context relation word c _iconditional probability and context relation word c with respect to mark word w _ithe ratio of probability:

v_{i} = \frac{p (c_{i} / w)}{p (c_{i})} - - - (9)

P (c wherein _i) expression context relation word c _ioverall distribution probability, for being uniformly distributed.Conditional probability p (c _i/ w) represent context relation word c _ithe ratio of the total degree that co-occurrence number of times during all image labelings of image collection P with mark word w in training set T occurs during at all image labelings of image collection P with mark word w:

p (c_{i} / w) = \frac{count (c_{i}, w)}{count (w)} - - - (10)

P(c _i/ w) represent the intensity distributions of vocabulary w and context relation word co-occurrence, then be exactly in order to prevent semantic vector w=< v divided by the whole probability of each context relation word ₁, v ₂..., v _m> is dominated by the high context relation word of the frequency of occurrences, because high-frequency conjunctive word often also has very large conditional probability.As shown in table 1, wherein " sky ", " sun ", " clouds ", " town " represent a group context conjunctive word, and " tree ", " building ", " river " are one group of mark word, and the semantic vector of mark word represents as shown in table 1.

Table 1

?	sky	sun	clouds	town
					tree	2.56	0.91	0.74	0.63
building	5.01	0.57	2.41	21.19
					river	2.57	2.57	1.12	5.72

Then to calculate the semantic similarity between mark word.The calculating of similarity is as shown in Equation 11:

sim (w_{i}, w_{j}) = \frac{w_{i} \cdot w_{j}}{| | w_{i} | | \cdot | | w_{j} | |} - - - (11)

Wherein || || for vectorial mould calculates.

W _iw _jcalculating as shown in Equation 12:

w_{i} \cdot w_{j} = Σ_{k = 1}^{m} v_{wi, k} v_{w_{j}, k} = Σ_{k = 1}^{m} \frac{p (c_{k} / w_{i})}{p (c_{k})} \cdot \frac{p (c_{k} / w_{j})}{p (c_{k})} - - - (12)

C wherein _krepresent context relation word.Between mark word, semantic similarity is as shown in table 2.Similarity span is 0 to 1, and the similarity between two mark words of the higher expression of numerical value is higher, and the probability that they appear in same piece image is just larger.

Table 2

?	tree	road	sky	wood
					tree
	1	0.1723	0.4311	0.2140
					road	0.1723	1	0.1742	0.0021
sky	0.4311	0.1742	1	0.0383
					wood	0.2140	0.0021	0.0383	1

Suppose in same mark, mark vocabulary is semantic relevant to context relation word, so one group of mark word A={w ₁, w ₂..., w _nprobability p (A) can obtain with similarity that other mark between word by calculating each mark word:

p (A) &Proportional; \frac{1}{n - 1} \underset{w_{i} &Element; A}{Σ} \underset{w_{j} &Element; A, j &NotEqual; i}{Σ} sim (w_{i}, w_{j}) - - - (13)

Formula 10,11,12 is updated in formula 13, can calculates the Probability p (A) of mark phrase:

p (A) &Proportional; \frac{1}{n - 1} \underset{w_{i} &Element; A}{Σ} \underset{w_{j} &Element; A, j &NotEqual; i}{Σ} \frac{Σ_{k = 1}^{m} \frac{count (c_{k}, w_{i})}{count (w_{i}) \cdot p (c_{k})} \cdot \frac{count (c_{k}, w_{j})}{count (w_{j}) \cdot p (c_{k})}}{| | w_{i} | | \cdot | | w_{j} | |} - - - (14)

Step 103: calculate p (I/A) by associating media correlation model.

In this step, first according to formula design conditions Probability p (I/w _i).Wherein,

P(w _i) computing method be:

With mark word w _ithere is the ratio value representation vocabulary w of total degree in the number of times occurring and all mark words in training set T _iprior probability p (w _i):

p (w_{i}) = \frac{| w_{i} |}{Σ_{w_{k} &Element; T} | w_{k} |} - - - (15)

P(w _i, b ₁..., b _n) computing method be:

P (w_{i}, b_{1}, b_{2}, . . ., b_{n}) = \underset{J &Element; T}{Σ} P (J) P (w_{i} | J) Π_{k = 1}^{n} P (b_{k} | J) - - - (16)

P (J) is illustrated in the probability of randomly drawing a width training image J in image collection P, is generally assumed to be and is uniformly distributed; p(w _i/ J) represent to occur vocabulary w in training image J _iposterior probability; And p (b _k/ J) represent to occur vision lemma b in training image J _kposterior probability.The probable value of each is estimated as follows:

P (w_{i} | J) = (1 - α_{J}) \frac{# (w_{i}, J)}{| J |} + α_{J} \frac{# (w_{i}, T)}{| T |} - - - (17)

P (b_{k} | J) = (1 - β_{J}) \frac{# (b_{k}, J)}{| J |} + β_{J} \frac{# (b_{k}, T)}{| T |} - - - (18)

Wherein, α _jwith β _jfor smoothing parameter, it is an empirical value; # (w _i, J) represent mark word w _iin training image J, whether occur, if so, # (w _i, J)=1, otherwise # (w _i, J)=0; # (w _i, T) represent mark word w _iin training set T, whether occur, if so, # (w _i, T)=1, otherwise # (w _i, T)=0; # (b _k, J) represent vision lemma b _kin training image J, whether occur, if so, # (b _k, J)=1, otherwise # (b _k, J)=0; | J| represents to mark in training image J total number of word and vision lemma; | T| represents to mark in training set T total number of word and vision lemma.

P (I/A) can approximate evaluation go out for

Step 104: the phrase to be marked that calculates image I to be marked.

Below p (A) and p (I/A) have been solved respectively, according to A=arg max _a(λ ₁log p (I/A)+λ ₂log p (A)) can be piece image I and calculate mark phrase A

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of making within the spirit and principles in the present invention, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. the image automatic annotation method based on correlativity between word, is characterized in that, training set T comprises l image, described l image construction image collection P=[p ₁p ₂p _l]; Each image labeling of described training set T has n mark word, and in training set T, all mark words form mark set of words W=[w ₁w ₂w _s]; Each image of training set T has corresponding vision lemma, and in training set T, all vision lemmas form the set B=[b of visual word unit ₁b ₂b _y], image to be marked is I, the method comprises:

A. according to formula