CN102542067A

CN102542067A - Automatic image semantic annotation method based on scale learning and correlated label dissemination

Info

Publication number: CN102542067A
Application number: CN2012100023165A
Authority: CN
Inventors: 王斌; 肖建力; 刘允才
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2012-01-06
Filing date: 2012-01-06
Publication date: 2012-07-04

Abstract

The invention relates to an automatic image semantic annotation method based on scale learning and correlated label dissemination, which comprises the following steps: firstly, the global and partial feature descriptor of each image is extracted after the image library is read; the feature descriptor is sent to a model based on a structured support vector machine for learning the distance scale between the images, actually the Mahalanobis distance; a model about the internal relation between key words is built; the learned Mahalanobis distance is embedded in a built label dissemination model so as to obtain the confidence degree score of each key word belonging to the image to be labeled; and a threshold value is set for the confidence degree score of each key word, and the key words of which the scores are higher than the threshold value are distributed to the images to be labeled, thereby completing labeling. The learning algorithm model based on the structured support vector machine can effectively solve the measuring problem of similarity between the images, the internal relation between the key words is fully excavated through the embedded-type correlated label dissemination model, and the accuracy of the image annotation and image retrieval is effectively improved.

Description

Automated graphics semanteme marking method based on yardstick study and the propagation of related label

Technical field

The present invention relates to image retrieval and automatic image annotation technical field, be specifically related to a kind of automated graphics semanteme marking method of propagating based on yardstick study and related label.

Background technology

In the last few years, along with the develop rapidly of Internet technology, multimedia image and memory device, the amount of images that the user touches was explosive increase.How fast and effeciently from mass data, searching the needed information of user is an important research project, and this also is the research contents of field of image search.

Initial image retrieval is based on text based retrieval mode (text-based image retrieval; Be called for short TBIR); It is at first described with a series of keyword the content of every width of cloth image with artificial mode; According to these keywords image is set up index then, image retrieval problem has just changed into the matching problem of text message like this.The mode of this artificial mark both wasted time and energy, and can not guarantee to give the objectivity of the keyword of image.The 1980s, content-based image labeling (content-based image retrieval is called for short CBIR) has appearred.This technology is at first extracted every width of cloth level image characteristic, according to these characteristics the image in the image data base is set up index then.The user only need provide a secondary example image, and the image with similarity characteristic just can be searched by system.The existence of semantic wide gap has greatly limited the application of CBIR in image retrieval.Automatic image annotation is in order to set up the contact between picture material and the keyword, automatically to give image to be marked with keyword assignment.Because the limitation of artificial mask method, increasing research steering utilizes the automatic image annotation technology to accomplish the task of image retrieval.

The purpose of image labeling is to let system automatically distribute the keyword that can describe its semantic content to image.The theory of existing image labeling technology general using machine learning goes out the relational model between feature space and the keyword space from marking the image focusing study, and utilizes this model to distribute the keyword of waiting to mark image as guidance.At present, the image labeling technology generally is divided three classes: based on the method for classification; Method based on probability model; Method based on internet hunt.Method based on classification is to regard image labeling as a kind of classification problem; A sorter all learnt out in each keyword from the image library that has marked; And utilize these different sorters to treat the mark image respectively and classify, the output result is that positive example shows that then the corresponding keyword of this sorter can be used for marking this width of cloth image.This method is one by one handled single keyword, does not have the inner link between the taking into account critical speech.Based on the method for probability model, be devoted to set up the characteristic of image and the joint ensemble between the keyword, only the characteristic of waiting to mark image need be provided, just can draw the probability that correspondent keyword occurs.The limitation of this method is that model extension property is relatively poor, the new keyword of coming in that adds, and all models need training again.Based on the method for internet hunt, accomplish image labeling through the extensive resource of excavating on the internet, improve the result of image labeling around literal such as utilizing around the internet epigraph.This method depends on the surrounding environment and the contextual information of internet epigraph, and the accuracy of annotation results is to be determined by the whether true of environmental information, and this has limited image labeling result's stability to a certain extent.

Summary of the invention

The objective of the invention is to deficiency to existing method; A kind of automated graphics semanteme marking method of propagating based on yardstick study and related label is proposed; Overcome the distance of only utilizing Euclidean distance etc. to define in advance in the prior art and weighed the defective of the Semantic Similarity between image; While has fully been excavated the correlativity between the keyword, makes that the result of mark is more accurate and effective.

For realizing above-mentioned purpose; The present invention at first extracts its content local and global characteristics fully describes image to every width of cloth image; And then with a kind of based on the distance scale-mahalanobis distance between the model learning image of structuring SVMs (Mahalanobis distance), then to the direct modeling of the inner link between the keyword of image, carry out label and propagate; And then the mahalanobis distance that will succeed in school is embedded into to be set up in the good label propagation model; Draw each keyword and belong to the probability of waiting to mark image, last setting threshold obtains the keyword of image.

The automated graphics semanteme marking method that the present invention is based on yardstick study and label propagation specifically comprises following step:

1. the color moment characteristic of image and Wavelet Texture are extracted the global characteristics descriptor of yardstick invariant features (scale-invariant feature transform is called for short SIFT) as image as its local feature description's in the extraction image library.

2. local feature and global characteristics are distinguished normalization, and be fused into a long proper vector as the characteristics of image descriptor.

3. proper vector and the label information with image library is integrated into the yardstick learning algorithm based on the structuring SVMs, draws the Ma Shi yardstick tolerance of similarity between image.

4. the relation between the image keyword is directly carried out modeling, fully excavate the correlativity between notion.

5. the scaling function of learning out is embedded in the keyword relationship modeling, finally draws the degree of confidence score of each keyword of waiting to mark image.

6. setting threshold is given image to be marked with the degree of confidence score greater than the keyword assignment of threshold value.

Particularly, according to an aspect of the present invention, a kind of automated graphics semanteme marking method of propagating based on yardstick study and related label is provided, comprises following concrete steps:

Step 1: the reading images storehouse, the feature description of extracting every width of cloth image is sub;

Step 2: draw similarity measurement between image through yardstick study;

Step 3: the inner link between the keyword of image is carried out relationship modeling;

Step 4: the similarity measurement of learning in the said step 2 is embedded in the process of keyword modeling;

Step 5: draw the degree of confidence score of the keyword of waiting to mark image, setting threshold is given image to be marked with the degree of confidence score greater than the keyword assignment of threshold value, accomplishes label and propagates.

Preferably, together, wherein, said global characteristics comprises that color moment and 62 dimensions are based on Gabor wave filter textural characteristics to feature description in the said step 1 with global characteristics and local Feature Fusion; Said local feature comprises the SIFT characteristic; With these characteristic normalization, be in turn connected into feature description of a long vector more respectively as image.

Preferably, the similarity measurement in the said step 2 realizes that through the yardstick study based on the structuring SVMs this implementation procedure may further comprise the steps:

Step 201: to the every width of cloth image x in the image library _i, according to its key word information generation ordering r _i, and design of graphics picture-sequence is right

As training set, utilize the modeling of structuring SVMs;

Step 202: the restrictive condition that provides optimization problem according to following principle: for x _i, the discriminant score of its correct sample ordering? M, ⌒ (x _i, r _i)/with the score of other any incorrect orderings

Satisfy inequality

Wherein

Be loss function, ⌒ is the mapping that the input space arrives output region,

Be about x in the image library _iOther any orderings; Under above-mentioned restrictive condition, find the solution feasible The M and the σ that set up, wherein, σ is a slack variable, μ is an adjustable parameter, is used for controlling lax degree;

Step 203: the Optimization Model of utilizing the said step 202 of cutting planes method iterative to propose; This method circulation makes up working set

wherein;

is the set of all restrictive condition; initial value is an empty set; Iteration begins: under current circulation; Algorithm is found the solution M and the σ under the work at present collection

, changes step 204 then over to;

Step 204: Find the biggest upheaval restrictions

and add it into the working set

Step 205: setting threshold; If the wavy amplitude of

then stops algorithm less than the threshold value that configures; Otherwise, return said step 203, the circulation of beginning next round.

Preferably, in the beginning of yardstick study, taken into full account the key word information of image set, for every width of cloth image generates the relevant ordering of image, and design of graphics picture-ordering is to gathering as training sample.

Preferably, keyword modeling in the said step 3 and label communication process are to accomplish through the mahalanobis distance yardstick that obtains in the said step 2 is embedded, and detailed process is following:

Step 301: the label communication process of taking into account critical speech mutual relationship is modeled as:

Wherein,

is that k keyword belongs to the degree of confidence score of waiting to mark image; δ (E) is an indicator function; And if only if, and incident E is that its value of true time is 1, otherwise is 0;

Step 302: for a keyword set Q; Provide its 0-1 vector representation form:

and if only if keyword i in set Q the time

value be 1, otherwise be 0;

Step 303: the δ indicator function is generalized to a recessed kernel function H; Obtaining a series of submodule function

is described below:

here,

is a kernel function;

Step 304: wait to mark image x for a pair _u, the degree of confidence score of keyword draws through following greedy algorithm:

Here,

It is a keyword set;

Step 305: obtain an x _uThe degree of confidence score vector of keyword

Step 306: threshold value θ is set ₀, when The time, k keyword passed to x _u

Preferably; After yardstick study finishes, only can preceding

width of cloth image of resulting image ordering be joined in keyword modeling and the label communication process.

Compare with existing method, superiority of the present invention is embodied in:

(1) not only extracts local feature but also extracted global characteristics, more fully portrayed the vision content of image, helped improving the accuracy of mark.

(2) similarity measurement between image is learnt to come out from image library, therefore is that sample is relevant, has overcome classic method and has used the distance (like Euclidean distance) that defines in advance to weigh the limitation of similarity between image.

(3) in the process of similarity yardstick tolerance study, added the key word information of image library, make that the yardstick of learning out can be better from the similarity between semantic level dimensioned plan picture.

(4) directly the inner link between the image keyword is carried out modeling, and be embedded into the scaling function of learning in this model, these two problems of the excavation that between study that has solved similarity measurement under the same framework simultaneously and keyword, concerns.

Description of drawings

Fig. 1 is the process flow diagram based on the automated graphics semantic tagger of yardstick study and the propagation of related label.

Fig. 2 by embodiment the example image of employing image library.

Fig. 3 by the present invention embodiment the annotation results on the employing image library.

Fig. 4 is the present invention and the mark comparing result of classical model on the part key speech.

Fig. 5 by the present invention embodiment the result for retrieval on the employing image library.

Embodiment

Below in conjunction with specific embodiment technical scheme of the present invention is done more detailed elaboration, the process flow diagram of operation is as shown in Figure 1.

Present embodiment is an example with the image labeling on the Core15k database.The example image of this database is as shown in Figure 2, comprises 374 keywords altogether, and every width of cloth image has been assigned with 1-5 keyword.

Below in conjunction with accompanying drawing embodiment of the present invention is done explanation more specifically, details are as follows:

Step 1, reading images storehouse, the feature description of extracting every width of cloth image is sub.

Image in the image library is extracted following three kinds of characteristics: (1) color moment characteristic; (2) 62 dimensions are based on Gabor wave filter textural characteristics; (3) SIFT characteristic.Preceding two kinds of characteristics are used for the global characteristics of picture engraving, a kind of local feature that comes picture engraving in back.With the normalization respectively of three kinds of characteristics, be fused into a long proper vector.

Step 2, based on the study of the yardstick of structuring SVMs.

Regard yardstick study the problem of category information retrieval as, come modeling and find the solution with the structuring SVMs.The image ordering that this yardstick of learning out based on the yardstick learning algorithm of structuring SVMs obtains is optimum.With X _InputThe presentation video storehouse, Y _OutputThe set of the image sequence of all images in the presentation video storehouse.For any two width of cloth image x in the image library _iAnd x _j,

Be mahalanobis distance (similarity size just) between the two, wherein M is a scaling function.For any sub-picture, r is its optimum image ordering,

Be Y _OutputIn other orderings arbitrarily, the order

Be the loss function between the different images ordering.

Yardstick learning algorithm step based on the structuring SVMs is specific as follows:

Step 201: to the every width of cloth image x in the image library _i,, generate its optimum image ordering r according to its key word information _i, and design of graphics picture-sequence is right

As training set, N is the number of total image.Utilize the structuring SVMs to the modeling of this problem;

Satisfy inequality

Wherein

Be loss function, ⌒ is the mapping that the input space arrives output region, Be about x in the image library _iOther any orderings.Under above-mentioned restrictive condition, find the solution M and σ that feasible

sets up.Here, σ is a slack variable, and μ is an adjustable parameter, is used for controlling lax degree;

(set of all restrictive conditions;

initial value is an empty set); Iteration begins: under current circulation; Find the solution M and σ under the work at present collection

based on the structuring algorithm of support vector machine, change step 204 then over to;

Step 204: Find the biggest upheaval restrictions and add it into the working set

Step 205: setting threshold; If the wavy amplitude of

then stops algorithm less than the threshold value that configures; Otherwise, return said step 203.

The scaling function of learning out through above-mentioned algorithm makes based on the image ordering of distance optimum; For pair image to be marked; Only get preceding

sub-picture of image sequence and participate in the following processes modeling; This has removed the redundant information in the image library to a certain extent, has saved the computing cost.

Step 3, the inner link between the keyword is carried out relationship modeling.

The process of modeling is followed such principle: one by one pass to the degree of confidence score of image to the element in the keyword set, must not be higher than the degree of confidence score that passes to certain sub-set of set image.Under the guidance of this principle, concrete modeling process is following:

Under the tolerance of scaling function M, the process model building that keyword is transmitted one by one is:

Wherein,

Be the degree of confidence score of this process,

L is the key word number, Q _iBe image x _iKeyword set; δ (E) is an indicator function, and value was 1 when and if only if incident E was genuine.The process that a certain subclass Q of keyword set is passed to image can be modeled as:

Wherein, q ^u(Q) be the degree of confidence score of this process.Particularly,

Step 301: according to modeling principle, the label communication process of mutual relationship is modeled as between the taking into account critical speech:

The concrete derivation algorithm of this model is following:

Step 302: for a keyword set Q; Provide its 0-1 vector representation form: and if only if keyword i in set Q the time

value be 1, otherwise be 0;

is described below:

here,

is a kernel function;

Here,

is a set of keywords;

Step 305: obtain one about x _uThe degree of confidence score vector of keyword

Step 306: threshold value θ is set ₀, when

The time, k keyword passed to x _u

Through top five steps, obtained the highest keyword set of degree of confidence score of test pattern.Fig. 3 has provided the result of the present invention's automatic image annotation on the embodiment image library.Can find out that from this figure method of the present invention has been accomplished the task of image labeling effectively.Though some keyword that the present invention provides in an embodiment (providing with oblique font) is not included among the result of artificial mark, can find out that these keywords are meaningful and appropriate for the description of image, semantic content.Why the present invention can excavate abundanter keyword set, is because considered the mutual relationship between the keyword, and to direct modeling.

In order to verify validity of the present invention, on embodiment, calculated the size of accuracy rate (Precision) and precision ratio (Recall), and contrasted with several kinds of classical image labeling algorithms.These algorithms comprise Machine Translation Model (machine translation model, MT), stride the medium correlation model (cross-media relevance model, CMR), and mutually the symbiosis model (co-occurrence model, COM).Fig. 4 is the present invention and the comparing result of its excess-three kind model on the part key speech.Can find out that from this figure mark accuracy rate and the precision ratio of technology of the present invention on most keywords all is better than other three kinds of models.In addition, we have done quantitative test to the mark ability of different models, table 1 be the present invention on Embodiment C orel5K with the annotation results comparison diagram of above-mentioned different models.

Table 1

Wherein, N+ is the number with keyword of non-zero precision ratio.With respect to several classical models, the present invention has raising in various degree in accuracy rate and precision ratio.Fig. 5 is the instance that the present invention is applied in image retrieval.From top to bottom, term is: buildings, bear, flower, railway.For each term, return the higher image of the first five secondary score.

Claims

1. an automated graphics semanteme marking method of propagating based on yardstick study and related label is characterized in that, comprises following concrete steps:

Step 2: draw similarity measurement between image through yardstick study;

2. the automated graphics semanteme marking method of propagating based on yardstick study and related label according to claim 1; It is characterized in that; Feature description in the said step 1 with global characteristics and local Feature Fusion together; Wherein, said global characteristics comprises that color moment and 62 dimensions are based on Gabor wave filter textural characteristics; Said local feature comprises the SIFT characteristic; With these characteristic normalization, be in turn connected into feature description of a long vector more respectively as image.

3. the automated graphics semanteme marking method of propagating based on yardstick study and related label according to claim 1 and 2; It is characterized in that; Similarity measurement in the said step 2 realizes that through the yardstick study based on the structuring SVMs this implementation procedure may further comprise the steps:

As training set, utilize the modeling of structuring SVMs;

Step 202: the restrictive condition that provides optimization problem according to following principle: for x _i, the discriminant score of its correct sample ordering<m, Ψ (x _i, r _i)>Score with other any incorrect orderings Satisfy inequality

Wherein

Be loss function, Ψ is the mapping that the input space arrives output region,

Be about x in the image library _iOther any orderings; Under above-mentioned restrictive condition, find the solution and make mintr{ (M ^TM)+and μ * σ } M and the σ that set up, wherein, σ is a slack variable, μ is an adjustable parameter, is used for controlling lax degree;

Step 203: the Optimization Model of utilizing the said step 202 of cutting planes method iterative to propose; The circulation of this method makes up working set Ω, and wherein, Ω is the set of all restrictive condition; The Ω initial value is an empty set; Iteration begins: under current circulation, algorithm is found the solution M and the σ under the work at present collection Ω, changes step 204 then over to;

Step 204: find out the maximum restrictive condition of wavy amplitude and add into working set Ω to it:

{\hat{r}}_{i} = \arg \max_{r &Element; Y_{output}} Δ (r_{i}, r) + < M, Ψ (x_{i}, r) >, i = 1,2, \cdot, N

Step 205: setting threshold; If the wavy amplitude of then stops algorithm less than the threshold value that configures; Otherwise, return said step 203, the circulation of beginning next round.

4. according to each described automated graphics semanteme marking method of propagating based on yardstick study and related label in the claim 1 to 3; It is characterized in that; Beginning in yardstick study; Taken into full account the key word information of image set, for every width of cloth image generates the relevant ordering of image, and design of graphics picture-ordering is to gathering as training sample.

5. according to each described automated graphics semanteme marking method of propagating based on yardstick study and related label in the claim 1 to 4; It is characterized in that; Keyword modeling in the said step 3 and label communication process are to accomplish through the mahalanobis distance yardstick that obtains in the said step 2 is embedded, and detailed process is following:

Σ_{k = 1}^{l} p_{k}^{u} δ (k &Element; Q) \leq Σ_{i = 1}^{\overset{&OverBar;}{N}} d_{M} (x_{u}, x_{i}) δ (Q \cap Q_{i})

Wherein,

Step 302: for a keyword set Q; Provide its 0-1 vector representation form:

and if only if keyword i in set Q the time

value be 1, otherwise be 0;

Step 303: the δ indicator function is generalized to a recessed kernel function H, obtains a series of submodule function T, be described below: Here, H (x)=1-2 ^{-α x}It is a kernel function;

p_{k}^{u} \overset{Δ}{=} T (C_{k}) - T (C_{k - 1}), k = 1,2, \cdot, l,

Here, It is a keyword set;

Step 305: obtain an x _uThe degree of confidence score vector of keyword

Step 306: threshold value θ is set ₀, when

The time, k keyword passed to x _u

6. according to each described automated graphics semanteme marking method of propagating based on yardstick study and related label in the claim 1 to 5; It is characterized in that; After yardstick study finishes, only can preceding