CN103853792B

CN103853792B - A kind of picture semantic automatic marking method and system

Info

Publication number: CN103853792B
Application number: CN201210521573.XA
Authority: CN
Inventors: 陆平; 董振江; 罗圣美; 刘丽霞; 陈清财; 刘胜宇; 户保田
Original assignee: ZTE Corp; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: ZTE Corp; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2012-12-07
Filing date: 2012-12-07
Publication date: 2018-06-15
Anticipated expiration: 2032-12-07
Also published as: CN103853792A

Abstract

The invention discloses a kind of picture semantic automatic marking method and systems, are related to image meaning automatic marking technology.System disclosed by the invention includes：Component one, for carried picture mark image data collection, build the index based on n member pictures；Component two pre-processes picture to be marked, extraction image n members；Component three retrieves the extracted corresponding all semantic labels of image n members from the constructed index based on n member pictures, calculates the probability value of the retrieved corresponding semantic label of image n members；The probability value of component four, all semantic labels of update；Component five is ranked up all semantic labels according to updated probability value, and one or more semantic labels output of setting value is reached during probability value is sorted.The invention also discloses a kind of picture semantic automatic marking methods.Technical scheme is applied in the automatic semantic tagger of image, can fast and efficiently excavate abundant linguistic indexing of pictures.

Description

A kind of picture semantic automatic marking method and system

Technical field

The present invention relates to image meaning automatic marking technologies, and in particular to a kind of figure based on n-gram picture indices structures Piece meaning automatic marking method and system, are mainly used in image meaning automatic marking and field of image search.

Background technology

So-called automatic image annotation (Automatic Image Annotation, AIA), exactly allows computer automatic Image content or the text label of user view can be reacted by being added to image.Using already provided with reaction picture semantic The image set of the text message of information or other to excavate the helpful resource of image Deep Semantics information.Learn image Deep Semantics concept space and the functional relation of image bottom original feature space.And other unknown semantics are believed using the model It ceases content images and carries out automatic marking.

On the whole, the method for carrying out image meaning automatic marking at present, is concentrated mainly on machine learning to picture It carries out on semantic tagger.Although the picture semantic mark based on machine learning has been investigated for many years, and have considerable Progress, while there has been proposed many pictures to represent new model, has attempted many multiclass mark graders.But the language of picture Justice mark effect and efficiency, it is not satisfactory；To reducing still without breakthrough progress for semantic gap；From reality Using also there is a big difference.Particularly when training data quality is not ideal enough or data set and during very big classification collection, mostly The performance of several algorithms all can drastically decline.This is primarily due to, these models are required for providing the data marked first Collection then using complicated machine learning algorithm, optimizes the parameter of a large amount of grader.It is each finally by what is obtained The grader of classification excavates the semantic label of unknown images.It is just higher to the requirement of training set in this way, and different people is to same The ambiguousness of the mark of one width picture is also bigger.When the number of labels of training set and the complex feature of selection, need Will be very big by the quantity of the parameter of classifier optimization, it is quick-fried that this kind of method does not adapt to Internet era amount of images now The fried form increased.

And most of machine learning algorithms due to time complexity the problem of, have ignored objects in images space letter Breath, and the low-level image feature information of more extraction images is to try to, and different low-level image feature information is merged, and then go to instruct Practice corresponding separator.In this way when training set changes, all training process are required for doing again one time, therefore current Machine learning algorithm is used in that training dataset is smaller mostly, on the problem of picture marked is needed to belong to specific area.

Invention content

The technical problem to be solved by the invention is to provide a kind of picture semantic automatic marking method and system, to improve The efficiency and effect of picture semantic automatic marking.

In order to solve the above-mentioned technical problem, the invention discloses a kind of picture semantic automatic marking system, including：

Component one, for carried picture mark image data collection, build the index based on n member pictures；

Component two pre-processes picture to be marked, extraction image n members；

Component three retrieves the extracted corresponding all semantemes of image n members from the constructed index based on n member pictures Label calculates the probability value of the retrieved corresponding semantic label of image n members；

The probability value of component four, all semantic labels of update；

Component five is ranked up all semantic labels according to updated probability value, reaches during probability value is sorted and sets One or more semantic labels output of definite value.

Preferably, in above system, the structure of the index based on n member pictures that the component one is built using image n members as Index, using image labeling and image details as index object.

Preferably, in above system, the component three calculates the retrieved corresponding language of image n members according to equation below The probability value of adopted label：

In formula：P (sun | img, (1,1)) --- in picture img to be marked, under conditions of there is (1,1), sun labels go out Existing probability, wherein, sun labels are the corresponding semantic label of image n members；

Lweight_sun--- the probability weights of (1,1) corresponding sun labels in index；

N_((1,1))--- the number that (1,1) occurs in picture img to be marked.

Preferably, in above system, the probability value that the component four updates all semantic labels refers to：

The probability value for initializing each semantic label of picture to be marked is 0, the probability value of update semantics label, until picture In all member be all retrieved.

Preferably, in above system, the component four according to equation below update semantics label probability value：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))

In formula：P (sun | img) --- the probability weights of sun are noted as in image img to be marked, wherein, sun labels For the corresponding semantic label of image n members；

P (sun | img, (1,1)) --- in picture img to be marked, under conditions of there is (1,1), sun labels occur general Rate.

The invention also discloses a kind of picture semantic automatic marking method, including：

For the image data collection for having carried picture mark, the index based on n member pictures is built；

Picture to be marked is pre-processed, extraction image n members are retrieved from the constructed index based on n member pictures The corresponding all semantic labels of image n members extracted calculate the probability of the retrieved corresponding semantic label of image n members Value；

The probability of occurrence value of all semantic labels is updated, all semantic labels are carried out according to updated probability of occurrence value Sequence, one or more semantic labels that setting value is reached during probability of occurrence value is sorted export.

Preferably, in the above method, the structure of the constructed index based on n member pictures is using image n members as index, to scheme As mark and image details are index object.

Preferably, in the above method, the general of the retrieved corresponding semantic label of image n members is calculated according to equation below Rate value：

N_((1,1))--- the number that (1,1) occurs in picture img to be marked.

Preferably, in the above method, the probability value for updating all semantic labels refers to：

Preferably, in the above method, according to the probability value of equation below update semantics label：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))

Technical scheme is applied in the automatic semantic tagger of image, can fast and efficiently excavate abundant image Semantic tagger.

Specific embodiment

Fig. 1 is the procedure chart that " image lemma " is extracted in the present embodiment；

Fig. 2 is the exemplary plot of image cutting and extraction n-gram in the present embodiment；

Fig. 3 be based on n-gram models picture indices method structure using n-gram as index, with semantic label and figure Exemplary plot as being index content index structure；

Fig. 4 is the picture semantic automatic marking flow diagram based on n-gram picture indices in the present embodiment.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to skill of the present invention Art scheme is described in further detail.It should be noted that in the absence of conflict, in embodiments herein and embodiment Feature can arbitrarily be combined with each other.

Embodiment 1

Conventional images automatic marking technology generally requires to carry out a large amount of parameter optimization and complicated parameter learning process, It is not ideal enough to the inefficiency mark effect of linguistic indexing of pictures, it is impossible to adapt to the ever-expanding new situations of picture scale.For The automatic efficiency of raising picture semantic and effect, are used in combination the picture indices structure based on n-gram models, the applicant Provide a kind of picture semantic automatic marking system based on n-gram picture indices structures.

The picture semantic automatic marking system based on n-gram picture indices structures, including at least the following basic element of character：

Component one, for carried picture mark image data collection, build the index based on n-gram pictures；

Specifically, component one to the picture randomly selected by carrying out textual cutting, and pass through k-means clustering methods Learn and build " image dictionary ", the image data collection then marked by having carried picture, structure is based on n-gram picture ropes Draw.

Wherein, the structure of the constructed index based on n member pictures is using image n members as index, with image labeling and image Details are index object.This is because with Bayesian probabilistic methods, to using image n-gram as index, with picture mark Note and picture are calculated for weights in the subindex node in the index structure of index object, you can obtain semantic label and figure As the probabilistic relation between n-gram.

By carrying out textual cutting to the picture randomly selected, and pass through k-means clustering methods and learn and build " figure As dictionary "；

Component two pre-processes picture to be marked, extraction image n-gram；

" image dictionary " extraction image n-gram that above-mentioned component two learns according to component one.

The image that searching part two is extracted in component three, the picture indices system based on n-gram built in component one The corresponding all semantic labels of n-gram, and calculate the probability value of the corresponding all semantic labels of image n-gram；

In the present embodiment, component three can calculate the retrieved corresponding semantic label of image n members according to equation below Probability value：

In formula：P (sun | img, (1,1)) --- in picture img to be marked, under conditions of there is (1,1), sun labels go out Existing probability；

N_((1,1))--- the number that (1,1) occurs in picture img to be marked.

The probability value of all semantic labels that component four, update component three calculate；

Specifically, it is 0 that component four, which initializes the probability value of each semantic label of picture to be marked, update semantics label it is general Rate value, until member all in picture is all retrieved.

In addition, component four can be according to the probability value of equation below update semantics label：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))

In formula：P (sun | img) --- the probability weights of sun are noted as in image img to be marked；

Component five, according to four newer probability value of component, all semantic labels are ranked up, probability value in sequence is reached One or more semantic labels to setting value export.

The process of above system automatic marking picture semantic, the process such as Fig. 4 institutes is described in detail by taking bigram as an example below Show.

Component (one) structure " image dictionary " in picture semantic automatic marking system based on n-gram picture indices structures When, it is necessary first to image lemma is learnt by the image data collection randomly selected, then " the image lemma " by learning to obtain It builds " image dictionary ".Wherein, the method and step of study " image lemma " is as shown in Figure 1, include the following steps：

The first step carries out textual cutting to the picture of selection, and the mode of textual cutting can be according to different applications Demand is designed.The example of a kind of picture textual cutting method provided in the embodiment of the present invention is by picture even partition Into the image fritter (such as Fig. 2) that size is m*n, each fritter is considered as one " word " in similar text-processing, and every Width image is considered as corresponding " article ", and the method that textual cutting is carried out to picture is without being limited thereto.

The characteristics of the underlying image of equal-sized image fritter that second step, extraction are cut into includes but not limited to image face Color characteristic, image texture characteristic.And merge its multiple low-level image feature, so as to obtain one, to react image fritter a variety of The feature vector of low-level image feature.

Third walks, and to the obtained feature vector of each image fritter, carries out cluster operation using clustering method, finally leads to It crosses and chooses the typical data point for representing respective cluster class as " image lemma ".Corresponding number is assigned to " the image lemma " of acquisition (such as Fig. 2).A kind of embodiment that the present invention uses is to be k-means cluster behaviour by the feature vector to all image fritters Make, predefine the quantity of clustering cluster, obtained " image lemma " by the barycenter for obtaining k-means cluster results.

It is exactly by constructing " image dictionary ", in order to further indicate that the space of image after study obtains " image lemma " Feature adds n-gram in " image dictionary ", for any one " image lemma ", n-1 adjacent thereto " image words Member " forms " image lemma " sequence, and all these " image lemma " sequences all are added in " image dictionary " as an item In, while other " image lemma " sequences that its length is less than n are added in, it forms " image dictionary ".For example, it is assumed that " the figure of extraction As lemma " it is 1,2,3, it is 2 to choose n, then " image dictionary " item that obtained " image dictionary " includes is：(1)、(2)、(3)、 (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3).For extracting " image lemma " quantity It for K, selects in the embodiment that n is 2, the gram quantity that " image dictionary " includes is K*K+K.

Since the index structure of the present embodiment structure is the image index structure based on n-gram, specifically with image n- Gram is index, using image labeling and image details as index object, as shown in figure 3, Mnode is master index section in figure Point is the item in " image dictionary " in master index node, including unigram and bigram.(1,1) it is image bigram, main rope The content for drawing node index includes two parts：

1st, the details of all pictures comprising " image dictionary " item in master index node, by taking Mnode as an example, under The picture of index is the details of all pictures comprising " image dictionary " item (1,1)；

2nd, the subindex node comprising text marking label (sun) and its correspondence weights (Lweightsun) (Cnode1).By taking Cnode1 as an example, subindex node includes the text label sun occurred in image data and passes through calculating Obtained correspondence weights Lweightsun.What Lweightsun reacted is " image dictionary " item and the subindex in master index node The relationship of text label in node, the computational methods that the present embodiment uses are as follows：

Wherein：

In formula：N ((1,1) | sun) --- in all pictures with text marking label (i.e. sun labels), comprising (1, 1) number；

N (n-gram | sun) --- in the index picture with text marking label (i.e. sun labels), include all n- The number of gram；

Nimg (sun) --- the number of all pictures with text marking label (sun labels)；

Nimg (All) --- the quantity of all pictures in data set；

N ((1,1)) --- image data concentrates the quantity of all (1,1)；

N (n-gram) --- image data concentrates the quantity of all n-gram.

What is indexed under subindex node is not only comprising " image dictionary " item in master index node (Mnode), but also band There are the details of all pictures of the text label in subindex node, by taking Cnode1 as an example, the picture of lower index includes (1,1) " image dictionary " item, while sun labels are carried again.

The basis that component two performs is to obtain " image dictionary " by the study of component one, and the component two is first to figure to be marked Piece is pre-processed, and including but not limited to picture size is normalized, the operations such as format conversion of picture.Then it is right Image carries out textual cutting process, to obtained image fritter, calculates the distance with each " image lemma ", be classified as from Its closest " image lemma ".Finally, the extraction of n-gram is carried out to image according to " image dictionary ", is selected in the present embodiment 8 direction extraction n-gram are taken, as shown in figure 3, the bigram that can be extracted is：(1,2), (2,2), (2,2), (2,1), (2, 4), (2,1), (2,3), (2,5).

The practice processes of component three are the image index structures based on n-gram built by one offline mould of component.To from The n-gram extracted in image to be marked retrieves extracted n- in the structure based on n-gram picture indices of structure The corresponding all semantic labels of gram, and calculate the probability value of its corresponding all semantic label.It is calculated as follows It carries out：

N_((1,1))--- the number that (1,1) occurs in picture img to be marked.

The probability value that component four fundamental rules initialize each label of picture img to be marked is 0, according to probability statistics rule, according to The probability value of following rule update image, semantic label：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))

In formula：P (sun | img) --- the general of text marking label (i.e. sun labels) is noted as in image img to be marked Rate weights；

P (sun | img, (1,1)) --- in picture img to be marked, under conditions of there is (1,1), text marking label is (i.e. Sun labels) occur probability.

And component four is constantly updated the probability value of semantic label, according to as above probability updating rule until picture In all gram be all retrieved.

Component five, major function are that the language of picture to be marked is determined according to the probability value of different labels being calculated Adopted label.

A kind of realization example provided in this embodiment is specific as follows：All semantic labels are carried out according to probability value first Then sequence selects alternative semantic mark of the one or more semantic labels more than certain weights (i.e. setting value) as picture Label, and by alternative picture semantic label according to probability value Sequential output, so as to obtain the final semantic label of image to be marked.

Embodiment 2

The present embodiment introduces a kind of picture semantic automatic marking method, and this method comprises the following steps：

Step 1: for the image data collection for having carried picture mark, the index based on n-gram pictures is built；

It should be noted that in the present embodiment, the structure of the constructed index based on n-gram pictures is with image n- Gram is index, using image labeling and image details as index object.

Step 2: picture to be marked is pre-processed, extraction image n-gram；

Step 3: the extracted corresponding institutes of image n-gram are retrieved from the constructed index based on n-gram pictures There is semantic label, calculate the probability value of the retrieved corresponding semantic labels of image n-gram；

In the present embodiment, the probability of the retrieved corresponding semantic labels of image n-gram is calculated according to equation below Value：

N_((1,1))--- the number that (1,1) occurs in picture img to be marked.

Step 4: update the probability of occurrence value of all semantic labels；

The concrete operations of the step are：The probability value for initializing each semantic label of picture to be marked is 0, update semantics mark The probability value of label, until member all in picture is all retrieved.

Wherein, the present embodiment according to equation below update semantics label probability value：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))

Step 5: being ranked up according to updated probability of occurrence value to all semantic labels, probability of occurrence value is sorted In reach setting value one or more semantic labels output.

From above-described embodiment as can be seen that technical scheme is applied in the automatic semantic tagger of image, can it is quick, Efficiently excavate abundant linguistic indexing of pictures.

One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, all or part of step of above-described embodiment can also be realized using one or more integrated circuits.Accordingly Ground, the form that hardware may be used in each module/unit in above-described embodiment are realized, can also use the shape of software function module Formula is realized.The application is not limited to the combination of the hardware and software of any particular form.

The above, only preferred embodiments of the invention, are not intended to limit the scope of the present invention.It is all this The spirit of invention and any modification, equivalent substitution, improvement and etc. within principle, done, should be included in the protection model of the present invention Within enclosing.

Claims

1. a kind of picture semantic automatic marking system, which is characterized in that the system includes：

Component two pre-processes picture to be marked, extraction image n members；

Component three retrieves the corresponding all semantic marks of extracted image n members from the constructed index based on n member pictures Label calculate the probability value of the retrieved corresponding semantic label of image n members；

The probability value of component four, all semantic labels of update；

Component five is ranked up all semantic labels according to updated probability value, reaches setting value during probability value is sorted One or more semantic labels output；

Specifically, component one to the picture randomly selected by carrying out textual cutting, and pass through k-means clustering methods and learn And " image dictionary " is built, the image data collection then marked by having carried picture, structure is based on n member picture indices；

When building " image dictionary ", it is necessary first to learn " image lemma " by the image data collection randomly selected, then pass through Learn obtained structure " image dictionary "；

N member items are added in " image dictionary ", for any one " image lemma ", n-1 adjacent thereto " image lemma " " image lemma " sequence is formed, all these " image lemma " sequences are all added in as an item in " image dictionary ", Other " image lemma " sequences that its length is less than n are added in simultaneously, are formed " image dictionary ".

2. the system as claimed in claim 1, which is characterized in that

The structure for the index based on n member pictures that the component one is built is detailed with image labeling and image using image n members as index Thin information is index object.

3. system as claimed in claim 1 or 2, which is characterized in that the component three is retrieved according to equation below calculating The corresponding semantic label of image n members probability value：

In formula：P (sun | img, (1,1)) --- in picture img to be marked, under conditions of there is (1,1), what sun labels occurred Probability, wherein, sun labels are the corresponding semantic label of image n members；

N_((1,1))--- the number that (1,1) occur in picture img to be marked.

4. system as claimed in claim 3, which is characterized in that the probability value that the component four updates all semantic labels refers to：

The probability value for initializing each semantic label of picture to be marked is 0, the probability value of update semantics label, until institute in picture Some members are all retrieved.

5. system as claimed in claim 4, which is characterized in that the component four is general according to equation below update semantics label Rate value：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))

In formula：P (sun | img) --- the probability weights of sun are noted as in image img to be marked, wherein, sun labels are figure As the corresponding semantic label of n members；

P (sun | img, (1,1)) --- in picture img to be marked, under conditions of there is (1,1), the probability of sun labels appearance.

6. a kind of picture semantic automatic marking method, which is characterized in that this method includes：

Picture to be marked is pre-processed, extraction image n members are retrieved from the constructed index based on n member pictures and carried The corresponding all semantic labels of image n members taken calculate the probability value of the retrieved corresponding semantic label of image n members；

The probability of occurrence value of all semantic labels is updated, all semantic labels are arranged according to updated probability of occurrence value Sequence, one or more semantic labels that setting value is reached during probability of occurrence value is sorted export；

Wherein, it is described for the image data collection for having carried picture mark, the index based on n member pictures is built, specifically, is passed through Textual cutting is carried out, and pass through k-means clustering methods and learn and build " image dictionary " to the picture randomly selected, then The image data collection marked by having carried picture, structure is based on n member picture indices；

7. method as claimed in claim 6, which is characterized in that the structure of the constructed index based on n member pictures is with image n Member is index, using image labeling and image details as index object.

8. method as claimed in claims 6 or 7, which is characterized in that it is right to calculate retrieved image n members according to equation below The probability value for the semantic label answered：

N_((1,1))--- the number that (1,1) occur in picture img to be marked.

9. method as claimed in claim 8, which is characterized in that the probability value for updating all semantic labels refers to：

10. method as claimed in claim 9, which is characterized in that according to the probability value of equation below update semantics label：

P (sun | img)=1- (1-p (sun | img)) (1-p (sun | img, (1,1)))