CN103853792A

CN103853792A - Automatic image semantic annotation method and system

Info

Publication number: CN103853792A
Application number: CN201210521573.XA
Authority: CN
Inventors: 陆平; 董振江; 罗圣美; 刘丽霞; 陈清财; 刘胜宇; 户保田
Original assignee: ZTE Corp; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: ZTE Corp; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2012-12-07
Filing date: 2012-12-07
Publication date: 2014-06-11
Anticipated expiration: 2032-12-07
Also published as: CN103853792B

Abstract

The invention discloses an automatic image semantic annotation system and relates to the field of automatic image semantic annotation. The system disclosed by the invention comprises a part I, a part II, a part III, a part IV and a part V, wherein the part I is used for establishing an index based on n-element images aiming at an image dataset with image tags; the part II is used for preprocessing an image to be annotated and extracting n elements of the image; the part III is used for retrieving all semantic tags corresponding to the extracted n elements of the image from the established index based on the n-element images and calculating the probability values of the semantic tags corresponding to the retrieved n elements of the image; the part IV is used for updating the probability values of all semantic tags; the part V is used for ranking all semantic tags according to the updated probability values and outputting one semantic tag or a plurality of semantic tags with the probability values which reach a set value in the rank. The invention additionally discloses an automatic image semantic annotation method. The technical scheme provided by the invention is applied to automatic image semantic annotation and rich image semantic tags can be quickly and efficiently mined.

Description

A kind of picture semantic automatic marking method and system

Technical field

The present invention relates to image meaning automatic marking technology, be specifically related to a kind of picture semantic automatic marking method and system based on n-gram picture indices structure, be mainly used in image meaning automatic marking and field of image search.

Background technology

So-called automatic image annotation (Automatic Image Annotation, AIA), allows the computing machine can be automatically to the text label that can react image content or user view in image interpolation exactly.Utilize with the image set of text message of reaction picture semantic information, or other are to excavating the helpful resource of image Deep Semantics information.The study Deep Semantics concept space of image and the funtcional relationship in image bottom primitive character space.And utilize this model to carry out automatic marking to other unknown semantics information content images.

On the whole, carry out at present the method for image meaning automatic marking, mainly concentrate on and use machine learning to carry out on semantic tagger picture.Although the picture semantic mark based on machine learning is studied a lot of years, and has had considerable progress, while people have proposed a lot of pictures and have represented new model, have attempted a lot of multiclass mark sorters.But semantic tagger effect and the efficiency of picture, can not be satisfactory; Dwindling of semantic gap still do not had to breakthrough progress; From actual application, also there is a big difference.Particularly, when training data quality is not ideal enough or data set and classification collection when very large, the performance of most algorithm all can sharply decline.This is mainly because these models all need first to provide the data set having marked, and then utilize complicated machine learning algorithm, and the parameter of a large amount of sorters is optimized.Finally, by the sorter of each classification of obtaining, excavate the semantic label of unknown images.Just higher to the requirement of training set like this, and the ambiguousness of the mark of different people to same width picture is also larger.When the number of labels of training set and the feature chosen are when comparatively complicated, need to will be very large by the quantity of the parameter of classifier optimization, the form of amount of images explosive increase Internet era that these class methods can not adapting to now.

And most of machine learning algorithms are due to the problem of time complexity, ignore the spatial information of objects in images, but the low-level image feature information of many extraction images of trying one's best, and different low-level image feature information is merged, and then remove to train corresponding separation vessel.In the time that training set changes, all training process all need again to do one time like this, and therefore to be mostly used in training dataset less for current machine learning algorithm, need the picture of mark to belong in the problem of specific area.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of picture semantic automatic marking method and system, to improve efficiency and the effect of picture semantic automatic marking.

In order to solve the problems of the technologies described above, the invention discloses a kind of picture semantic automatic marking system, comprising:

Parts one, for the image data collection with picture mark, build the index based on n unit picture;

Parts two, picture to be marked is carried out to pre-service, extract image n unit;

Parts three, from the constructed index based on n unit picture, retrieve extracted all semantic labels corresponding to image n unit, calculate the probable value of semantic label corresponding to the image n unit that retrieves;

Parts four, upgrade the probable value of all semantic labels;

Parts five, according to the probable value after upgrading, all semantic labels are sorted, will probable value reach one or more semantic labels outputs of setting value in sequence.

Preferably, in said system, the structure of the index based on n unit picture that described parts one build is take image n unit as index, take image labeling and image details as index object.

Preferably, in said system, described parts three calculate the probable value of the semantic label of the image n unit correspondence retrieving according to following formula:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

In formula: p (sun|img, (1,1))---in picture img to be marked, occur under the condition of (1,1), the probability that sun label occurs, wherein, sun label is semantic label corresponding to image n unit;

Lweight _sun---the probability weights of the sun label of (1,1) correspondence in index;

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

Preferably, in said system, the probable value that described parts four upgrade all semantic labels refers to:

The probable value of each semantic label of initialization picture to be marked is 0, and the probable value of update semantics label, until all units are all retrieved in picture.

Preferably, in said system, described parts four are according to the probable value of following formula update semantics label:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))

In formula: p (sun|img)---in image img to be marked, be noted as the probability weights of sun, wherein, sun label is semantic label corresponding to image n unit;

P (sun|img, (1,1))---in picture img to be marked, there is under the condition of (1,1) probability that sun label occurs.

The invention also discloses a kind of picture semantic automatic marking method, comprising:

For the image data collection having marked with picture, build the index based on n unit picture;

Picture to be marked is carried out to pre-service, extract image n unit, from the constructed index based on the first picture of n, retrieve extracted all semantic labels corresponding to image n unit, calculate the probable value of the semantic label of the image n unit correspondence retrieving;

Upgrade the probability of occurrence value of all semantic labels, according to the probability of occurrence value after upgrading, all semantic labels are sorted, will in the sequence of probability of occurrence value, reach one or more semantic label outputs of setting value.

Preferably, in said method, the structure of the constructed index based on n unit picture is take image n unit as index, take image labeling and image details as index object.

Preferably, in said method, calculate the probable value of the semantic label of the image n unit correspondence retrieving according to following formula:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

Preferably, in said method, the probable value of upgrading all semantic labels refers to:

Preferably, in said method, the probable value according to following formula update semantics label:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))

Present techniques scheme is applied in the automatic semantic tagger of image, can excavate fast and efficiently abundant linguistic indexing of pictures.

Embodiment

Fig. 1 is the procedure chart that extracts " image lemma " in the present embodiment;

Fig. 2 is that in the present embodiment, image cuts and extract the exemplary plot of n-gram;

Fig. 3 be picture indices method based on n-gram model build take n-gram as index, the exemplary plot take semantic label and image as index content index structure;

Fig. 4 is the picture semantic automatic marking schematic flow sheet based on n-gram picture indices in the present embodiment.

Embodiment

For making the object, technical solutions and advantages of the present invention clearer, below in connection with accompanying drawing, technical solution of the present invention is described in further detail.It should be noted that, in the situation that not conflicting, the feature in the application's embodiment and embodiment can combine arbitrarily mutually.

Embodiment 1

Conventional images automatic marking technology, often needs to carry out a large amount of parameter optimizations and complicated parameter learning process, not ideal enough to the inefficiency mark effect of linguistic indexing of pictures, can not adapt to the new situations that picture scale constantly expands.In order to improve the automatic efficiency of picture semantic and effect, be combined with the picture indices structure based on n-gram model, the applicant provides a kind of picture semantic automatic marking system based on n-gram picture indices structure.

Be somebody's turn to do the picture semantic automatic marking system based on n-gram picture indices structure, at least comprise the following basic element of character:

Parts one, for the image data collection with picture mark, build the index based on n-gram picture;

Particularly, parts one are by the picture of choosing is at random carried out to text cutting, and learn and build " image dictionary " by k-means clustering method, then, by the image data collection with picture mark, build based on n-gram picture indices.

Wherein, the structure of the constructed index based on n unit picture is take image n unit as index, take image labeling and image details as index object.This be because, use Bayesian probability method, to take image n-gram as index, calculate take weights in picture mark and the subindex node of picture in the index structure of index object, can obtain the probabilistic relation between semantic label and image n-gram.

By the picture of choosing is at random carried out to text cutting, and learn and build " image dictionary " by k-means clustering method;

Parts two, picture to be marked is carried out to pre-service, extract image n-gram;

" image dictionary " that above-mentioned parts two are learnt to arrive according to parts one extracts image n-gram.

All semantic labels corresponding to image n-gram that in parts three, the picture indices system based on n-gram that builds at parts one, searching part two extracts, and calculate the probable value of all semantic labels that this image n-gram is corresponding;

In the present embodiment, parts three can calculate according to following formula the probable value of the semantic label of the image n unit correspondence retrieving:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

In formula: p (sun|img, (1,1))---in picture img to be marked, there is under the condition of (1,1) probability that sun label occurs;

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

The probable value of all semantic labels that parts four, renewal parts three calculate;

Particularly, the probable value of each semantic label of parts four initialization picture to be marked is 0, and the probable value of update semantics label, until all units are all retrieved in picture.

In addition, parts four can be according to the probable value of following formula update semantics label:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))

In formula: p (sun|img)---in image img to be marked, be noted as the probability weights of sun;

Parts five, the probable value of upgrading according to parts four, sort to all semantic labels, probable value in sequence reached to one or more semantic label outputs of setting value.

The process that describes below said system automatic marking picture semantic as an example of bigram example in detail, this process as shown in Figure 4.

When in picture semantic automatic marking system based on n-gram picture indices structure, parts () build " image dictionary ", first need the image data collection study image lemma by choosing at random, " image lemma " structure " image dictionary " then obtaining by study.Wherein, the method step of study " image lemma " as shown in Figure 1, comprises the steps:

The first step, the picture of choosing is carried out to text cutting, the mode of text cutting can design according to different application demands.The example of a kind of picture text cutting method providing in the embodiment of the present invention is that picture is evenly divided into the image fritter (as Fig. 2) of size for m*n, each fritter can be regarded " word " in similar text-processing as, and every width image can be regarded " article " accordingly as, the method for picture being carried out to text cutting is not limited to this.

The image low-level image feature of the equal-sized image fritter that second step, extraction are cut into includes but not limited to color of image feature, image texture characteristic.And its multiple low-level image features are merged, thereby obtain the proper vector of an energy response diagram as the multiple low-level image feature of fritter.

The 3rd step, to the proper vector of the each image fritter obtaining, adopts clustering method to carry out cluster operation, finally by choosing the typical data point that represents respective cluster class as " image lemma ".Give corresponding numbering (as Fig. 2) to " the image lemma " that obtain.A kind of embodiment that the present invention adopts, is by the proper vector of all image fritters is done to k-means cluster operation, pre-determines the quantity of clustering cluster, obtains " image lemma " by obtaining the barycenter of k-means cluster result.

Study obtains after " image lemma ", be exactly by structure " image dictionary ", for the space characteristics of further presentation video, in " image dictionary ", add n-gram item, for any " image lemma ", a n-1 being adjacent " image lemma " forms " image lemma " sequence, all these " image lemma " sequences are all added in " image dictionary " as an item, add its length to be less than other " image lemma " sequences of n simultaneously, form " image dictionary ".For example, suppose that " the image lemma " that extract is 1,2,3, choosing n is 2, " image dictionary " that " image dictionary " obtaining so comprises is: (1), (2), (3), (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3).Be K for extracting " image lemma " quantity, selecting n is that in 2 embodiment, the gram quantity that " image dictionary " comprises is K*K+K.

The index structure building due to the present embodiment is the image index structure based on n-gram, specifically take image n-gram as index, take image labeling and image details as index object, as shown in Figure 3, in figure, Mnode is master index node, in master index node, be the item in " image dictionary ", comprise unigram and bigram.(1,1) is image bigram, and the content of master index node index comprises two parts:

The details of all pictures that 1, comprise " image dictionary " in master index node, take Mnode as example, picture of its lower index is the details of all pictures of comprising " image dictionary " (1,1);

2, comprise text marking label (sun) with and the subindex node (Cnode1) of corresponding weights (Lweightsun).Take Cnode1 as example, subindex node comprises the text label sun that occurs in view data and by the corresponding weights Lweightsun calculating.Lweightsun reaction be " image dictionary " in master index node with subindex node in the relation of text label, the computing method of the present embodiment employing are as follows:

{Lweight}_{sum} = p (sun | (1,1)) = \frac{p (sun, (1,1))}{p ((1,1))} = \frac{p ((1,1) | sun) \cdot p (sun)}{p ((1,1))}

Wherein:

p ((1,1) | sun) = \frac{p (sun, (1,1))}{p (sun)} \approx \frac{N ((1,1) | sun)}{N (n - gram | sum)}

p (sun) = \frac{Nimg (sun)}{Nimg (All)}

p ((1,1)) = \frac{N ((1,1))}{N (n - gram)}

In formula: N ((1,1) | sun)---in all pictures with text marking label (being sun label), the number that comprises (1,1);

N (n-gram|sun)---in the index picture with text marking label (being sun label), the number that comprises all n-gram;

Nimg (sun)---with the number of all pictures of text marking label (sun label);

Nimg (All)---the quantity of all pictures of data centralization;

N ((1,1))---image data is concentrated the quantity of all (1,1);

N (n-gram)---image data is concentrated the quantity of all n-gram.

Under subindex node, index is " image dictionary " both having comprised in master index node (Mnode), simultaneously again with the details of all pictures of the text label in subindex node, take Cnode1 as example, the picture of its lower index comprises (1,1) " image dictionary ", simultaneously again with sun label.

The basis that parts two are carried out is to obtain " image dictionary " by parts one study, and first these parts two carry out pre-service to picture to be marked, include but not limited to picture size to be normalized the operations such as the format conversion of picture.Then image is carried out to text cutting process, to the image fritter obtaining, the distance of calculating and each " image lemma ", is classified as from its nearest " image lemma ".Finally, according to " image dictionary ", image is carried out the extraction of n-gram, in the present embodiment, choose 8 directions and extract n-gram, as shown in Figure 3, the bigram that can extract is: (1,2), (2,2), (2,2), (2,1), (2,4), (2,1), (2,3), (2,5).

The enforcement basis of parts three is the image index structures based on n-gram that build by parts one off-line mould.To the n-gram extracting from image to be marked, in the structure based on n-gram picture indices building, retrieve all semantic labels that extracted n-gram is corresponding, and calculate the probable value of all semantic labels of its correspondence.It calculates as is undertaken by following formula:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

The probable value of each label of parts four fundamental rules initialization picture img to be marked is 0, according to probability statistics rule, upgrades the probable value of image, semantic label according to following rule:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))

In formula: p (sun|img)---in image img to be marked, be noted as the probability weights of text marking label (being sun label);

P (sun|img, (1,1))---in picture img to be marked, there is under the condition of (1,1) probability that text marking label (being sun label) occurs.

And parts four are according to as above probability update rule, the probable value of semantic label constantly updated, until all gram are retrieved in picture.

Parts five, its major function is according to the probable value of the different labels that calculate, and determines the semantic label of picture to be marked.

It is specific as follows that the one that the present embodiment provides realizes example: first all semantic labels are sorted according to probable value, then select to be greater than one or more semantic labels of certain weights (being setting value) as the alternative semantic label of picture, and by alternative picture semantic label according to probable value Sequential output, thereby obtain the final semantic label of image to be marked.

Embodiment 2

The present embodiment is introduced a kind of picture semantic automatic marking method, and the method comprises the steps:

Step 1, for the image data collection with picture mark, build the index based on n-gram picture;

It should be noted that, in the present embodiment, the structure of the constructed index based on n-gram picture is take image n-gram as index, take image labeling and image details as index object.

Step 2, picture to be marked is carried out to pre-service, extract image n-gram;

Step 3, from the constructed index based on n-gram picture, retrieve all semantic labels that extracted image n-gram is corresponding, calculate the probable value of the semantic label that the image n-gram that retrieves is corresponding;

In the present embodiment, calculate the probable value of the semantic label that the image n-gram that retrieves is corresponding according to following formula:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

Step 4, upgrade the probability of occurrence value of all semantic labels;

The concrete operations of this step are: the probable value of each semantic label of initialization picture to be marked is 0, and the probable value of update semantics label, until all units are all retrieved in picture.

Wherein, the present embodiment is according to the probable value of following formula update semantics label:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))

Step 5, according to the probability of occurrence value after upgrading, all semantic labels are sorted, will probability of occurrence value reach one or more semantic labels outputs of setting value in sequence.

Can find out from above-described embodiment, present techniques scheme is applied in the automatic semantic tagger of image, can excavate fast and efficiently abundant linguistic indexing of pictures.

One of ordinary skill in the art will appreciate that all or part of step in said method can carry out instruction related hardware by program and complete, described program can be stored in computer-readable recording medium, as ROM (read-only memory), disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuit.Correspondingly, the each module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.

The above, be only preferred embodiments of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a picture semantic automatic marking system, is characterized in that, this system comprises:

Parts four, upgrade the probable value of all semantic labels;

2. the system as claimed in claim 1, is characterized in that,

The structure of the index based on n unit picture that described parts one build is take image n unit as index, take image labeling and image details as index object.

3. system as claimed in claim 1 or 2, is characterized in that, described parts three calculate the probable value of the semantic label of the image n unit correspondence retrieving according to following formula:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

4. system as claimed in claim 3, is characterized in that, the probable value that described parts four upgrade all semantic labels refers to:

5. system as claimed in claim 4, is characterized in that, described parts four are according to the probable value of following formula update semantics label:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))

6. a picture semantic automatic marking method, is characterized in that, the method comprises:

7. method as claimed in claim 6, is characterized in that, the structure of the constructed index based on n unit picture is take image n unit as index, take image labeling and image details as index object.

8. the method as described in claim 6 or 7, is characterized in that, calculates the probable value of the semantic label of the image n unit correspondence retrieving according to following formula:

p (sun | img, (1,1)) = 1 - {(1 - Lweigh t_{sum})}^{N_{((1,1))}}

N _((1,1))---the number that in picture img to be marked, (1,1) occurs.

9. method as claimed in claim 8, is characterized in that, the probable value of upgrading all semantic labels refers to:

10. method as claimed in claim 9, is characterized in that, the probable value according to following formula update semantics label:

p(sun|img)＝1-(1-p(sun|img))·(1-p(sun|img，(1，1)))