CN102880612B

CN102880612B - Image annotation method and device thereof

Info

Publication number: CN102880612B
Application number: CN201110197235.0A
Authority: CN
Inventors: 曹琼; 刘汝杰; 于浩
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-07-14
Filing date: 2011-07-14
Publication date: 2015-05-06
Anticipated expiration: 2031-07-14
Also published as: CN102880612A

Abstract

The embodiment of the invention provides an image annotation method and a device thereof. The image annotation method comprises the steps that an initial label set comprising a plurality of labels is obtained for an input image; the label set based similarity degree between the label set of the input image and a label set of a comparison image stored in a data base is calculated; the label set based similarity degree and the vision based similarity degree are subjected to consolidation, so as to obtain the consolidated similarity degree between the input image and the comparison image; and the label set of the input image is updated based on the consolidated similarity degree. According to the embodiment of the invention, low level characteristics and high level semantics of the image can be considered at the same time; the precision of image annotation can be improved; the automatic label annotation is realized; and the annotation efficiency is improved.

Description

Image labeling method and device

Technical Field

The invention relates to the field of image classification and retrieval, in particular to an image labeling method and an image labeling device.

Background

With the development of computer networks and multimedia technology, the amount of multimedia information available on the internet has also grown very rapidly. The proliferation of multimedia information provides users with rich resources, and at the same time, how to quickly and effectively obtain interesting resources from massive information also brings huge challenges to researchers. Thus, image classification and retrieval techniques are gaining increasing attention.

Content-Based Image Retrieval (CBIR) technology has been widely studied since the introduction of the last ninety decades. Other images that are similar in visual characteristics can be retrieved by indexing the visual content characteristics of the image itself (e.g., underlying characteristics such as color, texture, shape, and spatial hierarchy). So that images can be directly compared and retrieved based on the visual similarity computed for low-level features of the images.

However, because the image is described by using the bottom visual features of the image, the features have no uniform rule correlation with the subjective judgment of people on the high-level semantics of the image. When completely different types of images are likely to have similar underlying features, the method of direct comparison based on visual similarity often fails to obtain accurate retrieval results.

On the other hand, some methods of labeling images by a Text-Based image retrieval (TBIR) technique have appeared. Similar images of the image to be marked are searched through the low-level features, and the label of the similar image is distributed to the image to be marked, so that the image vision and the related text information can be combined for retrieval.

However, in the process of implementing the invention, the inventor finds that the prior art has the following defects: at present, due to the distance between the low-level features and the high-level semantics of the images, the accuracy of image annotation is low; and if the image is only marked by human-computer interaction or manual mode, the efficiency is low and the burden of the user is heavy.

Disclosure of Invention

The embodiment of the invention provides an image annotation method and a device thereof, aiming at simultaneously considering the low-level characteristics and high-level semantics of an image and improving the accuracy of image annotation; and moreover, automatic labeling of the label is realized, and the labeling efficiency is improved.

According to an aspect of an embodiment of the present invention, there is provided an image annotation method, including:

obtaining an initial set of tags comprising a plurality of tags for an input image, wherein an accuracy of representing semantics of the input image is determined from the plurality of tags;

calculating a tagset-based similarity between the tagset of the input image and the tagsets of the comparison images stored in the database;

performing a merging calculation on the similarity based on the label set and the similarity based on vision to obtain a merged similarity of the input image and the comparison image;

updating the label set of the input image based on the merged similarity.

According to another aspect of the embodiments of the present invention, there is provided an image annotation apparatus including:

an initializer for obtaining an initial set of tags for an input image, the set of tags comprising a plurality of tags from which an accuracy of representing semantics of the input image is determined;

a relation calculator that calculates a similarity based on a tag set between the tag set of the input image and a tag set of a comparison image stored in a database;

a merging calculator which performs merging calculation on the similarity based on the label set and the similarity based on vision to obtain a merged similarity of the input image and the comparison image;

a tag set updater that updates a tag set of the input image based on the merged similarity.

The embodiment of the invention has the advantages that the low-level characteristics and high-level semantics of the image can be considered simultaneously by combining the similarity based on the label set and the similarity based on the vision, so that the accuracy of image annotation is improved; and moreover, automatic labeling of the label is realized, and the labeling efficiency is improved.

Features that are described and/or illustrated with respect to one embodiment may be used in one or more other embodiments, in combination with or instead of the features of the other embodiments, by the same or similar methods.

It should be emphasized that the terms "comprises" and "comprising," when used in this specification, are taken to specify the presence of stated features, integers, steps or components but do not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flowchart of an image annotation method according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of an annotated image in an embodiment of the invention;

FIG. 3 is a schematic diagram of obtaining an initial set of tags according to an embodiment of the present invention;

FIG. 4 is a flowchart of an image annotation method according to an embodiment of the invention;

FIG. 5 is a diagram illustrating an iterative process of an image annotation method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an image annotation apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of another configuration of the image annotation device in the embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

An embodiment of the present invention provides an image annotation method, and fig. 1 is a flowchart of the image annotation method according to the embodiment of the present invention. As shown in fig. 1, the image annotation method includes:

step 101, obtaining an initial label set comprising a plurality of labels for an input image, wherein the accuracy of the semantic meaning of the input image is determined according to the plurality of labels;

102, calculating the similarity based on the label set between the label set of the input image and the label set of the comparison image stored in the database;

103, carrying out merging calculation on the similarity based on the label set and the similarity based on the vision so as to obtain the merging similarity of the input image and the comparison image;

and 104, updating the label set of the input image based on the combined similarity.

In this embodiment, each image may be labeled with a label set, the label set may include a plurality of labels, and the accuracy of representing the semantics of the input image may be determined according to the arrangement order of the labels, for example, the accuracy of the label arranged in the front is higher than that of the label arranged in the back.

FIG. 2 is a schematic diagram of an annotated image in an embodiment of the invention. As shown in fig. 2, the image corresponds to a label set, which includes four labels: { gold, gate, bridge, sights }. In fig. 2, as can be seen from the arrangement sequence of the tags, the accuracy of gold is greater than that of gate, the accuracy of gate is greater than that of bridge, and the accuracy of bridge is greater than that of sights.

In specific implementation, a weight may also be given to each tag, for example, the weight of gold is 60, the weight of gate is 52, the weight of bridge is 48, the weight of sight is 30, and the accuracy of the semantic meaning of the tag expressed by the weight is reflected. It should be noted that the above is merely an illustrative illustration of the label set, but is not limited thereto, and the specific implementation can be determined according to actual situations.

In one embodiment, the obtaining an initial tag set for the input image may specifically include: a set of tags is randomly assigned.

In another embodiment, the initial set of tags for the input image may be obtained using a visual similarity-based approach. Calculating a vision-based similarity of the input image and the comparison image stored in the database; an initial set of labels for the input image is obtained based on the vision-based similarity.

Wherein a large number of comparison images which have been image-labeled can be stored in the database. The first few comparison images closest to the input image in terms of visual-based similarity may be selected after calculating the visual-based similarity, and an initial set of labels for the input image may be obtained based on the labels of these comparison images. For example, a collection of tags for the several comparison images may be employed as the initial set of tags; but not limited to, the initial set of tags may also be obtained by voting as described below.

FIG. 3 is a schematic diagram of obtaining an initial set of tags according to an embodiment of the present invention. As shown in fig. 3, based on the visual similarity, a plurality of comparison images that are relatively close in visual similarity may be found in the database for the input image; then Voting (Voting) or statistics is carried out according to the searched label set of the comparison image, and an initial label set of the input image can be obtained.

The above is merely an illustrative description of how to obtain the initial tag set, but is not limited thereto. As for the specific implementation of how to calculate the similarity based on the vision, how to search and compare images according to the similarity based on the vision, how to perform voting, and the like, the prior art can be adopted, and details are not repeated here.

In this embodiment, after the input image is initialized to obtain the initial tab set, the relationship between the tab set of the input image and the tab set of the comparison image may be calculated, so as to obtain the similarity based on the tab sets.

In one embodiment, the calculating a relationship between the tag set of the input image and the tag set of the comparison image stored in the database may specifically include: calculating the intersection of the tag set of the input image and the tag set of the comparison image and the ratio of the tag set of the input image to the union of the tag sets of the comparison images; determining a similarity based on the tag set between the tag set of the input image and the tag set of the comparison image stored in the database according to the obtained ratio.

For example, the initial set of labels for the input image as shown in FIG. 2 is: { gold, gate, bridge, sights }. Assume that the labelsets of the compared images are: { gold, gate, bridge, 2006}, then the similarity based on the tag set can be calculated according to the number of the same tags as: 3/5, respectively; assume that the labelsets of the compared images are: { facade, sanfrancisco, bridge, gold }, then the similarity based on the tag set is: 1/7.

In another embodiment, calculating the relationship between the tag set of the input image and the tag set of the comparison image stored in the database may specifically include: calculating semantic distances between the label sets of the input image and the comparison image; determining a similarity based on the tag set between the tag set of the input image and the tag set of the comparison image stored in the database according to the calculated semantic distance.

For example, the initial set of labels for the input image as shown in FIG. 2 is: { gold, gate, bridge, sights }, assuming that the tag sets of the compared images are: { gold, gate, bridge, 2006 }. First, the distance between the input image label and the comparison image label can be calculated, and a distance matrix can be obtained, as shown in the following table:

TABLE 1

	golden	gate	bridge	sights
					golden	1	0	0.1	0
gate	0	1	0.1	0.3
					bridge	0.1	0.1	1	0.6
2006	0	0	0	0

Then, based on the distance matrix, the optimal one-to-one correspondence is found, which can be realized by adopting a greedy matching method or a Munkre optimal matching method. For example: and by using a Munkre optimal matching method, the obtained correspondence among the labels is gold-gold, gate-gate, bridge-bridge and 2006-sights, and the similarity of the label set is (1+1+1+ 0)/4-3/4.

The above is only an illustrative description of how to calculate the similarity based on the tag set, but is not limited to this, for example, a weight may be added, or a specific implementation may be determined according to actual situations by using an existing method for calculating the similarity.

In this embodiment, after obtaining the similarity based on the tab set, the similarity based on the tab set and the similarity based on the vision may be subjected to a merging calculation to obtain a merged similarity of the input image and the comparison image.

For example, the similarity based on the tag set and the similarity based on the vision may be weighted and then added or multiplied. Assuming that the similarity based on the tab set is 1/7, the similarity based on the vision is 1/3, and the weights are 3 and 2, respectively, the combined similarity may be: (1/7) × 3+ (1/3) × 2 ═ 23/21.

The above description is only for illustrative purposes of how to calculate the merging similarity, but is not limited thereto, and for example, an existing calculation method may be adopted, and a specific implementation may be determined according to actual situations.

In this embodiment, after obtaining the combined similarity based on the similarity of the tab sets and the similarity based on the vision, the updating the tab sets of the input image based on the combined similarity may specifically include: if the merging similarity is larger than a preset value, counting the times of the tags appearing in the tag set of the comparison image; and adjusting the labels in the label set of the input image according to the statistical result so as to update the label set of the input image.

In the embodiment, the similarity based on the label set and the similarity based on the vision can be combined, so that the low-level features and the high-level semantics of the image can be considered simultaneously, and the accuracy of image annotation is improved. To improve accuracy, multiple iterations may be performed.

FIG. 4 is a flowchart of an image annotation method according to an embodiment of the invention. As shown in fig. 4, the image annotation method includes:

step 401, obtaining an initial label set comprising a plurality of labels for an input image, wherein the accuracy of representing the semantics of the input image is determined according to the plurality of labels;

step 402, calculating the similarity based on the label set between the label set of the input image and the label set of the comparison image stored in the database;

step 403, merging and calculating the similarity based on the label set and the similarity based on the vision to obtain the merged similarity of the input image and the comparison image;

step 404, judging whether the merging similarity is greater than a preset value, if so, executing step 405; otherwise, go to step 406;

step 405, counting the number of times that the labels appear in the label set of the comparison image; and adjusting the labels in the label set of the input image according to the statistical result so as to update the label set of the input image.

Step 406, judging whether a preset condition is reached;

if the preset condition is not met, selecting other comparison images from the database; and proceeds to step 402; and if the preset condition is reached, ending the image annotation process.

In this embodiment, when selecting the comparison image, only one comparison image may be selected. Multiple comparison images can also be selected; in step 405, statistics may be performed based on the label sets of the plurality of comparison images, and thus the results may be accumulated, further improving accuracy.

In this embodiment, the step 406 of determining whether the preset condition is reached may specifically include: judging whether the label set of the input image is the same as or similar to the label set of the previous iteration; if the two are the same or similar, the label set of the input image reaches a stable state, and the preset condition is determined to be reached. The preset condition may also be a preset number of iterations or an iteration time, and the specific iteration condition may be determined according to an actual situation.

FIG. 5 is a schematic diagram of an iterative process of an image annotation method according to an embodiment of the present invention. As shown in fig. 5, the input image is labeled with a label set having a plurality of labels, and the weight of the label may correspond to a statistical histogram, which represents the accuracy of the semantics of the input image. A plurality of comparison images may be selected based on the similarity based on the tagset and the similarity based on the vision, votes may be cast based on the tagsets of the comparison images, and the tagset of the input image may be updated. And the updating process can be iterated for multiple times, so that the accuracy of the tag set is further improved.

How to update the tag set is further explained by way of example below. For example, the tag set of the input image is { gate, gold, bridge, signs }. After the merging similarity is calculated, 5 comparison images with the merging similarity larger than a preset value are obtained, and the label sets of the comparison images are respectively as follows: { bridge, 2006, fast }, { sanfrancisco, bridge, gold }, { bridge, traffic, sanfrancisco, gate }, { bridge, gold, sanfrancisco, fast }, { gate, bridge, fast }.

The number of occurrences of the label can be counted in the label set of the 5 images. The statistical results are shown in the following table:

TABLE 2

Label (R)	Number of times
		bridge	5
favorite	3
		sanfrancisco	3
gate	2
		golden	2
2006	1
		traffic	1

The first 4 tags with the largest number of votes can be selected as new tags to update the tag set of the input image, i.e. the tag set of the input image is updated to { bridge, favorite, sanfrancisco, gate }. The above is only an explanation of one iteration process, and multiple iterations of the above process may be performed.

In addition, in this embodiment, the tag set obtained by the previous round of calculation in the iterative process may participate in the process of obtaining a new tag, in addition to being used for calculating the tag similarity. Moreover, normalization processing can be carried out on the number of votes, and the updating accuracy is further improved.

For example, as shown in fig. 3, the 4 tags with the largest number of votes obtained are the initial tags { gate, gold, bridge, signs }, and the number of votes obtained is 4, 3, 3, and 2, respectively. The number of votes obtained may be normalized, for example, to make the sum 1, and the result after normalization is: {4, 3, 3, 2}/12 ═ 0.33, 0.25, 0.25, 0.17 }.

The merged similarity may then be calculated based on the tag set { gate, golden, bridge, signs }, and the vision-based similarity. According to the merging similarity, 5 comparison images with the merging similarity larger than a preset value are obtained, and the label sets of the 5 comparison images are respectively as follows: { bridge, 2006, favorite, ca }, { sanfranciscos, bridge, gold }, { bridge, traffic, gate }, { bridge, gold, sanfranciscos, favorite }, { gate, bridge, favorite }.

Then in the tag set of these 5 images, the number of occurrences of each tag can be counted as shown in the following table:

TABLE 3

Label (R)	Number of times
		bridge	5
favorite	3
		sanfrancisco	2
gate	2
		golden	2
2006	1
		traffic	1
ca	1

Similarly, if the number of votes obtained is normalized, for example, the sum is 1, the result after normalization is: {5, 3, 2, 2, 2, 1, 1, 1}/17 ═ 0.294, 0.177, 0.118, 0.118, 0.118, 0.059, 0.059}, it is possible to add the number of votes obtained and the number of votes obtained before by different weights. Assuming that the weight of the previous vote count is a (0 < a < 1), and the weight of the round vote count is 1-a, the results shown in the following table can be obtained:

TABLE 4

Label (R)	Number of votes obtained before	Number of votes obtained in this round	Addition result a is 0.5
				gate	0.33	0.118	0.224
golden	0.25	0.118	0.184
				bridge	0.25	0.294	0.272
sights	0.17		0.085
				favorite		0.177	0.0885
sanfrancisco		0.118	0.0590
				2006		0.059	0.0295
traffic		0.059	0.0295

Thus, the four tags with the largest results can be selected: { gate, gold, bridge, favorite }, to update the tag set of the input image. Meanwhile, the weights of the 4 labels can be normalized, and {0.29, 0.24, 0.35, 0.12} can be obtained from {0.224, 0.184, 0.272, 0.0885} for use in the next round of calculation of the iterative process.

According to the embodiment, the similarity based on the label set and the similarity based on the vision are combined, so that the low-level features and the high-level semantics of the image can be considered at the same time, and the accuracy of image annotation is improved; and moreover, automatic labeling of the label is realized, and the labeling efficiency is improved.

An embodiment of the present invention further provides an image annotation apparatus, and fig. 6 is a schematic diagram of a structure of the image annotation apparatus in the embodiment of the present invention. As shown in fig. 6, the image labeling apparatus includes: an initializer 601, a relation calculator 602, a merging calculator 603 and a tag set updater 604; wherein,

the initializer 601 acquires an initial label set including a plurality of labels for the input image, wherein the accuracy of representing the semantics of the input image is determined according to the plurality of labels;

the relation calculator 602 calculates a similarity based on the tag set between the tag set of the input image and the tag set of the comparison image stored in the database;

the merging calculator 603 performs merging calculation on the similarity based on the tag set and the similarity based on the vision to obtain a merged similarity of the input image and the comparison image;

the label set updater 604 updates the label set of the input image based on the merged similarity.

In particular, the initializer 601 may be specifically configured to: randomly assigning an initial tag set; or calculating a vision-based similarity of the input image and a comparison image stored in a database; an initial set of labels for the input image is obtained based on the vision-based similarity.

In one embodiment, the relationship calculator 602 may specifically include: a set calculator and a relationship determiner. Wherein the set calculator calculates an intersection of the tag set of the input image and the tag set of the comparison image, and a ratio of a union of the tag set of the input image and the tag set of the comparison image; a relationship determiner determines a tag-based similarity between the tag set of the input image and the tag set of the comparison image stored in the database based on the calculated ratio.

In another embodiment, the relationship calculator 602 may specifically include: a distance calculator and a relationship determiner. Wherein the distance calculator calculates a semantic distance between the set of labels of the input image and the set of labels of the comparison image; a relationship determiner determines a tag set-based similarity between the tag set of the input image and the tag set of the comparison image stored in the database according to the calculated semantic distance.

In a specific implementation, the tag set updater 604 may specifically include: a count counter and a tag adjuster; wherein, the times counting unit counts the times of the occurrence of the labels in the label set of the comparison image when the merging similarity obtained by the merging calculator 603 is greater than a preset value; and the label adjuster adjusts the labels in the label set of the input image according to the statistical result so as to update the label set of the input image.

FIG. 7 is a schematic diagram of another configuration of the image annotation device in the embodiment of the invention. As shown in fig. 7, the image labeling apparatus includes: an initializer 701, a relation calculator 702, a merging calculator 703 and a tag set updater 704; as mentioned above, no further description is provided herein.

As shown in fig. 7, the image annotation apparatus may further include: a condition determiner 705 and an image selector 706; the condition judger 705 is configured to judge whether a preset condition is reached; when the condition determiner 705 determines that the preset condition is not reached, the image selector 706 selects another comparative image from the database. Also, the relationship calculator 702 is further configured to calculate a similarity based on the tag set between the tag set of the input image and the tag sets of the other comparison images.

In one embodiment, the condition decider 705 may be specifically configured to: judging whether the label set of the input image is the same as or similar to the label set of the previous iteration; if the two are the same or similar, the label set of the input image reaches a stable state and is determined to reach a preset condition.

It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in functional terms in the foregoing description for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

With regard to the embodiments including the above embodiments, the following remarks are also disclosed.

(supplementary note 1) an image annotation method comprising:

updating the label set of the input image based on the merged similarity.

(supplementary note 2) the image annotation method according to supplementary note 1, further comprising:

judging whether a preset condition is reached or not;

if the preset condition is not met, selecting other comparison images from the database; and calculating a similarity based on the tag set between the tag set of the input image and the tag set of the other comparison image.

(supplementary note 3) the step of judging whether the preset condition is met according to the image labeling method described in supplementary note 2 specifically comprises:

judging whether the label set of the input image is the same as or similar to the label set of the previous iteration; and if the two are the same or similar, determining that the preset condition is reached.

(supplementary note 4) according to the image labeling method of supplementary note 1 or 2, calculating a similarity based on a tag set between the tag set of the input image and a tag set of a comparison image stored in a database, specifically comprising:

calculating a ratio of an intersection of the set of labels of the input image and the set of labels of the comparison image and a union of the set of labels of the input image and the set of labels of the comparison image;

determining a similarity based on the tag set between the tag set of the input image and the tag set of the comparison image stored in the database according to the obtained ratio.

(supplementary note 5) according to the image labeling method of supplementary note 1 or 2, calculating a similarity based on a tag set between the tag set of the input image and a tag set of a comparison image stored in a database, specifically comprising:

calculating semantic distances of the tag sets of the input image and the tag sets of the comparison image;

and determining the similarity based on the label sets between the label sets of the input image and the label sets of the comparison images stored in the database according to the obtained semantic distance.

(supplementary note 6) according to the image annotation method of supplementary note 1 or 2, acquiring an initial tag set for an input image specifically includes: randomly assigning the initial set of tags; or

Calculating a vision-based similarity of the input image to a comparison image stored in a database; and obtaining the initial label set according to the vision-based similarity.

(supplementary note 7) updating the label set of the input image based on the merging similarity according to the image labeling method described in supplementary note 1 or 2, specifically including:

if the merging similarity is larger than a preset value, counting the times of the tags appearing in the tag set of the comparison image;

and adjusting the labels in the label set of the input image according to the statistical result so as to update the label set of the input image.

(supplementary note 8) an image annotation apparatus, comprising:

(supplementary note 9) the image annotation apparatus according to supplementary note 8, said image annotation apparatus further comprising:

a condition judger for judging whether a preset condition is reached;

an image selector for selecting another comparative image from the database if the preset condition is not met;

and the relation calculator is further used for calculating the similarity based on the label sets between the label sets of the input image and the other label sets of the comparison images.

(supplementary note 10) the image annotation apparatus according to supplementary note 9, wherein said condition determiner is configured to: judging whether the label set of the input image is the same as or similar to the label set of the previous iteration; and if the two are the same or similar, determining that the preset condition is reached.

(supplementary note 11) the image annotation apparatus according to supplementary note 8 or 9, wherein the relation calculator specifically includes:

a set calculator that calculates an intersection of the tag set of the input image and the tag set of the comparison image and a ratio of a union of the tag set of the input image and the tag set of the comparison image;

and a relation determiner for determining a similarity based on the label set between the label set of the input image and the label set of the comparison image stored in the database according to the obtained ratio.

(supplementary note 12) the image annotation apparatus according to supplementary note 8 or 9, wherein the relationship calculator specifically includes:

a distance calculator that calculates a semantic distance of the set of labels of the input image and the set of labels of the comparison image;

and a relation determiner for determining a similarity based on the tag sets between the tag sets of the input image and the tag sets of the comparison images stored in the database according to the obtained semantic distance.

(supplementary note 13) the image annotation apparatus according to supplementary note 8 or 9, wherein the initializer is specifically configured to: randomly assigning the initial set of tags; or

(supplementary note 14) the image annotation apparatus according to supplementary note 8 or 9, wherein the tag set updater specifically includes:

a frequency counter for counting the frequency of the tag appearing in the tag set of the comparison image if the merging similarity is larger than a preset value;

and the label adjuster is used for adjusting the labels in the label set of the input image according to the statistical result so as to update the label set of the input image.

Claims

1. An image annotation method, comprising:

updating the label set of the input image based on the merged similarity.

2. The image annotation method of claim 1, after updating the set of labels of the input image based on the merged similarity, further comprising:

judging whether a preset condition is reached or not;

3. The image annotation method according to claim 2, wherein the step of determining whether the preset condition is met specifically comprises: judging whether the label set of the input image is the same as or similar to the label set of the previous iteration; and if the two are the same or similar, determining that the preset condition is reached.

4. The image annotation method according to claim 1 or 2, wherein the calculating of the similarity between the label set of the input image and the label set of the comparison image stored in the database based on the label sets comprises:

5. The image annotation method according to claim 1 or 2, wherein the calculating of the similarity between the label set of the input image and the label set of the comparison image stored in the database based on the label sets comprises:

6. The image annotation method according to claim 1 or 2, wherein the obtaining of the initial tag set for the input image specifically comprises: randomly assigning the initial set of tags; or

7. The image annotation method according to claim 1 or 2, wherein updating the label set of the input image based on the merged similarity includes:

8. An image annotation apparatus, comprising:

9. The image annotation device of claim 8, further comprising:

a condition judger for judging whether a preset condition is reached;

10. The image annotation device according to claim 8 or 9, wherein the tag set updater specifically comprises: