CN113255722A

CN113255722A - Image annotation method and device

Info

Publication number: CN113255722A
Application number: CN202110396659.3A
Authority: CN
Inventors: 鲍一平
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-08-13

Abstract

The invention provides an image labeling method and device, which comprise the following steps: acquiring a training image, wherein the training image is marked with a coarse-grained type; similarity calculation is carried out on the unmarked training images in the training images and the marked target training images to obtain similarity values; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image; and determining the fine-grained category of the unlabeled training image according to the similarity value. In the invention, only a small number of fine-grained classes of the training images need to be labeled in the early stage, and the fine-grained classes of other unlabeled training images can be quickly and accurately obtained in an unsupervised mode, so that the labeling accuracy and efficiency are effectively improved through lower cost.

Description

Image annotation method and device

Technical Field

The invention belongs to the technical field of image annotation, and particularly relates to an image annotation method and device.

Background

The accurate classification of the images is to use a classification model to perform more detailed subcategory classification on the basis of distinguishing the basic categories of the image contents, such as distinguishing the types of birds, the styles of vehicles, the varieties of dogs and the like in the images, and at present, the image classification method has wide business requirements and application scenes in the industry and in the actual life.

At present, before training a classification model, a sample image needs to be labeled with fine sub-categories, and a manual labeling mode is usually adopted at present, that is, a annotator subjectively analyzes each sample image and labels the sub-categories of the sample images.

However, in the current labeling mode, a labeler needs to be able to distinguish hundreds of similar object types, the requirement on the labeler is high, and meanwhile, the situations of label error/label omission are easy to occur, so that the labeling cost is high, and the labeling efficiency and accuracy are low.

Disclosure of Invention

The invention provides an image labeling method and device, which are used for solving the problems of high labeling cost, low labeling efficiency and low labeling accuracy in the prior art.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides an image annotation method, where the image annotation method includes:

acquiring a training image, wherein the training image is marked with a coarse-grained category;

similarity calculation is carried out on the unlabeled training images in the training images and the target labeled training images to obtain similarity values; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image;

and determining the fine-grained category of the unlabeled training image according to the similarity value.

In a second aspect, an embodiment of the present invention provides an image annotation apparatus, including:

the first determining module is used for acquiring a training image, and the training image is marked with a coarse-grained category;

the similarity module is used for carrying out similarity calculation on the unmarked training images in the training images and the target marked training images to obtain similarity values; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image;

and the marking module is used for determining the fine-grained category of the unmarked training image according to the similarity value.

In a third aspect of the embodiments of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when being executed by a processor, implements the steps of the image annotation method described above.

In a fourth aspect of the embodiments of the present invention, an electronic device is provided, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when the computer program is executed by the processor, the steps of the image annotation method described above are implemented.

In an embodiment of the present invention, the present invention includes: acquiring a training image, wherein the training image is marked with a coarse-grained type; similarity calculation is carried out on the unmarked training images in the training images and the marked target training images to obtain similarity values; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image; and determining the fine-grained category of the unlabeled training image according to the similarity value. In the invention, based on the division of the coarse and fine particle size categories, only a small number of fine particle size categories of the training images need to be labeled in the early stage, and the fine particle size categories of other unlabeled training images can be quickly and accurately obtained in an unsupervised mode, so that the labeling accuracy and efficiency are effectively improved through lower cost.

Drawings

FIG. 1 is a schematic diagram illustrating steps of an image annotation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating specific steps of an image annotation method according to an embodiment of the present invention;

FIG. 3 is a block diagram of an image annotation apparatus according to an embodiment of the present invention;

fig. 4 is a block diagram of an apparatus provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart illustrating steps of an image annotation method according to an embodiment of the present invention, as shown in fig. 1, the method may include:

101, acquiring a training image, wherein the training image is marked with a coarse-grained type.

The classification of the image refers to inputting the image into the deep learning model, so that the deep learning model outputs the accurate category of the image, and in order to enable the deep learning model to have the function, the deep learning model needs to be trained by using training data.

The training data refers to a large number of pre-collected training images and real category labels realized on each training image, the labeled training images are input into a deep learning model, the difference degree is solved by using the result output by the deep learning model and the real category labels of the training images, the parameter optimization of the deep learning model can be realized by using a preset loss function, and the deep learning model is trained through multiple rounds of iteration until the difference value is smaller than a smaller threshold value, so that the deep learning model has the function of image classification. Therefore, accurate and efficient labeling of massive training images in the training process is an important subject of current research.

In the embodiment of the present invention, the classes of the training images may be divided into coarse and fine granularity categories, where the coarse granularity category refers to a category with a larger hierarchy and coverage, such as sports, animals, etc., and the fine granularity category may belong to a corresponding coarse granularity category, which is a category with a smaller hierarchy and coverage, for example, the coarse granularity category "sports" may include fine granularity categories such as "football", "basketball", "running", etc.; the coarse-grained category "animals" may include fine-grained categories such as "cats", "dogs", "ducks", etc. Further, if a fine-grained category is taken as a coarse-grained category, it may also include finer-grained categories, for example, if a fine-grained category "cat" is taken as a coarse-grained category, it may include finer-grained categories such as "bos cat", "blue cat", "egypt cat", and so on. This results in a tree-like structure between the coarse and fine force classes.

According to the classification of the coarse-grained classes, the labeling difficulty of the coarse-grained classes is low, so that all training images can be firstly classified into the corresponding coarse-grained classes, namely, one coarse-grained class can be labeled for each training image in a manual or machine labeling mode.

And 102, carrying out similarity calculation on the unmarked training images in the training images and the target marked training images to obtain a similarity value.

The target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image.

In the embodiment of the present invention, after labeling the coarse-grained categories of each training image, a corresponding template may be further added according to each fine-grained category included in each coarse-grained category, so that the template may be used to accurately reflect the characteristics of the fine-grained category, that is, for each fine-grained category, at least one training image is selected from the training images as the template of the fine-grained category, so that the selected training image and the corresponding fine-grained category may form a labeled training image. The labeled training images may only occupy a small portion of all the training images, such as 5% or 10%, so that the generation process of the labeled training images consumes less computing resources and requires less cost.

In addition, it is basically ensured that each fine-grained category corresponds to one training image, and under the condition that budget and computing resources are sufficient, each fine-grained category can correspond to a plurality of training images.

For example, for the coarse-grained category "animals" included fine-grained categories: the cat, the dog and the duck can be used for correspondingly setting at least one training image with the content of the cat as a template for the fine-grained category cat; correspondingly setting at least one training image with the content of the dog as a template for the fine-grained category 'dog'; correspondingly setting at least one training image with duck content as template for fine-grained category duck

Specifically, after the marked training images corresponding to each fine-grained classification are determined, a large number of unmarked training images exist in the training images, at this time, similarity calculation can be performed on the unmarked training images and all marked training images corresponding to the target under the same coarse-grained classification, and the marked training images of the target are the training images corresponding to the fine-grained classification and serving as the template. The larger the similarity value is, the higher the probability of matching the fine-grained classes corresponding to the unmarked training image and the marked training image of the target is; the smaller the similarity value is, the smaller the probability of matching the fine-grained classes corresponding to the unmarked training image and the marked training image of the target is. The similarity value between the unlabeled training image and the target-labeled training image can be calculated by calculating the similarity value between the image feature of the unlabeled training image and the image feature of the target-labeled training image.

For example, if the coarse-grained category of an unlabeled image is "animal", the unlabeled image may be associated with the fine-grained categories included in the coarse-grained category "animal": and performing similarity calculation on the labeled training images corresponding to the cat, the dog and the duck to obtain similarity values between the unlabeled image and the labeled training images corresponding to the fine-grained categories of the cat, the dog and the duck, and assuming that the unlabeled image is the cat, the similarity between the unlabeled image and the labeled training images corresponding to the fine-grained category of the cat is larger.

And 103, determining the fine-grained category of the unmarked training image according to the similarity value.

In the embodiment of the present invention, the target labeled training image with the largest similarity value to the unlabeled training image may be considered as the image closest to the content of the unlabeled training image, and therefore, preferably, the fine-grained class of the target labeled training image with the largest similarity value to the unlabeled training image, that is, the fine-grained class that is most matched with the unlabeled training image, may be used as the labeled class of the unlabeled training image until all the training images are labeled.

To sum up, an image annotation method provided by the embodiment of the present invention includes: acquiring a training image, wherein the training image is marked with a coarse-grained type; similarity calculation is carried out on the unmarked training images in the training images and the marked target training images to obtain similarity values; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image; and determining the fine-grained category of the unlabeled training image according to the similarity value. In the invention, based on the division of the coarse and fine particle size categories, only a small number of fine particle size categories of the training images need to be labeled in the early stage, and the fine particle size categories of other unlabeled training images can be quickly and accurately obtained in an unsupervised mode, so that the labeling accuracy and efficiency are effectively improved through lower cost.

Fig. 2 is a flowchart illustrating specific steps of an image annotation method according to an embodiment of the present invention, as shown in fig. 2, the method may include:

step 201, obtaining a training image, wherein the training image is marked with a coarse-grained type.

This step may specifically refer to step 101, which is not described herein again.

Step 202, extracting a first image feature of the unmarked training image through a preset target network model.

And 203, extracting second image characteristics of the marked training image through the target network model.

In the embodiment of the invention, the image is high-dimensional data, and the similarity calculation directly performed on the image data causes higher calculation difficulty and higher calculation resource consumption, so that the embodiment of the invention can extract the image characteristics of the image through a target network model and realize the similarity calculation based on the low-dimensional or high-dimensional image characteristic data. The method comprises the steps of extracting first image features of an unlabeled training image and second image features of an labeled training image through a preset target network model.

Specifically, the feature is corresponding features or characteristics of a certain class of objects different from other classes of objects, or a set of the features and characteristics, the feature is data which can be extracted through measurement or processing, the main purpose of feature extraction is dimension reduction, and the main idea is to project an original image sample to a low-dimensional feature space to obtain low-dimensional image sample features which can reflect the essence of the image sample or distinguish the image sample.

For training images, each training image has self characteristics which can be distinguished from other training images, and some training images are natural characteristics which can be intuitively felt, such as brightness, edges, textures, colors and the like; some of the image features are obtained by transformation or processing, such as moments, histograms, principal components, and the like, in the embodiment of the present application, the image features may be expressed by a feature vector expression, for example, f ═ x1, x2 … xn }, and a common image feature extraction method includes: (1) the geometric method is a texture feature analysis method based on the theory of image texture elements. (2) And (3) extracting the characteristics of a model method, wherein the model method is based on a structural model of the image, and parameters of the model are used as texture characteristics, such as a convolutional neural network model. (3) The method mainly comprises the following steps of extracting the characteristics of a signal processing method, and extracting and matching the texture characteristics: gray level co-occurrence matrix, autoregressive texture model, wavelet transform, etc.

And 204, performing similarity calculation on the first image features of the unmarked training images and the second image features of the target marked training images to obtain the similarity value.

In this step, similarity calculation may be performed on a first image feature of the unlabeled training image and a second image feature of the target-labeled training image, and a similarity value between the two image features may be used as a similarity value between the unlabeled training image and the target-labeled training image.

Specifically, since the first image feature and the second image feature are in a vector form, the similarity value of the first image feature and the second image feature can be reflected by the vector distance between the first image feature and the second image feature, and the larger the vector distance between the first image feature and the second image feature, the smaller the similarity value between the first image feature and the second image feature is, the smaller the vector distance between the first image feature and the second image feature is, the larger the similarity value between the first image feature and the second image feature is.

Optionally, the similarity value includes any one of a cosine distance value and a euclidean distance value between the first image feature and the second image feature. In a specific implementation, a cosine distance value, a euclidean distance value, or the like may be employed as the similarity value between the first image feature and the second image feature.

Optionally, the tree belonging relationship between the fine-grained category and the coarse-grained category may be obtained from a target knowledge graph.

In the embodiment of the invention, the classes of the training images can be divided into the coarse granularity class, wherein the coarse granularity class refers to the class with larger hierarchy and coverage range, the fine granularity class can belong to the corresponding coarse granularity class and is the class with smaller hierarchy and coverage range, and a tree structure is formed between the coarse granularity class and the fine granularity class.

Specifically, the Knowledge Graph (Knowledge Graph) is used for reflecting the relation between the Knowledge development process and the structure, the Knowledge Graph covers the association relation between various entities in the nature, and the tree belonging relation between the fine-grained category and the coarse-grained category required by the embodiment of the invention can be obtained by extracting the tree structure relation in the Knowledge Graph. In addition, the tree-like relationship between the fine-grained categories and the coarse-grained categories can be independently established according to the characteristics of the actual application scene.

Step 205, using the fine-grained class of the target labeled training image with the maximum similarity value with the unlabeled training image as the labeled class of the unlabeled training image.

Optionally, step 205 may be followed by:

and step 206, taking the corresponding relation between the training image and the fine-grained category as training data, and training a deep learning model to obtain a target deep learning model.

In the embodiment of the present invention, the classification of the image refers to inputting the image into the target deep learning model, so that the target deep learning model outputs the accurate category of the image, and in order for the target deep learning model to have the function, the deep learning model needs to be trained by using training data.

The training data refers to a large number of pre-collected training images and real category labels realized on each training image, the labeled training images are input into a deep learning model, the difference degree is solved by using the result output by the deep learning model and the real category labels of the training images, the parameter optimization of the deep learning model can be realized by using a preset loss function, and the deep learning model is trained through multiple rounds of iteration until the difference value is smaller than a smaller threshold value, so that a target deep learning model is obtained, and the target deep learning model has the function of image classification.

Fig. 3 is a block diagram of an image annotation apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus may include:

a first determining module 301, configured to obtain a training image, where the training image is labeled with a coarse-grained category.

A similarity module 302, configured to perform similarity calculation on an unlabeled training image in the training images and a target-labeled training image to obtain a similarity value; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image;

optionally, the similarity module 302 includes:

and the similarity submodule is used for carrying out similarity calculation on the first image characteristics of the unmarked training image and the second image characteristics of the target marked training image to obtain the similarity value.

Optionally, the similarity module 302 may further include:

the first extraction submodule is used for extracting first image features of the unmarked training image through a preset target network model;

and the second extraction submodule is used for extracting second image characteristics of the marked training image through the target network model.

Optionally, the similarity value includes any one of a cosine distance value and a euclidean distance value between the first image feature and the second image feature.

And the labeling module 303 is configured to determine a fine-grained category of the unlabeled training image according to the similarity value.

Optionally, the labeling module 303 includes:

and the marking sub-module is used for taking the fine-grained category of the target marked training image with the maximum similarity value with the unmarked training image as the fine-grained category of the unmarked training image.

Optionally, the apparatus may further include:

and the training module is used for training the deep learning model by taking the corresponding relation between the training image and the fine-grained category as training data to obtain a target deep learning model.

Optionally, the apparatus may further include:

and the extraction module is used for acquiring the tree belonging relation between the fine granularity category and the coarse granularity category from the target knowledge graph.

To sum up, an image annotation apparatus provided by an embodiment of the present invention includes: acquiring a training image, wherein the training image is marked with a coarse-grained type; similarity calculation is carried out on the unmarked training images in the training images and the marked target training images to obtain similarity values; the target marked training image is a training image marked with a fine granularity category in the training image, and the coarse granularity category of the unmarked training image is the same as that of the target marked training image; and determining the fine-grained category of the unlabeled training image according to the similarity value. In the invention, based on the division of the coarse and fine particle size categories, only a small number of fine particle size categories of the training images need to be labeled in the early stage, and the fine particle size categories of other unlabeled training images can be quickly and accurately obtained in an unsupervised mode, so that the labeling accuracy and efficiency are effectively improved through lower cost.

In addition, an apparatus is further provided in an embodiment of the present invention, specifically referring to fig. 4, the apparatus 600 includes a processor 610, a memory 620, and a computer program stored in the memory 620 and capable of running on the processor 610, and when the computer program is executed by the processor 610, the computer program implements each process of the image annotation method embodiment in the foregoing embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the image annotation method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The embodiment of the invention also provides a computer program, and the computer program can be stored on a cloud or a local storage medium. When being executed by a computer or a processor, the computer program is used for executing the corresponding steps of the image annotation method of the embodiment of the invention and realizing the corresponding modules in the image annotation device according to the embodiment of the invention.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image annotation method, characterized in that the image annotation method comprises:

2. The method according to claim 1, wherein the calculating the similarity between the unlabeled training image in the training images and the target labeled training image to obtain a similarity value comprises:

and performing similarity calculation on the first image characteristics of the unmarked training image and the second image characteristics of the target marked training image to obtain the similarity value.

3. The method of claim 2, further comprising:

extracting the features of the unmarked training image through a target network model to obtain first image features;

and performing feature extraction on the marked training image through the target network model to obtain the second image feature.

4. The method according to claim 2 or 3, wherein the similarity value comprises any one of a cosine distance value and a Euclidean distance value between the first image feature and the second image feature.

5. The method according to any one of claims 1-4, wherein determining a fine-grained class of the unlabeled training image according to the similarity value comprises:

and taking the fine-grained category of the target marked training image with the maximum similarity value with the unmarked training image as the fine-grained category of the unmarked training image.

6. The method according to any of claims 1-5, wherein after said determining a fine-grained class of said unlabeled training image according to said similarity value, said method further comprises:

and taking the corresponding relation between the training image and the fine-grained category as training data, and training a deep learning model to obtain a target deep learning model.

7. The method according to any one of claims 1 to 6, further comprising:

and acquiring the tree belonging relation between the fine granularity category and the coarse granularity category from a target knowledge graph.

8. An image labeling apparatus, characterized by comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image annotation method according to any one of claims 1 to 7.

10. An electronic device, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the image annotation method according to any one of claims 1 to 7.