CN117788958A

CN117788958A - Image labeling method, device, equipment and storage medium

Info

Publication number: CN117788958A
Application number: CN202410205526.7A
Authority: CN
Inventors: 周士博; 黄俊嘉
Original assignee: Ruichi Laser Shenzhen Co ltd
Current assignee: Ruichi Laser Shenzhen Co ltd
Priority date: 2024-02-26
Filing date: 2024-02-26
Publication date: 2024-03-29

Abstract

The invention relates to the technical field of automatic labeling, and discloses an image labeling method, an image labeling device and a storage medium, wherein the image labeling method comprises the following steps: performing category segmentation on the image to be annotated through a preset image segmentation model to obtain a plurality of target segmentation images; obtaining image feature vectors corresponding to each target segmentation image and text feature vectors corresponding to category texts in a preset category list; determining a vector cosine similarity between the image feature vector and the text feature vector; and obtaining an image labeling result corresponding to the image to be labeled based on the vector cosine similarity. According to the method, the device and the system, the vector cosine similarity between the image feature vector corresponding to each target segmented image and the text feature vector corresponding to the category text in the preset category list is determined, and the image labeling result corresponding to the image to be labeled is obtained based on the vector cosine similarity, so that the technical problems of higher labeling cost, lower labeling efficiency and lower labeling accuracy of the object category in the image in the prior art are solved.

Description

Image labeling method, device, equipment and storage medium

Technical Field

The present invention relates to the field of automatic labeling technologies, and in particular, to an image labeling method, apparatus, device, and storage medium.

Background

The image recognition usually uses a segmentation model to recognize and segment any object in the image, such as distinguishing people, vehicles, animals and the like in the image, and has wide service requirements and application scenes in industry and actual life at present.

At present, before a segmentation model is trained, the outer contour of a demand type needs to be marked in a polygonal mode by a marking program in a dotting mode. At present, a manual labeling mode is often adopted, namely, each sample image is subjectively analyzed by a labeling person, and the category of the sample image is labeled. However, in the current labeling mode, labeling staff needs to be able to distinguish hundreds of similar object types, so that the requirement on the labeling staff is high, meanwhile, the condition of wrong labeling/missing labeling easily occurs, the labeling cost is high, and the labeling efficiency and accuracy are low.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide an image labeling method, an image labeling device, image labeling equipment and a storage medium, and aims to solve the technical problems of higher labeling cost and lower labeling efficiency and accuracy of object types in images in the prior art.

In order to achieve the above object, the present invention provides an image labeling method, which includes:

performing category segmentation on the image to be annotated through a preset image segmentation model to obtain a plurality of target segmentation images;

obtaining image feature vectors corresponding to each target segmentation image and text feature vectors corresponding to category texts in a preset category list;

determining a vector cosine similarity between the image feature vector and the text feature vector;

and obtaining an image labeling result corresponding to the image to be labeled based on the vector cosine similarity.

Optionally, before the step of performing category segmentation on the image to be annotated through the preset image segmentation model to obtain the plurality of target segmented images, the method further includes:

determining a target image category according to user requirements;

performing format conversion on the target image category to obtain a converted image category;

and establishing a preset category list based on the converted image categories.

Optionally, the step of obtaining the image feature vector corresponding to each target segmentation image and the text feature vector corresponding to the category text in the preset category list includes:

extracting features of each target segmentation image to obtain image feature vectors corresponding to each target segmentation image;

and extracting the characteristics of the category texts in the preset category list to obtain text characteristic vectors corresponding to the category texts.

Optionally, the step of extracting features of the category text in the preset category list to obtain a text feature vector corresponding to the category text includes:

performing word segmentation processing on category texts in a preset category list to obtain text units corresponding to the category texts;

embedding the text unit to obtain a text unit vector corresponding to the text unit;

and extracting the characteristics of the text unit vectors to obtain text characteristic vectors corresponding to the category texts.

Optionally, the step of determining a vector cosine similarity between the image feature vector and the text feature vector includes:

carrying out standardization processing on the image feature vector and the text feature vector to obtain a standard image feature vector and a standard text feature vector;

determining a vector dot product of the standard image feature vector and the standard text feature vector;

and determining the vector cosine similarity between the standard image feature vector and the standard text feature vector based on the vector dot product.

Optionally, the step of obtaining the image labeling result corresponding to the image to be labeled based on the vector cosine similarity includes:

judging whether matching categories corresponding to the target segmentation images exist or not according to the vector cosine similarity;

and determining the labeling result corresponding to the image to be labeled according to the judging result.

Optionally, the step of determining the labeling result corresponding to the image to be labeled according to the judging result includes:

if so, acquiring the image outline of each target segmentation image;

and obtaining a labeling result corresponding to the image to be labeled based on the contour point set corresponding to the outer contour of the image and the matching category.

In addition, in order to achieve the above object, the present invention also provides an image labeling device, which includes:

the image segmentation module is used for carrying out category segmentation on the image to be annotated through a preset image segmentation model to obtain a plurality of target segmentation images;

the vector acquisition module is used for acquiring image feature vectors corresponding to each target segmentation image and text feature vectors corresponding to category texts in a preset category list;

the similarity determining module is used for determining vector cosine similarity between the image feature vector and the text feature vector;

and the marking result acquisition module is used for acquiring an image marking result corresponding to the image to be marked based on the vector cosine similarity.

In addition, to achieve the above object, the present invention also proposes an image labeling apparatus, the apparatus comprising: a memory, a processor, and an image annotation program stored on the memory and executable on the processor, the image annotation program configured to implement the steps of the image annotation method as described above.

In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon an image labeling program which, when executed by a processor, implements the steps of the image labeling method as described above.

The invention discloses a method for classifying images to be marked by a preset image segmentation model to obtain a plurality of target segmentation images; obtaining image feature vectors corresponding to each target segmentation image and text feature vectors corresponding to category texts in a preset category list; determining a vector cosine similarity between the image feature vector and the text feature vector; acquiring an image labeling result corresponding to the image to be labeled based on the vector cosine similarity; compared with the prior art that the manual labeling mode is adopted to label the types of the sample images, the cost is high, and the technical problems that the labeling cost of the types of objects in the images is high and the labeling efficiency and accuracy are low in the prior art are solved by determining the vector cosine similarity between the image feature vectors corresponding to each target segmented image and the text feature vectors corresponding to the types of texts in the preset type list and acquiring the image labeling result corresponding to the images to be labeled based on the vector cosine similarity.

Drawings

FIG. 1 is a schematic structural diagram of an image labeling device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of an image labeling method according to the present invention;

FIG. 3 is a flowchart of a second embodiment of the image labeling method of the present invention;

FIG. 4 is a flowchart of a third embodiment of the image labeling method of the present invention;

fig. 5 is a block diagram of a first embodiment of an image labeling apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an image labeling device of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the image labeling apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 is not limiting of the image annotation device and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and an image tagging program may be included in the memory 1005 as one type of storage medium.

In the image labeling apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the image labeling apparatus of the present invention may be disposed in the image labeling apparatus, and the image labeling apparatus invokes an image labeling program stored in the memory 1005 through the processor 1001 and executes the image labeling method provided by the embodiment of the present invention.

An embodiment of the invention provides an image labeling method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the image labeling method of the invention.

In this embodiment, the image labeling method includes the following steps:

step S10: and carrying out category segmentation on the image to be annotated through a preset image segmentation model to obtain a plurality of target segmentation images.

It should be noted that, the execution body of the method of the embodiment may be an image labeling device that labels object types in any image, or other image labeling systems that can implement the same or similar functions and include the image labeling device. The image labeling method according to this embodiment and the image labeling methods according to the embodiments described below will be specifically described with an image labeling system (hereinafter referred to as a system).

It should be noted that the above-mentioned preset image segmentation model may be a model for identifying and segmenting an object in an image, such as a SAM (Segment Anything Model) model, which is not limited in this embodiment. The principle of the SAM model is based, among other things, on enabling a computer vision system to identify and segment any objects in an image, even if these objects have not previously been present in the training data. The core idea is, among other things, to understand and parse the image content using deep learning techniques, in particular convolutional neural networks (Convolutional Neural Networks, CNN). Such models are trained through a large variety of data sets, enabling them to learn and identify a variety of different objects and scenes. The model typically incorporates a multi-task learning technique, meaning that it learns not only the recognition objects, but also other related tasks, such as classification and precise location of the objects, so that the overall understanding of the image by the model can be enhanced, improving segmentation accuracy. In addition, the SAM model may employ zero-sample or few-sample learning techniques, which enable it to identify and segment objects that are rarely or not seen at all during the training process, enabling the model to have a greater generalization ability, which can cope with a variety of novel and unknown image segmentation tasks.

It is to be understood that the image to be marked can be any image which needs to be identified by an object. The embodiment can realize object identification in the image to be marked by marking the object category in the image to be marked.

It should be noted that the target segmented image may be a plurality of images obtained by segmenting all object categories in the image through a preset image segmentation model, where the category segmentation may be a process of segmenting all categories in the image.

In this embodiment, if the types of objects in the image need to be identified, all the object types in the image to be marked can be segmented through the SAM model to obtain a plurality of segmented target segmented images, and then each segmented target segmented image can be stored separately.

Step S20: and obtaining image feature vectors corresponding to the target segmentation images and text feature vectors corresponding to the category texts in a preset category list.

The image feature vector may be a feature vector obtained by extracting features of the target segmented image. In an embodiment, a Pre-trained convolutional neural network (e.g., resNet) can be used as an image encoder by the CLIP (Contrastive Language-lmage Pre-tracking) model. When an input object splits an image, the image encoder processes the image through a series of convolution, pooling, and full-join layers, extracting its features, where the final result output by the image encoder is a high-dimensional feature vector (i.e., the image feature vector described above), which represents the visual content of the input image, and is a point in high-dimensional space.

It should be noted that the above-mentioned preset category list may be a list storing all object categories required by the user. Correspondingly, the text category may be a category in a text form stored in a preset category list.

Specifically, before the step S10, the method further includes: determining a target image category according to user requirements; performing format conversion on the target image category to obtain a converted image category; and establishing a preset category list based on the converted image categories.

It will be appreciated that the target image category may be an object category in an image required by a user, such as: grasslands, dogs, cats, floors, trees, people, etc., as this embodiment is not limited thereto.

In practical application, the system can define object types required by the user according to the user requirement, such as: grasslands, dogs, cats, floors, trees, people and the like, obtaining target image categories, and performing format conversion on the target image categories to convert the target image categories into character string formats, wherein a preset category list can be established based on the image categories converted into the character string formats, i.e. the preset category list can be provided in the form of the character string list, such as: class= [ "grass", "dog", "cat", "ground", "tree", "person" ].

It should be noted that the text feature vector may be a feature vector obtained by extracting features of a text category.

In practical application, after a plurality of target segmentation images are obtained by segmentation from the images to be annotated, target image categories required by a user can be defined according to the user requirements, and format conversion is performed on text categories in the target image categories to obtain a preset category list provided in the form of a character string list. Then, feature extraction can be performed on each target segmented image, image feature vectors corresponding to each target segmented image are obtained, and text feature vectors corresponding to text categories in a preset category list are obtained.

Further, the step S20 includes:

step S201: and extracting the characteristics of each target segmentation image to obtain the image characteristic vector corresponding to each target segmentation image.

In this embodiment, the system may input each target segmented image to the pretrained convolutional neural network, where the pretrained convolutional neural network may process each target segmented image through a series of convolutional layers, pooling layers, and full-connection layers, extract features therein, and output a high-dimensional image feature vector corresponding to each target segmented image.

Step S202: and extracting the characteristics of the category texts in the preset category list to obtain text characteristic vectors corresponding to the category texts.

In this embodiment, the system may perform feature extraction on the category text in the preset category list through an encoder based on a transducer (i.e., based on a deep neural network model of a self-attention mechanism), to obtain a text feature vector corresponding to the category text.

Specifically, the step S202 includes: performing word segmentation processing on category texts in a preset category list to obtain text units corresponding to the category texts; embedding the text unit to obtain a text unit vector corresponding to the text unit; and extracting the characteristics of the text unit vectors to obtain text characteristic vectors corresponding to the category texts.

It should be noted that the text units may be smaller units than the text obtained by dividing the category text, for example: words, subwords, characters, etc., which are not limited in this embodiment. In practical applications, when processing text, word segmentation processing needs to be performed on the text, i.e. continuous text strings are divided into independent and meaningful units, for example: if the text is "i like programming", it may be split into three words, "i like", "program". In some language models, especially models that deal with english, it is also possible to use sub-word segmentation, which means that words can be further broken down into smaller units, and thus the image annotation accuracy can be further improved.

It should be noted that the text unit vector described above may be a vector in which text units are mapped to a high-dimensional space. After the text is divided into small units, a number of text units are obtained, which can then be converted into numerical form, where each word or subword is mapped to a vector in a high-dimensional space, i.e. the text unit vector corresponding to the text unit is obtained. The vectors are learned in the model training process, and semantic and grammatical relations among words can be captured.

In this embodiment, when the text is converted into a vector form, i.e. after obtaining a text unit vector corresponding to the text unit, the text unit vector may be sent to a transducer-based encoder, which may use a self-attention mechanism (self-attention) to understand the relationship between different words in the text, which enables the model to capture long-distance dependencies in the text. After being processed by the transducer encoder, the text is converted into a high-dimensional feature vector, which is a numerical representation of the text, containing rich semantic information of the input text. In addition, the scheme can also be combined with the Zero-Shot characteristic of the CLIP model to realize automatic labeling of the image, wherein the core algorithm principle of the CLIP is to realize feature extraction by establishing a connection between the image and the text. In processing images, a ResNet or ViT series model can be used as a backbone network for feature extraction, and for text feature extraction, a BERT model is currently commonly used. In such a model of CLIP, the text feature vector is compared with the image feature vector output by the image encoder to achieve matching between the text and the image.

Step S30: and determining the vector cosine similarity between the image feature vector and the text feature vector.

It should be noted that, the above-mentioned vector cosine similarity may be a similarity of the image feature vector and the text feature vector in the direction. In this embodiment, the vector cosine similarity may be obtained by calculating the dot product of the image feature vector and the text feature vector and the respective module length (length), so that the similarity between the target segmented image in the image to be annotated and the category text in the preset category list may be determined according to the vector cosine similarity between the vectors.

The value range of the vector pre-similarity is between-1 and 1, and if the image feature vector and the text feature vector are identical, the vector cosine similarity between the image feature vector and the text feature vector is 1; if the image feature vector and the text feature vector are completely opposite, the vector cosine similarity between the image feature vector and the text feature vector is-1; if the vector cosine similarity between the image feature vector and the text feature vector is near 0, it means that there is little correlation between the two vectors.

Step S40: and obtaining an image labeling result corresponding to the image to be labeled based on the vector cosine similarity.

It should be noted that, the image labeling result may be a result of labeling a desired object category in an image.

In this embodiment, the system may determine the similarity between the object segmentation image in the image to be marked and the object type (i.e. the type text in the preset type list) required by the user through the vector cosine similarity, and if the similarity between the object segmentation image and the object type required by the user is high, the outer contour of the object segmentation image may be marked by a marking program in a dotting manner at this time, so as to obtain an image marking result corresponding to the image to be marked.

The embodiment discloses performing category segmentation on an image to be annotated through a preset image segmentation model to obtain a plurality of target segmentation images; obtaining image feature vectors corresponding to each target segmentation image and text feature vectors corresponding to category texts in a preset category list; determining a vector cosine similarity between the image feature vector and the text feature vector; acquiring an image labeling result corresponding to the image to be labeled based on the vector cosine similarity; compared with the prior art that the manual labeling mode is adopted to label the types of the sample images, the cost is high, and the technical problems that the labeling cost of the types of objects in the images is high and the labeling efficiency and accuracy are low in the prior art are solved by determining the vector cosine similarity between the image feature vector corresponding to each target segmented image and the text feature vector corresponding to the type text in the preset type list and acquiring the image labeling result corresponding to the image to be labeled based on the vector cosine similarity.

Referring to fig. 3, fig. 3 is a flowchart of a second embodiment of the image labeling method according to the present invention.

Based on the above first embodiment, in order to accurately determine the similarity between the target segmented image and the category text, and further improve the accuracy of image labeling, in this embodiment, the step S30 includes:

step S301: and carrying out standardization processing on the image feature vector and the text feature vector to obtain a standard image feature vector and a standard text feature vector.

It should be understood that the normalization process may be a process of dividing the vector by the size (modulo) to obtain a unit vector, i.e., a vector having a length of 1. Correspondingly, the standard image feature vector may be a feature vector obtained after the image feature vector is subjected to the normalization processing, that is, a unit vector corresponding to the image feature vector; the standard text feature vector may be a vector obtained by normalizing the text feature vector, that is, a unit vector corresponding to the text feature vector.

In this embodiment, the normalization processing may be performed by dividing the image feature vector and the text feature vector by the euclidean norms of the respective feature vectors, respectively, to obtain the normalized standard image feature vector and standard text feature vector.

Step S302: a vector dot product of the standard image feature vector and the standard text feature vector is determined.

It will be appreciated that the vector dot product described above may be the dot product between a standard image feature vector and a standard text feature vector.

Step S303: and determining the vector cosine similarity between the standard image feature vector and the standard text feature vector based on the vector dot product.

It should be noted that, in this embodiment, the vector cosine similarity may be obtained by calculating the dot product of the standard image feature vector and the standard text feature vector and the respective modulo length (length), where, since both the standard image feature vector and the standard text feature vector are standardized, the result of the dot product of the standard image feature vector and the standard text feature vector is the vector cosine similarity between the standard image feature vector and the standard text feature vector.

In practical application, for the image feature vector a and the text feature vector B, the calculation formula corresponding to the vector cosine similarity is as follows:

wherein C is the vector cosine similarity,for the vector dot product, ++>The euclidean norms (i.e. lengths) of the image feature vector a and the text feature vector B, respectively.

According to the embodiment, the standard image feature vector and the standard text feature vector are obtained through standardized processing of the image feature vector and the text feature vector, the vector dot product of the standard image feature vector and the standard text feature vector is determined, and then the vector cosine similarity between the image feature vector and the text feature vector is determined based on the vector dot product, so that the similarity of the target segmentation image and the category text can be accurately determined, and the accuracy of image labeling is improved.

Referring to fig. 4, fig. 4 is a flowchart of a third embodiment of the image labeling method according to the present invention.

Based on the above embodiments, in order to improve the efficiency and the accuracy of image labeling, in this embodiment, the step S40 includes:

step S401: and judging whether matching categories corresponding to the target segmentation images exist or not according to the vector cosine similarity.

The matching category may be a text category having a high matching degree with the target segmented image.

Step S402: and determining the labeling result corresponding to the image to be labeled according to the judging result.

Specifically, the step S402 includes: if so, acquiring the image outline of each target segmentation image; and obtaining a labeling result corresponding to the image to be labeled based on the contour point set corresponding to the outer contour of the image and the matching category.

It should be appreciated that the image outer contour described above is the outer contour of the target segmented image; accordingly, the set of contour points may be a set of points that constitute the outer contour of the image.

In practical application, if a category matched with the target segmented image exists in the category text in the preset category list according to the vector cosine similarity between the image feature vector and the text feature vector, the matching category matched with the target segmented image can be determined at the moment, the contour point set corresponding to the outer contour of the image of the target segmented image is extracted, and then the contour point set of the target segmented image and the matching category corresponding to the target segmented image are combined, so that the labeling result corresponding to the image to be labeled can be obtained. Thereafter, the labeling result corresponding to the image to be labeled may be stored in a label text file for the user to view, where the format of the label text file may include, but is not limited to, JSON, TXT, and the like.

According to the method and the device, whether the matching category corresponding to each target segmented image exists or not is judged according to the vector cosine similarity between the standard image feature vector and the standard text feature vector, if so, the image outline of each target segmented image is obtained, and the labeling result corresponding to the image to be labeled is obtained based on the outline point set corresponding to the image outline and the matching category, so that the efficiency and the accuracy of image labeling are improved.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores an image labeling program, and the image labeling program realizes the steps of the image labeling method when being executed by a processor.

Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of an image marking apparatus according to the present invention.

As shown in fig. 5, an image labeling apparatus according to an embodiment of the present invention includes:

the image segmentation module 501 is configured to perform category segmentation on an image to be annotated through a preset image segmentation model to obtain a plurality of target segmented images;

the vector obtaining module 502 is configured to obtain an image feature vector corresponding to each target segmentation image and a text feature vector corresponding to a category text in a preset category list;

a similarity determining module 503, configured to determine a vector cosine similarity between the image feature vector and the text feature vector;

the labeling result obtaining module 504 is configured to obtain an image labeling result corresponding to the image to be labeled based on the vector cosine similarity.

Further, the image segmentation module 501 is further configured to determine a target image class according to a user requirement; performing format conversion on the target image category to obtain a converted image category; and establishing a preset category list based on the converted image categories.

Further, the vector obtaining module 502 is further configured to perform feature extraction on each target segmented image to obtain an image feature vector corresponding to each target segmented image; and extracting the characteristics of the category texts in the preset category list to obtain text characteristic vectors corresponding to the category texts.

Further, the vector obtaining module 502 is further configured to perform word segmentation on a category text in a preset category list, so as to obtain a text unit corresponding to the category text; embedding the text unit to obtain a text unit vector corresponding to the text unit; and extracting the characteristics of the text unit vectors to obtain text characteristic vectors corresponding to the category texts.

The image labeling device of the embodiment discloses that category segmentation is carried out on an image to be labeled through a preset image segmentation model, so as to obtain a plurality of target segmentation images; obtaining image feature vectors corresponding to each target segmentation image and text feature vectors corresponding to category texts in a preset category list; determining a vector cosine similarity between the image feature vector and the text feature vector; acquiring an image labeling result corresponding to the image to be labeled based on the vector cosine similarity; compared with the prior art that the manual labeling mode is adopted to label the types of the sample images, the cost is high, and the technical problems that the labeling cost of the types of objects in the images is high and the labeling efficiency and accuracy are low in the prior art are solved by determining the vector cosine similarity between the image feature vector corresponding to each target segmented image and the text feature vector corresponding to the type text in the preset type list and acquiring the image labeling result corresponding to the image to be labeled based on the vector cosine similarity.

Based on the first embodiment of the image labeling device of the present invention, a second embodiment of the image labeling device of the present invention is provided.

In this embodiment, the similarity determining module 503 is further configured to perform normalization processing on the image feature vector and the text feature vector to obtain a standard image feature vector and a standard text feature vector; determining a vector dot product of the standard image feature vector and the standard text feature vector; and determining the vector cosine similarity between the standard image feature vector and the standard text feature vector based on the vector dot product.

Based on the above-described respective device embodiments, a third embodiment of the image labeling device of the present invention is proposed.

In this embodiment, the labeling result obtaining module 504 is further configured to determine whether there is a matching category corresponding to each target segmented image according to the vector cosine similarity; and determining the labeling result corresponding to the image to be labeled according to the judging result.

The labeling result obtaining module 504 is further configured to obtain an image outline of each of the target segmented images if the labeling result exists; and obtaining a labeling result corresponding to the image to be labeled based on the contour point set corresponding to the outer contour of the image and the matching category.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An image labeling method, characterized in that the image labeling method comprises the following steps:

2. The image labeling method according to claim 1, wherein before the step of obtaining a plurality of target segmented images by performing class segmentation on the image to be labeled through a preset image segmentation model, the method further comprises:

determining a target image category according to user requirements;

3. The image labeling method according to claim 1, wherein the step of obtaining the image feature vector corresponding to each target segmented image and the text feature vector corresponding to the category text in the preset category list comprises:

4. The method for labeling images as set forth in claim 3, wherein the step of extracting features of the category text in the preset category list to obtain text feature vectors corresponding to the category text comprises:

5. The image labeling method of claim 1, wherein the step of determining a vector cosine similarity between the image feature vector and the text feature vector comprises:

6. The image labeling method according to claim 1, wherein the step of obtaining the image labeling result corresponding to the image to be labeled based on the vector cosine similarity comprises:

7. The method for labeling images according to claim 6, wherein the step of determining the labeling result corresponding to the image to be labeled according to the judgment result comprises:

if so, acquiring the image outline of each target segmentation image;

8. An image annotation device, the device comprising:

9. An image annotation device, the device comprising: a memory, a processor and an image annotation program stored on the memory and executable on the processor, the image annotation program being configured to implement the steps of the image annotation method according to any of claims 1 to 7.

10. A storage medium having stored thereon an image marking program which, when executed by a processor, implements the steps of the image marking method according to any one of claims 1 to 7.