CN117671681A

CN117671681A - Picture labeling method, device, terminal and storage medium

Info

Publication number: CN117671681A
Application number: CN202311533897.XA
Authority: CN
Inventors: 黄咏驰; 刘宗泽; 陈高华; 肖嵘
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2023-11-16
Filing date: 2023-11-16
Publication date: 2024-03-08

Abstract

The application is applicable to the field of data annotation, and provides a picture annotation method, a device, a terminal and a storage medium, wherein the method comprises the following steps: acquiring a picture to be marked and a marking task corresponding to the picture to be marked; determining a target model and a feature tag indicated by the labeling task based on the labeling task; extracting the picture content of the picture to be marked according to the target model to obtain a first picture content set; the first picture content set comprises at least one picture content to be matched; matching each picture content to be matched in the first picture content set with a second feature content carrying the feature tag; and if the first characteristic content matched with the second characteristic content exists in the first picture content set, marking the characteristic label of the second characteristic content to the picture to be marked. The scheme can improve the marking efficiency.

Description

Picture labeling method, device, terminal and storage medium

Technical Field

The application belongs to the field of data annotation, and particularly relates to a picture annotation method, a device, a terminal and a storage medium.

Background

The data annotation refers to a process of processing data such as unprocessed pictures, voice, text and the like, so as to convert the data into machine-recognizable information. The link of data annotation provides data support for the development of artificial intelligence. At present, the data are marked manually to obtain marked data. However, the labeling is completely dependent on manual labeling, and the labeling efficiency is extremely low.

Disclosure of Invention

The embodiment of the application provides a picture marking method, a picture marking device, a terminal and a storage medium, which are used for solving the problem of low manual marking efficiency in the prior art.

A first aspect of an embodiment of the present application provides a method for labeling a picture, including:

acquiring a picture to be marked and a marking task corresponding to the picture to be marked;

determining a target model and a feature tag indicated by the labeling task based on the labeling task;

extracting the picture content of the picture to be marked according to the target model to obtain a first picture content set; the first picture content set comprises at least one picture content to be matched;

matching each picture content to be matched in the first picture content set with a second feature content carrying the feature tag;

And if the first characteristic content matched with the second characteristic content exists in the first picture content set, marking the characteristic label of the second characteristic content to the picture to be marked.

A second aspect of the embodiments of the present application provides a picture marking apparatus, including:

the acquisition module is used for acquiring pictures to be marked and marking tasks corresponding to the pictures to be marked;

the determining module is used for determining a target model and a feature tag indicated by the labeling task based on the labeling task;

the extraction module is used for extracting the picture content of the picture to be marked according to the target model to obtain a first picture content set; the first picture content set comprises at least one picture content to be matched;

the matching module is used for matching each picture content to be matched in the first picture content set with the second characteristic content carrying the characteristic tag;

and the labeling module is used for labeling the feature labels of the second feature content to the pictures to be labeled if the first feature content matched with the second feature content exists in the first picture content set.

A third aspect of the embodiments of the present application provides a terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect when executing the computer program.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method according to the first aspect.

A fifth aspect of the present application provides a computer program product for causing a terminal to carry out the steps of the method of the first aspect described above when the computer program product is run on the terminal.

From the above, after the image to be marked and the marking task are obtained, the method and the device can determine the target model and the feature label required by marking based on the marking task. The image content of the image to be marked is extracted according to the target model, the image to be marked can be processed more efficiently, useless image processing steps and useless feature extraction are avoided as much as possible, image processing and feature extraction time is further shortened, and marking efficiency is improved. According to the feature tag, the second feature content used for matching with the picture content of the picture to be marked is determined, matching items which need to be matched are reduced, namely, the extracted features and all feature contents do not need to be matched in the method, only the second feature content associated with the feature tag needs to be matched, and matching is more accurate. In this way, when the first characteristic content matched with the second characteristic content exists in the first picture content set, the characteristic label of the second characteristic content can be directly marked on the picture to be marked. The matching time in the labeling process is reduced, and the labeling efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a picture labeling method provided in an embodiment of the present application;

fig. 2 is a second flowchart of a picture labeling method provided in the embodiment of the present application;

fig. 3 is a block diagram of a picture marking device according to an embodiment of the present application;

fig. 4 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted in context as "when …" or "upon" or "in response to a determination" or "in response to detection. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In particular implementations, the terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be appreciated that in some embodiments, the device is not a portable communication device, but a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).

In the following discussion, a terminal including a display and a touch sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal supports various applications, such as one or more of the following: drawing applications, presentation applications, word processing applications, website creation applications, disk burning applications, spreadsheet applications, gaming applications, telephony applications, video conferencing applications, email applications, instant messaging applications, workout support applications, photo management applications, digital camera applications, digital video camera applications, web browsing applications, digital music player applications, and/or digital video player applications.

Various applications that may be executed on the terminal may use at least one common physical user interface device such as a touch sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal may be adjusted and/or changed between applications and/or within the corresponding applications. In this way, the common physical architecture (e.g., touch-sensitive surface) of the terminal may support various applications with user interfaces that are intuitive and transparent to the user.

It should be understood that the sequence number of each step in this embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

In order to illustrate the technical solutions described in the present application, the following description is made by specific examples.

Referring to fig. 1, fig. 1 is a flowchart of a picture labeling method provided in an embodiment of the present application. As shown in fig. 1, a method for labeling pictures, the method comprises the following steps:

and step 101, obtaining a picture to be marked and a marking task corresponding to the picture to be marked.

Specifically, the picture to be marked is a picture to be marked, the picture to be marked is an animal picture, a plant picture or a character picture, or a picture containing animals, plants and characters at the same time, and the like, and the style of the picture to be marked and the content contained in the picture to be marked are various.

The picture to be marked in the application can refer to one picture, one frame of picture taken from video, and one or more batches of pictures including multiple pictures or multiple frames of pictures.

Specifically, the labeling task is also acquired together with the picture to be labeled. The labeling task indicates a processing specification and a labeling requirement to be followed when labeling the picture to be labeled, namely, how to process the picture to be labeled, and in what case, label the picture to be labeled, and adds a corresponding label for the picture to be labeled.

For example, the labeling task is to label the picture to be labeled containing brown animals with a label of "brown animals"; or the marking task is that the non-school uniform personnel in the picture to be marked containing the non-school uniform personnel come out through a detection frame, and a label of 'non-school uniform according to a school rule' is marked on the right side of the detection frame; or, the labeling task is to label the picture to be labeled containing the word "success" of the Chinese character with a label "success with the Chinese character" in the picture, and the like.

The processing operation required to be executed on the picture to be annotated is different for different annotation tasks, such as extracting text information or shape information, etc., and for example, whether a detection box needs to be added for indication, etc.

Similarly, there are differences in labeling requirements corresponding to different labeling tasks. If the label is directly marked on the picture to be marked, or the label is marked on the appointed position of the picture to be marked at the same time, and the like.

From the above, when the picture to be marked is marked, the marking task is indispensable. Therefore, in order to label the picture to be labeled, when the picture to be labeled is acquired, the labeling task corresponding to the picture to be labeled is required to be acquired together, and the specific requirement of labeling is clear, so that a certain vector is achieved.

Step 102, determining a target model and a feature tag indicated by the labeling task based on the labeling task.

Specifically, a large model is first introduced, which refers to a deep-learning or machine-learning model with a large number of parameters that can be automatically adjusted by a training process to capture complex relationships in the input data. The large model is generally applied in the fields of natural language processing, computer vision, and speech recognition. The picture annotation in this application is a branch of computer vision.

Specifically, the method and the device realize image annotation by using the large model, and achieve the purpose of multi-category annotation by the strong computing power and migration capability of the large model. Rather than being limited to a particular field or labeling style.

The target model is the large model or a model combination constructed by the large model, and is a tool selected for executing the labeling task.

Specifically, the large model includes an End-to-End target detection locator (grouping End-to-End Detection Transformer With Improved Denoising Anchor Boxes, grouping DINO) model, a contrast language-Image Pre-training (CLIP) model, a everything segmentable (Segment anything model, SAM) model, a Self-distillation without labels v2 (Self-Distillation With No Labels v2, DINOv 2) model, and the like, which are various large models applied to the field of computer vision.

Specifically, the above-mentioned grouping DINO model is also called a positioning-based DINO model, and the object in the picture can be framed out by using a detection frame. The CLIP model is used for extracting text information in the picture. The SAM model is used for capturing objects in pictures and segmenting out the captured objects. The DINOv2 model can realize functions of image classification, image segmentation and the like, and extracts global features in images or extracts partial features of limited parts. Only partial functions of a part of large models are shown when the pictures are processed, and other functions which are not shown and large models which are not introduced belong to the practical implementation scheme of the application and are within the protection scope of the application.

Specifically, the determining the target model based on the labeling task includes: analyzing the labeling task from a functional layer to obtain a first function and a second function required by realizing the labeling task; acquiring a first basic model with the first function and a second basic model with the second function; and carrying out model combination on the first basic model and the second basic model to obtain the target model.

Wherein the first base model and the second base model are the large models mentioned above. The parsing the labeling task from the functional layer refers to parsing the labeling task, so as to obtain a function that the target model needs to have when the labeling task is implemented.

Specifically, the application considers the functional requirements corresponding to the labeling task, finds the most suitable model, namely the target model, for the labeling task, and is used for processing the picture to be labeled, including but not limited to preprocessing the picture to be labeled, detecting an object in the picture to be labeled, adding a detection frame to the picture to be labeled, extracting the picture content in the picture to be labeled, and the like, so as to help complete the labeling task.

Specifically, if the functional requirements corresponding to the labeling task are complex, a plurality of large models are required to cooperate to meet the functional requirements of the labeling task, and then the plurality of large models are required to be combined to obtain the target model. Thus, the object model in this application includes at least one of the large models.

Specifically, the labeling task obtained from the function layer is analyzed through natural language processing and other technologies to obtain an analysis result, wherein the analysis result comprises functions required to be provided by the target model, namely the first function and the second function.

And then, selecting from a plurality of large models according to the first function and the second function obtained by analysis, determining a first basic model with the first function and a second basic model with the second function, and combining the first basic model and the second basic model to obtain the target model.

In one example, if the first function is obtained by parsing, a detection frame is used to frame out a specific object in the image to be marked, and the second function is to extract the image content of the corresponding object in each detection frame, then the grouping DINO model with the first function is selected as the first basic model, the DINOv2 model with the second function is selected as the second basic model, and the grouping DINO model and the DINOv2 model are combined to obtain the target model.

In an example, if the parsing further obtains a third function corresponding to the labeling task, where the third function is to extract text information in the picture to be labeled, the CLIP model may be selected as a third basic model. And combining the grouping DINO model, the DINov2 model and the CLIP model at the moment to obtain the target model.

In one example, if there is a large model having both the first function and the second function, the large model is directly used as the target model, and model combination is not required. The method and the device have the advantages that the large model with more comprehensive functions for the labeling task is preferentially selected, the model combination time is shortened, and the labeling process is quickened.

The above describes the process of constructing the object model in real time based on the labeling task. Alternatively, the pre-stored object model already constructed can also be directly invoked.

Specifically, after determining the target model and the feature label indicated by the labeling task based on the labeling task, the method further includes: based on the feature tag, performing tag inquiry in a feature library; and if the feature labels are inquired to be contained in the feature library, acquiring the second feature content associated with the feature labels.

Specifically, after the labeling task is obtained, the labeling task is analyzed through techniques such as natural language processing, and the feature tag corresponding to the labeling task can also be obtained through analysis.

The feature tag is a tag for labeling the picture based on the picture content. The feature labels are stored in a feature library, and feature content associated with the feature labels and corresponding to the feature labels, namely second feature content, is stored.

Optionally, the second feature content is associated with an original picture in which the second feature content is located, and position information of the second feature content in the original picture. The second characteristic content is extracted from the original picture. And storing the original picture and the position information together in the feature library. The target model can be used for rapidly locking the position of the characteristic content, acquiring the characteristic content and performing self-learning.

After the feature labels are obtained, the existing labels in the feature library are queried to see whether the feature labels exist in the feature library. And if so, acquiring the second characteristic content associated with the characteristic label.

If the feature labels do not exist in the feature library, the pictures to be marked are marked manually, after the pictures which can be marked as the feature labels are obtained through marking manually based on the marking task, the picture contents are used as feature contents, and then the feature labels and the corresponding feature contents are updated into the feature library, so that matching reference items are provided for subsequent marking.

Specifically, the pictures to be marked are flexibly processed according to the query result, and the manual force is used, but the manual force is not completely relied on, so that the marking efficiency is improved, and the personnel input cost is reduced.

Step 103, extracting the picture content of the picture to be marked according to the target model to obtain a first picture content set; the first picture content set comprises at least one picture content to be matched.

Specifically, after the target model is determined, extracting the picture content of the picture to be marked by using the target model, and taking the extracted picture content as the picture content to be matched for executing the matching operation.

Specifically, since the styles of the pictures to be marked and the contents contained therein are various, and the requirements defined in the marking task are different, the number of the extracted picture contents and the number of the extracted picture contents are also more or less, that is, the first picture content set contains at least one picture content to be matched. Therefore, the first picture content set is introduced to store the picture content extracted from the picture to be marked, so that the content storage is orderly.

Step 104, matching each picture content to be matched in the first picture content set with a second feature content carrying the feature tag.

Specifically, each extracted picture content to be matched is matched with the second characteristic content associated with the characteristic tag respectively.

Specifically, after matching, step 105 is performed in the case that there is a matching item in the first set of picture contents that matches the second feature content carrying the feature tag, i.e. there is the first feature content in step 105. If not, the following operation is performed.

Specifically, after the matching of each of the to-be-matched picture content in the first picture content set with the second feature content carrying the feature tag, the method further includes: if any picture content to be matched is not matched with the second characteristic content, a labeling request is sent to an operation end; the annotation request comprises the picture to be annotated and the annotation task corresponding to the picture to be annotated, and the annotation request is used for indicating the operation end to annotate the picture to be annotated based on the annotation task; receiving annotation feedback information sent by the operation end based on the annotation request; if the labeling feedback information is detected to contain the characteristic labels and the label indication areas, extracting third characteristic contents of the label indication areas; the feature tag is given based on the tag indication area; establishing an association relationship between the feature tag and the third feature content; and updating the third feature content and the association relation into the feature library.

Specifically, the operation end is a manual operation end. And sending the annotation request to the operation terminal under the condition that any picture content to be matched in the first picture content set is not matched with the second characteristic content. The labeling request is attached with the picture to be labeled and the labeling task corresponding to the picture to be labeled, an operator of the manual operation end labels the picture to be labeled based on the labeling task after receiving the labeling request, and labeling feedback information is sent after the labeling request is completed.

Specifically, in order to avoid a manual labeling error, the operation end may delegate multiple operators to process the labeling request. And when the labeling information given by all operators is consistent, the labeling feedback information is sent.

Specifically, the labeling information is mainly divided into two types: and when the picture content matched with the second characteristic content does not exist in the picture to be marked, the marking information is 'no-mark content'. When the picture content matched with the second characteristic content exists in the picture to be marked, the marking information is the characteristic label and the label indication area indicated by the characteristic label. The label indication area is a whole picture of the picture to be marked or a local area in the picture to be marked, namely, an operator marks the characteristic label to the picture to be marked through the label indication area, so that the characteristic label is given based on the label indication area, and the picture content indicated by the label indication area is the picture content meeting the marking requirement given by the marking task.

Further, the labeling feedback information is also divided into two cases: in one case, the labeling information is "no labeling content". In another case, the labeling information is the feature tag and the tag indication area indicated by the feature tag.

Specifically, if the labeling feedback information is detected to include the feature label and the label indication area, extracting the picture content of the label indication area, namely the third feature content, through the target model.

Further, an association relationship between the feature tag and the third feature content is established, and the third feature content and the association relationship are updated into the feature library.

In an application scene, the picture to be marked is a single face picture, and the marking task is to mark the picture wearing the mask as qualified, namely the feature tag is qualified. The picture to be marked is a single face picture of wearing the white mask.

At this time, the feature labels which are qualified exist in the feature library, and the second feature content associated with the feature labels in the feature library is extracted based on the face picture of the person wearing the blue mask.

Specifically, in the application scenario, the picture content extracted from the single face picture wearing the white mask cannot be matched with the second feature content, but after the processing of the operation end, the feature label is marked to the single face picture wearing the white mask in a qualified mode, then the picture content is required to be extracted from the single face picture wearing the white mask, the association relation between the picture content and the feature label in a qualified mode is established, and the picture content and the association relation are stored in the feature library together, so that the purpose of updating the feature library is achieved.

In some application scenes, due to the interference of factors such as light, correct picture content cannot be extracted from the picture to be marked, so that marking fails, and then after correction by an operator, the picture content and the association relation are required to be updated into the feature library.

In one example, the matching each of the to-be-matched picture content in the first set of picture content with the second feature content carrying the feature tag includes: calculating the content similarity of each picture content to be matched and the second characteristic content; and comparing the content similarity with a preset similarity threshold value to obtain a matching result.

Specifically, the content similarity of each picture content to be matched and the second characteristic content is calculated respectively, and the content similarity is compared with a preset similarity threshold value to obtain a matching result. And through calculation and comparison, if the content similarity is lower than the preset similarity threshold, the matching result between the picture content to be matched corresponding to the content similarity and the second characteristic content is considered to be unmatched. Otherwise, the matching result between the picture content to be matched corresponding to the content similarity and the second characteristic content is considered to be matching.

Specifically, as long as there is one content similarity not lower than the preset similarity threshold, it is considered that there is a matching item matching the second feature content in the first picture content set, that is, there is the first feature content, and step 105 may be performed. If the marking request does not exist, the marking request is sent to the operation end, and the help of the operation end is sought.

Specifically, after the matching of each of the to-be-matched picture content in the first picture content set with the second feature content carrying the feature tag, the method further includes: if any picture content to be matched is not matched with the second characteristic content, a labeling request is sent to an operation end; the annotation request comprises the picture to be annotated and the annotation task corresponding to the picture to be annotated, and the annotation request is used for indicating the operation end to annotate the picture to be annotated based on the annotation task; receiving annotation feedback information sent by the operation end based on the annotation request; if the labeling feedback information is detected to contain the characteristic labels and the label indication areas, extracting third characteristic contents of the label indication areas; the feature tag is given based on the tag indication area; calculating feature similarity between the third feature content and the second feature content; and if the feature similarity is lower than the preset similarity threshold, adjusting the preset similarity threshold.

Specifically, the annotation request is sent to the operation end under the condition that any picture content to be matched in the first picture content set is not matched with the second feature content. And then receiving the annotation feedback information sent by the operation end based on the annotation request. The specific contents of the operation end, the labeling request and the labeling feedback information are described above, and are not described herein.

Specifically, if the labeling feedback information is detected to include the feature label and the label indication area, extracting the third feature content of the label indication area through the target model. And calculating the feature similarity between the extracted third feature content and the second feature content. Comparing the feature similarity with the preset similarity threshold, and adjusting the preset similarity threshold under the condition that the feature similarity is lower than the preset similarity threshold, wherein the feature similarity is mainly used for reducing the preset similarity threshold.

It should be noted that if the probability that the feature similarity is lower than the preset similarity threshold does not reach the adjustment standard, the preset similarity threshold is not adjusted.

Specifically, the labeling flow is continuously optimized through parameter adjustment such as feature library updating and preset similarity threshold value, and labeling efficiency and labeling accuracy are improved.

And 105, if the first characteristic content matched with the second characteristic content exists in the first picture content set, marking the characteristic label of the second characteristic content to the picture to be marked.

The first feature content is a picture content in the first picture content set, wherein the content similarity is not lower than the preset similarity threshold.

Specifically, after the picture content to be matched, i.e. the first feature content, which is matched with the second feature content in the first picture content set is determined, the feature tag is marked to the picture to be marked.

Specifically, labeling the feature tag to the picture to be labeled includes labeling the feature tag to the whole picture to be labeled and/or labeling the feature tag to a local area in the picture to be labeled.

In the embodiment of the application, after the image to be marked and the marking task are obtained, the target model and the feature label required for marking can be determined based on the marking task. The image content of the image to be marked is extracted according to the target model, the image to be marked can be processed more efficiently, useless image processing steps and useless feature extraction are avoided as much as possible, image processing and feature extraction time is further shortened, and marking efficiency is improved. According to the feature tag, the second feature content used for matching with the picture content of the picture to be marked is determined, matching items which need to be matched are reduced, namely, the extracted features and all feature contents do not need to be matched in the method, only the second feature content associated with the feature tag needs to be matched, and matching is more accurate. In this way, when the first characteristic content matched with the second characteristic content exists in the first picture content set, the characteristic label of the second characteristic content can be directly marked on the picture to be marked. The matching time in the labeling process is reduced, and the labeling efficiency is improved.

Referring to fig. 2, fig. 2 is a flowchart two of a picture labeling method provided in an embodiment of the present application. As shown in fig. 2, a method for labeling pictures, the method comprises the following steps:

step 201, obtaining a picture to be marked and a marking task corresponding to the picture to be marked.

The implementation process of this step is the same as that of step 101 in the foregoing embodiment, and will not be described here again.

Step 202, determining a target model and a feature tag indicated by the labeling task based on the labeling task.

The implementation process of this step is the same as that of step 102 in the foregoing embodiment, and will not be described here again.

Step 203, extracting the picture content of the picture to be marked according to the target model to obtain a first picture content set; the first picture content set comprises at least one picture content to be matched.

The implementation process of this step is the same as that of step 103 in the foregoing embodiment, and will not be described here again.

Step 204, matching each of the to-be-matched picture content in the first picture content set with a second feature content carrying the feature tag.

The implementation process of this step is the same as that of step 104 in the foregoing embodiment, and will not be described here again.

And 205, if the first feature content matched with the second feature content exists in the first picture content set, marking the feature tag of the second feature content to the picture to be marked.

The implementation procedure of this step is the same as that of step 105 in the foregoing embodiment, and will not be described here again.

And 206, marking quality inspection is performed on the marked pictures.

Specifically, after the feature tag is marked for the picture to be marked, a marked picture is obtained.

Specifically, after the pictures to be marked are processed by the marking process, the pictures are mainly divided into two types: the first type is pictures successfully marked with the characteristic labels, and the second type is pictures not marked with the characteristic labels. The former is referred to as the noted picture. That is, the marked picture is a picture which is marked and has the feature tag.

And after the labeling is finished, extracting a target number of labeled pictures from the labeled pictures according to the quality inspection extraction proportion, and taking the target number of labeled pictures as pictures to be inspected. And then labeling quality inspection is carried out on the picture to be inspected.

Specifically, the labeling quality inspection is to send a quality inspection request to the operation end or a quality inspection end special for quality inspection, and receive quality inspection information fed back by the operation end or the quality inspection end.

The quality inspection request comprises the picture to be inspected, the labeling task corresponding to the picture to be inspected, the characteristic label corresponding to the picture to be inspected and the picture content associated with the characteristic label.

Further, the operation end or the quality inspection end performs quality inspection on the picture to be marked according to the quality inspection request.

And if the feature labels corresponding to the pictures to be inspected and the picture content associated with the feature labels are found to be correct, the fed-back quality inspection information is that the quality inspection is passed.

If any item of the feature tag of the picture to be inspected and the picture content associated with the feature tag is found to have errors, the fed-back quality inspection information is quality inspection abnormality, and abnormal content is attached. And the abnormal content is the characteristic label and/or the picture content associated with the characteristic label.

Specifically, the labeling quality inspection is realized based on the updated feature library.

In the embodiment of the application, after labeling is completed, labeling quality inspection is performed on the labeled pictures labeled with the feature labels, and then the labeling process of the application can be optimized according to quality inspection results after quality inspection, so that the labeling accuracy is further improved.

Referring to fig. 3, fig. 3 is a block diagram of a picture marking apparatus provided in an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

The picture marking apparatus 300 includes: the system comprises an acquisition module 301, a determination module 302, an extraction module 303, a matching module 304 and a labeling module 305.

The obtaining module 301 is configured to obtain a picture to be annotated and an annotation task corresponding to the picture to be annotated.

The determining module 302 is configured to determine, based on the labeling task, a target model and a feature tag indicated by the labeling task.

The extracting module 303 is configured to extract, according to the target model, the picture content of the picture to be annotated, to obtain a first picture content set; the first picture content set comprises at least one picture content to be matched.

And a matching module 304, configured to match each of the to-be-matched picture content in the first picture content set with a second feature content carrying the feature tag.

And the labeling module 305 is configured to label the feature tag of the second feature content to the picture to be labeled if there is a first feature content matching the second feature content in the first picture content set.

Specifically, the determining module 302 is further configured to:

analyzing the labeling task from a functional layer to obtain a first function and a second function required by realizing the labeling task;

acquiring a first basic model with the first function and a second basic model with the second function;

and carrying out model combination on the first basic model and the second basic model to obtain the target model.

Specifically, the device further comprises a query acquisition module, configured to:

based on the feature tag, performing tag inquiry in a feature library;

and if the feature labels are inquired to be contained in the feature library, acquiring the second feature content associated with the feature labels.

Specifically, the device further comprises an auxiliary labeling module for:

if any picture content to be matched is not matched with the second characteristic content, a labeling request is sent to an operation end; the annotation request comprises the picture to be annotated and the annotation task corresponding to the picture to be annotated, and the annotation request is used for indicating the operation end to annotate the picture to be annotated based on the annotation task;

receiving annotation feedback information sent by the operation end based on the annotation request;

If the labeling feedback information is detected to contain the characteristic labels and the label indication areas, extracting third characteristic contents of the label indication areas; the feature tag is given based on the tag indication area;

establishing an association relationship between the feature tag and the third feature content;

and updating the third feature content and the association relation into the feature library.

Specifically, the matching module 304 is further configured to:

calculating the content similarity of each picture content to be matched and the second characteristic content;

and comparing the content similarity with a preset similarity threshold value to obtain a matching result.

Specifically, the device further comprises an adjustment module for:

calculating feature similarity between the third feature content and the second feature content;

and if the feature similarity is lower than the preset similarity threshold, adjusting the preset similarity threshold.

Specifically, the device also comprises a quality inspection module for:

and performing labeling quality inspection on the labeled pictures.

The image marking device provided by the embodiment of the application can realize each process of the embodiment of the image marking method, and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.

Fig. 4 is a block diagram of a terminal according to an embodiment of the present application. As shown in the figure, the terminal 4 of this embodiment includes: at least one processor 40 (only one is shown in fig. 4), a memory 41 and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the processor 40 implementing the steps in any of the various method embodiments described above when executing the computer program 42.

The terminal 4 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal 4 may include, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the terminal 4 and is not limiting of the terminal 4, and may include more or fewer components than shown, or may combine some components, or different components, e.g., the terminal may further include input and output devices, network access devices, buses, etc.

The processor 40 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the terminal 4, such as a hard disk or a memory of the terminal 4. The memory 41 may also be an external storage device of the terminal 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal 4. The memory 41 is used for storing the computer program as well as other programs and data required by the terminal. The memory 41 may also be used for temporarily storing data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other manners. For example, the apparatus/terminal embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The present application may implement all or part of the procedures in the methods of the above embodiments, and may also be implemented by a computer program product, which when run on a terminal causes the terminal to implement steps in the embodiments of the methods described above.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for labeling pictures, comprising:

2. The method of claim 1, wherein the determining a target model based on the labeling task comprises:

3. The method of claim 1, wherein after determining the target model and the feature label indicated by the labeling task based on the labeling task, further comprising:

based on the feature tag, performing tag inquiry in a feature library;

4. A method according to claim 3, wherein after said matching each of said to-be-matched picture content in said first set of picture content with a second feature content carrying said feature tag, further comprising:

5. The method of claim 1, wherein said matching each of the to-be-matched picture content in the first set of picture content with a second feature content carrying the feature tag comprises:

6. The method of claim 5, wherein after said matching each of said to-be-matched picture content in said first set of picture content with a second feature content carrying said feature tag, further comprising:

7. The method according to claim 1, wherein after labeling the feature tag for the picture to be labeled, obtaining a labeled picture, and if there is a first feature content in the first picture content set, which matches the second feature content, labeling the feature tag of the second feature content to the picture to be labeled, further includes:

and marking quality inspection is carried out on the marked pictures.

8. A picture marking apparatus, comprising:

9. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.