CN113095444B

CN113095444B - Image labeling method, device and storage medium

Info

Publication number: CN113095444B
Application number: CN202110629239.5A
Authority: CN
Inventors: 李晶; 郑哲; 刘瑞; 崔文朋; 聂玉虎; 池颖英; 徐鲲鹏; 李腾浩; 杨玎; 黄桂林; 胡戈飚; 习雨同; 曹波
Original assignee: State Grid Jiangxi Electric Power Co ltd; State Grid Jiangxi Electric Power Co ltd Construction Branch; State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; Beijing Smartchip Microelectronics Technology Co Ltd
Current assignee: State Grid Jiangxi Electric Power Co ltd; State Grid Jiangxi Electric Power Co ltd Construction Branch; State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; Beijing Smartchip Microelectronics Technology Co Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-09-17
Anticipated expiration: 2041-06-07
Also published as: CN113095444A

Abstract

The embodiment of the invention provides an image labeling method, an image labeling device and a storage medium, and belongs to the technical field of image processing. The method comprises the following steps: acquiring the marking operation of a user on an original image sample; training a target detection model by using the labeled image sample, and detecting the original image sample again by using the trained target detection model to obtain a model detection result; and comparing the model detection result with the labeled image sample, and displaying a comparison result. The embodiment of the invention is suitable for the image recognition process.

Description

Image labeling method, device and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to an image annotation method, an image annotation device and a storage medium.

Background

Image processing algorithms based on deep learning often need to finely identify the class and position of an object, which requires a large amount of manually labeled data for training. However, training data is typically tens of thousands, or even millions. And the workload is very large due to the pure manual marking.

Disclosure of Invention

The embodiment of the invention aims to provide an image labeling method, an image labeling device and a storage medium, which solve the problems of large workload and low efficiency of pure manual labeling, utilize a machine part to replace manual quality inspection work, and can improve the accuracy of sample labeling and save the quality inspection cost due to the generalization performance of a machine algorithm.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, the present invention provides an image annotation method, including:

acquiring the marking operation of a user on an original image sample;

training a target detection model by using the labeled image sample, and detecting the original image sample again by using the trained target detection model to obtain a model detection result;

and comparing the model detection result with the labeled image sample, and displaying a comparison result.

In a first possible implementation manner of the first aspect, the target detection model is obtained by a fusion network through a set number of parallel detectors, where the set number is obtained by adding one to the number of target categories in the original image sample, and the number of layers of the fusion network is the number of target categories in the original image sample.

In a second possible implementation manner of the first aspect, the training of the target detection model by using the labeled image samples includes:

respectively training detectors in the target detection model by using the labeled image samples;

when the values of the loss functions of all the detectors in the target detection model reach a function threshold, determining that all the detectors are successfully trained;

and taking the output of all the detectors when the training is successful as the input of the fusion network in the target detection model to obtain the trained target detection model.

In a third possible implementation manner of the first aspect, the comparing the model detection result with the labeled image sample, and displaying the comparison result includes:

calculating the intersection ratio of the model detection result and an annotated target in the annotated image sample;

determining the marked image samples which need to be corrected and do not need to be corrected according to the comparison result of the intersection ratio and a set value;

and distinguishing and displaying the marked image samples needing to be corrected and the marked image samples not needing to be corrected by using different identifiers.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner, the determining, according to the comparison result between the intersection ratio and the set value, the annotated image sample that needs to be corrected and the annotated image sample that does not need to be corrected includes:

calculating the confidence degree of the model detection result with the intersection ratio lower than a set value, and obtaining a first type of the labeled image sample needing to be corrected according to the comparison result of the confidence degree and a first threshold value;

and calculating the fitting degree of the model detection result with the intersection ratio equal to or larger than the set value and the labeled image sample, and obtaining a second type of labeled image sample needing to be corrected and the labeled image sample needing not to be corrected according to the comparison result of the fitting degree and a second threshold value.

With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, the obtaining, according to the comparison result between the confidence degree and the first threshold, the first type of the annotated image sample that needs to be modified includes:

when the confidence coefficient is larger than or equal to the first threshold value, determining that a missed detection target exists in the corresponding labeled image sample;

when the confidence coefficient is smaller than the first threshold value, determining that a false detection target exists in the corresponding labeled image sample;

and determining the missed detection target and the false detection target as a first type of the labeled image sample needing to be corrected.

With reference to the fourth possible implementation manner of the first aspect, in a sixth possible implementation manner, the obtaining, according to the comparison result between the degree of fitting and the second threshold, the second type of annotated image sample that needs to be modified and the second type of annotated image sample that does not need to be modified includes:

when the fitting degree is smaller than the second threshold value, determining the corresponding labeled image sample as a second type of labeled image sample needing to be corrected;

and when the fitting degree is greater than or equal to the second threshold value, determining the corresponding labeled image sample as the labeled image sample which does not need to be corrected.

With reference to the third possible implementation manner of the first aspect, in a seventh possible implementation manner, after the labeled image samples that need to be corrected and do not need to be corrected are displayed in a differentiated manner by using different identifiers, the method further includes:

and receiving the correction operation of the user aiming at the marked image sample needing to be corrected.

In an eighth possible implementation manner of the first aspect, the method further includes:

comparing whether the model detection result is consistent with the target category in the image sample after the labeling;

and when the model detection result is inconsistent with the target type in the labeled image sample, labeling the target type of the model detection result in the labeled image sample.

In a second aspect, the present invention provides an image annotation apparatus, comprising:

the annotation module is used for acquiring annotation operation of a user on an original image sample;

the model processing module is used for training a target detection model by using the labeled image sample, and detecting the original image sample again by using the trained target detection model to obtain a model detection result;

and the comparison module is used for comparing the model detection result with the labeled image sample and displaying a comparison result.

In a first possible implementation manner of the second aspect, the target detection model is obtained by a fusion network through a set number of parallel detectors, where the set number is obtained by adding one to the number of target categories in the original image sample, and the number of layers of the fusion network is the number of target categories in the original image sample.

In a second possible implementation manner of the second aspect, the model processing module is further configured to:

In a third possible implementation manner of the second aspect, the comparing module further includes:

the calculation submodule is used for calculating the intersection ratio of the model detection result and an annotated target in the annotated image sample;

the comparison processing submodule is used for determining the image samples which need to be corrected and do not need to be corrected after the mark according to the comparison result of the intersection ratio and a set value;

and the display submodule is used for distinguishing and displaying the marked image samples which need to be corrected and do not need to be corrected by utilizing different identifiers.

With reference to the third possible implementation manner of the second aspect, in a fourth possible implementation manner, the comparison processing sub-module is specifically configured to:

With reference to the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner, the obtaining, according to the comparison result between the confidence degree and the first threshold, the first type of the annotated image sample that needs to be modified includes:

With reference to the fourth possible implementation manner of the second aspect, in a sixth possible implementation manner, the obtaining, according to the comparison result between the degree of fitting and a second threshold, a second type of the annotated image sample that needs to be modified and the annotated image sample that does not need to be modified includes:

With reference to the third possible implementation manner of the second aspect, in a seventh possible implementation manner, the apparatus further includes:

and the receiving and modifying module is used for receiving the correcting operation of the user aiming at the marked image sample needing to be corrected.

In a seventh possible implementation manner of the second aspect, the apparatus further includes:

the category comparison module is used for comparing whether the model detection result is consistent with the target category in the image sample after the marking;

and the class correction module is used for labeling the target class of the model detection result on the labeled image sample when the model detection result is inconsistent with the target class in the labeled image sample.

In a third aspect, the present invention provides a machine-readable storage medium having stored thereon instructions for causing a machine to perform the image annotation method as described above.

The image labeling method, the device and the storage medium provided by the embodiment of the invention can be used for training a target detection model by using the labeled image sample after the labeling operation of the user on the original image sample is acquired, and the original image sample is redetected by utilizing the trained target detection model to obtain a model detection result, that is, the embodiment of the invention replaces manual fine adjustment marking by machine learning, saves the labor cost of manually fine adjustment marking the sample, improves the accuracy of sample marking, saves the quality inspection cost, and then compares the model detection result with the marked image sample, and the comparison result is displayed, so that the user can conveniently check the difference between the model detection result obtained by machine learning and the manual labeling operation, and the user can conveniently modify the defect of manual labeling based on the model detection result. Compared with the prior art that the size of the trained data set needs to be limited when the model is trained, if the trained data set is smaller, the cost generated by training cannot offset the saved labor cost, and the embodiment of the invention trains the target detection model by using all the image samples after the labeling, and is more suitable for small-scale data set labeling work.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

fig. 1 is a schematic flowchart of an image annotation method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another image annotation method provided in the embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an image annotation apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another image annotation device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of another image annotation apparatus according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

Fig. 1 is a schematic flowchart of an image annotation method according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

step 101, obtaining the labeling operation performed by the user for the original image sample.

The embodiment of the invention is different from the related technology in that the embodiment of the invention quickly carries out manual rough labeling operation on targets in the original image samples by labeling personnel on all the original image samples, has low requirements on labeling, only needs to label the targets to a target main body, and has low labor cost and high efficiency.

And 102, training a target detection model by using the labeled image sample, and detecting the original image sample again by using the trained target detection model to obtain a model detection result.

The target detection model in the embodiment of the present invention may adopt a boosting detection model, and is a multi-detection model fusion algorithm implemented based on a boosting principle, the model is obtained by connecting a set number of detectors in parallel through a fusion network, the set number is obtained by adding one to the number of target categories in the original image sample, for example, when the target categories to be detected are a safety helmet and a cigarette, that is, the number of target categories is 2, the set number of detectors in parallel is 3. Due to the parallel structure, mutual influence is avoided, each detector adopts independent training, the training times are controlled by utilizing a loss function, and model initial parameters of each detector are consistent. For example, the detectors in the target detection model are respectively trained by using the labeled image samples, and when the values of the loss functions of all the detectors in the target detection model reach a function threshold, it is determined that all the detectors are successfully trained. And then, connecting the successfully trained detectors in parallel, and taking the output of all the successfully trained detectors as the input of a fusion network in the target detection model to obtain the trained target detection model. The number of layers of the fusion network is the number of target categories in the original image sample, for example, when the number of target categories is 2, the number of layers of the fusion network is also 2, and the fusion network may adopt two full-connection-layer networks.

For example, when the target category to be detected is a safety helmet and a cigarette, the detector may adopt, but is not limited to, yolov4-tiny and other models, and train three detectors by using the labeled image samples at the same time until the values of the loss functions of the three detectors reach a function threshold, determine that the three detectors are successfully trained at this time, and combine three one-dimensional arrays output when all detectors are successfully trained into a two-dimensional array as training data of a fusion network, thereby obtaining a trained target detection model.

In the embodiment of the present invention, the target detection model is trained by using the image sample labeled in step 101. After the target detection model is trained, the target characteristics in the labeled image sample can be extracted, so that the trained target detection model is finally obtained, and the trained target detection model is used for detecting the original image sample again to obtain a model detection result. Due to the generalization performance of the target detection model, the marked image sample can be corrected, for example, a target which is artificially missed is marked.

And 103, comparing the model detection result with the labeled image sample, and displaying a comparison result.

The method comprises the steps of obtaining an image sample after labeling, and obtaining a model detection result by re-detecting the image sample after labeling aiming at the target rough labeling operation of an original image sample manually performed by a user or aiming at the target detection model, wherein the model detection result has the calibration of a target type.

In the embodiment of the present invention, it may be firstly compared whether the model detection result is consistent with the target class in the labeled image sample. When the model detection result is inconsistent with the target type in the image sample after the labeling, the target type of the model detection result can be labeled on the image sample after the labeling by taking the model detection result as the standard. And when the model detection result is consistent with the target class in the image sample after the labeling, the class correction is not needed.

Optionally, after the labeled image samples that need to be corrected and do not need to be corrected are obtained in the following, the target class in the model detection result may be directly and correspondingly labeled on the labeled image samples that need to be corrected and do not need to be corrected.

In addition, the intersection ratio of the model detection result and the labeled target in the labeled image sample is calculated. And in the model detection result and the labeled image sample, labeling the target by generally adopting a rectangular frame. Therefore, when the intersection ratio of the two is calculated, that is, when the intersection ratio is large, the overlapping area of the rectangular frames representing the labeling targets of the two is large, and when the intersection ratio is small, the overlapping area of the rectangular frames representing the labeling targets of the two is small. In order to facilitate determination of the result of the cross-over ratio between the model detection result and the labeling target in the labeled image sample, a set value for comparing the cross-over ratio is defined in advance, and is, for example, 0.5.

And then, determining the marked image samples needing to be corrected and not needing to be corrected according to the comparison result of the intersection ratio and a set value. Specifically, the model detection result and the labeled image sample with the intersection ratio lower than the set value are determined as samples which are not successfully matched, and the model detection result and the labeled image sample with the intersection ratio equal to or greater than the set value are determined as samples which are successfully matched. And when the confidence coefficient is greater than or equal to a first threshold value, the probability that the target exists in the model detection result is higher, and the corresponding annotated image sample is determined to have the undetected target. And when the confidence coefficient is smaller than the first threshold value, the probability that the target exists in the model detection result is smaller, and the corresponding image sample after the labeling is determined to have the false detection target. And then, determining the missed detection target and the false detection target as a first type of the labeled image sample needing to be corrected. The first threshold is set in a range of 0-1, and may be determined according to specific needs, for example, may be set to 0.7.

In addition, for the samples successfully matched, calculating the degree of fit between the corresponding model detection result and the labeled image sample, where the degree of fit is a measure of the similarity between the model detection result and the labeled image sample to the same target, for example, the degree of fit may be the reciprocal of the distance between the center points of the rectangular frames of the two labeled targets multiplied by the average distance between the four sides of the two frames, that is, the distance between the center points of the two rectangular frames is calculated first as a first distance, then the average value of the distances between the corresponding sides of the two rectangular frames is calculated, and the reciprocal of the product of the first distance and the average value is used as the degree of fit, and the greater the degree of fit is, the more the fit between the two rectangular frames is indicated. And then, according to the comparison result of the fitting degree and a second threshold value, obtaining a second type of the labeled image sample needing to be corrected and the labeled image sample needing not to be corrected. When the fitting degree is smaller than a second threshold value, the deviation of the rectangular frame of the labeling target in the two is large, and the corresponding labeled image sample is determined as a second type of labeled image sample needing to be corrected. In addition, when the fitting degree is greater than or equal to a second threshold value, the rectangular frames of the labeling targets in the two are relatively close to each other, and the corresponding labeled image sample is determined as the coarsely labeled image sample which does not need to be corrected. Wherein the setting of the second threshold value can be set according to the requirement.

Finally, the marked image samples needing to be corrected and the marked image samples needing no correction can be distinguished and displayed by different marks.

For example, when detecting a target type, no matter in a target rough labeling operation performed manually or in a detection task of a target detection model, a rectangular frame needs to be drawn on a detected target type object and a target type needs to be labeled. During visual display, the image samples can be distinguished from the labeled image samples which need to be corrected and do not need to be corrected by using different colors, for example, green represents the labeled image samples which do not need to be corrected, namely, the labeled image samples are frames with accurate labels after being detected again by the target detection model; blue represents a second type of the labeled image sample needing to be corrected, namely a frame with inaccurate size of a rectangular frame in a model detection result re-detected by the target detection model and a rectangular frame in the labeled image sample, and is displayed according to the model detection result output by the target detection model; the red color represents a frame which is completely wrong in the image sample after the mark and needs to be deleted, namely a false detection target in the image sample after the mark which needs to be corrected in the first class; and yellow represents a frame with missing marks in the marked image sample, namely a missing detection target in the first type of marked image sample needing to be corrected.

After the coarsely labeled image sample needing to be corrected is displayed in a distinguishing mode by using the different colors, the correction operation of a user on the labeled image sample needing to be corrected can be received. The various colors are only used as references for a marking person, the marking person can preferably carefully check the red and yellow frames and then check the blue frame, the green frame can be taken, the careful check is not needed, and the frames are required to be double-clicked to be changed into the green after being adjusted. And finally, the whole image sample is a green frame, and then the next image sample is viewed. For example, the annotating personnel can complete the annotation work of the undetected target in the annotated image sample by clicking the yellow frame, so that the rectangular frame is changed into green. For the red frame, the right key of the marking personnel can delete the false detection target in the marked image sample. For the blue frame, the annotator can directly double click on the blue frame, i.e. the rectangular frame can be turned into green.

The original labeling method comprises the steps of labeling, manual quality inspection, unqualified repair and manual quality inspection again, or labeling a part of data, training an algorithm model by using the part of data, predicting the rest of data by using the trained algorithm model, and manually screening the result after model prediction to finish labeling of all data. Aiming at the existing labeling method, on one hand, the labor cost is high, on the other hand, when the model training is carried out, the size of a data set required by the training is limited, and when the training data set is small, the labor cost saved by the model training cannot be offset by the cost generated by the training. In the embodiment of the invention, the target detection model is trained by using all the marked original image samples, and the trained data set is all data needing to be marked, so the embodiment of the invention is more suitable for marking work of small data sets. On the other hand, after manual labeling, machine learning is used for labeling again, manual quality inspection work is replaced by a machine part, accuracy of data set labeling is improved due to generalization performance of a machine algorithm, and manual quality inspection cost is saved. In addition, the embodiment of the invention adopts visual software and contrasts and analyzes the model detection result and the labeled image sample, so that a user can conveniently check the difference between the model detection result obtained by machine learning and the manual labeling operation, the user can conveniently modify the defect of manual labeling based on the model detection result, and the cost is saved together.

FIG. 2 is a flowchart illustrating an image annotation method according to an exemplary embodiment of the present application. The method may be performed by a computer device, the method comprising:

step 201, obtaining the labeling operation performed by a user for an original image sample;

in the embodiment of the invention, the category of the target to be detected in the original image sample is taken as the safety helmet and the cigarette as an example, and all the original image samples are subjected to manual rough marking operation by marking personnel by utilizing the rectangular frame on the target in the original image sample, so that the marked image sample is obtained. However, because the manual rough labeling only needs to be performed on the target subject, the conditions of label missing, label error and inaccurate labeling may exist. Therefore, the technical means of combining manual operation and machine algorithm is adopted in the embodiment of the invention, so that the manual labeling cost can be saved, and the accuracy of image sample labeling can be improved.

Step 202, training a target detection model by using the labeled image sample, and detecting the original image sample again by using the trained target detection model to obtain a model detection result.

In the embodiment of the invention, yolov4-tiny is adopted as a detector, as the target types to be detected are a safety helmet and a cigarette, and the two types are two types, three detectors are arranged, and the three parallel detectors are trained simultaneously by using the marked image sample. The initial parameters of the models of the three detectors are consistent, and after the detectors are trained independently, when the values of the loss functions of all the detectors reach the function threshold, the successful training of all the detectors is determined. And then, connecting the successfully trained detectors in parallel, and forming a two-dimensional array by three one-dimensional arrays output by all the detectors when the detectors are successfully trained to serve as training data of the fusion network, thereby obtaining a trained target detection model. The number of layers of the fusion network is the same as the number of target categories in the original image sample, namely 2, and two layers of full-connection layer networks can be adopted.

Step 203, calculating the intersection ratio of the model detection result and the labeled target in the labeled image sample;

step 204, judging whether the intersection ratio is lower than a set value, if so, executing step 205, otherwise, executing step 210;

step 205, calculating the confidence of the model detection result with the intersection ratio lower than a set value;

step 206, judging whether the confidence is smaller than a first threshold, if so, executing step 207, otherwise, executing step 208;

step 207, determining that a false detection target exists in the corresponding labeled image sample;

step 208, determining that a missed detection target exists in the corresponding labeled image sample;

step 209, determining the missed detection target and the false detection target as a first type of the labeled image sample needing to be corrected;

step 210, calculating the fit degree of the model detection result with the intersection ratio equal to or greater than the set value and the labeled image sample;

step 211, determining whether the degree of attachment is smaller than a second threshold, if so, executing step 212, otherwise, executing step 213;

step 212, determining the corresponding labeled image sample as a second type of labeled image sample needing to be corrected;

step 213, determining the corresponding labeled image sample as the labeled image sample which does not need to be corrected;

step 214, correspondingly labeling the target type of the model detection result on the labeled image samples needing to be corrected and not needing to be corrected;

step 215, displaying the marked image samples needing to be corrected and not needing to be corrected in a distinguishing mode by using different identifiers;

and step 216, receiving a correction operation performed by a user on the labeled image sample needing to be corrected.

According to the embodiment of the invention, the machine part is used for replacing manual quality inspection work, and the generalization performance of the machine algorithm can improve the accuracy of data set labeling. And the machine algorithm analysis can save high labor cost. In actual operation, pure manual fine modification is the link with the lowest efficiency in labeling, and the labeling efficiency can be greatly improved by using machine algorithm analysis. Compared with a single detector, the method has the advantages that the accuracy is higher, the robustness is stronger, and the training process does not depend on the training result of a single model. In addition, the final visual presentation comprises various analysis results such as missing detection, false detection, inaccuracy and accuracy, so that the annotating personnel can conveniently refer to the analysis results to finish the accurate modification of the target category in the image sample.

Correspondingly, fig. 3 is a schematic structural diagram of an image annotation device according to an embodiment of the present invention. As shown in fig. 3, the apparatus 30 includes: the labeling module 31 is configured to obtain a labeling operation performed by a user on an original image sample; the model processing module 32 is configured to train a target detection model using the labeled image sample, and detect the original image sample again using the trained target detection model to obtain a model detection result; and the comparison module 33 is configured to compare the model detection result with the labeled image sample, and display a comparison result.

Further, the target detection model is obtained by a fusion network through a set number of parallel detectors, the set number is obtained by adding one to the number of target categories in the original image sample, and the number of layers of the fusion network is the number of target categories in the original image sample.

Further, the model processing module is further configured to: respectively training detectors in the target detection model by using the labeled image samples; when the values of the loss functions of all the detectors in the target detection model reach a function threshold, determining that all the detectors are successfully trained; and taking the output of all the detectors when the training is successful as the input of the fusion network in the target detection model to obtain the trained target detection model.

Further, as shown in fig. 4, the comparison module 33 further includes:

a calculating submodule 41, configured to calculate an intersection ratio between the model detection result and an annotated target in the annotated image sample;

the comparison processing submodule 42 is configured to determine the image samples after the annotation that need to be corrected and do not need to be corrected according to the comparison result between the intersection ratio and the set value;

and the display sub-module 43 is configured to display the labeled image samples requiring modification and the labeled image samples not requiring modification separately by using different identifiers.

Further, the comparison processing sub-module is specifically configured to: calculating the confidence degree of the model detection result with the intersection ratio lower than a set value, and obtaining a first type of the labeled image sample needing to be corrected according to the comparison result of the confidence degree and a first threshold value; and calculating the fitting degree of the model detection result with the intersection ratio equal to or larger than the set value and the labeled image sample, and obtaining a second type of labeled image sample needing to be corrected and the labeled image sample needing not to be corrected according to the comparison result of the fitting degree and a second threshold value.

Further, the obtaining a first type of the labeled image sample needing to be corrected according to the comparison result between the confidence degree and the first threshold value includes:

Further, the obtaining a second type of the annotated image sample that needs to be corrected and the annotated image sample that does not need to be corrected according to the comparison result between the degree of attachment and a second threshold includes:

Further, as shown in fig. 5, the apparatus further includes:

a category comparison module 34, configured to compare whether the model detection result is consistent with a target category in the labeled image sample;

a category modification module 35, configured to label, when the model detection result is inconsistent with the target category in the labeled image sample, the target category of the model detection result in the labeled image sample.

Further, as shown in fig. 5, the apparatus further includes:

and a receiving modification module 36, configured to receive a modification operation performed by a user on the annotated image sample that needs to be modified.

For specific implementation processes and beneficial effects of each module in the image annotation apparatus 30, reference may be made to the description of the processing process of the image annotation method.

Accordingly, the embodiment of the present invention also provides a machine-readable storage medium, which stores instructions for causing a machine to execute the image annotation method as described above.

In addition, the image annotation device shown in fig. 2 to 5 can be implemented based on the hardware structure of a computer or a mobile terminal. The hardware structure of the image labeling device comprises a processor and a memory, wherein the labeling module, the model processing module, the comparison module and the like are stored in the memory as program modules, and the processor executes the program modules stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program module from the memory. The kernel can be set to be one or more than one, the manual quality inspection work is replaced by the machine part by adjusting kernel parameters, and due to the generalization performance of a machine algorithm, the accuracy of data set labeling can be improved, and the quality inspection cost is saved. The visualization software and the comparative analysis are complementary parts in the embodiment of the invention, which jointly promote cost saving.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

The embodiment of the invention provides a processor, which is used for running a program, wherein the image annotation method is executed when the program runs.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An image annotation method, comprising:

acquiring the marking operation of a user on an original image sample;

training a target detection model by using the labeled image sample, and detecting the original image sample again by using the trained target detection model to obtain a model detection result, wherein the target detection model is obtained by connecting a set number of detectors in parallel through a fusion network;

comparing the model detection result with the labeled image sample, and displaying the comparison result,

the training of the target detection model by using the labeled image sample comprises the following steps:

and taking the output of all the detectors when the training is successful as the input of a fusion network in the target detection model to obtain the trained target detection model, wherein the fusion network is a full-connection layer network.

2. The image annotation method according to claim 1, wherein the set number is the number of target classes in the original image sample plus one, and the number of layers of the fusion network is the number of target classes in the original image sample.

3. The image annotation method of claim 1, wherein the comparing the model detection result with the annotated image sample and displaying the comparison result comprises:

4. The image annotation method of claim 3, wherein said determining the annotated image sample that needs to be modified and that does not need to be modified according to the comparison result of the intersection ratio and the set value comprises:

5. The image annotation method of claim 4, wherein obtaining the first type of annotated image sample that needs to be modified according to the comparison result between the confidence level and the first threshold comprises:

6. The image annotation method of claim 4, wherein the obtaining of the second type of annotated image sample requiring modification and the annotated image sample not requiring modification according to the comparison result between the degree of attachment and the second threshold comprises:

7. The image annotation method of claim 3, wherein after said differentially displaying said annotated image sample requiring modification and said annotated image sample not requiring modification with different labels, said method further comprises:

8. The image annotation method of claim 1, further comprising:

and labeling the target type of the model detection result on the labeled image sample.

9. An image annotation apparatus, comprising:

the model processing module is used for training a target detection model by using the labeled image samples and detecting the original image samples again by using the trained target detection model to obtain a model detection result, wherein the target detection model is obtained by connecting a set number of parallel detectors through a fusion network;

a comparison module for comparing the model detection result with the labeled image sample and displaying the comparison result,

wherein the model processing module is further configured to:

10. The image annotation device of claim 9, wherein the set number is the number of target classes in the original image sample plus one, and the number of layers of the fusion network is the number of target classes in the original image sample.

11. The image annotation device of claim 9, wherein the comparison module further comprises:

12. The image annotation device of claim 11, wherein the comparison processing sub-module is specifically configured to:

13. The image annotation device of claim 12, wherein obtaining the first type of annotated image sample that needs to be modified according to the comparison result between the confidence level and the first threshold comprises:

14. The image annotation apparatus according to claim 12, wherein the obtaining of the second type of annotated image sample requiring modification and the annotated image sample not requiring modification according to the comparison result between the degree of attachment and the second threshold value comprises:

15. The image annotation apparatus of claim 11, further comprising:

16. The image annotation device of claim 9, further comprising:

17. A machine-readable storage medium having stored thereon instructions for causing a machine to perform the image annotation process of any one of claims 1-8 above.