WO2023029348A1

WO2023029348A1 - Image instance labeling method based on artificial intelligence, and related device

Info

Publication number: WO2023029348A1
Application number: PCT/CN2022/071328
Authority: WO
Inventors: 王俊; 高鹏
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-08-30
Filing date: 2022-01-11
Publication date: 2023-03-09
Also published as: CN113705687A; CN113705687B

Abstract

The present application relates to the technical field of artificial intelligence. Provided are an image instance labeling method based on artificial intelligence, and a related device. An information amount of each instance in a target image is identified by means of calling a preset instance segmentation model, so as to acquire, from the target image, a first information amount that is higher than a preset information amount threshold value, a first instance that corresponds to the first information amount, and a second instance other than the first instance; then, a first label of the first instance is manually labeled, such that the accuracy of a manually labeled instance label is high; since an information amount of the second instance is low, and the second instance is easily identified and labeled by a model, pseudo-labeling is performed on a second label of the second instance on the basis of a semi-supervised learning mode, such that the labeling efficiency is high; and an instance label of the target image is obtained on the basis of the first label and the second label. The present application can be applied to the field of digital medical treatment for labeling an instance in a medical image.

Description

Image instance labeling method and related equipment based on artificial intelligence

This application claims the priority of the Chinese patent application with the application number 202111005698.2 filed on August 30, 2021, and the title of the invention is "artificial intelligence-based image instance labeling method and related equipment", the entire content of which is incorporated by reference in this application.

technical field

The present application relates to the technical field of artificial intelligence, in particular to an artificial intelligence-based image instance labeling method, device, electronic equipment, and storage medium.

Background technique

With the continuous development of deep learning, computer vision has achieved more and more success, thanks to the support of large training data sets. The training data set is a data set with rich label information, and collecting and labeling such a training data set usually requires a huge human cost.

The inventor realized that, compared with image classification technology, image instance segmentation has a higher degree of difficulty, and a large amount of labeled training data is required to truly realize the instance segmentation function. However, the number of available labeled samples is often insufficient relative to the training scale, or the cost of obtaining samples is too high. In many cases, annotators with relevant professional knowledge (such as doctors) are scarce or difficult to spare time, or the annotation cost of annotators is too high, or the image annotation or judgment cycle is too long, these problems may cause the instance segmentation model to fail. Unable to train effectively.

Therefore, how to obtain a large number of samples for image instance segmentation model training has become a research hotspot for those skilled in the art.

Contents of the invention

In view of the above, it is necessary to propose an artificial intelligence-based image instance annotation method, device, electronic equipment, and storage medium, which can reduce the number of instances in manually annotated images and improve the accuracy of instances in images annotated by models.

The first aspect of the present application provides an artificial intelligence-based image instance labeling method, the method comprising:

Call the preset instance segmentation model to identify the amount of information of each instance in the target image;

Obtaining a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, where instances in the target image other than the first instance are second instances ;

manually annotating the first label of the first instance in the target image;

Pseudo-labeling the second label of the second instance in the target image based on a semi-supervised learning manner;

An instance label of the target image is obtained based on the first label and the second label.

The second aspect of the present application provides an artificial intelligence-based image instance labeling device, the device comprising:

An instance recognition module, used to call a preset instance segmentation model to identify the amount of information of each instance in the target image;

An instance acquiring module, configured to acquire a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, except for the first instance in the target image The instance of is the second instance;

a first labeling module, configured to manually label the first label of the first instance in the target image;

A second labeling module, configured to pseudo-label the second label of the second instance in the target image based on a semi-supervised learning method;

A label determining module, configured to obtain an instance label of the target image based on the first label and the second label.

A third aspect of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the artificial intelligence-based image instance labeling method when executing a computer program stored in a memory.

manually annotating the first label of the first instance in the target image;

The fourth aspect of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the artificial intelligence-based image instance labeling method is implemented:

manually annotating the first label of the first instance in the target image;

To sum up, the artificial intelligence-based image instance labeling method, device, electronic equipment, and storage medium described in this application identify the amount of information of each instance in the target image by calling the preset instance segmentation model, so as to extract the information from the target image Obtain the first information amount higher than the preset information amount threshold, the first instance corresponding to the first information amount, and the second instance other than the first instance, and then manually mark the first label of the first instance, and the manually marked instance The accuracy of the label is high. Since the second instance has low information content, it is easy to be recognized and labeled by the model. The second label of the second instance is pseudo-labeled based on the semi-supervised learning method, and the labeling efficiency is high. Based on the first label and all The second label obtains the instance label of the target image. The application can be applied in the field of digital medical treatment, and the examples in medical images are marked. This application only needs to manually label a small number of instances in a target image, instead of labeling all instances in the entire target image, so as to obtain instance labels with high accuracy while reducing the workload of instance labeling.

Description of drawings

FIG. 1 is a flow chart of an artificial intelligence-based image instance labeling method provided in Embodiment 1 of the present application.

FIG. 2 is a schematic diagram of a target image and two corresponding perturbed images provided by an embodiment of the present application.

Fig. 3 is a schematic diagram of a first instance of a target image provided by an embodiment of the present application.

Fig. 4 is a schematic diagram of a second example of the target image provided by the embodiment of the present application.

FIG. 5 is a structural diagram of an artificial intelligence-based image instance tagging device provided in Embodiment 2 of the present application.

FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.

Detailed ways

In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terminology used herein in the description of the application is only for the purpose of describing an example in an optional embodiment, and is not intended to limit the application.

The artificial intelligence-based image instance tagging method provided in the embodiment of the present application is executed by electronic equipment, and accordingly, the artificial intelligence-based image instance tagging apparatus runs in the electronic device.

In this embodiment of the present application, instances in an image may be marked based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Embodiment one

FIG. 1 is a flow chart of an artificial intelligence-based image instance labeling method provided in Embodiment 1 of the present application. The artificial intelligence-based image instance labeling method specifically includes the following steps. According to different requirements, the order of the steps in the flow chart can be changed, and some can be omitted.

S11, call the preset instance segmentation model to identify the amount of information of each instance in the target image.

The preset instance segmentation model may be a pre-trained machine learning model, which is used to identify instances in the image, so as to obtain the amount of information of the instances. Instances refer to target objects in target images, e.g., pedestrians, cars, bicycles, buildings, etc.

The target image refers to the image that needs to be labeled with instances. A target image can contain multiple instances, for example, it can contain dozens of instances. Different types of instances have different recognition difficulties. Therefore, it is necessary to consider whether it is necessary to All instances in the target image are manually annotated. Although all instances in the target image can be manually labeled, the accuracy of manual labeling is higher, but the cost of manual labeling is relatively high, and the efficiency is low. In fact, the number of images completed by manual labeling is limited. of. In this embodiment, by identifying the amount of information of the instances in the target image, it is determined which instances in the target image are to be marked manually and which entities are to be marked by the instance tagging model according to the amount of information.

In some embodiments, the artificial intelligence-based image instance labeling method can be applied to a medical scene. When applied to a medical scene, the target image is a medical image, and the instances in the target image are multiple organs. Medical images refer to internal tissues obtained non-invasively for medical treatment or medical research, such as images of the stomach, abdomen, heart, knees, brain, such as computed tomography (CT), magnetic resonance Imaging (Magnetic Resonance Imaging, MRI), ultrasound (ultrasonic, US), X-ray images, EEG, and optical photography images generated by medical instruments.

In an optional implementation manner, the calling the preset instance segmentation model to identify the amount of information of each instance in the target image includes:

performing a first perturbation on the target image to obtain a first perturbation image, and performing a second perturbation on the target image to obtain a second perturbation image;

Invoking the preset instance segmentation model to identify a first category label, a first instance detection frame, and a first instance contour mask for each instance in the target image;

invoking the preset instance segmentation model to identify a second category label, a second instance detection frame, and a second instance contour mask for each instance in the first perturbed image;

invoking the preset instance segmentation model to identify a third category label, a third instance detection frame, and a third instance contour mask for each instance in the second perturbed image;

calculating a class label score for each instance based on the first class label, the second class label, and the third class label;

calculating a detection frame score for each instance based on the first instance detection frame, the second instance detection frame, and the third instance detection frame;

calculating a contour mask score for each instance based on the first instance contour mask, the second instance contour mask, and the third instance contour mask;

The information amount of the corresponding instance is calculated according to the class label score and the corresponding detection frame score and the contour mask score.

Generally speaking, image enhancement can be achieved through image transformation. The purpose of image enhancement is to increase the amount of image data and enrich the diversity of images, thereby improving the generalization ability of the model. In this optional implementation manner, image disturbance may be introduced in a transformation manner, for example, adding noise (Noise) to the image to introduce image disturbance. Image noise is the signal that is disturbed by random signals during image acquisition or transmission, which hinders people's understanding and analysis of images. The introduction of noise increases the difficulty of model identification.

As shown in Figure 2, the top image in Figure 2 is the target image, the middle image is Gaussian noise added to the target image, and the bottom image is the target image Added salt and pepper noise (pepper noise) on the basis of . The image obtained by adding Gaussian noise on the target image is used as the first perturbed image, and the image obtained by adding salt and pepper noise on the target image is used as the second perturbed image.

Input the image (for example, the target image, the first disturbance image, the second disturbance image) into the instance segmentation model, and the instance segmentation model can output the category label of the instance belonging to a certain category in the image, and use the detection frame to frame the instance in the image The position of , and the instance contour mask of the instance. The instance segmentation model in this embodiment can be trained on the basis of the Faster R-CNN model, and the specific training process will not be described in detail. Investigate the consistency of the model's prediction of the target in the sample. If the prediction result changes little before and after the transformation, it means that the target is easier to predict and has less information. If there is a large difference in the prediction result of the target before and after the transformation, then This local target is the target that the model is more likely to confuse, and it should be actively selected for priority labeling.

After invoking the preset instance segmentation model to identify the class label, instance detection frame and instance contour mask of each instance in the target image, the first perturbed image and the second perturbed image, the class label, instance detection frame and The instance contour mask is calculated to obtain the class label score, the detection frame score and the contour mask score of the instance, so that the information amount of the corresponding instance is calculated according to the class label score and the corresponding detection frame score and the contour mask score, Further, the first instance and the second instance in the target image are determined according to the amount of information of the instances.

In an optional implementation manner, the calculating the category label score of each instance based on the first category label, the second category label and the third category label includes:

Obtaining a first predicted probability corresponding to the first category label, a second predicted probability of the second category label, and a third predicted probability of the third category label;

calculating the probability mean of the first predicted probability and the corresponding second predicted probability and the third predicted probability;

The mean value is used as the category label score of the corresponding instance.

The category label score is used to evaluate whether the prediction of the instance segmentation model on the perturbed first and second perturbed images is consistent with the prediction on the target image.

For an instance, in the target image, the class probability predicted by the instance segmentation model is 0.9, in the first perturbed image, the class probability predicted by the instance segmentation model is 0.9, and in the second perturbed image, the class predicted by the instance segmentation model is A probability of 0.89 indicates that the instance segmentation model has a high prediction consistency for this instance.

For another example, in the target image, the class probability predicted by the instance segmentation model is 0.9, in the first perturbed image, the class probability predicted by the instance segmentation model is 0.4, and in the second perturbed image, the class predicted by the instance segmentation model is A probability of 0.7 indicates that the prediction consistency of the instance segmentation model for this instance is low.

From the dimension of category label score, the smaller the predicted probability of an instance is, the smaller the probability mean is, and the lower the category label score of an instance is, the higher the information content of the instance is. For the model, the more confusing local instances are more Difficult and better examples to learn. For examples with high information content that are difficult to identify, and examples with low prediction probability that are easily confused by the model in the prediction of disturbed images, after manual labeling and adding model training, the model will have better judgment ability for such examples in the future , thereby improving the accuracy and generalization of the model. In an optional implementation manner, the calculating the detection frame score of each instance based on the detection frame of the first instance, the detection frame of the second instance, and the detection frame of the third instance includes:

calculating a first intersection-over-union ratio between the first instance detection frame and the corresponding second instance detection frame;

calculating a second intersection-over-union ratio between the first instance detection frame and the corresponding third instance detection frame;

The corresponding detection frame score of the instance is calculated based on the first intersection and union ratio and the second intersection and union ratio according to a preset first calculation model.

The detection box score is used to evaluate whether the prediction of the instance segmentation model on the perturbed first and second perturbed images is consistent with the prediction on the target image.

The intersection-over-union ratio (IOU) represents the degree of overlap between two instance detection frames. The larger the intersection-over-union ratio (IOU), the more overlapping regions and the greater the degree of overlap between the two instance detection frames. The smaller the intersection and union ratio IOU, the smaller the overlapping area and the smaller the overlapping degree between the two instance detection frames. In this optional embodiment, the greater the IOU, the more similar the prediction of the instance segmentation model is to the target image and the perturbed image corresponding to the IOU, that is, the higher the prediction consistency. The smaller the IOU, the less similar the prediction of the instance segmentation model is to the target image and the perturbed image corresponding to the IOU, that is, the lower the prediction consistency.

The calculation process of the intersection-over-union ratio IOU is a prior art, and this application does not elaborate on it.

The preset second calculation model may be: T2=(1-IOU1)*(1-IOU2), where T2 represents the contour mask score of the instance, IOU1 represents the first intersection-over-union ratio, and IOU2 represents the second intersection-over-union ratio.

For example, assuming instance L1, the first intersection ratio between the target image and the first disturbance image is 0.9, and the second intersection ratio between the target image and the second disturbance image is 0.9, then the detection frame score of instance L1 = (1-0.9 )*(1-0.9)=0.01. It can be seen that instance L1 is an instance with low information content.

As another example, assuming instance L2, the first intersection ratio between the target image and the first disturbance image is 0.4, and the second intersection ratio between the target image and the second disturbance image is 0.3, then the detection frame score of instance L1 = (1- 0.4)*(1-0.3)=0.42. It can be seen that instance L2 is an instance with high information content.

From the dimension of IOU, the larger the IOU of the instance, the lower the detection frame score of the instance, the higher the information content of the instance, and it is more difficult for the model and should be learned. For examples that are difficult to identify with high information content, there are overlapping detection frames with low IOU that are easily confused by the model in the prediction of the perturbed image. Or easily identifiable instances have lower predictive consistency, so highly informative instances are labeled more valuable than easily identifiable instances. After manual labeling of high-information instances and adding them to model training, the model will have a better ability to judge such instances in the future, thereby improving the accuracy and generalization of the model.

In an optional implementation manner, the calculating the contour mask score of each instance based on the contour mask of the first instance, the contour mask of the second instance and the contour mask of the third instance includes:

calculating the first Jaccard distance between the first instance contour mask and the corresponding second instance contour mask;

calculating a second Jaccard distance between the first instance contour mask and the corresponding third instance contour mask;

A contour mask score corresponding to the instance is calculated based on the first Jaccard distance and the second Jaccard distance according to a preset second calculation model.

The instance contour mask is similar to the instance detection box, and is also used to evaluate whether the prediction of the instance segmentation model for the perturbed first and second perturbed images is consistent with the prediction of the target image.

Jaccard distance is used to describe the dissimilarity between two contour masks. The larger the Jaccard distance, the less overlapping area and lower similarity between two contour masks. The smaller the Jaccard distance, the more overlapping areas and higher similarity between two contour masks. In this optional embodiment, the larger the Jaccard distance, the less similar the prediction of the instance segmentation model is to the target image and the disturbance image corresponding to the Jaccard distance, that is, the lower the prediction consistency. The smaller the Jaccard distance, the more similar the prediction of the instance segmentation model is to the target image and the perturbed image corresponding to the Jaccard distance, that is, the higher the prediction consistency.

The calculation process of the Jaccard distance is a prior art and will not be described in detail.

The preset second calculation model may be: T3=D1*D2, where T3 represents the contour mask score of the instance, D1 represents the first Jaccard distance, and D2 represents the second Jaccard distance.

In an optional implementation manner, the calculation of the information amount of the corresponding instance according to the category label score and the corresponding detection frame score and the contour mask score includes:

Calculate the product of the category label score and the corresponding detection frame score and the contour mask score to obtain the final score of the corresponding instance;

The final score is determined as the informativeness of the instance.

After obtaining the category label score, detection frame score, and contour mask score of each instance in the target image, multiply the category label score, detection frame score, and contour mask score to obtain the final score of the corresponding instance, The amount of information as an example.

In other embodiments, the average of the class label score, detection box score, and contour mask of each instance can also be calculated as the final score of the instance. Alternatively, compute the sum of the class label score, bounding box score, and contour mask for each instance as the final score for the instance. This application does not impose any restrictions.

The final score is used to indicate whether the prediction of the instance segmentation model on the first disturbed image and the second disturbed image is consistent with the prediction on the target image. The lower the final score, it indicates that the prediction of the instance segmentation model on the first perturbed image and the second perturbed image is inconsistent with the prediction on the target image. The performance of the first disturbed image and the second disturbed image is more unstable. The higher the final score, it indicates that the prediction of the instance segmentation model for the first perturbed image and the second perturbed image is consistent with the prediction of the target image, which means that even after the target image is perturbed, the target image and the perturbed image obtained by the instance segmentation model are The performance of the first perturbed image and the second perturbed image is still very stable.

S12. Obtain a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, where the instance in the target image other than the first instance is the first instance Two examples.

Wherein, the preset information volume threshold is a preset critical value used to indicate the level of information volume.

The lower the final score, the lower the consistency, and the higher the information content of the corresponding instance; the higher the final score, the higher the consistency, and the lower the information content of the corresponding instance. When the information volume of a certain instance is higher than the preset information volume threshold, the instance is regarded as the first instance, and when the information volume of a certain instance is lower than the preset information volume threshold, the instance is regarded as the second instance.

It should be understood that the first instance refers to a set of multiple instances with an information amount higher than a preset information amount threshold, such as the area enclosed by an ellipse in the target image as shown in FIG. 3 . The second instance refers to a collection of multiple instances whose information volume is lower than the preset information volume threshold, such as the area framed by an irregular figure in the target image as shown in FIG. 4 . The first instance and the second instance completely constitute the set of instances in the image. That is, a certain instance in the target image is either the first instance or the second instance.

S13. Manually label the first label of the first instance in the target image.

Since the first instance is an instance higher than the preset information threshold in the target image, the prediction consistency of the instance segmentation model for the target image and the first disturbance image and the second disturbance image is low, so it should be done manually Annotation of instances, such as multiple pedestrians with obvious occlusion.

The first instance in the target image can be identified, and the first instance can be manually annotated by an expert with rich annotation experience, thereby improving the accuracy rate of the annotation of the first instance.

Especially when the target image is a medical image, the instance segmentation of the medical image needs to identify multiple instance individuals in the image, and accurately outline multiple lesion areas for intelligent auxiliary diagnosis. Therefore, the instance labeling difficulty index of the medical image Higher, by manually labeling these high-information (difficult to label) instances, the accuracy will be greatly improved.

S14. Pseudo-label the second label of the second instance in the target image based on a semi-supervised learning manner.

Since the second instance is an instance in the target image that is lower than the preset information threshold, the prediction consistency of the instance segmentation model for the target image and the first disturbed image and the second disturbed image is relatively high, so the difficulty of labeling is small. Pseudo-annotation of instances by supervised learning can improve the efficiency of labeling target images.

The semi-supervised learning method refers to the instance labeling model obtained through the joint training of the labeled sample set and the unlabeled sample set, and instance labeling of the new unlabeled image through the instance labeling model. The instance label output by the instance labeling model is compared with In terms of instance labels that are artificially labeled, they are called pseudo-labels.

S15. Obtain an instance label of the target image based on the first label and the second label.

The instances in the target image are divided into the first instance and the second instance. The instance label of the first instance is the first label, and the instance label of the second instance is the second label. Therefore, after obtaining the first label and the second label, the target Labels for all instances in the image have been obtained.

In an optional embodiment, the method also includes:

Using the labeled images marked with instance labels and multiple target images as a training set;

training the instance labeling model based on the training set;

Evaluate the accuracy of the instance tagging model based on the test set, and end the training of the instance tagging model when the accuracy meets a preset accuracy threshold.

Wherein, the annotated image marked with the instance label may refer to an image used to train the instance label model.

After using the artificial intelligence-based image instance labeling method described in the embodiment of the present application to perform instance labeling on multiple target images, after obtaining the instance labels, the target images marked with the instance labels can be added to the real instance labels. Annotated images are used as a training set, so that the instance labeling model is updated based on the training set.

The test set includes test images and the real instance labels of each test image, and the test images in the test set are input into the updated instance labeling model, and the test instance labels of the test images are predicted by the updated instance labeling model. When the test instance label is the same as the corresponding real instance label, it indicates that the updated instance annotation model is successfully tested on the test image. When the test instance label is not the same as the corresponding real instance label, it indicates that the updated instance labeling model fails to test the test image. Calculate the ratio of the number of successful tests to the number of test images in the test set, and use the ratio as the test accuracy of the instance tagging model, and end the training of the instance tagging model when the test accuracy meets a preset accuracy threshold.

In this optional embodiment, the number of images with instance labels is greatly increased by using the target images with instance labels, and the instance labeling model can be updated and trained, thereby improving the performance of the instance labeling model.

This application marks a small number of instances in the image instead of all instances. For example, instances with occlusion in some areas in the target image are instances with high information content. Through manual active labeling, the accuracy of the artificially marked instance labels is high, and the target image has high accuracy. The other instances of other instances have low information content, no manual labeling is required, and the cost of manual labeling is saved. Since the instances with low information content are easier to identify, semi-supervised learning and labeling can improve the accuracy of instance labeling. The labeling efficiency of the instance is improved. This application only needs to manually label a small number of instances in a target image, instead of labeling all instances in the entire target image, so as to obtain instance labels with high accuracy while reducing the workload of instance labeling.

This application is suitable for images with complex layouts and mutual occlusion in different areas. Applying this application to the field of intelligent auxiliary recognition of medical images can simultaneously perform region delineation and quantitative evaluation of different target locations and key organ instances, especially for image regions that may be occluded from each other, this application can perform instance segmentation more effectively.

Embodiment two

In some embodiments, the artificial intelligence-based image instance labeling device 50 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the image instance tagging device 50 based on artificial intelligence can be stored in the memory of the electronic device, and executed by at least one processor to execute (see Figure 1 for details) based on artificial intelligence. A feature for image instance annotation.

In this embodiment, the artificial intelligence-based image instance tagging device 50 can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an instance identification module 501 , an instance acquisition module 502 , a first labeling module 503 , a second labeling module 504 , a label determination module 505 and a model training module 506 . The module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

The instance identification module 501 is configured to call a preset instance segmentation model to identify the amount of information of each instance in the target image.

In an optional implementation manner, the instance identification module 501 calls a preset instance segmentation model to identify the amount of information of each instance in the target image, including:

The final score is determined as the informativeness of the instance.

In other embodiments, the average of the class label score, detection box score, and contour mask of each instance can also be calculated as the final score of the instance. Alternatively, compute the sum of the class label score, bounding box score, and contour mask for each instance as the final score for the instance. This application does not impose any limitations.

The instance obtaining module 502 is configured to obtain a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, except for the first instance in the target image An instance other than one instance is a second instance.

The first labeling module 503 is configured to manually label the first label of the first instance in the target image.

The second labeling module 504 is configured to pseudo-label the second label of the second instance in the target image based on a semi-supervised learning method.

The label determining module 505 is configured to obtain an instance label of the target image based on the first label and the second label.

In an optional implementation manner, the model training module 506 is configured to:

training the instance labeling model based on the training set;

Embodiment three

This embodiment provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the steps in the above-mentioned embodiment of the method for tagging an image instance based on artificial intelligence are implemented, for example, 1 shows S11-S15:

S11, calling the preset instance segmentation model to identify the amount of information of each instance in the target image;

S12. Obtain a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, where the instance in the target image other than the first instance is the first instance Two instances;

S13. Manually annotating the first label of the first instance in the target image;

S14. Pseudo-labeling the second label of the second instance in the target image based on a semi-supervised learning manner;

Alternatively, when the computer program is executed by the processor, the functions of the modules/units in the above-mentioned device embodiments are realized, such as modules 501-505 in FIG. 5:

The instance recognition module 501 is used to call a preset instance segmentation model to identify the amount of information of each instance in the target image;

The instance obtaining module 502 is configured to obtain a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, except for the first instance in the target image an instance other than an instance is a second instance;

The first labeling module 503 is configured to manually label the first label of the first instance in the target image;

The second labeling module 504 is configured to pseudo-label the second label of the second instance in the target image based on a semi-supervised learning method;

Embodiment Four

Referring to FIG. 6 , it is a schematic structural diagram of an electronic device provided by Embodiment 3 of the present application. In a preferred embodiment of the present application, the electronic device 6 includes a memory 61 , at least one processor 62 , at least one communication bus 63 and a transceiver 64 .

Those skilled in the art should understand that the structure of the electronic device shown in Figure 6 does not constitute a limitation of the embodiment of the present application, it can be a bus structure or a star structure, and the electronic device 6 can also include a ratio diagram more or less other hardware or software, or a different arrangement of components.

In some embodiments, the electronic device 6 is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits, Programmable gate arrays, digital processors and embedded devices, etc. The electronic device 6 may also include a client device, which includes but is not limited to any electronic product that can interact with the client through a keyboard, mouse, remote control, touch pad or voice control device, for example, Personal computers, tablets, smartphones, digital cameras, etc.

It should be noted that the electronic device 6 is only an example, and other existing or future electronic products that can be adapted to this application should also be included in the scope of protection of this application, and are included here by reference .

In some embodiments, a computer program is stored in the memory 61, and when the computer program is executed by the at least one processor 62, all or part of the steps in the above-mentioned method for tagging image instances based on artificial intelligence are implemented. Described memory 61 comprises volatile and nonvolatile memory, such as Random Access Memory (Random Access Memory, RAM), Read-Only Memory (Read-Only Memory, ROM), Programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), one-time programmable read-only memory (One-time Programmable Read-Only Memory, OTPROM), electronic erasable rewritable only Read-only memory (Electrically-Erasable Programmable Read-Only Memory, EEPROM), read-only CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, disk storage, tape storage, or can be used to carry or store data computer readable storage medium. The computer-readable storage medium may be non-volatile or volatile. Further, the computer-readable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; The data created using the node, etc.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In some embodiments, the at least one processor 62 is the control core (Control Unit) of the electronic device 6, and uses various interfaces and lines to connect the various components of the entire electronic device 6, by running or executing the The programs or modules in the memory 61 and call the data stored in the memory 61 to execute various functions of the electronic device 6 and process data. For example, when the at least one processor 62 executes the computer program stored in the memory, it realizes all or part of the steps of the artificial intelligence-based image instance labeling method described in the embodiment of the present application; or realizes the artificial intelligence-based image instance labeling method. Label all or part of the functionality of the device. The at least one processor 62 may be composed of an integrated circuit, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and a combination of various control chips, etc.

In some embodiments, the at least one communication bus 63 is configured to implement communication between the memory 61 and the at least one processor 62 and the like.

Although not shown, the electronic device 6 can also include a power supply (such as a battery) for supplying power to each component. Preferably, the power supply can be logically connected to the at least one processor 62 through a power management device, thereby realizing Manage functions such as charging, discharging, and power management. The power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components. The electronic device 6 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

The above-mentioned integrated units implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, electronic device, or network device, etc.) or a processor (processor) execute the methods described in various embodiments of the present application part.

In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, the embodiments should be regarded as exemplary and not restrictive in all points of view, and the scope of the application is defined by the appended claims rather than the foregoing description, and it is intended that the scope of the present application be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in this application. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is clear that the word "comprising" does not exclude other elements or the singular does not exclude the plural. A plurality of units or devices stated in the specification may also be realized by one unit or device through software or hardware. The words first, second, etc. are used to denote names and do not imply any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application without limitation. Although the present application has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present application.

Claims

A method for labeling image instances based on artificial intelligence, wherein the method includes:

Call the preset instance segmentation model to identify the amount of information of each instance in the target image;

Obtaining a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, where instances in the target image other than the first instance are second instances ;

manually annotating the first label of the first instance in the target image;

Pseudo-labeling the second label of the second instance in the target image based on a semi-supervised learning manner;

An instance label of the target image is obtained based on the first label and the second label.
The method for labeling image instances based on artificial intelligence according to claim 1, wherein said calling a preset instance segmentation model to identify the amount of information of each instance in the target image comprises:

performing a first perturbation on the target image to obtain a first perturbation image, and performing a second perturbation on the target image to obtain a second perturbation image;

Invoking the preset instance segmentation model to identify a first category label, a first instance detection frame, and a first instance contour mask for each instance in the target image;

invoking the preset instance segmentation model to identify a second category label, a second instance detection frame, and a second instance contour mask for each instance in the first perturbed image;

invoking the preset instance segmentation model to identify a third category label, a third instance detection frame, and a third instance contour mask for each instance in the second perturbed image;

calculating a class label score for each instance based on the first class label, the second class label, and the third class label;

calculating a detection frame score for each instance based on the first instance detection frame, the second instance detection frame, and the third instance detection frame;

calculating a contour mask score for each instance based on the first instance contour mask, the second instance contour mask, and the third instance contour mask;

The information amount of the corresponding instance is calculated according to the class label score and the corresponding detection frame score and the contour mask score.
The method for labeling image instances based on artificial intelligence according to claim 2, wherein said calculating the category label score of each instance based on said first category label, said second category label and said third category label comprises :

Obtaining a first predicted probability corresponding to the first category label, a second predicted probability of the second category label, and a third predicted probability of the third category label;

calculating the probability mean of the first predicted probability and the corresponding second predicted probability and the third predicted probability;

The mean value is used as the category label score of the corresponding instance.
The method for labeling image instances based on artificial intelligence according to claim 2, wherein the calculation of the detection of each instance based on the first instance detection frame, the second instance detection frame and the third instance detection frame Box scores include:

calculating a first intersection-over-union ratio between the first instance detection frame and the corresponding second instance detection frame;

calculating a second intersection-over-union ratio between the first instance detection frame and the corresponding third instance detection frame;

The corresponding detection frame score of the instance is calculated based on the first intersection and union ratio and the second intersection and union ratio according to a preset first calculation model.
The method for labeling image instances based on artificial intelligence according to claim 2, wherein said calculation of each Contour mask scores for instances include:

calculating the first Jaccard distance between the first instance contour mask and the corresponding second instance contour mask;

calculating a second Jaccard distance between the first instance contour mask and the corresponding third instance contour mask;

A contour mask score corresponding to the instance is calculated based on the first Jaccard distance and the second Jaccard distance according to a preset second calculation model.
The artificial intelligence-based image instance labeling method according to any one of claims 2 to 5, wherein the calculation of the corresponding The amount of information for an instance includes:

Calculate the product of the category label score and the corresponding detection frame score and the contour mask score to obtain the final score of the corresponding instance;

The final score is determined as the informativeness of the instance.
The method for labeling image instances based on artificial intelligence as claimed in claim 6, wherein said method further comprises:

Using the labeled images marked with instance labels and multiple target images as a training set;

training the instance labeling model based on the training set;

Evaluate the accuracy of the instance tagging model based on the test set, and end the training of the instance tagging model when the accuracy meets a preset accuracy threshold.
A device for labeling image instances based on artificial intelligence, wherein the device includes:

An instance recognition module, used to call a preset instance segmentation model to identify the amount of information of each instance in the target image;

An instance acquiring module, configured to acquire a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, except for the first instance in the target image The instance of is the second instance;

a first labeling module, configured to manually label the first label of the first instance in the target image;

A second labeling module, configured to pseudo-label the second label of the second instance in the target image based on a semi-supervised learning method;

A label determining module, configured to obtain an instance label of the target image based on the first label and the second label.
An electronic device, wherein the electronic device includes a processor and a memory, and the processor is configured to execute computer-readable instructions stored in the memory to implement the following steps:

Call the preset instance segmentation model to identify the amount of information of each instance in the target image;

Obtaining a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, where instances in the target image other than the first instance are second instances ;

manually annotating the first label of the first instance in the target image;

Pseudo-labeling the second label of the second instance in the target image based on a semi-supervised learning manner;

An instance label of the target image is obtained based on the first label and the second label.
The electronic device according to claim 9, wherein, when the processor executes the computer-readable instructions to realize calling a preset instance segmentation model to identify the amount of information of each instance in the target image, it specifically includes:

performing a first perturbation on the target image to obtain a first perturbation image, and performing a second perturbation on the target image to obtain a second perturbation image;

Invoking the preset instance segmentation model to identify a first category label, a first instance detection frame, and a first instance contour mask for each instance in the target image;

invoking the preset instance segmentation model to identify a second category label, a second instance detection frame, and a second instance contour mask for each instance in the first perturbed image;

invoking the preset instance segmentation model to identify a third category label, a third instance detection frame, and a third instance contour mask for each instance in the second perturbed image;

calculating a class label score for each instance based on the first class label, the second class label, and the third class label;

calculating a detection frame score for each instance based on the first instance detection frame, the second instance detection frame, and the third instance detection frame;

calculating a contour mask score for each instance based on the first instance contour mask, the second instance contour mask, and the third instance contour mask;

The information amount of the corresponding instance is calculated according to the class label score and the corresponding detection frame score and the contour mask score.
The electronic device of claim 10, wherein the processor executes the computer-readable instructions to effectuate computing each instance based on the first class label, the second class label, and the third class label When the category label score of the , specifically includes:

Obtaining a first predicted probability corresponding to the first category label, a second predicted probability of the second category label, and a third predicted probability of the third category label;

calculating the probability mean of the first predicted probability and the corresponding second predicted probability and the third predicted probability;

The mean value is used as the category label score of the corresponding instance.
The electronic device according to claim 10, wherein the processor executes the computer-readable instructions to implement calculation based on the first instance detection frame, the second instance detection frame, and the third instance detection frame When scoring the detection frame of each instance, it specifically includes:

calculating a first intersection-over-union ratio between the first instance detection frame and the corresponding second instance detection frame;

calculating a second intersection-over-union ratio between the first instance detection frame and the corresponding third instance detection frame;

The corresponding detection frame score of the instance is calculated based on the first intersection and union ratio and the second intersection and union ratio according to a preset first calculation model.
The electronic device of claim 10, wherein the processor executes the computer-readable instructions to implement When the mask computes the contour mask score for each instance, it includes:

calculating the first Jaccard distance between the first instance contour mask and the corresponding second instance contour mask;

calculating a second Jaccard distance between the first instance contour mask and the corresponding third instance contour mask;

A contour mask score corresponding to the instance is calculated based on the first Jaccard distance and the second Jaccard distance according to a preset second calculation model.
The electronic device according to any one of claims 10 to 13, wherein the processor executes the computer-readable instructions to realize the detection according to the class label score and the corresponding detection frame score, the contour mask When the code score calculates the information volume of the corresponding instance, it specifically includes:

Calculate the product of the category label score and the corresponding detection frame score and the contour mask score to obtain the final score of the corresponding instance;

The final score is determined as the informativeness of the instance.
The electronic device of claim 14, wherein the processor executes the computer readable instructions to further implement the following steps:

Using the labeled images marked with instance labels and multiple target images as a training set;

training the instance labeling model based on the training set;

Evaluate the accuracy of the instance tagging model based on the test set, and end the training of the instance tagging model when the accuracy meets a preset accuracy threshold.
A computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, wherein the computer program implements the following steps when executed by a processor:

Call the preset instance segmentation model to identify the amount of information of each instance in the target image;

Obtaining a first information amount higher than a preset information amount threshold and a first instance corresponding to the first information amount from the target image, where instances in the target image other than the first instance are second instances ;

manually annotating the first label of the first instance in the target image;

Pseudo-labeling the second label of the second instance in the target image based on a semi-supervised learning manner;

An instance label of the target image is obtained based on the first label and the second label.
The computer-readable storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor to implement calling a preset instance segmentation model to identify the amount of information of each instance in the target image, it specifically includes:

performing a first perturbation on the target image to obtain a first perturbation image, and performing a second perturbation on the target image to obtain a second perturbation image;

Invoking the preset instance segmentation model to identify a first category label, a first instance detection frame, and a first instance contour mask for each instance in the target image;

Invoking the preset instance segmentation model to identify a second category label, a second instance detection frame, and a second instance contour mask for each instance in the first perturbed image;

invoking the preset instance segmentation model to identify a third category label, a third instance detection frame, and a third instance contour mask for each instance in the second perturbed image;

calculating a class label score for each instance based on the first class label, the second class label, and the third class label;

calculating a detection frame score for each instance based on the first instance detection frame, the second instance detection frame, and the third instance detection frame;

calculating a contour mask score for each instance based on the first instance contour mask, the second instance contour mask, and the third instance contour mask;

The information amount of the corresponding instance is calculated according to the class label score and the corresponding detection frame score and the contour mask score.
The computer-readable storage medium of claim 17, wherein the computer-readable instructions are executed by the processor to implement When computing the class label score for each instance, specifically:

Obtaining a first predicted probability corresponding to the first category label, a second predicted probability of the second category label, and a third predicted probability of the third category label;

calculating the probability mean of the first predicted probability and the corresponding second predicted probability and the third predicted probability;

The mean value is used as the category label score of the corresponding instance.
The computer-readable storage medium of claim 17, wherein the computer-readable instructions are executed by the processor to implement a detection frame based on the first instance, the second instance detection frame, and the third instance detection frame. When the instance detection box calculates the detection box score of each instance, it specifically includes:

calculating a first intersection-over-union ratio between the first instance detection frame and the corresponding second instance detection frame;

calculating a second intersection-over-union ratio between the first instance detection frame and the corresponding third instance detection frame;

The corresponding detection frame score of the instance is calculated based on the first intersection and union ratio and the second intersection and union ratio according to a preset first calculation model.
The computer-readable storage medium of claim 17 , wherein the computer-readable instructions are executed by the processor to implement a method based on the first instance contour mask, the second instance contour mask, and the When the third instance contour mask calculates the contour mask score of each instance, it specifically includes:

calculating the first Jaccard distance between the first instance contour mask and the corresponding second instance contour mask;

calculating a second Jaccard distance between the first instance contour mask and the corresponding third instance contour mask;

A contour mask score corresponding to the instance is calculated based on the first Jaccard distance and the second Jaccard distance according to a preset second calculation model.
The computer-readable storage medium as claimed in any one of claims 17 to 20, wherein the computer-readable instructions are executed by the processor to realize according to the category label score and the corresponding detection frame score, When calculating the information content of the corresponding instance, the contour mask score specifically includes:

Calculate the product of the category label score and the corresponding detection frame score and the contour mask score to obtain the final score of the corresponding instance;

The final score is determined as the informativeness of the instance.
The computer-readable storage medium of claim 21, wherein the computer-readable instructions are executed by the processor to further implement the steps of:

Using the labeled images marked with instance labels and multiple target images as a training set;

training the instance labeling model based on the training set;

Evaluate the accuracy of the instance tagging model based on the test set, and end the training of the instance tagging model when the accuracy meets a preset accuracy threshold.