WO2022077917A1

WO2022077917A1 - Instance segmentation model sample screening method and apparatus, computer device and medium

Info

Publication number: WO2022077917A1
Application number: PCT/CN2021/096675
Authority: WO
Inventors: 王俊; 高鹏
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-10-14
Filing date: 2021-05-28
Publication date: 2022-04-21
Also published as: CN112163634B; CN112163634A

Abstract

An instance segmentation model sample screening method, which relates to artificial intelligence, and can be used for a medical image analysis assistance scenario. The method comprises: reading an original data set; picking out, on the basis of an active learning manner and from an unlabeled set, first samples to be labeled, information amounts of which are greater than those of remaining samples, so as to obtain a first labeled set in a manner of manually labeling the plurality of first samples to be labeled; selecting, on the basis of a semi-supervised learning manner and from all the remaining samples, second samples to be labeled, confidence coefficients of which are higher than a set value, so as to obtain a second labeled set in a manner of pseudo-labeling said second samples; and taking the first labeled set, the second labeled set and a labeled set together as a training set. By means of the method, a manual sample labeling amount is reduced, and a large number of samples used for training an image instance segmentation model can be obtained, such that a more ideal accuracy for the instance segmentation model can be realized. In addition, the method further relates to blockchain technology, and both an original data set and a training set can be stored in a blockchain.

Description

Instance segmentation model sample screening method, apparatus, computer equipment and medium

This application claims the priority of the Chinese patent application filed on October 14, 2020 with the application number 202011099366.0 and the invention titled "Instance Segmentation Model Sample Screening Method, Apparatus, Computer Equipment and Medium", the entire content of which is approved by Reference is incorporated in this application.

technical field

The present application relates to the technical field of artificial intelligence, and can be applied to the field of image instance segmentation. The present application specifically provides an instance segmentation model sample screening method, apparatus, computer equipment and medium.

Background technique

With the continuous development of deep learning, computer vision has achieved increasing success thanks to the support of large training datasets. Training datasets (training datasets for short) are datasets with rich labeling information. Collecting and labeling such datasets usually requires huge labor costs.

Compared with image classification technology, image instance segmentation is more difficult, and a large amount of labeled training data is required to truly realize instance segmentation. However, the inventors realized that the number of available annotated samples is often insufficient relative to the scale of the problem, or the cost of obtaining samples is too high. In many cases, annotators with relevant professional knowledge (such as doctors) are scarce or difficult to spare time, or the labeling cost of the annotators is too high, or the labeling or judgment cycle of images is too long, these problems may be impossible for the instance segmentation model. Effective training.

Therefore, how to obtain a large number of samples (training data sets) for training the image instance segmentation model has become a research hotspot for those skilled in the art.

SUMMARY OF THE INVENTION

In order to solve the problem in the prior art that it is difficult to obtain a large number of samples for image instance segmentation model training, the present application can provide an instance segmentation model sample screening method, device, computer equipment and medium, which can reduce the amount of manual annotation of samples. the purpose of obtaining a large number of samples at the same time.

In order to achieve the above technical purpose, the present application discloses a sample screening method for instance segmentation model, which includes but is not limited to the following steps.

Read the original dataset, which includes unlabeled and labeled sets.

Based on the active learning method, a plurality of first to-be-labeled samples with more information than the remaining samples are selected from the unlabeled set, and the first labeled set is obtained by manually labeling the plurality of first to-be-labeled samples. All the first samples to be labeled and all remaining samples form an unlabeled set.

Based on the semi-supervised learning method, a second to-be-labeled sample with a confidence higher than a set value is selected from all the remaining samples, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample.

The first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.

In order to achieve the above technical purpose, the present application also discloses an instance segmentation model sample screening device, which includes but is not limited to a data reading module, a first screening module, a second screening module and a data expansion module.

The data reading module reads the original data set, the original data set includes the unlabeled set and the labeled set.

The first screening module is configured to select, based on an active learning method, a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and the plurality of first to-be-labeled samples are manually labeled as the first labeled set. All the first samples to be labeled and all remaining samples form an unlabeled set.

The second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set.

The data expansion module uses the first label set, the second label set and the already labelled set as the training set of the current instance segmentation model.

In order to achieve the above-mentioned technical purpose, the present application also provides a computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is made to execute any Steps of a sample screening method in an embodiment.

In order to achieve the above-mentioned technical purpose, the present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute any Steps of a sample screening method in an embodiment.

The beneficial effects of the present application are: based on the semi-supervised active learning strategy, the present application can select the samples with the largest amount of information about the current model to be labeled by the labelers, and effectively expand the training set through the semi-supervised pseudo-labeling learning method, so the present application A large number of samples for image instance segmentation model training can be obtained while reducing the amount of manual annotation of samples, so as to achieve a more ideal instance segmentation model accuracy.

The present application can obtain a large number of model training samples faster while reducing manual labeling to a great extent, so that the training speed of the instance segmentation model of the present application is faster, so the present application has good practical significance and application promotion value.

Description of drawings

FIG. 1 shows a schematic flowchart of a sample screening method for an instance segmentation model in some embodiments of the present application.

FIG. 2 shows a schematic diagram of the working principle of the sample screening apparatus for instance segmentation model in some embodiments of the present application.

FIG. 3 shows a schematic diagram of the working principle of the instance segmentation model in some embodiments of the present application.

FIG. 4 shows the scores of instance objects in some embodiments of the present application in three dimensions: category, detection frame, and segmentation contour.

FIG. 5 shows the scores of instance objects in other embodiments of the present application in three dimensions: category, detection frame, and segmentation contour.

FIG. 6 is a schematic diagram showing the comparison of instance segmentation effects that can be achieved by using the present application and the existing method on different numbers of labeled images (taking cerebral hemorrhage area segmentation and fundus edema area segmentation as examples).

Figure 7 is a schematic diagram showing the comparison of the model accuracy (applied to the segmentation of intracerebral hemorrhage regions) achieved using the present application and the existing method on different numbers of annotated images.

FIG. 8 is a schematic diagram showing the comparison of the model accuracy (applied to fundus edema region segmentation) achieved by using the present application and the existing method on different numbers of labeled images.

FIG. 9 shows a block diagram of the internal structure of a computer device in some embodiments of the present application.

Detailed ways

Hereinafter, a detailed explanation and description of a sample screening method, apparatus, computer equipment and medium of an instance segmentation model provided by the present application will be given in conjunction with the accompanying drawings.

In the auxiliary scenario of intelligent analysis of medical images, in order to solve the problem of difficulty in obtaining a large number of training samples for instance segmentation models in conventional technologies, this application can effectively combine Active Learning and Semi-supervised Learning. both options. Among them, this application can use the advantages of active learning to obtain the best possible generalization model by sampling as few labeled samples as possible, and use the semi-supervised learning to mine the relationship between labeled samples and unlabeled samples. Advantages of better generalizing models. The present application can combine the advantages of these two schemes, and can provide a semi-supervised active learning strategy to achieve rapid acquisition and screening of a large number of instance segmentation model samples.

As shown in FIG. 1 , some embodiments of the present application may provide an instance segmentation model sample screening method, which is suitable for medical image analysis with complex layouts, for example, is preferably suitable for images in which different areas are occluded from each other. The method may include: But not limited to the following steps.

Step S1, read the original data set, the original data set in some embodiments of the present application may include but not limited to data sets such as unlabeled sets, labeled sets, and test sets. It should be understood that the original dataset contains relatively few labeled sets and many unlabeled sets. In some embodiments of the present application, a dataset refers to a medical image dataset, an unlabeled set refers to an unlabeled medical image dataset, a labeled set refers to a labeled medical image dataset, and a test set refers to a medical image that can be used for model evaluation. data set.

Step S2, based on the active learning method, the present application first selects a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtains the first labeled set by manually labeling multiple first to-be-labeled samples. A label set is a partial training set formed by manual labeling. All the first to-be-labeled samples and all remaining samples form an unlabeled set in the original data set, that is, the medical image samples to be labeled and the remaining unlabeled medical image samples constitute all the Unlabeled medical image samples. As shown in Figure 2, although a new training set can be provided for the current instance segmentation model through manual annotation, the number of labeled samples that can be completed through manual annotation is limited in fact.

In specific implementation, D={(x ₁ , y ₁ ), (x ₂ , y ₂ ),...,(x _i ,y _i ),x _i+1 ,...,x _n } can be expressed as The entire dataset, where x represents the sample and y represents the labeling result. The dataset includes labeled data {(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _i ,y _i )} and unlabeled data {x _i+1 ,..., x _n }, the first i samples in the data set are the previous labeled sets, and the remaining ni samples represent the unlabeled sets in the original data set. In this embodiment, several samples with the largest amount of information (for example, the top k samples with the largest amount of information) may be selected from the unmarked set, and these samples are used for marking by the marking personnel. The specific value of k can be reasonably selected according to the actual situation, for example, k=500.

As shown in FIG. 3 , the instance segmentation model formed based on the present application can work as follows: scan images (that is, images in the original data set, including unlabeled images and labeled images) through the instance segmentation model. Annotated data streams and solid lines can represent annotated data streams; proposals can be generated after scanning the image, and bounding box information and mask information can be generated by classifying the proposed information, and then based on the bounding box information in the subsequent network and mask information to determine the instance detection box score (bbox_score), instance output category score (class_score) and instance contour mask score (mask_score), and then select the amount of information according to the instance detection box score, instance output category score and instance contour mask score the largest number of samples. The instance segmentation model of this embodiment can be extended on the basis of the Faster R-CNN model, wherein the FPN network (a feature extraction network) scans the image based on its own pyramid structure to obtain proposal information, and the process of scanning the image can be a feature map ( feature map) mapping, RPN network (a regional recommendation network) generates bounding box information and mask information by processing proposal information. The processing methods can include binary classification (foreground and background classification) and BB (bounding box, detection box) ) regression, according to the bounding box information and mask information, the coordinates of the detection frame, whether there is a target in the detection frame, and the class label of the detection frame can be determined; then the bounding box information and mask information can be aligned. After processing, it is sent to the subsequent network, and the valuable selection alignment processing is used to correspond the pixels of the original image and the feature map. The subsequent network in this embodiment may include a detection head (RCNN Head) and a segmentation head (Mask Head) in the instance segmentation model, and then output the above instance detection frame score and instance output category score based on the detection head, and output the above based on the segmentation head. The instance contour mask score of , and the output dimension can be 1.

More specifically, under the overall architecture design of the instance segmentation model in FIG. 3 , the step of selecting a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on the active learning method in this embodiment specifically includes: Calculate the instance detection frame score, instance output category score and instance contour mask score of each sample in the unlabeled set, and then use the instance detection frame score, instance output category score and instance contour mask score to jointly determine the final score of each sample. In some embodiments of this application, the instance detection box score is the intersection ratio (IOU) of the instance's predicted bounding box and the ground truth bounding box, and the instance output category score is the instance's classification value, The instance contour mask score is the intersection ratio between the instance's predicted mask and the ground truth mask. In this embodiment, the process of determining the final score of each sample by using the instance detection frame score, the instance output category score, and the instance contour mask score includes: using the instance detection frame score, the instance output category score, and the mean and standard deviation of the instance contour mask score Calculate the score of each instance in the current sample, and then use the mean and standard deviation of the scores of each instance in the current sample to calculate the final score of the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set.

Calculate the score of each instance in the current sample using the mean and standard deviation of the instance detection frame score, instance output category score, and instance outline mask score

The formula is as follows. The mean calculation is used to aggregate all the scores, and the standard deviation calculation is used to count the diversity of scores.

in,

represents the score of the jth instance in the ith sample,

Represents the instance output category score, instance detection frame score, instance outline mask score of the jth instance in the ith sample, std represents the standard deviation calculation symbol, and mean represents the mean value calculation symbol.

The formula for calculating the final score S _i of the current sample by using the mean and standard deviation of the scores of each instance in the current sample is as follows.

This embodiment can select the first sample to be labeled from the unlabeled set during the training process of the instance segmentation model, and realize manual labeling of those data based on the active learning algorithm selection. Therefore, this application can screen all unlabeled samples based on the above three-branch information metrics (instance detection frame score, instance output category score, instance outline mask score). When there are k samples, select the top k or less than k samples for manual interpretation and labeling; that is, some embodiments of the present application may perform manual interpretation and labeling on the selected unlabeled k medical image samples.

Some embodiments of the present application may, for example, select a plurality of first samples to be labeled from the unlabeled set according to the negative correlation between the final score and the amount of information, as shown in FIG. 4 and FIG. 5 , examples include categories, detection frames and segmentation The scores in the three dimensions of the contour, the lower the comprehensive score of these three scores, the more corresponding samples should be labeled. Select the top k or less than k samples to label for the labeling personnel. In this embodiment, relevant labeling personnel (such as experts in the medical field) can be used for labeling, and the labeled samples can be placed in the training data set directory.

In order to make the instance segmentation model exert better performance, some embodiments of the present application may further include the step of calculating a loss function. As shown in FIG. 3 , the loss function of some embodiments of the present application may include five parts, namely detection frame loss L _class , output class loss L _bbox , outline mask loss L _mask , detection frame score loss L _bboxIOU , and outline mask score The loss L _MaskIOU , a total of up to five loss functions can be used together for iterative training and learning of the instance segmentation model.

Among them, the loss function L _semi of the semi-supervised part of the instance segmentation model is calculated as follows:

L _semi =L _class +L _bbox +L _mask +L _bboxIOU +L _MaskIOU

Combined with the active learning part, the overall loss function L of the instance segmentation model is calculated as follows:

L=L _sup +β*L _semi

Among them, L _sup represents the active learning part of the loss function, and β represents the loss balance coefficient; the loss balance coefficient is used to suppress the potential noise caused by false annotations, and the default value is 0.01.

In step S3, a second to-be-labeled sample with a confidence level higher than a set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample. For high-confidence samples, annotated results are automatically generated through a semi-supervised pseudo-annotation strategy. Wherein, the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected. The sample is used as the second sample to be labeled. In some embodiments of the present application, the first threshold, the second threshold, and the third threshold may be equal, for example, the first threshold=the second threshold=the third threshold=0.9. The present application can select the second to-be-labeled samples with three metric index scores greater than 0.9 from all the remaining samples during the instance segmentation model training process, and perform pseudo-labeling to obtain approximate reference labeling results, thereby further expanding the training set. , which is conducive to better performance of the model.

Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model. Using this training set to train an instance segmentation model for the analysis task of medical images, the present application can fully utilize the potential of instance segmentation. Therefore, in this application, the obtained first and second annotation sets can be added to the training set, and the model can be trained and updated, so that the information increment of the new samples obtained can greatly increase the number of medical image samples with annotations , update the training to improve the existing target instance segmentation model. For example, by applying the present application to the field of intelligent aided recognition of medical images, the area delineation or quantitative evaluation of different target positions and key organ instances can be performed simultaneously, especially for image areas that may be occluded from each other, the present application can more effectively segment key target instances. . It can be seen that this application can overcome the problem of over-reliance on limited and scarce doctors and experts for labeling, and provide a large number of useful samples for the image instance segmentation model. In addition, it should be understood that the above steps of the present application may be repeatedly performed for many times.

As shown in FIG. 7 and FIG. 8 , some embodiments of the present disclosure are implemented by comparison on a medical image instance segmentation task. Compared with the existing methods such as MC Dropout, Core Set, Class Entropy, Learning Loss, etc., the results of training after labeling by adding 500 samples step by step each time can be found that this application labels 1000-1500 samples intelligently selected. After training, the accuracy of the instance segmentation model that can only be achieved by the existing method of training 2000-3000 images can be achieved, and the labeling cost can be reduced by about 50%.

As shown in FIG. 6 , taking the existing Class Entropy method as an example, the present embodiment provides a graph of the segmentation results of the cerebral hemorrhage area and the fundus edema area in the actual model work. It can be seen that the results obtained in the experiments of the present application are basically consistent with the theoretical conclusions. After intelligently selecting a small number of samples, the instance segmentation effect that can only be achieved by conventional methods with more samples can be achieved. Experiments on the two tasks of CT intracerebral hemorrhage area segmentation and fundus edema area segmentation show that the application can achieve almost the same performance with only about 50% of the sample size of the conventional complete data set. It can be seen that the solution provided in this application is significantly better than the current There are other methods that can save a lot of manpower and material resources. Each time this application selects the most valuable samples for improving and enhancing the target segmentation model for training. On the basis of ensuring the accuracy of the task, the labeling cost and workload are effectively reduced, and the labeling efficiency is greatly improved. A large number of labeled samples were obtained under the premise of less manual annotation. Therefore, using the instance segmentation model of the present application can have a training set with a larger number of samples, which greatly improves the accuracy of the model. More importantly, this application essentially provides a set of efficient human in the loop model acquisition method combining sample labeling and training, making full use of expert knowledge and high-confidence prediction of artificial intelligence, for Deep learning reduces data set requirements and provides a new implementation method, which has high practical application significance and promotion value.

As shown in FIG. 2 , other embodiments of the present application can provide an instance segmentation model sample screening apparatus, which includes but is not limited to a data reading module, a first screening module, a second screening module, and a data expansion module.

The first screening module is used to select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples based on an active learning method from the unlabeled set, and the plurality of first to-be-labeled samples are manually labeled as the first label set; The unlabeled sample and all remaining samples form an unlabeled set.

It should be emphasized that, in order to further ensure the privacy and security of the data in the embodiments of the present application, the above-mentioned data such as the original data set and the training set may also be stored in a node of a blockchain.

Based on the active learning strategy, this application selects some high-value samples from a large number of unlabeled original medical images and labels them for labelers (such as doctors), and does not need to label all samples. Each time, the most valuable samples for improving the deep learning instance segmentation model are selected for training, thereby effectively reducing the labeling cost and the workload of doctors on the basis of obtaining the ideal task accuracy, and maximizing the efficiency of manual labeling of samples. This application can select the samples with the largest amount of information to speed up the training of the instance segmentation model, and the amount of manually labeled data is significantly reduced, which provides a new implementation method for deep learning to reduce data set requirements, realizes efficient data and computing resource utilization, and saves computing. LF. Combined with the predicted output of the instance segmentation model, we provide a semi-supervised active learning framework for instance segmentation of medical images, which can be integrated with mainstream instance segmentation models, which can significantly save the annotation cost of training deep neural network instance segmentation models. The above experiments show that a medical image instance segmentation model with stronger generalization ability and more accurate can be trained on the basis of this application, and the network overfitting can be reduced to better adapt to scenarios such as medical applications.

As shown in FIG. 9 , the present application also provides a computer device, including a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes any The steps of the sample screening method in the Examples. Wherein, the computer equipment can be a PC, a portable electronic device such as a PAD, a tablet computer, a laptop computer, or an intelligent mobile terminal such as a mobile phone, which is not limited to the description here; the computer equipment can also be implemented by a server, The server can be constituted by a cluster system, and in order to realize the function of each unit, it is merged into one computer device or a separate set of functions of each unit. The execution of the program includes the instructions of the following steps: Step S1, reading the original data set, the original data set in this application may include unlabeled sets and labeled sets. Step S2, based on the active learning method, select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtain a first labeled set by manually labeling a plurality of first to-be-labeled samples; all the first to be labeled samples are obtained. The sample and all remaining samples form the unlabeled set. The step of selecting a plurality of first to-be-labeled samples with more information than the remaining samples from the unlabeled set based on the active learning method includes: calculating the instance detection frame score, instance output category score and instance outline mask of each sample in the unlabeled set score, to determine the final score of each sample by using the instance detection frame score, the instance output category score and the instance outline mask score; specifically, in some embodiments of the present application, the instance detection frame score is the intersection of the instance detection frame and the real frame In the comparison, the instance output category score is the classification value of the instance, and the instance contour mask score is the intersection ratio between the detection mask of the instance and the real mask. The process of using the instance detection frame score, instance output category score and instance contour mask score to determine the final score of each sample includes: calculating the current sample using the mean and standard deviation of the instance detection frame score, instance output category score and instance contour mask score The score of each instance in the current sample; the final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set. The first to-be-labeled sample can be selected from the unlabeled set during instance segmentation model training. In step S3, a second to-be-labeled sample with a confidence level higher than a set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample. Wherein, the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected. The sample is used as the second sample to be labeled. The present application can select the second to-be-labeled sample from all remaining samples during the instance segmentation model training process. Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.

A storage medium storing computer-readable instructions, when executed by one or more processors, causes the one or more processors to perform the steps of the sample screening method as follows in any embodiment of the present application. Step S1, reading the original data set, the original data set in this application may include unlabeled sets and labeled sets. Step S2, based on the active learning method, select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtain a first labeled set by manually labeling a plurality of first to-be-labeled samples; all the first to be labeled samples are obtained. The sample and all remaining samples form the unlabeled set. The step of selecting a plurality of first to-be-labeled samples with more information than the remaining samples from the unlabeled set based on the active learning method includes: calculating the instance detection frame score, instance output category score and instance outline mask of each sample in the unlabeled set score, to determine the final score of each sample by using the instance detection frame score, the instance output category score and the instance outline mask score; specifically, in some embodiments of the present application, the instance detection frame score is the intersection of the instance detection frame and the real frame In the comparison, the instance output category score is the classification value of the instance, and the instance contour mask score is the intersection ratio between the detection mask of the instance and the real mask. The process of using the instance detection frame score, instance output category score and instance contour mask score to determine the final score of each sample includes: calculating the current sample using the mean and standard deviation of the instance detection frame score, instance output category score and instance contour mask score The score of each instance in the current sample; the final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set. The first to-be-labeled sample can be selected from the unlabeled set during instance segmentation model training. In step S3, a second to-be-labeled sample with a confidence higher than the set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample. Wherein, the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected. The sample is used as the second sample to be labeled. The present application can select the second to-be-labeled sample from all remaining samples during the instance segmentation model training process. Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, and may be embodied in any computer-readable storage medium , for use by an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch and execute instructions from an instruction execution system, apparatus, or device), or in conjunction with these instruction execution systems, device or equipment. For the purposes of this specification, a "computer-readable storage medium" can be any device that can contain, store, communicate, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or apparatus . The computer-readable storage medium may be non-volatile or volatile. More specific examples (non-exhaustive list) of computer readable storage media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM, Random Access Memory), Read-Only Memory (ROM, Read-Only Memory), Erasable and Editable Read-Only Memory (EPROM, Erasable Programmable Read-Only Memory, or Flash Memory), Optical Devices, and Portable Optical Disc Read-Only Memory (CDROM, Compact Disc Read-Only Memory). In addition, the computer-readable storage medium may even be paper or other suitable medium on which the program can be printed, as the paper or other medium may be optically scanned, for example, and then edited, interpreted or, if necessary, otherwise Process in a suitable manner to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application-specific integrated circuits with suitable combinational logic gate circuits, Programmable Gate Arrays (PGA, Programmable Gate Array), Field Programmable Gate Arrays (FPGA, Field Programmable Gate Array), etc.

In the description of this specification, reference to the terms "this embodiment", "one embodiment", "some embodiments", "example", "specific example", or "some examples" or the like is meant to be combined with the description of the embodiment A particular feature, structure, material, or characteristic described or exemplified is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of this application, "plurality" means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.

The above descriptions are only the preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements and simple improvements made in the substantial content of the present application shall be included in the protection scope of the present application. Inside.

Claims

An instance segmentation model sample screening method, comprising:

reading an original data set, the original data set includes an unlabeled set and a labeled set;

Based on the active learning method, a plurality of first to-be-labeled samples with more information than the remaining samples are selected from the unlabeled set, and the first to-be-labeled sample is obtained by manually labeling the plurality of first to-be-labeled samples; all the first to-be-labeled samples are obtained; Annotated samples and all remaining samples form the unlabeled set;

Based on the semi-supervised learning method, a second to-be-labeled sample with a confidence level higher than the set value is selected from all the remaining samples, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample;

The first label set, the second label set and the already labelled set are taken together as a training set of the current instance segmentation model.
The sample screening method for an instance segmentation model according to claim 1, wherein the step of selecting a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on an active learning method comprises:

Calculate the instance detection frame score, instance output class score, and instance contour mask score for each sample in the unlabeled set to determine each instance using the instance detection frame score, the instance output class score, and the instance contour mask score. the final score of the sample;

The plurality of first to-be-labeled samples are selected from the unlabeled set according to the negative or positive correlation between the final score and the amount of information.
The sample screening method for an instance segmentation model according to claim 2, wherein the process of determining the final score of each sample by using the instance detection frame score, the instance output category score and the instance contour mask score comprises:

Calculate the score of each instance in the current sample by using the mean and standard deviation of the instance detection frame score, the instance output category score and the instance outline mask score;

The final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample.
The sample screening method for an instance segmentation model according to claim 3, wherein calculating the score of each instance in the current sample by using the mean value and standard deviation of the instance detection frame score, the instance output category score and the instance outline mask score comprises:

in,
represents the score of the jth instance in the ith sample,
Represents the instance output category score, instance detection frame score, instance outline mask score of the jth instance in the ith sample, std represents the standard deviation calculation symbol, and mean represents the mean value calculation symbol.
The sample screening method for an instance segmentation model according to claim 4, wherein calculating the final score of the current sample by using the mean and standard deviation of the scores of each instance in the current sample comprises:

Among them, Si represents the final score of the current sample.
The sample screening method for an instance segmentation model according to claim 2, wherein the step of selecting a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method comprises:

Obtain the instance detection frame score, instance output category score and instance outline mask score of all remaining samples;

When the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is determined that the confidence of the current sample is higher than the set value, and the current sample is selected as the The second sample to be labeled.
The instance segmentation model sample screening method according to claim 6, wherein,

The first threshold, the second threshold, and the third threshold are equal.
The instance segmentation model sample screening method according to claim 6, wherein,

The first threshold=second threshold=third threshold=0.9.
The instance segmentation model sample screening method according to claim 2, wherein,

The instance detection frame score is the intersection ratio of the instance detection frame and the real frame;

The instance output category score is the classification value of the instance;

The instance contour mask score is the intersection ratio of the detected mask and the ground truth mask of the instance.
The instance segmentation model sample screening method according to claim 1, wherein,

The first to-be-labeled sample is selected from the unlabeled set during the instance segmentation model training process.
The sample screening method of an instance segmentation model according to claim 10, wherein the loss function is calculated in the instance segmentation model training process; the loss function includes detection frame loss, output category loss, contour mask loss, and detection frame score loss and the contour mask score loss.
The instance segmentation model sample screening method according to claim 1, wherein,

The second to-be-labeled sample is selected from all remaining samples during instance segmentation model training.
The sample screening method for an instance segmentation model according to claim 2, wherein the calculating the instance detection frame score, instance output category score and instance outline mask score of each sample in the unlabeled set comprises:

Generate proposal information by scanning the image through the instance segmentation model;

Generate bounding box information and mask information by classifying proposal information;

According to the bounding box information and mask information, the instance detection box score, instance output category score and instance outline mask score are determined.
The method for instance segmentation model sample screening according to claim 13, wherein the classifying the proposal information comprises:

Binary classification and detection frame regression processing are performed on the proposed information.
The sample screening method for an instance segmentation model according to claim 13, wherein the calculating the instance detection frame score, instance output category score and instance outline mask score of each sample in the unlabeled set comprises:

The instance detection box score and instance output category score are output based on the detection head in the instance segmentation model, and the instance contour mask score is output based on the segmentation head in the instance segmentation model.
The instance segmentation model sample screening method according to claim 1, wherein,

the unlabeled set represents an unlabeled medical image dataset;

The labeled set represents a labeled medical image dataset.
The instance segmentation model sample screening method according to claim 1, wherein,

The number of the first to-be-labeled samples is k=500.
An instance segmentation model sample screening device, comprising:

a data reading module, which reads an original data set, the original data set includes an unlabeled set and a labeled set;

The first screening module is used to select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on an active learning method, and the plurality of first to-be-labeled samples are manually labeled as the first label set ; All the first to-be-labeled samples and all remaining samples form an unlabeled set;

The second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all the remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set;

The data expansion module uses the first label set, the second label set and the already labelled set together as a training set of the current instance segmentation model.
A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor causes the processor to perform any one of claims 1 to 17. 1. The steps of the sample screening method of one claim.
A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the execution of any one of claims 1 to 17 Steps of the sample screening method.