WO2022077917A1 - Instance segmentation model sample screening method and apparatus, computer device and medium - Google Patents

Instance segmentation model sample screening method and apparatus, computer device and medium Download PDF

Info

Publication number
WO2022077917A1
WO2022077917A1 PCT/CN2021/096675 CN2021096675W WO2022077917A1 WO 2022077917 A1 WO2022077917 A1 WO 2022077917A1 CN 2021096675 W CN2021096675 W CN 2021096675W WO 2022077917 A1 WO2022077917 A1 WO 2022077917A1
Authority
WO
WIPO (PCT)
Prior art keywords
instance
score
sample
labeled
segmentation model
Prior art date
Application number
PCT/CN2021/096675
Other languages
French (fr)
Chinese (zh)
Inventor
王俊
高鹏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022077917A1 publication Critical patent/WO2022077917A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30016Brain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present application relates to the technical field of artificial intelligence, and can be applied to the field of image instance segmentation.
  • the present application specifically provides an instance segmentation model sample screening method, apparatus, computer equipment and medium.
  • Training datasets are datasets with rich labeling information. Collecting and labeling such datasets usually requires huge labor costs.
  • the present application can provide an instance segmentation model sample screening method, device, computer equipment and medium, which can reduce the amount of manual annotation of samples. the purpose of obtaining a large number of samples at the same time.
  • the present application discloses a sample screening method for instance segmentation model, which includes but is not limited to the following steps.
  • a plurality of first to-be-labeled samples with more information than the remaining samples are selected from the unlabeled set, and the first labeled set is obtained by manually labeling the plurality of first to-be-labeled samples. All the first samples to be labeled and all remaining samples form an unlabeled set.
  • a second to-be-labeled sample with a confidence higher than a set value is selected from all the remaining samples, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample.
  • the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
  • an instance segmentation model sample screening device which includes but is not limited to a data reading module, a first screening module, a second screening module and a data expansion module.
  • the data reading module reads the original data set, the original data set includes the unlabeled set and the labeled set.
  • the first screening module is configured to select, based on an active learning method, a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and the plurality of first to-be-labeled samples are manually labeled as the first labeled set. All the first samples to be labeled and all remaining samples form an unlabeled set.
  • the second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set.
  • the data expansion module uses the first label set, the second label set and the already labelled set as the training set of the current instance segmentation model.
  • the present application also provides a computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is made to execute any Steps of a sample screening method in an embodiment.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute any Steps of a sample screening method in an embodiment.
  • the beneficial effects of the present application are: based on the semi-supervised active learning strategy, the present application can select the samples with the largest amount of information about the current model to be labeled by the labelers, and effectively expand the training set through the semi-supervised pseudo-labeling learning method, so the present application A large number of samples for image instance segmentation model training can be obtained while reducing the amount of manual annotation of samples, so as to achieve a more ideal instance segmentation model accuracy.
  • the present application can obtain a large number of model training samples faster while reducing manual labeling to a great extent, so that the training speed of the instance segmentation model of the present application is faster, so the present application has good practical significance and application promotion value.
  • FIG. 1 shows a schematic flowchart of a sample screening method for an instance segmentation model in some embodiments of the present application.
  • FIG. 2 shows a schematic diagram of the working principle of the sample screening apparatus for instance segmentation model in some embodiments of the present application.
  • FIG. 3 shows a schematic diagram of the working principle of the instance segmentation model in some embodiments of the present application.
  • FIG. 4 shows the scores of instance objects in some embodiments of the present application in three dimensions: category, detection frame, and segmentation contour.
  • FIG. 5 shows the scores of instance objects in other embodiments of the present application in three dimensions: category, detection frame, and segmentation contour.
  • FIG. 6 is a schematic diagram showing the comparison of instance segmentation effects that can be achieved by using the present application and the existing method on different numbers of labeled images (taking cerebral hemorrhage area segmentation and fundus edema area segmentation as examples).
  • Figure 7 is a schematic diagram showing the comparison of the model accuracy (applied to the segmentation of intracerebral hemorrhage regions) achieved using the present application and the existing method on different numbers of annotated images.
  • FIG. 8 is a schematic diagram showing the comparison of the model accuracy (applied to fundus edema region segmentation) achieved by using the present application and the existing method on different numbers of labeled images.
  • FIG. 9 shows a block diagram of the internal structure of a computer device in some embodiments of the present application.
  • this application can effectively combine Active Learning and Semi-supervised Learning. both options. Among them, this application can use the advantages of active learning to obtain the best possible generalization model by sampling as few labeled samples as possible, and use the semi-supervised learning to mine the relationship between labeled samples and unlabeled samples. Advantages of better generalizing models. The present application can combine the advantages of these two schemes, and can provide a semi-supervised active learning strategy to achieve rapid acquisition and screening of a large number of instance segmentation model samples.
  • some embodiments of the present application may provide an instance segmentation model sample screening method, which is suitable for medical image analysis with complex layouts, for example, is preferably suitable for images in which different areas are occluded from each other.
  • the method may include: But not limited to the following steps.
  • Step S1 read the original data set
  • the original data set in some embodiments of the present application may include but not limited to data sets such as unlabeled sets, labeled sets, and test sets. It should be understood that the original dataset contains relatively few labeled sets and many unlabeled sets.
  • a dataset refers to a medical image dataset
  • an unlabeled set refers to an unlabeled medical image dataset
  • a labeled set refers to a labeled medical image dataset
  • a test set refers to a medical image that can be used for model evaluation. data set.
  • Step S2 based on the active learning method, the present application first selects a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtains the first labeled set by manually labeling multiple first to-be-labeled samples.
  • a label set is a partial training set formed by manual labeling. All the first to-be-labeled samples and all remaining samples form an unlabeled set in the original data set, that is, the medical image samples to be labeled and the remaining unlabeled medical image samples constitute all the Unlabeled medical image samples.
  • a new training set can be provided for the current instance segmentation model through manual annotation, the number of labeled samples that can be completed through manual annotation is limited in fact.
  • the dataset includes labeled data ⁇ (x 1 ,y 1 ),(x 2 ,y 2 ),...,(x i ,y i ) ⁇ and unlabeled data ⁇ x i+1 ,..., x n ⁇ , the first i samples in the data set are the previous labeled sets, and the remaining ni samples represent the unlabeled sets in the original data set.
  • samples with the largest amount of information may be selected from the unmarked set, and these samples are used for marking by the marking personnel.
  • the instance segmentation model formed based on the present application can work as follows: scan images (that is, images in the original data set, including unlabeled images and labeled images) through the instance segmentation model.
  • Annotated data streams and solid lines can represent annotated data streams; proposals can be generated after scanning the image, and bounding box information and mask information can be generated by classifying the proposed information, and then based on the bounding box information in the subsequent network and mask information to determine the instance detection box score (bbox_score), instance output category score (class_score) and instance contour mask score (mask_score), and then select the amount of information according to the instance detection box score, instance output category score and instance contour mask score the largest number of samples.
  • bbox_score instance output category score
  • mask_score instance contour mask score
  • the instance segmentation model of this embodiment can be extended on the basis of the Faster R-CNN model, wherein the FPN network (a feature extraction network) scans the image based on its own pyramid structure to obtain proposal information, and the process of scanning the image can be a feature map ( feature map) mapping, RPN network (a regional recommendation network) generates bounding box information and mask information by processing proposal information.
  • the processing methods can include binary classification (foreground and background classification) and BB (bounding box, detection box) ) regression, according to the bounding box information and mask information, the coordinates of the detection frame, whether there is a target in the detection frame, and the class label of the detection frame can be determined; then the bounding box information and mask information can be aligned.
  • the subsequent network in this embodiment may include a detection head (RCNN Head) and a segmentation head (Mask Head) in the instance segmentation model, and then output the above instance detection frame score and instance output category score based on the detection head, and output the above based on the segmentation head.
  • the instance contour mask score of , and the output dimension can be 1.
  • the step of selecting a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on the active learning method in this embodiment specifically includes: Calculate the instance detection frame score, instance output category score and instance contour mask score of each sample in the unlabeled set, and then use the instance detection frame score, instance output category score and instance contour mask score to jointly determine the final score of each sample.
  • the instance detection box score is the intersection ratio (IOU) of the instance's predicted bounding box and the ground truth bounding box
  • the instance output category score is the instance's classification value
  • the instance contour mask score is the intersection ratio between the instance's predicted mask and the ground truth mask.
  • the process of determining the final score of each sample by using the instance detection frame score, the instance output category score, and the instance contour mask score includes: using the instance detection frame score, the instance output category score, and the mean and standard deviation of the instance contour mask score Calculate the score of each instance in the current sample, and then use the mean and standard deviation of the scores of each instance in the current sample to calculate the final score of the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set.
  • This embodiment can select the first sample to be labeled from the unlabeled set during the training process of the instance segmentation model, and realize manual labeling of those data based on the active learning algorithm selection. Therefore, this application can screen all unlabeled samples based on the above three-branch information metrics (instance detection frame score, instance output category score, instance outline mask score). When there are k samples, select the top k or less than k samples for manual interpretation and labeling; that is, some embodiments of the present application may perform manual interpretation and labeling on the selected unlabeled k medical image samples.
  • Some embodiments of the present application may, for example, select a plurality of first samples to be labeled from the unlabeled set according to the negative correlation between the final score and the amount of information, as shown in FIG. 4 and FIG. 5 , examples include categories, detection frames and segmentation The scores in the three dimensions of the contour, the lower the comprehensive score of these three scores, the more corresponding samples should be labeled. Select the top k or less than k samples to label for the labeling personnel.
  • relevant labeling personnel such as experts in the medical field
  • the labeled samples can be placed in the training data set directory.
  • some embodiments of the present application may further include the step of calculating a loss function.
  • the loss function of some embodiments of the present application may include five parts, namely detection frame loss L class , output class loss L bbox , outline mask loss L mask , detection frame score loss L bboxIOU , and outline mask score
  • the loss L MaskIOU a total of up to five loss functions can be used together for iterative training and learning of the instance segmentation model.
  • the loss function L semi of the semi-supervised part of the instance segmentation model is calculated as follows:
  • L semi L class +L bbox +L mask +L bboxIOU +L MaskIOU
  • the overall loss function L of the instance segmentation model is calculated as follows:
  • L sup represents the active learning part of the loss function
  • represents the loss balance coefficient
  • the loss balance coefficient is used to suppress the potential noise caused by false annotations, and the default value is 0.01.
  • step S3 a second to-be-labeled sample with a confidence level higher than a set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample.
  • a semi-supervised pseudo-annotation strategy For high-confidence samples, annotated results are automatically generated through a semi-supervised pseudo-annotation strategy.
  • the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected.
  • the sample is used as the second sample to be labeled.
  • the present application can select the second to-be-labeled samples with three metric index scores greater than 0.9 from all the remaining samples during the instance segmentation model training process, and perform pseudo-labeling to obtain approximate reference labeling results, thereby further expanding the training set. , which is conducive to better performance of the model.
  • Step S4 the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
  • the present application can fully utilize the potential of instance segmentation. Therefore, in this application, the obtained first and second annotation sets can be added to the training set, and the model can be trained and updated, so that the information increment of the new samples obtained can greatly increase the number of medical image samples with annotations , update the training to improve the existing target instance segmentation model.
  • the present application by applying the present application to the field of intelligent aided recognition of medical images, the area delineation or quantitative evaluation of different target positions and key organ instances can be performed simultaneously, especially for image areas that may be occluded from each other, the present application can more effectively segment key target instances. . It can be seen that this application can overcome the problem of over-reliance on limited and scarce doctors and experts for labeling, and provide a large number of useful samples for the image instance segmentation model. In addition, it should be understood that the above steps of the present application may be repeatedly performed for many times.
  • some embodiments of the present disclosure are implemented by comparison on a medical image instance segmentation task.
  • the results of training after labeling by adding 500 samples step by step each time can be found that this application labels 1000-1500 samples intelligently selected.
  • the accuracy of the instance segmentation model that can only be achieved by the existing method of training 2000-3000 images can be achieved, and the labeling cost can be reduced by about 50%.
  • the present embodiment provides a graph of the segmentation results of the cerebral hemorrhage area and the fundus edema area in the actual model work. It can be seen that the results obtained in the experiments of the present application are basically consistent with the theoretical conclusions. After intelligently selecting a small number of samples, the instance segmentation effect that can only be achieved by conventional methods with more samples can be achieved. Experiments on the two tasks of CT intracerebral hemorrhage area segmentation and fundus edema area segmentation show that the application can achieve almost the same performance with only about 50% of the sample size of the conventional complete data set.
  • this application essentially provides a set of efficient human in the loop model acquisition method combining sample labeling and training, making full use of expert knowledge and high-confidence prediction of artificial intelligence, for Deep learning reduces data set requirements and provides a new implementation method, which has high practical application significance and promotion value.
  • an instance segmentation model sample screening apparatus which includes but is not limited to a data reading module, a first screening module, a second screening module, and a data expansion module.
  • the data reading module reads the original data set, the original data set includes the unlabeled set and the labeled set.
  • the first screening module is used to select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples based on an active learning method from the unlabeled set, and the plurality of first to-be-labeled samples are manually labeled as the first label set; The unlabeled sample and all remaining samples form an unlabeled set.
  • the second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set.
  • the data expansion module uses the first label set, the second label set and the already labelled set as the training set of the current instance segmentation model.
  • the above-mentioned data such as the original data set and the training set may also be stored in a node of a blockchain.
  • this application selects some high-value samples from a large number of unlabeled original medical images and labels them for labelers (such as doctors), and does not need to label all samples.
  • labelers such as doctors
  • This application can select the samples with the largest amount of information to speed up the training of the instance segmentation model, and the amount of manually labeled data is significantly reduced, which provides a new implementation method for deep learning to reduce data set requirements, realizes efficient data and computing resource utilization, and saves computing. LF.
  • the present application also provides a computer device, including a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes any The steps of the sample screening method in the Examples.
  • the computer equipment can be a PC, a portable electronic device such as a PAD, a tablet computer, a laptop computer, or an intelligent mobile terminal such as a mobile phone, which is not limited to the description here; the computer equipment can also be implemented by a server, The server can be constituted by a cluster system, and in order to realize the function of each unit, it is merged into one computer device or a separate set of functions of each unit.
  • Step S1 reading the original data set, the original data set in this application may include unlabeled sets and labeled sets.
  • Step S2 based on the active learning method, select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtain a first labeled set by manually labeling a plurality of first to-be-labeled samples; all the first to be labeled samples are obtained. The sample and all remaining samples form the unlabeled set.
  • the step of selecting a plurality of first to-be-labeled samples with more information than the remaining samples from the unlabeled set based on the active learning method includes: calculating the instance detection frame score, instance output category score and instance outline mask of each sample in the unlabeled set score, to determine the final score of each sample by using the instance detection frame score, the instance output category score and the instance outline mask score; specifically, in some embodiments of the present application, the instance detection frame score is the intersection of the instance detection frame and the real frame In the comparison, the instance output category score is the classification value of the instance, and the instance contour mask score is the intersection ratio between the detection mask of the instance and the real mask.
  • the process of using the instance detection frame score, instance output category score and instance contour mask score to determine the final score of each sample includes: calculating the current sample using the mean and standard deviation of the instance detection frame score, instance output category score and instance contour mask score The score of each instance in the current sample; the final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set. The first to-be-labeled sample can be selected from the unlabeled set during instance segmentation model training.
  • step S3 a second to-be-labeled sample with a confidence level higher than a set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample.
  • the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected.
  • the sample is used as the second sample to be labeled.
  • the present application can select the second to-be-labeled sample from all remaining samples during the instance segmentation model training process. Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
  • a storage medium storing computer-readable instructions, when executed by one or more processors, causes the one or more processors to perform the steps of the sample screening method as follows in any embodiment of the present application.
  • Step S1 reading the original data set, the original data set in this application may include unlabeled sets and labeled sets.
  • Step S2 based on the active learning method, select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtain a first labeled set by manually labeling a plurality of first to-be-labeled samples; all the first to be labeled samples are obtained. The sample and all remaining samples form the unlabeled set.
  • the step of selecting a plurality of first to-be-labeled samples with more information than the remaining samples from the unlabeled set based on the active learning method includes: calculating the instance detection frame score, instance output category score and instance outline mask of each sample in the unlabeled set score, to determine the final score of each sample by using the instance detection frame score, the instance output category score and the instance outline mask score; specifically, in some embodiments of the present application, the instance detection frame score is the intersection of the instance detection frame and the real frame In the comparison, the instance output category score is the classification value of the instance, and the instance contour mask score is the intersection ratio between the detection mask of the instance and the real mask.
  • the process of using the instance detection frame score, instance output category score and instance contour mask score to determine the final score of each sample includes: calculating the current sample using the mean and standard deviation of the instance detection frame score, instance output category score and instance contour mask score The score of each instance in the current sample; the final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set. The first to-be-labeled sample can be selected from the unlabeled set during instance segmentation model training.
  • step S3 a second to-be-labeled sample with a confidence higher than the set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample.
  • the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected.
  • the sample is used as the second sample to be labeled.
  • the present application can select the second to-be-labeled sample from all remaining samples during the instance segmentation model training process. Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, and may be embodied in any computer-readable storage medium , for use by an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch and execute instructions from an instruction execution system, apparatus, or device), or in conjunction with these instruction execution systems, device or equipment.
  • a "computer-readable storage medium” can be any device that can contain, store, communicate, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or apparatus .
  • the computer-readable storage medium may be non-volatile or volatile.
  • Computer readable storage media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM, Random Access Memory), Read-Only Memory (ROM, Read-Only Memory), Erasable and Editable Read-Only Memory (EPROM, Erasable Programmable Read-Only Memory, or Flash Memory), Optical Devices, and Portable Optical Disc Read-Only Memory (CDROM, Compact Disc Read-Only Memory).
  • wiring electronic devices
  • portable computer disk cartridges magnetic devices
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • Optical Devices and Portable Optical Disc Read-Only Memory (CDROM, Compact Disc Read-Only Memory).
  • the computer-readable storage medium may even be paper or other suitable medium on which the program can be printed, as the paper or other medium may be optically scanned, for example, and then edited, interpreted or, if necessary, otherwise Process in a suitable manner to obtain the program electronically and then store it in computer memory.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with “first”, “second” may expressly or implicitly include at least one of that feature.
  • plurality means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.

Abstract

An instance segmentation model sample screening method, which relates to artificial intelligence, and can be used for a medical image analysis assistance scenario. The method comprises: reading an original data set; picking out, on the basis of an active learning manner and from an unlabeled set, first samples to be labeled, information amounts of which are greater than those of remaining samples, so as to obtain a first labeled set in a manner of manually labeling the plurality of first samples to be labeled; selecting, on the basis of a semi-supervised learning manner and from all the remaining samples, second samples to be labeled, confidence coefficients of which are higher than a set value, so as to obtain a second labeled set in a manner of pseudo-labeling said second samples; and taking the first labeled set, the second labeled set and a labeled set together as a training set. By means of the method, a manual sample labeling amount is reduced, and a large number of samples used for training an image instance segmentation model can be obtained, such that a more ideal accuracy for the instance segmentation model can be realized. In addition, the method further relates to blockchain technology, and both an original data set and a training set can be stored in a blockchain.

Description

实例分割模型样本筛选方法、装置、计算机设备及介质Instance segmentation model sample screening method, apparatus, computer equipment and medium
本申请要求于2020年10月14日提交中国专利局、申请号为202011099366.0,发明名称为“实例分割模型样本筛选方法、装置、计算机设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on October 14, 2020 with the application number 202011099366.0 and the invention titled "Instance Segmentation Model Sample Screening Method, Apparatus, Computer Equipment and Medium", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,能够应用在图像实例分割领域中,本申请具体提供了实例分割模型样本筛选方法、装置、计算机设备及介质。The present application relates to the technical field of artificial intelligence, and can be applied to the field of image instance segmentation. The present application specifically provides an instance segmentation model sample screening method, apparatus, computer equipment and medium.
背景技术Background technique
随着深度学习的不断发展,计算机视觉取得了越来越大的成功,而这要归功于大型训练数据集的支持。训练数据集(简称训练集)是带有丰富标注信息的数据集,收集并标注这样的数据集通常需要庞大的人力成本。With the continuous development of deep learning, computer vision has achieved increasing success thanks to the support of large training datasets. Training datasets (training datasets for short) are datasets with rich labeling information. Collecting and labeling such datasets usually requires huge labor costs.
与图像分类技术相比,图像实例分割难度系数更高,必须要大量具有标注的训练数据才能真正实现实例分割功能。但是发明人意识到,可获取的有标注样本数量相对于问题的规模来说往往不足,或者获取样本的代价过高。在很多情况下,具备相关专业知识的标注人员(如医生)稀缺或难以抽出时间,或者标注人员的标注成本过高,再或者图像的标注或判断周期过长,这些问题都可能实例分割模型无法有效训练。Compared with image classification technology, image instance segmentation is more difficult, and a large amount of labeled training data is required to truly realize instance segmentation. However, the inventors realized that the number of available annotated samples is often insufficient relative to the scale of the problem, or the cost of obtaining samples is too high. In many cases, annotators with relevant professional knowledge (such as doctors) are scarce or difficult to spare time, or the labeling cost of the annotators is too high, or the labeling or judgment cycle of images is too long, these problems may be impossible for the instance segmentation model. Effective training.
因此,如何能够得到大量的用于图像实例分割模型训练的样本(训练数据集)成为了本领域技术人员的一个研究热点。Therefore, how to obtain a large number of samples (training data sets) for training the image instance segmentation model has become a research hotspot for those skilled in the art.
发明内容SUMMARY OF THE INVENTION
为解决现有技术存在的难以获得大量的用于图像实例分割模型训练的样本等问题,本申请能够提供实例分割模型样本筛选方法、装置、计算机设备及介质,可达到在减小样本人工标注量的同时获得大量样本的目的。In order to solve the problem in the prior art that it is difficult to obtain a large number of samples for image instance segmentation model training, the present application can provide an instance segmentation model sample screening method, device, computer equipment and medium, which can reduce the amount of manual annotation of samples. the purpose of obtaining a large number of samples at the same time.
为实现上述技术目的,本申请公开了一种实例分割模型样本筛选方法,该方法包括但不限于如下的步骤。In order to achieve the above technical purpose, the present application discloses a sample screening method for instance segmentation model, which includes but is not limited to the following steps.
读取原始数据集,原始数据集包括未标注集和已标注集。Read the original dataset, which includes unlabeled and labeled sets.
基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,通过人工标注多个第一待标注样本的方式得到第一标注集。所有第一待标注样本和所有剩余样本组成未标注集。Based on the active learning method, a plurality of first to-be-labeled samples with more information than the remaining samples are selected from the unlabeled set, and the first labeled set is obtained by manually labeling the plurality of first to-be-labeled samples. All the first samples to be labeled and all remaining samples form an unlabeled set.
基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,通过伪标注第二待标注样本的方式得到第二标注集。Based on the semi-supervised learning method, a second to-be-labeled sample with a confidence higher than a set value is selected from all the remaining samples, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample.
将第一标注集、第二标注集以及已标注集共同作为当前实例分割模型的训练集。The first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
为实现上述的技术目的,本申请还公开了一种实例分割模型样本筛选装置,该装置包括但不限于数据读取模块、第一筛选模块、第二筛选模块及数据扩充模块。In order to achieve the above technical purpose, the present application also discloses an instance segmentation model sample screening device, which includes but is not limited to a data reading module, a first screening module, a second screening module and a data expansion module.
数据读取模块,读取原始数据集,原始数据集包括未标注集和已标注集。The data reading module reads the original data set, the original data set includes the unlabeled set and the labeled set.
第一筛选模块,用于基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,多个第一待标注样本被人工标注为第一标注集。所有第一待标注样本和所有剩余样本组成未标注集。The first screening module is configured to select, based on an active learning method, a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and the plurality of first to-be-labeled samples are manually labeled as the first labeled set. All the first samples to be labeled and all remaining samples form an unlabeled set.
第二筛选模块,用于基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,第二待标注样本被伪标注为第二标注集。The second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set.
数据扩充模块,将第一标注集、第二标注集以及已标注集共同作为当前实例分割模型的训练集。The data expansion module uses the first label set, the second label set and the already labelled set as the training set of the current instance segmentation model.
为实现上述的技术目的,本申请还提供了一种计算机设备,包括存储器和处理器, 存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行如本申请任一实施例中的样本筛选方法的步骤。In order to achieve the above-mentioned technical purpose, the present application also provides a computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is made to execute any Steps of a sample screening method in an embodiment.
为实现上述的技术目的,本申请还提供了一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如本申请任一实施例中的样本筛选方法的步骤。In order to achieve the above-mentioned technical purpose, the present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute any Steps of a sample screening method in an embodiment.
本申请的有益效果为:基于半监督主动学习策略,本申请能够挑选出对当前模型信息量最大的样本给标注人员标注,并通过半监督伪标注学习方式进行训练集合有效地扩充,所以本申请能够在减小样本人工标注量的同时获得大量的用于图像实例分割模型训练的样本,以实现更理想的实例分割模型准确率。The beneficial effects of the present application are: based on the semi-supervised active learning strategy, the present application can select the samples with the largest amount of information about the current model to be labeled by the labelers, and effectively expand the training set through the semi-supervised pseudo-labeling learning method, so the present application A large number of samples for image instance segmentation model training can be obtained while reducing the amount of manual annotation of samples, so as to achieve a more ideal instance segmentation model accuracy.
本申请可在极大程度上减少人工标注的同时更快地得到大量的模型训练用样本,以使得应用本申请的实例分割模型的训练速度更快,因而本申请具有良好的实践意义和应用推广价值。The present application can obtain a large number of model training samples faster while reducing manual labeling to a great extent, so that the training speed of the instance segmentation model of the present application is faster, so the present application has good practical significance and application promotion value.
附图说明Description of drawings
图1示出了本申请一些实施例中的实例分割模型样本筛选方法的流程示意图。FIG. 1 shows a schematic flowchart of a sample screening method for an instance segmentation model in some embodiments of the present application.
图2示出了本申请一些实施例中的实例分割模型样本筛选装置的工作原理示意图。FIG. 2 shows a schematic diagram of the working principle of the sample screening apparatus for instance segmentation model in some embodiments of the present application.
图3示出了本申请一些实施例中的实例分割模型的工作原理示意图。FIG. 3 shows a schematic diagram of the working principle of the instance segmentation model in some embodiments of the present application.
图4示出了本申请一些实施例中的实例目标在类别、检测框、分割轮廓三个维度上的分数。FIG. 4 shows the scores of instance objects in some embodiments of the present application in three dimensions: category, detection frame, and segmentation contour.
图5示出了本申请另一些实施例中的实例目标在类别、检测框、分割轮廓三个维度上的分数。FIG. 5 shows the scores of instance objects in other embodiments of the present application in three dimensions: category, detection frame, and segmentation contour.
图6示出了在不同的标注图像数量上使用本申请和现有方法所能够达到的实例分割效果(以脑出血区域分割和眼底水肿区域分割为例)的对比示意图。FIG. 6 is a schematic diagram showing the comparison of instance segmentation effects that can be achieved by using the present application and the existing method on different numbers of labeled images (taking cerebral hemorrhage area segmentation and fundus edema area segmentation as examples).
图7示出了在不同的标注图像数量上使用本申请和现有方法所达到的模型精度(应用于脑出血区域分割)的对比示意图。Figure 7 is a schematic diagram showing the comparison of the model accuracy (applied to the segmentation of intracerebral hemorrhage regions) achieved using the present application and the existing method on different numbers of annotated images.
图8示出了在不同的标注图像数量上使用本申请和现有方法所达到的模型精度(应用于眼底水肿区域分割)的对比示意图。FIG. 8 is a schematic diagram showing the comparison of the model accuracy (applied to fundus edema region segmentation) achieved by using the present application and the existing method on different numbers of labeled images.
图9示出了本申请一些实施例中的计算机设备的内部结构框图。FIG. 9 shows a block diagram of the internal structure of a computer device in some embodiments of the present application.
具体实施方式Detailed ways
下面结合说明书附图对本申请所提供的一种实例分割模型样本筛选方法、装置、计算机设备及介质进行详细的解释和说明。Hereinafter, a detailed explanation and description of a sample screening method, apparatus, computer equipment and medium of an instance segmentation model provided by the present application will be given in conjunction with the accompanying drawings.
在医疗影像智能分析辅助场景下,为解决常规技术存在的大量的实例分割模型用训练样本的难以获取问题,本申请能够有效地结合主动学习(Active Learning)和半监督学习(Semi-supervised Learning)这两种方案。其中,本申请可利用主动学习具有的通过尽可能少地采样标注样本而得到尽可能好的泛化模型的优点,利用半监督学习具有的挖掘有标注样本和无标注样本之间的联系而获得更好的泛化模型的优点。本申请能够将这两种方案的优点结合在一起,可提供半监督主动学习策略,以实现大量实例分割模型样本的快速地获取和筛选。In the auxiliary scenario of intelligent analysis of medical images, in order to solve the problem of difficulty in obtaining a large number of training samples for instance segmentation models in conventional technologies, this application can effectively combine Active Learning and Semi-supervised Learning. both options. Among them, this application can use the advantages of active learning to obtain the best possible generalization model by sampling as few labeled samples as possible, and use the semi-supervised learning to mine the relationship between labeled samples and unlabeled samples. Advantages of better generalizing models. The present application can combine the advantages of these two schemes, and can provide a semi-supervised active learning strategy to achieve rapid acquisition and screening of a large number of instance segmentation model samples.
如图1所示,本申请的一些实施例可提供一种实例分割模型样本筛选方法,适于具有复杂布局的医疗影像分析中,例如较好适用于不同区域相互遮挡的图像,该方法可包括但不限于如下的步骤。As shown in FIG. 1 , some embodiments of the present application may provide an instance segmentation model sample screening method, which is suitable for medical image analysis with complex layouts, for example, is preferably suitable for images in which different areas are occluded from each other. The method may include: But not limited to the following steps.
步骤S1,读取原始数据集,本申请一些实施例中的原始数据集可以包括但不限于未标注集、已标注集及测试集等数据集。应理解的是,原始数据集中包含的已标注集比较少、未标注集非常多。本申请一些实施例中数据集是指医学图像数据集,未标注集表示未标记的医学图像数据集,已标注集表示已标记的医学图像数据集,测试集表示可用于进行模型评估的医疗图像数据集。Step S1, read the original data set, the original data set in some embodiments of the present application may include but not limited to data sets such as unlabeled sets, labeled sets, and test sets. It should be understood that the original dataset contains relatively few labeled sets and many unlabeled sets. In some embodiments of the present application, a dataset refers to a medical image dataset, an unlabeled set refers to an unlabeled medical image dataset, a labeled set refers to a labeled medical image dataset, and a test set refers to a medical image that can be used for model evaluation. data set.
步骤S2,基于主动学习方式,本申请先从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,通过人工标注多个第一待标注样本的方式得到第一标注集,第一标注集为通过人工标注形成的部分训练集,所有第一待标注样本和所有剩余样本组成原始数据集中的未标注集,即待标注的医学图像样本和剩余未标注的医学图像样本共组成所有未标记的医学图像样本。如图2所示,虽然能够通过人工标注方式为当前实例分割模型提供新的训练集,但事实上能够通过人工标注完成的标注样本的数量是受到限制的。Step S2, based on the active learning method, the present application first selects a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtains the first labeled set by manually labeling multiple first to-be-labeled samples. A label set is a partial training set formed by manual labeling. All the first to-be-labeled samples and all remaining samples form an unlabeled set in the original data set, that is, the medical image samples to be labeled and the remaining unlabeled medical image samples constitute all the Unlabeled medical image samples. As shown in Figure 2, although a new training set can be provided for the current instance segmentation model through manual annotation, the number of labeled samples that can be completed through manual annotation is limited in fact.
具体实施时,可令D={(x 1,y 1),(x 2,y 2),...,(x i,y i),x i+1,...,x n}表示整个数据集,其中,x表示样本,y表示标注结果。该数据集包括已标注数据{(x 1,y 1),(x 2,y 2),...,(x i,y i)}和未标注数据{x i+1,...,x n},数据集中前i个样本为在先的已标注集,其余n-i个样本表示原始数据集中的未标注集。本实施例可从未标注集中选择信息量最大的若干个样本(例如具有最大信息量的前k个样本),这些样本用于标注人员进行标注。k的具体值可根据实际情况进行合理的选择,例如k=500。 In specific implementation, D={(x 1 , y 1 ), (x 2 , y 2 ),...,(x i ,y i ),x i+1 ,...,x n } can be expressed as The entire dataset, where x represents the sample and y represents the labeling result. The dataset includes labeled data {(x 1 ,y 1 ),(x 2 ,y 2 ),...,(x i ,y i )} and unlabeled data {x i+1 ,..., x n }, the first i samples in the data set are the previous labeled sets, and the remaining ni samples represent the unlabeled sets in the original data set. In this embodiment, several samples with the largest amount of information (for example, the top k samples with the largest amount of information) may be selected from the unmarked set, and these samples are used for marking by the marking personnel. The specific value of k can be reasonably selected according to the actual situation, for example, k=500.
如图3所示,基于本申请形成的实例分割模型可按照如下方式工作:通过实例分割模型扫描图像(即原始数据集中的图像,包括未标注图像和已标注图像),图3中虚线表示未标注数据流、实线可表示已标注数据流;扫描图像后可生成提议信息(proposals),通过对提议信息进行分类的方式生成边界框信息和掩码信息,然后在后续网络中根据边界框信息和掩码信息确定实例检测框得分(bbox_score)、实例输出类别得分(class_score)及实例轮廓掩码得分(mask_score),进而根据实例检测框得分、实例输出类别得分以及实例轮廓掩码得分选择信息量最大的若干个样本。本实施例的实例分割模型可在Faster R-CNN模型的基础上扩展,其中,FPN网络(一种特征提取网络)基于自身的金字塔结构扫描图像后得到提议信息,扫描图像过程可为特征图(feature map)映射,RPN网络(一种区域推荐网络)通过处理提议信息的方式生成边界框信息以及掩码信息,处理方式可包括二值分类(前景、背景分类)和BB(bounding box,检测框)回归,根据边界框信息和掩码信息可确定检测框坐标、检测框内是否存在目标以及检测框的类标签等内容;然后对边界框信息和掩码信息进行有价值选区对齐(ROI Align)处理后送入后续网络,有价值选区对齐处理用于将原图和特征图的像素对应起来。本实施例中的后续网络可包括实例分割模型中的检测头(RCNN Head)和分割头(Mask Head),进而基于检测头输出如上的实例检测框得分和实例输出类别得分,基于分割头输出如上的实例轮廓掩码得分,输出的维度均可为1。As shown in FIG. 3 , the instance segmentation model formed based on the present application can work as follows: scan images (that is, images in the original data set, including unlabeled images and labeled images) through the instance segmentation model. Annotated data streams and solid lines can represent annotated data streams; proposals can be generated after scanning the image, and bounding box information and mask information can be generated by classifying the proposed information, and then based on the bounding box information in the subsequent network and mask information to determine the instance detection box score (bbox_score), instance output category score (class_score) and instance contour mask score (mask_score), and then select the amount of information according to the instance detection box score, instance output category score and instance contour mask score the largest number of samples. The instance segmentation model of this embodiment can be extended on the basis of the Faster R-CNN model, wherein the FPN network (a feature extraction network) scans the image based on its own pyramid structure to obtain proposal information, and the process of scanning the image can be a feature map ( feature map) mapping, RPN network (a regional recommendation network) generates bounding box information and mask information by processing proposal information. The processing methods can include binary classification (foreground and background classification) and BB (bounding box, detection box) ) regression, according to the bounding box information and mask information, the coordinates of the detection frame, whether there is a target in the detection frame, and the class label of the detection frame can be determined; then the bounding box information and mask information can be aligned. After processing, it is sent to the subsequent network, and the valuable selection alignment processing is used to correspond the pixels of the original image and the feature map. The subsequent network in this embodiment may include a detection head (RCNN Head) and a segmentation head (Mask Head) in the instance segmentation model, and then output the above instance detection frame score and instance output category score based on the detection head, and output the above based on the segmentation head. The instance contour mask score of , and the output dimension can be 1.
更为具体地,在图3中的实例分割模型整体架构设计下,本实施例中基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本的步骤具体包括:计算未标注集中各样本的实例检测框得分、实例输出类别得分以及实例轮廓掩码得分,再利用实例检测框得分、实例输出类别得分以及实例轮廓掩码得分共同确定各样本的最终得分。本申请一些实施例中,实例检测框得分为实例的检测框(predicted bounding box)与真实框(ground truth bounding box)两者的交并比(IOU),实例输出类别得分为实例的分类值,实例轮廓掩码得分为实例的检测掩码(predicted mask)与真实掩码(ground truth mask)间的交并比。本实施例中利用实例检测框得分、实例输出类别得分以及实例轮廓掩码得分确定各样本的最终得分过程包括:利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分的均值和标准差计算当前样本中各实例的得分,然后利用当前样本中各实例的得分的均值和标准差计算当前样本的最终得分。依据最终得分与信息量之间的负相关或正相关关系从未标注集中挑选出多个第一待标注样本。More specifically, under the overall architecture design of the instance segmentation model in FIG. 3 , the step of selecting a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on the active learning method in this embodiment specifically includes: Calculate the instance detection frame score, instance output category score and instance contour mask score of each sample in the unlabeled set, and then use the instance detection frame score, instance output category score and instance contour mask score to jointly determine the final score of each sample. In some embodiments of this application, the instance detection box score is the intersection ratio (IOU) of the instance's predicted bounding box and the ground truth bounding box, and the instance output category score is the instance's classification value, The instance contour mask score is the intersection ratio between the instance's predicted mask and the ground truth mask. In this embodiment, the process of determining the final score of each sample by using the instance detection frame score, the instance output category score, and the instance contour mask score includes: using the instance detection frame score, the instance output category score, and the mean and standard deviation of the instance contour mask score Calculate the score of each instance in the current sample, and then use the mean and standard deviation of the scores of each instance in the current sample to calculate the final score of the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set.
利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分的均值和标准差计算当前样本中各实例(instance)的得分
Figure PCTCN2021096675-appb-000001
的公式如下。其均值计算用于将所有得分综合起来,标准差计算用于统计得分的多样性。
Calculate the score of each instance in the current sample using the mean and standard deviation of the instance detection frame score, instance output category score, and instance outline mask score
Figure PCTCN2021096675-appb-000001
The formula is as follows. The mean calculation is used to aggregate all the scores, and the standard deviation calculation is used to count the diversity of scores.
Figure PCTCN2021096675-appb-000002
Figure PCTCN2021096675-appb-000002
其中,
Figure PCTCN2021096675-appb-000003
代表第i个样本中的第j个实例的得分,
Figure PCTCN2021096675-appb-000004
分别代表第i个样本中的第j个实例的实例输出类别得分、实例检测框得分、实例轮廓掩码得分,std表示标准差计算符号,mean表示均值计算符号。
in,
Figure PCTCN2021096675-appb-000003
represents the score of the jth instance in the ith sample,
Figure PCTCN2021096675-appb-000004
Represents the instance output category score, instance detection frame score, instance outline mask score of the jth instance in the ith sample, std represents the standard deviation calculation symbol, and mean represents the mean value calculation symbol.
利用当前样本中各实例的得分的均值和标准差计算当前样本的最终得分S i的公式如下。 The formula for calculating the final score S i of the current sample by using the mean and standard deviation of the scores of each instance in the current sample is as follows.
Figure PCTCN2021096675-appb-000005
Figure PCTCN2021096675-appb-000005
本实施例能够在实例分割模型训练过程中从未标注集中挑选第一待标注样本,并实现基于主动学习算法选择对那些数据进行人工标注。所以本申请可基于如上的三分支信息度量指标(实例检测框得分、实例输出类别得分、实例轮廓掩码得分)筛选所有未标注样本,本申请实施过程中可在当标注的时间人力成本够标注k个样本时,选择排在前k个或少于k个样本进行人工判读和标注;即本申请一些实施例可以对挑选出的未标记的k个医学图像样本进行人工判读和标注。This embodiment can select the first sample to be labeled from the unlabeled set during the training process of the instance segmentation model, and realize manual labeling of those data based on the active learning algorithm selection. Therefore, this application can screen all unlabeled samples based on the above three-branch information metrics (instance detection frame score, instance output category score, instance outline mask score). When there are k samples, select the top k or less than k samples for manual interpretation and labeling; that is, some embodiments of the present application may perform manual interpretation and labeling on the selected unlabeled k medical image samples.
本申请一些实施例例如可依据最终得分与信息量之间的负相关关系从未标注集中挑选出多个第一待标注样本,如图4和图5所示,实例有类别、检测框及分割轮廓三个维度上的得分,这三个得分的综合得分越低,则对应的样本更应该需要被标注。选择排在前k个或少于k个样本给标注人员标注,本实施例可通过相关标注人员(如医疗领域专家)进行标注,可将标注好的样本放在训练数据集目录下。Some embodiments of the present application may, for example, select a plurality of first samples to be labeled from the unlabeled set according to the negative correlation between the final score and the amount of information, as shown in FIG. 4 and FIG. 5 , examples include categories, detection frames and segmentation The scores in the three dimensions of the contour, the lower the comprehensive score of these three scores, the more corresponding samples should be labeled. Select the top k or less than k samples to label for the labeling personnel. In this embodiment, relevant labeling personnel (such as experts in the medical field) can be used for labeling, and the labeled samples can be placed in the training data set directory.
为了使实例分割模型发挥出更好的性能,本申请一些实施例还可包括计算损失函数的步骤。如图3所示,本申请一些实施例的损失函数可包括五部分,即检测框损失L class、输出类别损失L bbox、轮廓掩码损失L mask、检测框得分损失L bboxIOU、轮廓掩码得分损失L MaskIOU,总共最多能够使用五种损失函数一起用于实例分割模型的反复训练和学习。 In order to make the instance segmentation model exert better performance, some embodiments of the present application may further include the step of calculating a loss function. As shown in FIG. 3 , the loss function of some embodiments of the present application may include five parts, namely detection frame loss L class , output class loss L bbox , outline mask loss L mask , detection frame score loss L bboxIOU , and outline mask score The loss L MaskIOU , a total of up to five loss functions can be used together for iterative training and learning of the instance segmentation model.
其中,实例分割模型中半监督部分的损失函数L semi计算如下: Among them, the loss function L semi of the semi-supervised part of the instance segmentation model is calculated as follows:
L semi=L class+L bbox+L mask+L bboxIOU+L MaskIOU L semi =L class +L bbox +L mask +L bboxIOU +L MaskIOU
结合主动学习部分,则实例分割模型整体的损失函数L计算如下:Combined with the active learning part, the overall loss function L of the instance segmentation model is calculated as follows:
L=L sup+β*L semi L=L sup +β*L semi
其中,L sup表示主动学习部分损失函数,β表示损失平衡系数;损失平衡系数用于抑制伪标注带来的潜在的噪声,默认值为0.01。 Among them, L sup represents the active learning part of the loss function, and β represents the loss balance coefficient; the loss balance coefficient is used to suppress the potential noise caused by false annotations, and the default value is 0.01.
步骤S3,基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,通过伪标注第二待标注样本的方式得到第二标注集。对高置信度样本通过半监督伪标注策略自动生成标注结果。其中,基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本的步骤包括:获取所有剩余样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分;当前样本的实例检测框得分大于第一阈值且实例输出类别得分大于第二阈值且实例轮廓掩码得分大于第三阈值时,判断出当前样本的置信度高于设定值,挑选出当前样本作为第二待标注样本。本申请一些实施例中的第一阈值、第二阈值、第三阈值三者可以相等,例如第一阈值=第二阈值=第三阈值=0.9。本申请能够在实例分割模型训练过程中从所有剩余样本中挑选出三个度量指标得分均大于0.9的第二待标注样本,并进行伪标注,得到近似参考标注结果,从而可实现进一步扩充训练集,有利于模型性能更好地提升。In step S3, a second to-be-labeled sample with a confidence level higher than a set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample. For high-confidence samples, annotated results are automatically generated through a semi-supervised pseudo-annotation strategy. Wherein, the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected. The sample is used as the second sample to be labeled. In some embodiments of the present application, the first threshold, the second threshold, and the third threshold may be equal, for example, the first threshold=the second threshold=the third threshold=0.9. The present application can select the second to-be-labeled samples with three metric index scores greater than 0.9 from all the remaining samples during the instance segmentation model training process, and perform pseudo-labeling to obtain approximate reference labeling results, thereby further expanding the training set. , which is conducive to better performance of the model.
步骤S4,将第一标注集、第二标注集和已标注集共同作为当前实例分割模型的训练集。以此训练集训练用于医疗影像的分析任务的实例分割模型,本申请可充分发挥出实例分割的潜力。所以本申请可将得到的第一标注集和第二标注集加入到训练集,对模型 进行训练更新,从而利用获得的新样本的信息增量,使具有标注的医学图像样本数量得到极大地增加,更新训练提升已有的目标实例分割模型。例如,将本申请应用于医疗影像智能辅助识别领域,可同时进行不同目标位置、关键器官实例的区域勾画即量化评估,特别对于可能相互遮挡的图像区域,本申请能够更有效进行关键目标实例分割。可见本申请能够克服过分依赖精力有限且稀缺的医生专家进行标注的问题,为图像实例分割模型提供大量的有用样本。另外,应当理解的是,本申请上述的步骤可重复执行多次。Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model. Using this training set to train an instance segmentation model for the analysis task of medical images, the present application can fully utilize the potential of instance segmentation. Therefore, in this application, the obtained first and second annotation sets can be added to the training set, and the model can be trained and updated, so that the information increment of the new samples obtained can greatly increase the number of medical image samples with annotations , update the training to improve the existing target instance segmentation model. For example, by applying the present application to the field of intelligent aided recognition of medical images, the area delineation or quantitative evaluation of different target positions and key organ instances can be performed simultaneously, especially for image areas that may be occluded from each other, the present application can more effectively segment key target instances. . It can be seen that this application can overcome the problem of over-reliance on limited and scarce doctors and experts for labeling, and provide a large number of useful samples for the image instance segmentation model. In addition, it should be understood that the above steps of the present application may be repeatedly performed for many times.
如图7和图8所示,本公开的一些实施例在医学影像实例分割任务上进行对比实现。与MC Dropout、Core Set、Class Entropy、Learning Loss等现有方法相比,通过每次逐步地增加500张样本进行标注后训练的结果可发现,本申请在智能挑选的1000~1500张样本进行标注后的训练就能达到现有方法2000~3000张训练才能达到的实例分割模型精度,减少约50%的标注成本。As shown in FIG. 7 and FIG. 8 , some embodiments of the present disclosure are implemented by comparison on a medical image instance segmentation task. Compared with the existing methods such as MC Dropout, Core Set, Class Entropy, Learning Loss, etc., the results of training after labeling by adding 500 samples step by step each time can be found that this application labels 1000-1500 samples intelligently selected. After training, the accuracy of the instance segmentation model that can only be achieved by the existing method of training 2000-3000 images can be achieved, and the labeling cost can be reduced by about 50%.
如图6所示,以现有Class Entropy方法为例,本实施例给出了在实际模型工作中对脑出血区域和眼底水肿区域的分割结果图。可见本申请实验得到的结果与理论上得到的结论基本符合,在智能挑选少量样本后就能够达到常规方法较多样本才能实现的实例分割效果。在CT脑出血区域分割和眼底水肿区域分割两个任务上的实验表明:本申请能够仅用常规完整的数据集的大约50%样本量实现几乎同等性能,可见本申请提供的方案明显优于现有其他方法,可节省大量的人力和物力。本申请每次挑选的都是对改进和提升目标分割模型最有价值的样本加入训练,在保证任务精度的基础上,有效地减少了标注代价以及工作量,极大地提高了标注效率,最终在少人工标注的前提下得到了大量标注样本。因此,采用本申请的实例分割模型能够具有更大量样本的训练集,极大地提升模型精度。更重要的是,本申请实质上提供了一套高效的人为回环(human in the loop)的样本标注和训练结合的模型习得方法,充分利用了专家知识和人工智能的高置信度预测,为深度学习降低数据集要求提供了新的实现方法,具有较高的实践应用意义以及推广价值。As shown in FIG. 6 , taking the existing Class Entropy method as an example, the present embodiment provides a graph of the segmentation results of the cerebral hemorrhage area and the fundus edema area in the actual model work. It can be seen that the results obtained in the experiments of the present application are basically consistent with the theoretical conclusions. After intelligently selecting a small number of samples, the instance segmentation effect that can only be achieved by conventional methods with more samples can be achieved. Experiments on the two tasks of CT intracerebral hemorrhage area segmentation and fundus edema area segmentation show that the application can achieve almost the same performance with only about 50% of the sample size of the conventional complete data set. It can be seen that the solution provided in this application is significantly better than the current There are other methods that can save a lot of manpower and material resources. Each time this application selects the most valuable samples for improving and enhancing the target segmentation model for training. On the basis of ensuring the accuracy of the task, the labeling cost and workload are effectively reduced, and the labeling efficiency is greatly improved. A large number of labeled samples were obtained under the premise of less manual annotation. Therefore, using the instance segmentation model of the present application can have a training set with a larger number of samples, which greatly improves the accuracy of the model. More importantly, this application essentially provides a set of efficient human in the loop model acquisition method combining sample labeling and training, making full use of expert knowledge and high-confidence prediction of artificial intelligence, for Deep learning reduces data set requirements and provides a new implementation method, which has high practical application significance and promotion value.
如图2所示,本申请另一些实施例能够提供一种实例分割模型样本筛选装置,该装置包括但不限于数据读取模块、第一筛选模块、第二筛选模块及数据扩充模块。As shown in FIG. 2 , other embodiments of the present application can provide an instance segmentation model sample screening apparatus, which includes but is not limited to a data reading module, a first screening module, a second screening module, and a data expansion module.
数据读取模块,读取原始数据集,原始数据集包括未标注集和已标注集。The data reading module reads the original data set, the original data set includes the unlabeled set and the labeled set.
第一筛选模块,用于基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,多个第一待标注样本被人工标注为第一标注集;所有第一待标注样本和所有剩余样本组成未标注集。The first screening module is used to select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples based on an active learning method from the unlabeled set, and the plurality of first to-be-labeled samples are manually labeled as the first label set; The unlabeled sample and all remaining samples form an unlabeled set.
第二筛选模块,用于基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,第二待标注样本被伪标注为第二标注集。The second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set.
数据扩充模块,将第一标注集、第二标注集以及已标注集共同作为当前实例分割模型的训练集。The data expansion module uses the first label set, the second label set and the already labelled set as the training set of the current instance segmentation model.
需要强调的是,为进一步保证本申请实施例中的数据的私密和安全性,上述的原始数据集和训练集等数据还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the data in the embodiments of the present application, the above-mentioned data such as the original data set and the training set may also be stored in a node of a blockchain.
基于主动学习策略,本申请从未标注的大量原始医学图像中挑选部分高价值样本给标注人员(如医生)标注,不需要对所有的样本进行标注。每次都挑选对改进深度学习实例分割模型最有价值的样本加入训练,从而在获取理想任务精度的基础上有效减少了标注代价和医生工作量,最大化样本人工标注效率。本申请能选择信息量最大的样本来加速实例分割模型训练,使用人工标注数据量明显降低,为深度学习降低数据集要求提供了新的实现方法,实现高效的数据和计算资源利用,节省了计算资源消耗。结合实例分割模型的预测输出,我们提供了的医学图像实例分割的半监督主动学习框架,可以和主流的实例分割模型融合在一起,从而可以显著地节省训练深度神经网络实例分割模型的标注成本。经过上述实验表明,在本申请的基础上能够训练得到泛化能力更强更准确 的医学图像实例分割模型,减少网络过拟合以更好的适应医学应用等场景。Based on the active learning strategy, this application selects some high-value samples from a large number of unlabeled original medical images and labels them for labelers (such as doctors), and does not need to label all samples. Each time, the most valuable samples for improving the deep learning instance segmentation model are selected for training, thereby effectively reducing the labeling cost and the workload of doctors on the basis of obtaining the ideal task accuracy, and maximizing the efficiency of manual labeling of samples. This application can select the samples with the largest amount of information to speed up the training of the instance segmentation model, and the amount of manually labeled data is significantly reduced, which provides a new implementation method for deep learning to reduce data set requirements, realizes efficient data and computing resource utilization, and saves computing. LF. Combined with the predicted output of the instance segmentation model, we provide a semi-supervised active learning framework for instance segmentation of medical images, which can be integrated with mainstream instance segmentation models, which can significantly save the annotation cost of training deep neural network instance segmentation models. The above experiments show that a medical image instance segmentation model with stronger generalization ability and more accurate can be trained on the basis of this application, and the network overfitting can be reduced to better adapt to scenarios such as medical applications.
如图9所示,本申请还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行如本申请任一实施例中的样本筛选方法的步骤。其中,计算机设备可以为PC,还可以为如PAD、平板电脑、手提电脑这种便携电子设备、还可以为如手机这种智能移动终端,不限于这里的描述;计算机设备还可以通过服务器实现,服务器可以是通过集群系统构成的,为实现各单元功能而合并为一或各单元功能分体设置的计算机设备。程序的执行包含以下的步骤的指令:步骤S1,读取原始数据集,本申请中的原始数据集可包括未标注集和已标注集。步骤S2,基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,通过人工标注多个第一待标注样本的方式得到第一标注集;所有第一待标注样本和所有剩余样本组成未标注集。其中,基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本的步骤包括:计算未标注集中各样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分,以利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分确定各样本的最终得分;具体地,本申请一些实施例中,实例检测框得分为实例的检测框与真实框的交并比,实例输出类别得分为实例的分类值,实例轮廓掩码得分为实例的检测掩码与真实掩码的交并比。利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分确定各样本的最终得分的过程包括:利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分的均值和标准差计算当前样本中各实例的得分;利用当前样本中各实例的得分的均值和标准差计算当前样本的最终得分。依据最终得分与信息量之间的负相关或正相关关系从未标注集中挑选出多个第一待标注样本。可在实例分割模型训练过程中从未标注集中挑选第一待标注样本。步骤S3,基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,通过伪标注第二待标注样本的方式得到第二标注集。其中,基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本的步骤包括:获取所有剩余样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分;当前样本的实例检测框得分大于第一阈值且实例输出类别得分大于第二阈值且实例轮廓掩码得分大于第三阈值时,判断出当前样本的置信度高于设定值,挑选出当前样本作为第二待标注样本。本申请可在实例分割模型训练过程中从所有剩余样本中挑选第二待标注样本。步骤S4,将第一标注集、第二标注集以及已标注集共同作为当前实例分割模型的训练集。As shown in FIG. 9 , the present application also provides a computer device, including a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes any The steps of the sample screening method in the Examples. Wherein, the computer equipment can be a PC, a portable electronic device such as a PAD, a tablet computer, a laptop computer, or an intelligent mobile terminal such as a mobile phone, which is not limited to the description here; the computer equipment can also be implemented by a server, The server can be constituted by a cluster system, and in order to realize the function of each unit, it is merged into one computer device or a separate set of functions of each unit. The execution of the program includes the instructions of the following steps: Step S1, reading the original data set, the original data set in this application may include unlabeled sets and labeled sets. Step S2, based on the active learning method, select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtain a first labeled set by manually labeling a plurality of first to-be-labeled samples; all the first to be labeled samples are obtained. The sample and all remaining samples form the unlabeled set. The step of selecting a plurality of first to-be-labeled samples with more information than the remaining samples from the unlabeled set based on the active learning method includes: calculating the instance detection frame score, instance output category score and instance outline mask of each sample in the unlabeled set score, to determine the final score of each sample by using the instance detection frame score, the instance output category score and the instance outline mask score; specifically, in some embodiments of the present application, the instance detection frame score is the intersection of the instance detection frame and the real frame In the comparison, the instance output category score is the classification value of the instance, and the instance contour mask score is the intersection ratio between the detection mask of the instance and the real mask. The process of using the instance detection frame score, instance output category score and instance contour mask score to determine the final score of each sample includes: calculating the current sample using the mean and standard deviation of the instance detection frame score, instance output category score and instance contour mask score The score of each instance in the current sample; the final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set. The first to-be-labeled sample can be selected from the unlabeled set during instance segmentation model training. In step S3, a second to-be-labeled sample with a confidence level higher than a set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample. Wherein, the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected. The sample is used as the second sample to be labeled. The present application can select the second to-be-labeled sample from all remaining samples during the instance segmentation model training process. Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如本申请任一实施例中如下的样本筛选方法的步骤。步骤S1,读取原始数据集,本申请中的原始数据集可包括未标注集和已标注集。步骤S2,基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,通过人工标注多个第一待标注样本的方式得到第一标注集;所有第一待标注样本和所有剩余样本组成未标注集。其中,基于主动学习方式从未标注集中挑选出信息量大于剩余样本的多个第一待标注样本的步骤包括:计算未标注集中各样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分,以利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分确定各样本的最终得分;具体地,本申请一些实施例中,实例检测框得分为实例的检测框与真实框的交并比,实例输出类别得分为实例的分类值,实例轮廓掩码得分为实例的检测掩码与真实掩码的交并比。利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分确定各样本的最终得分的过程包括:利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分的均值和标准差计算当前样本中各实例的得分;利用当前样本中各实例的得分的均值和标准差计算当前样本的最终得分。依据最终得分与信息量之间的负相关或正相关关系从未标注集中挑选出多个第一待标注样本。可在实例分割模型训练过程中从未标注集中挑选第一待标注样本。步骤S3,基于半监督学习方式从所 有剩余样本中挑选出置信度高于设定值的第二待标注样本,通过伪标注第二待标注样本的方式得到第二标注集。其中,基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本的步骤包括:获取所有剩余样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分;当前样本的实例检测框得分大于第一阈值且实例输出类别得分大于第二阈值且实例轮廓掩码得分大于第三阈值时,判断出当前样本的置信度高于设定值,挑选出当前样本作为第二待标注样本。本申请可在实例分割模型训练过程中从所有剩余样本中挑选第二待标注样本。步骤S4,将第一标注集、第二标注集以及已标注集共同作为当前实例分割模型的训练集。A storage medium storing computer-readable instructions, when executed by one or more processors, causes the one or more processors to perform the steps of the sample screening method as follows in any embodiment of the present application. Step S1, reading the original data set, the original data set in this application may include unlabeled sets and labeled sets. Step S2, based on the active learning method, select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set, and obtain a first labeled set by manually labeling a plurality of first to-be-labeled samples; all the first to be labeled samples are obtained. The sample and all remaining samples form the unlabeled set. The step of selecting a plurality of first to-be-labeled samples with more information than the remaining samples from the unlabeled set based on the active learning method includes: calculating the instance detection frame score, instance output category score and instance outline mask of each sample in the unlabeled set score, to determine the final score of each sample by using the instance detection frame score, the instance output category score and the instance outline mask score; specifically, in some embodiments of the present application, the instance detection frame score is the intersection of the instance detection frame and the real frame In the comparison, the instance output category score is the classification value of the instance, and the instance contour mask score is the intersection ratio between the detection mask of the instance and the real mask. The process of using the instance detection frame score, instance output category score and instance contour mask score to determine the final score of each sample includes: calculating the current sample using the mean and standard deviation of the instance detection frame score, instance output category score and instance contour mask score The score of each instance in the current sample; the final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample. According to the negative or positive correlation between the final score and the amount of information, a plurality of first samples to be labeled are selected from the unlabeled set. The first to-be-labeled sample can be selected from the unlabeled set during instance segmentation model training. In step S3, a second to-be-labeled sample with a confidence higher than the set value is selected from all remaining samples based on a semi-supervised learning method, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample. Wherein, the step of selecting the second to-be-labeled sample with a confidence higher than the set value from all the remaining samples based on the semi-supervised learning method includes: obtaining the instance detection frame score, instance output category score and instance outline mask of all remaining samples Score; when the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is judged that the confidence of the current sample is higher than the set value, and the current sample is selected. The sample is used as the second sample to be labeled. The present application can select the second to-be-labeled sample from all remaining samples during the instance segmentation model training process. Step S4, the first label set, the second label set and the already labelled set are taken together as the training set of the current instance segmentation model.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读存储介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读存储介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。所述计算机可读存储介质可以是非易失性,也可以是易失性的。计算机可读存储介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM,Random Access Memory),只读存储器(ROM,Read-Only Memory),可擦除可编辑只读存储器(EPROM,Erasable Programmable Read-Only Memory,或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM,Compact Disc Read-Only Memory)。另外,计算机可读存储介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。Logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, and may be embodied in any computer-readable storage medium , for use by an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch and execute instructions from an instruction execution system, apparatus, or device), or in conjunction with these instruction execution systems, device or equipment. For the purposes of this specification, a "computer-readable storage medium" can be any device that can contain, store, communicate, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or apparatus . The computer-readable storage medium may be non-volatile or volatile. More specific examples (non-exhaustive list) of computer readable storage media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM, Random Access Memory), Read-Only Memory (ROM, Read-Only Memory), Erasable and Editable Read-Only Memory (EPROM, Erasable Programmable Read-Only Memory, or Flash Memory), Optical Devices, and Portable Optical Disc Read-Only Memory (CDROM, Compact Disc Read-Only Memory). In addition, the computer-readable storage medium may even be paper or other suitable medium on which the program can be printed, as the paper or other medium may be optically scanned, for example, and then edited, interpreted or, if necessary, otherwise Process in a suitable manner to obtain the program electronically and then store it in computer memory.
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA,Programmable Gate Array),现场可编程门阵列(FPGA,Field Programmable Gate Array)等。It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application-specific integrated circuits with suitable combinational logic gate circuits, Programmable Gate Arrays (PGA, Programmable Gate Array), Field Programmable Gate Arrays (FPGA, Field Programmable Gate Array), etc.
在本说明书的描述中,参考术语“本实施例”、“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, reference to the terms "this embodiment", "one embodiment", "some embodiments", "example", "specific example", or "some examples" or the like is meant to be combined with the description of the embodiment A particular feature, structure, material, or characteristic described or exemplified is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少 两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of this application, "plurality" means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请实质内容上所作的任何修改、等同替换和简单改进等,均应包含在本申请的保护范围之内。The above descriptions are only the preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements and simple improvements made in the substantial content of the present application shall be included in the protection scope of the present application. Inside.

Claims (20)

  1. 一种实例分割模型样本筛选方法,其中,包括:An instance segmentation model sample screening method, comprising:
    读取原始数据集,所述原始数据集包括未标注集和已标注集;reading an original data set, the original data set includes an unlabeled set and a labeled set;
    基于主动学习方式从所述未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,通过人工标注所述多个第一待标注样本的方式得到第一标注集;所有第一待标注样本和所有剩余样本组成所述未标注集;Based on the active learning method, a plurality of first to-be-labeled samples with more information than the remaining samples are selected from the unlabeled set, and the first to-be-labeled sample is obtained by manually labeling the plurality of first to-be-labeled samples; all the first to-be-labeled samples are obtained; Annotated samples and all remaining samples form the unlabeled set;
    基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,通过伪标注所述第二待标注样本的方式得到第二标注集;Based on the semi-supervised learning method, a second to-be-labeled sample with a confidence level higher than the set value is selected from all the remaining samples, and a second label set is obtained by pseudo-labeling the second to-be-labeled sample;
    将所述第一标注集、所述第二标注集以及所述已标注集共同作为当前实例分割模型的训练集。The first label set, the second label set and the already labelled set are taken together as a training set of the current instance segmentation model.
  2. 根据权利要求1所述的实例分割模型样本筛选方法,其中,基于主动学习方式从所述未标注集中挑选出信息量大于剩余样本的多个第一待标注样本的步骤包括:The sample screening method for an instance segmentation model according to claim 1, wherein the step of selecting a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on an active learning method comprises:
    计算所述未标注集中各样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分,以利用所述实例检测框得分、所述实例输出类别得分及所述实例轮廓掩码得分确定各样本的最终得分;Calculate the instance detection frame score, instance output class score, and instance contour mask score for each sample in the unlabeled set to determine each instance using the instance detection frame score, the instance output class score, and the instance contour mask score. the final score of the sample;
    依据所述最终得分与所述信息量之间的负相关或正相关关系从所述未标注集中挑选出所述多个第一待标注样本。The plurality of first to-be-labeled samples are selected from the unlabeled set according to the negative or positive correlation between the final score and the amount of information.
  3. 根据权利要求2所述的实例分割模型样本筛选方法,其中,利用所述实例检测框得分、所述实例输出类别得分及所述实例轮廓掩码得分确定各样本的最终得分的过程包括:The sample screening method for an instance segmentation model according to claim 2, wherein the process of determining the final score of each sample by using the instance detection frame score, the instance output category score and the instance contour mask score comprises:
    利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分的均值和标准差计算当前样本中各实例的得分;Calculate the score of each instance in the current sample by using the mean and standard deviation of the instance detection frame score, the instance output category score and the instance outline mask score;
    利用当前样本中各实例的得分的均值和标准差计算当前样本的最终得分。The final score of the current sample is calculated using the mean and standard deviation of the scores of each instance in the current sample.
  4. 根据权利要求3所述的实例分割模型样本筛选方法,其中,所述利用实例检测框得分、实例输出类别得分及实例轮廓掩码得分的均值和标准差计算当前样本中各实例的得分包括:The sample screening method for an instance segmentation model according to claim 3, wherein calculating the score of each instance in the current sample by using the mean value and standard deviation of the instance detection frame score, the instance output category score and the instance outline mask score comprises:
    Figure PCTCN2021096675-appb-100001
    Figure PCTCN2021096675-appb-100001
    其中,
    Figure PCTCN2021096675-appb-100002
    代表第i个样本中的第j个实例的得分,
    Figure PCTCN2021096675-appb-100003
    分别代表第i个样本中的第j个实例的实例输出类别得分、实例检测框得分、实例轮廓掩码得分,std表示标准差计算符号,mean表示均值计算符号。
    in,
    Figure PCTCN2021096675-appb-100002
    represents the score of the jth instance in the ith sample,
    Figure PCTCN2021096675-appb-100003
    Represents the instance output category score, instance detection frame score, instance outline mask score of the jth instance in the ith sample, std represents the standard deviation calculation symbol, and mean represents the mean value calculation symbol.
  5. 根据权利要求4所述的实例分割模型样本筛选方法,其中,所述利用当前样本中各实例的得分的均值和标准差计算当前样本的最终得分包括:The sample screening method for an instance segmentation model according to claim 4, wherein calculating the final score of the current sample by using the mean and standard deviation of the scores of each instance in the current sample comprises:
    Figure PCTCN2021096675-appb-100004
    Figure PCTCN2021096675-appb-100004
    其中,S i表示当前样本的最终得分。 Among them, Si represents the final score of the current sample.
  6. 根据权利要求2所述的实例分割模型样本筛选方法,其中,基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本的步骤包括:The sample screening method for an instance segmentation model according to claim 2, wherein the step of selecting a second to-be-labeled sample with a confidence higher than a set value from all remaining samples based on a semi-supervised learning method comprises:
    获取所述所有剩余样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分;Obtain the instance detection frame score, instance output category score and instance outline mask score of all remaining samples;
    当前样本的实例检测框得分大于第一阈值且实例输出类别得分大于第二阈值且实例轮廓掩码得分大于第三阈值时,判断出当前样本的置信度高于设定值,挑选出当前样本作为第二待标注样本。When the instance detection frame score of the current sample is greater than the first threshold, the instance output category score is greater than the second threshold, and the instance contour mask score is greater than the third threshold, it is determined that the confidence of the current sample is higher than the set value, and the current sample is selected as the The second sample to be labeled.
  7. 根据权利要求6所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 6, wherein,
    所述第一阈值、所述第二阈值、所述第三阈值三者相等。The first threshold, the second threshold, and the third threshold are equal.
  8. 根据权利要求6所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 6, wherein,
    所述第一阈值=第二阈值=第三阈值=0.9。The first threshold=second threshold=third threshold=0.9.
  9. 根据权利要求2所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 2, wherein,
    实例检测框得分为实例的检测框与真实框的交并比;The instance detection frame score is the intersection ratio of the instance detection frame and the real frame;
    实例输出类别得分为实例的分类值;The instance output category score is the classification value of the instance;
    实例轮廓掩码得分为实例的检测掩码与真实掩码的交并比。The instance contour mask score is the intersection ratio of the detected mask and the ground truth mask of the instance.
  10. 根据权利要求1所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 1, wherein,
    在实例分割模型训练过程中从所述未标注集中挑选第一待标注样本。The first to-be-labeled sample is selected from the unlabeled set during the instance segmentation model training process.
  11. 根据权利要求10所述的实例分割模型样本筛选方法,其中,所述在实例分割模型训练过程计算损失函数;所述损失函数包括检测框损失、输出类别损失、轮廓掩码损失、检测框得分损失及轮廓掩码得分损失。The sample screening method of an instance segmentation model according to claim 10, wherein the loss function is calculated in the instance segmentation model training process; the loss function includes detection frame loss, output category loss, contour mask loss, and detection frame score loss and the contour mask score loss.
  12. 根据权利要求1所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 1, wherein,
    在实例分割模型训练过程中从所有剩余样本中挑选第二待标注样本。The second to-be-labeled sample is selected from all remaining samples during instance segmentation model training.
  13. 根据权利要求2所述的实例分割模型样本筛选方法,其中,所述计算所述未标注集中各样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分包括:The sample screening method for an instance segmentation model according to claim 2, wherein the calculating the instance detection frame score, instance output category score and instance outline mask score of each sample in the unlabeled set comprises:
    通过实例分割模型扫描图像生成提议信息;Generate proposal information by scanning the image through the instance segmentation model;
    通过对提议信息进行分类的方式生成边界框信息和掩码信息;Generate bounding box information and mask information by classifying proposal information;
    根据边界框信息和掩码信息确定实例检测框得分、实例输出类别得分及实例轮廓掩码得分。According to the bounding box information and mask information, the instance detection box score, instance output category score and instance outline mask score are determined.
  14. 根据权利要求13所述的实例分割模型样本筛选方法,其中,所述对提议信息进行分类包括:The method for instance segmentation model sample screening according to claim 13, wherein the classifying the proposal information comprises:
    对所述提议信息进行二值分类和检测框回归处理。Binary classification and detection frame regression processing are performed on the proposed information.
  15. 根据权利要求13所述的实例分割模型样本筛选方法,其中,所述计算所述未标注集中各样本的实例检测框得分、实例输出类别得分及实例轮廓掩码得分之后包括:The sample screening method for an instance segmentation model according to claim 13, wherein the calculating the instance detection frame score, instance output category score and instance outline mask score of each sample in the unlabeled set comprises:
    基于实例分割模型中的检测头输出实例检测框得分和实例输出类别得分,基于实例分割模型中的分割头输出实例轮廓掩码得分。The instance detection box score and instance output category score are output based on the detection head in the instance segmentation model, and the instance contour mask score is output based on the segmentation head in the instance segmentation model.
  16. 根据权利要求1所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 1, wherein,
    所述未标注集表示未标记的医学图像数据集;the unlabeled set represents an unlabeled medical image dataset;
    所述已标注集表示已标记的医学图像数据集。The labeled set represents a labeled medical image dataset.
  17. 根据权利要求1所述的实例分割模型样本筛选方法,其中,The instance segmentation model sample screening method according to claim 1, wherein,
    所述第一待标注样本数量为k=500。The number of the first to-be-labeled samples is k=500.
  18. 一种实例分割模型样本筛选装置,其中,包括:An instance segmentation model sample screening device, comprising:
    数据读取模块,读取原始数据集,所述原始数据集包括未标注集和已标注集;a data reading module, which reads an original data set, the original data set includes an unlabeled set and a labeled set;
    第一筛选模块,用于基于主动学习方式从所述未标注集中挑选出信息量大于剩余样本的多个第一待标注样本,所述多个第一待标注样本被人工标注为第一标注集;所有第一待标注样本和所有剩余样本组成未标注集;The first screening module is used to select a plurality of first to-be-labeled samples with an amount of information greater than the remaining samples from the unlabeled set based on an active learning method, and the plurality of first to-be-labeled samples are manually labeled as the first label set ; All the first to-be-labeled samples and all remaining samples form an unlabeled set;
    第二筛选模块,用于基于半监督学习方式从所有剩余样本中挑选出置信度高于设定值的第二待标注样本,所述第二待标注样本被伪标注为第二标注集;The second screening module is used to select a second to-be-labeled sample with a confidence higher than a set value from all the remaining samples based on a semi-supervised learning method, and the second to-be-labeled sample is pseudo-labeled as a second label set;
    数据扩充模块,将所述第一标注集、所述第二标注集以及所述已标注集共同作为当前实例分割模型的训练集。The data expansion module uses the first label set, the second label set and the already labelled set together as a training set of the current instance segmentation model.
  19. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如权利要求1至17中任一项权利要求所述样本筛选方法的步骤。A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor causes the processor to perform any one of claims 1 to 17. 1. The steps of the sample screening method of one claim.
  20. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至17中任一项权利要求所述样本筛选方法的步骤。A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the execution of any one of claims 1 to 17 Steps of the sample screening method.
PCT/CN2021/096675 2020-10-14 2021-05-28 Instance segmentation model sample screening method and apparatus, computer device and medium WO2022077917A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011099366.0A CN112163634B (en) 2020-10-14 2020-10-14 Sample screening method and device for instance segmentation model, computer equipment and medium
CN202011099366.0 2020-10-14

Publications (1)

Publication Number Publication Date
WO2022077917A1 true WO2022077917A1 (en) 2022-04-21

Family

ID=73866927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096675 WO2022077917A1 (en) 2020-10-14 2021-05-28 Instance segmentation model sample screening method and apparatus, computer device and medium

Country Status (2)

Country Link
CN (1) CN112163634B (en)
WO (1) WO2022077917A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482436A (en) * 2022-09-21 2022-12-16 北京百度网讯科技有限公司 Training method and device for image screening model and image screening method
CN116229369A (en) * 2023-03-03 2023-06-06 嘉洋智慧安全科技(北京)股份有限公司 Method, device and equipment for detecting people flow and computer readable storage medium
CN117218132A (en) * 2023-11-09 2023-12-12 铸新科技(苏州)有限责任公司 Whole furnace tube service life analysis method, device, computer equipment and medium
CN117315263A (en) * 2023-11-28 2023-12-29 杭州申昊科技股份有限公司 Target contour segmentation device, training method, segmentation method and electronic equipment

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163634B (en) * 2020-10-14 2023-09-05 平安科技(深圳)有限公司 Sample screening method and device for instance segmentation model, computer equipment and medium
CN112381834B (en) * 2021-01-08 2022-06-03 之江实验室 Labeling method for image interactive instance segmentation
CN112884055B (en) * 2021-03-03 2023-02-03 歌尔股份有限公司 Target labeling method and target labeling device
CN113487738B (en) * 2021-06-24 2022-07-05 哈尔滨工程大学 Building based on virtual knowledge migration and shielding area monomer extraction method thereof
CN113255669B (en) * 2021-06-28 2021-10-01 山东大学 Method and system for detecting text of natural scene with any shape
CN113361535B (en) * 2021-06-30 2023-08-01 北京百度网讯科技有限公司 Image segmentation model training, image segmentation method and related device
CN113554068B (en) * 2021-07-05 2023-10-31 华侨大学 Semi-automatic labeling method, device and readable medium for instance segmentation data set
CN113487617A (en) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN113705687B (en) * 2021-08-30 2023-03-24 平安科技(深圳)有限公司 Image instance labeling method based on artificial intelligence and related equipment
CN113762286A (en) * 2021-09-16 2021-12-07 平安国际智慧城市科技股份有限公司 Data model training method, device, equipment and medium
CN114612702A (en) * 2022-01-24 2022-06-10 珠高智能科技(深圳)有限公司 Image data annotation system and method based on deep learning
CN114462531A (en) * 2022-01-30 2022-05-10 支付宝(杭州)信息技术有限公司 Model training method and device and electronic equipment
CN114359676B (en) * 2022-03-08 2022-07-19 人民中科(济南)智能技术有限公司 Method, device and storage medium for training target detection model and constructing sample set
CN115439686B (en) * 2022-08-30 2024-01-09 一选(浙江)医疗科技有限公司 Method and system for detecting object of interest based on scanned image
CN115170809B (en) * 2022-09-06 2023-01-03 浙江大华技术股份有限公司 Image segmentation model training method, image segmentation device, image segmentation equipment and medium
CN115393361B (en) * 2022-10-28 2023-02-03 湖南大学 Skin disease image segmentation method, device, equipment and medium with low annotation cost
CN117115568B (en) * 2023-10-24 2024-01-16 浙江啄云智能科技有限公司 Data screening method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN108985334A (en) * 2018-06-15 2018-12-11 广州深域信息科技有限公司 The generic object detection system and method for Active Learning are improved based on self-supervisory process
CN111401293A (en) * 2020-03-25 2020-07-10 东华大学 Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN112163634A (en) * 2020-10-14 2021-01-01 平安科技(深圳)有限公司 Example segmentation model sample screening method and device, computer equipment and medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097103A1 (en) * 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
CN109447169B (en) * 2018-11-02 2020-10-27 北京旷视科技有限公司 Image processing method, training method and device of model thereof and electronic system
CN109949317B (en) * 2019-03-06 2020-12-11 东南大学 Semi-supervised image example segmentation method based on gradual confrontation learning
US10430946B1 (en) * 2019-03-14 2019-10-01 Inception Institute of Artificial Intelligence, Ltd. Medical image segmentation and severity grading using neural network architectures with semi-supervised learning techniques
CN111666993A (en) * 2020-05-28 2020-09-15 平安科技(深圳)有限公司 Medical image sample screening method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN108985334A (en) * 2018-06-15 2018-12-11 广州深域信息科技有限公司 The generic object detection system and method for Active Learning are improved based on self-supervisory process
CN111401293A (en) * 2020-03-25 2020-07-10 东华大学 Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN112163634A (en) * 2020-10-14 2021-01-01 平安科技(深圳)有限公司 Example segmentation model sample screening method and device, computer equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN RONG BVSB, SAR , RONG CHEN, YONG-FENG CAO, HONG SUN: "Multi-class Image Classification with Active Learning and Semi-supervised Learning", ACTA AUTOMATICA SINICA, KEXUE CHUBANSHE, BEIJING, CN, vol. 37863, no. 1, 31 August 2011 (2011-08-31), CN , XP055921885, ISSN: 0254-4156, DOI: 10.3724/SP.J.1004.2011.00954 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482436A (en) * 2022-09-21 2022-12-16 北京百度网讯科技有限公司 Training method and device for image screening model and image screening method
CN115482436B (en) * 2022-09-21 2023-06-30 北京百度网讯科技有限公司 Training method and device for image screening model and image screening method
CN116229369A (en) * 2023-03-03 2023-06-06 嘉洋智慧安全科技(北京)股份有限公司 Method, device and equipment for detecting people flow and computer readable storage medium
CN117218132A (en) * 2023-11-09 2023-12-12 铸新科技(苏州)有限责任公司 Whole furnace tube service life analysis method, device, computer equipment and medium
CN117218132B (en) * 2023-11-09 2024-01-19 铸新科技(苏州)有限责任公司 Whole furnace tube service life analysis method, device, computer equipment and medium
CN117315263A (en) * 2023-11-28 2023-12-29 杭州申昊科技股份有限公司 Target contour segmentation device, training method, segmentation method and electronic equipment
CN117315263B (en) * 2023-11-28 2024-03-22 杭州申昊科技股份有限公司 Target contour device, training method, segmentation method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112163634B (en) 2023-09-05
CN112163634A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
WO2022077917A1 (en) Instance segmentation model sample screening method and apparatus, computer device and medium
JP6843086B2 (en) Image processing systems, methods for performing multi-label semantic edge detection in images, and non-temporary computer-readable storage media
Kisilev et al. Medical image description using multi-task-loss CNN
CN110428432B (en) Deep neural network algorithm for automatically segmenting colon gland image
WO2018108129A1 (en) Method and apparatus for use in identifying object type, and electronic device
WO2022001623A1 (en) Image processing method and apparatus based on artificial intelligence, and device and storage medium
CN111931931B (en) Deep neural network training method and device for pathology full-field image
US10789712B2 (en) Method and system for image analysis to detect cancer
WO2021189913A1 (en) Method and apparatus for target object segmentation in image, and electronic device and storage medium
US11875512B2 (en) Attributionally robust training for weakly supervised localization and segmentation
Huang et al. Omni-supervised learning: scaling up to large unlabelled medical datasets
CN110472049B (en) Disease screening text classification method, computer device and readable storage medium
WO2022089257A1 (en) Medical image processing method, apparatus, device, storage medium, and product
CN111882560A (en) Lung parenchymal CT image segmentation method based on weighted full-convolution neural network
WO2021057148A1 (en) Brain tissue layering method and device based on neural network, and computer device
CN108154191B (en) Document image recognition method and system
CN113987119A (en) Data retrieval method, cross-modal data matching model processing method and device
Zhang et al. An efficient semi-supervised manifold embedding for crowd counting
CN110889437B (en) Image processing method and device, electronic equipment and storage medium
Fu et al. Medical image retrieval and classification based on morphological shape feature
Ullah et al. DSFMA: Deeply supervised fully convolutional neural networks based on multi-level aggregation for saliency detection
CN116525075A (en) Thyroid nodule computer-aided diagnosis method and system based on few sample learning
WO2023029348A1 (en) Image instance labeling method based on artificial intelligence, and related device
Wu et al. Automatic mass detection from mammograms with region-based convolutional neural network
CN114596435A (en) Semantic segmentation label generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21878971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21878971

Country of ref document: EP

Kind code of ref document: A1