CN114463361A - Network model training method, device, equipment, medium and program product - Google Patents

Network model training method, device, equipment, medium and program product Download PDF

Info

Publication number
CN114463361A
CN114463361A CN202210138535.XA CN202210138535A CN114463361A CN 114463361 A CN114463361 A CN 114463361A CN 202210138535 A CN202210138535 A CN 202210138535A CN 114463361 A CN114463361 A CN 114463361A
Authority
CN
China
Prior art keywords
image
training
model
sample
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210138535.XA
Other languages
Chinese (zh)
Inventor
刘佳
王兆玮
孙钦佩
杨叶辉
王晓荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210138535.XA priority Critical patent/CN114463361A/en
Publication of CN114463361A publication Critical patent/CN114463361A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30016Brain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30056Liver; Hepatic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a network model training method and device, electronic equipment, a non-transitory computer readable storage medium storing computer instructions, and a computer program product, and relates to the technical field of deep learning, in particular to the technical field of image segmentation. The method comprises the steps of firstly, generating a probability map corresponding to a target object based on labeling information of a plurality of first sample images for the target object; and then, pre-training by using the probability atlas and the image without the label to obtain a pre-training model containing network parameters with higher quality, and further training the pre-training model by using a small amount of medical images with labels on the basis, so that the obtained image segmentation model has higher image segmentation precision.

Description

Network model training method, device, equipment, medium and program product
Technical Field
The disclosure relates to the technical field of deep learning, in particular to the technical field of image segmentation, and discloses a network model training method and device, electronic equipment, a non-transitory computer readable storage medium storing computer instructions, and a computer program product.
Background
The field of medical image segmentation can be mainly divided into two main types of segmentation, one is structural segmentation (such as brain tissue, lung, liver, heart, etc.), and the other is lesion segmentation. In recent years, deep learning has a good effect in the field of medical image segmentation, and has the advantages of high robustness, higher precision and higher speed. Generally speaking, deep learning requires a large amount of labeled data to complete model training, however, since medical images are mainly three-dimensional images, and the quality of the medical images is poor compared with that of traditional natural images, labeling difficulty is high, and labeling is time-consuming, the field of medical image segmentation is caused, and the amount of labeled data is small, which greatly limits the application of deep learning in the field of medical image segmentation.
Disclosure of Invention
The present disclosure provides at least a network model training method, apparatus, device, program product, and storage medium.
According to an aspect of the present disclosure, there is provided a network model training method, including:
generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images for the target object;
based on the probability map and the plurality of second sample images, model training is carried out by taking the image blocks masked out in the recovered second sample images as targets to obtain a pre-training model;
and training the pre-training model based on the labeling information of the target object of the plurality of third sample images and the third sample images to obtain an image segmentation model for the target object.
According to another aspect of the present disclosure, there is provided a network model training method, including:
generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images for the target object;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the probability atlas and the plurality of second sample images to obtain an image recovery model.
According to another aspect of the present disclosure, there is provided a network model training apparatus including:
the first map determining module is used for generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images aiming at the target object;
the pre-training module is used for carrying out model training by taking the image blocks which are masked off in the recovered second sample images as targets based on the probability map and the plurality of second sample images to obtain a pre-training model;
the segmentation model training module is used for training the pre-training model based on the labeling information of the target object of the multiple third sample images and the third sample images to obtain an image segmentation model for the target object:
according to another aspect of the present disclosure, there is provided a network model training apparatus including:
the second map determining module is used for generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images aiming at the target object;
and the recovery model training module is used for performing model training by taking the masked image blocks in the recovered second sample images as targets to obtain an image recovery model based on the probability map and the plurality of second sample images.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the method in any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the pre-training model containing the network parameters with higher quality can be obtained by pre-training the probability map and the image without the label, namely the second sample image, on the basis, a small amount of medical images with labels, namely the third sample image, are used for further training the pre-training model, and the obtained image segmentation model can determine the segmentation result of the medical image with higher segmentation precision.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is one of the flow diagrams of a network model training method according to the present disclosure;
FIG. 2 is a flow chart of a training method of pre-training a model according to the present disclosure;
FIG. 3 is a schematic block diagram of an encoder according to the present disclosure;
FIG. 4 is a flow chart of a method of training an image segmentation model according to the present disclosure;
FIG. 5 is a second flow chart of a network model training method according to the present disclosure;
FIG. 6 is a third flowchart of a network model training method according to the present disclosure;
FIG. 7 is a fourth flowchart of a network model training method according to the present disclosure;
FIG. 8 is one of the schematic structural diagrams of a network model training apparatus according to the present disclosure;
FIG. 9 is a second schematic diagram of the network model training apparatus according to the present disclosure;
fig. 10 is a schematic structural diagram of an electronic device according to the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The field of medical image segmentation can be mainly divided into two main types of segmentation, one is structural segmentation (such as brain tissue, lung, liver, heart, etc.), and the other is lesion segmentation. In recent years, deep learning has a good effect in the field of medical image segmentation, and has the advantages of high robustness, higher precision and higher speed. Generally speaking, deep learning requires a large amount of labeled data to complete model training, however, since medical images are mainly three-dimensional images, and the quality of the medical images is poor compared with that of traditional natural images, labeling difficulty is high, and labeling is time-consuming, the field of medical image segmentation is caused, and the amount of labeled data is small, which greatly limits the application of deep learning in the field of medical image segmentation.
To address this technical deficiency, the present disclosure provides at least a network model training method, apparatus, device, program product, and storage medium. According to the method, the probability atlas and the image without the label, namely the second sample image are used for pre-training, the pre-training model containing the network parameters with high quality can be obtained, on the basis, a small amount of medical images with labels, namely the third sample image is used for further training the pre-training model, and the obtained image segmentation model can determine the segmentation result of the medical image with high segmentation precision.
The network model training method of the present disclosure is explained below with specific embodiments.
Fig. 1 shows a flowchart of a network model training method of an embodiment of the present disclosure, an execution subject of which may be a device with computing capabilities. As shown in fig. 1, the network model training method of the embodiment of the present disclosure may include the following steps:
and S110, generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images aiming at the target object.
The first sample image may be a medical image with labeling information, for example, an abdomen image with labeling information corresponding to a liver. The labeling information may specifically include information on whether each pixel segment point in the chest image belongs to the liver.
The probability map comprises the probability that each pixel point in the image with the preset size belongs to the target object, and the image with the preset size and the second sample image have the same resolution and size, so that the probability map and each second sample image can be combined for model training to obtain the pre-training model with higher image restoration precision.
And S120, performing model training by taking the masked image blocks in the recovered second sample images as targets based on the probability atlas and the plurality of second sample images to obtain a pre-training model.
The second sample image is an image without labeling information on the target object, and may be, for example, an abdomen image without labeling information corresponding to a liver. The second sample image may not be the same source as the first sample image, but both need to include the target object, e.g., both need to include the liver.
The second sample images do not need to be marked with information, so that the second sample images are easy to obtain, the number of the second sample images used for training the pre-training model is large, the pre-training model can be fully trained, the image recovery capability or the image restoration capability of the pre-training model is strong, and the precision is high. The pre-training model has higher image restoration capability, which indicates that more accurate image features can be extracted, so that the pre-training model is further trained under the guidance of the accurate image features, and the image segmentation model with higher image segmentation precision can be obtained.
The position of the target object in the corresponding image is relatively fixed, so that the pre-training model is trained by combining the probability map, and the accuracy of the pre-training model for recovering the image block corresponding to the target object can be improved. For example, the position of the liver in the abdominal image is relatively fixed, the probability map can accurately represent whether the corresponding pixel point belongs to the liver, the pre-training model is trained by combining the probability map corresponding to the liver, and the precision of restoring the image block corresponding to the liver by the pre-training model can be improved.
S130, training the pre-training model based on the labeling information of the target object of the multiple third sample images and the third sample images to obtain an image segmentation model for the target object.
The third sample image may be a medical image having labeling information, for example, an abdomen image having labeling information corresponding to a liver. The third sample image may be the same image as the first sample image or may be a different image, which is not limited in this disclosure
And performing supervised learning on the image segmentation of the target object by using the labeling information of the third sample image on the target object, so as to obtain an image segmentation model with higher segmentation precision on the target object.
The pre-training model is obtained by combining probability map training, more accurate image features can be extracted, on the basis of the pre-training model, parameters of the pre-training model are taken as initial parameters, training of the segmentation model is further carried out, prior information in a plurality of first sample images, namely information in the probability map, is fused into the pre-training model obtained by training of a second sample image, the migration capability of the pre-training model used for image segmentation is improved, and therefore the segmentation precision of objects or structures in the images can be improved.
In some embodiments, the probability map may be generated using the following steps:
firstly, generating a mask image of each first sample image for a target object based on the labeling information of the first sample image for the target object; then, based on the mask image corresponding to each first sample image, a probability map corresponding to the target object is generated.
The mask image is the same size and resolution as the second sample image.
The mask image may be a binary image representing whether each pixel point in the mask image is a target object, and therefore, when the mask image is determined, the label information of the target object based on the first sample image is required. Specifically, when the labeling information of a certain pixel point indicates that the pixel point belongs to the target object, the pixel value of the pixel point in the mask image is 1; and when the marking information of the pixel point indicates that the pixel point does not belong to the target object, the pixel value of the pixel point in the mask image is 0.
After obtaining the mask image of each first sample image, summing the pixel values of the pixel points at the same position in each mask image, and then performing an averaging operation to obtain a probability of whether the corresponding pixel point belongs to the target object, for example, the probability of a certain pixel point can be determined by using the following formula:
Figure BDA0003505956920000061
in the formula, PposionRepresenting the probability of the pixel point with the position of (x, y, z); n represents the number of mask images, (x, y, z) represents the position coordinates of the pixel point, Ii(x,y,z)And (3) representing the pixel value of a pixel point with (x, y, z) in the ith mask image.
According to the method, the accurate probability of whether each pixel point belongs to the target object can be determined, and then the probability of each pixel point is utilized to form a probability map. The probability map can accurately reflect the position information of the target object in the image, and can play a good guiding role in the segmentation and restoration of the target object.
Since the resolution and the size of different first sample images may be different, in order to improve the accuracy of the generated probability map, before the mask images of the first sample images are generated, each first sample image may be preprocessed to unify all the first sample images to a preset resolution and a preset size. And then generating a corresponding mask image based on the preprocessed first sample image.
The preprocessing may include a first preprocessing operation and a second preprocessing operation. Illustratively, a certain first sample image may be preprocessed by the following steps, and a mask image corresponding to the first sample image is generated:
firstly, performing a first preprocessing operation on the first sample image to obtain a first image with a preset resolution; then, carrying out second preprocessing operation on the first image to obtain a second image with a preset size; and finally, generating a mask image corresponding to the target object based on the labeling information of the first sample image for the target object and the second image.
Illustratively, the preset resolution may be an image resolution of 1mm by 1 mm; specifically, the first sample image may be unified into a first image having a preset resolution by using a tri-linear interpolation algorithm. The preset size may be the maximum size of all the first sample images, and the second preprocessing operation may be a padding method.
In some embodiments, the training of the pre-trained model may be performed using the following steps:
firstly, negating each probability included in the probability map to obtain a target map; and then, based on the target atlas and the plurality of second sample images, performing model training by taking the image blocks masked out in the recovered second sample images as targets until a first cut-off condition of the training is met, and obtaining a pre-training model. The first cut-off condition may specifically be the number of iterations, or may also be the image restoration accuracy of the pre-training model.
Because the larger the probability value in the probability map is, the larger the probability of the pixel point being the target object is, the easier the pixel point is to learn, and the smaller the value is, the larger the segmentation difficulty of the pixel point is, the more generally, therefore, in order to learn the edge of the target object more accurately, the negation operation can be performed on each probability in the probability map, and then model training is performed. The purpose of the negation operation is to change the original larger probability to a smaller probability.
For example, the negation operation may specifically be that no negation operation is performed on the probability with a value of zero, and for the probability with a value of non-zero, a value obtained by subtracting the probability from 1 is calculated, and the obtained value is used as a result of the negation operation to form the probability in the target map.
Before training of the pre-training model, each second sample image needs to be divided into a plurality of image blocks, and at least one image block in each second sample image needs to be masked. For example, for a certain second sample image, a plurality of image blocks formed by dividing the second sample image may be arranged in a queue, then the order of the image blocks in the queue is scrambled, and then 75% of the image blocks arranged at the tail of the queue are masked.
When the pre-training model is trained, inputting the residual image blocks of each second sample image into the pre-training model to be trained, and outputting a predicted recovery image corresponding to each second sample image by the pre-training model; then, determining image recovery loss information based on each second sample image, each predicted restored image and the target map; and finally, training the pre-training model to be trained by taking the image blocks masked off in each second sample image as targets based on the image recovery loss information to obtain the trained pre-training model.
Exemplarily, when determining the loss information of the image restoration, firstly, determining the loss information of each pixel point in each masked image block based on each second sample image and each predicted and restored image, and then, weighting the loss information by using the probability of the corresponding pixel point in the target map, so as to promote the pre-training model to pay important attention to some more difficult and less frequent pixel points, such as edge pixel points, during the training generation, thereby effectively improving the training precision of the pre-training model. And finally, summing the weighted loss information of each pixel point in each image block which is masked off to obtain the image recovery loss information.
For example, as shown in fig. 2, information corresponding to the image block 2B or the image block remaining after a part of the image block is masked off from a certain second sample image 2A may be input into an encoder in the pre-training model, the encoder performs image feature processing on the input image block or information to obtain encoded information 2C, and inserts the information of the masked-off image block into a queue or list formed by the information of the unmasked image block according to the position of the masked-off image block in the image based on the obtained encoded information. And then, inputting the information in the queue or the list after the information insertion operation into a decoder in a pre-training model, and processing the input information or the input features through the decoder to obtain decoded information 2D, wherein the decoded information comprises the information of the restored or restored image block, and certainly also comprises the information of the unmasked image block. Finally, the restored second sample image 2E, i.e., the above-described predicted restored image, can be generated based on the decoded information.
All the masked image patches are commonly represented by a learnable vector, i.e., all the masked patches share the vector, so that the pre-trained model knows that the position is masked.
For example, an image block or information of an image block may be encoded by an encoder as shown in fig. 3.
The training process of the pre-training model does not need supervision information, namely does not need labeled information, is a self-supervision learning and training process, reduces the requirements on training samples, and utilizes the unlabeled training samples, so that a large number of training samples can be obtained easily, and the training precision is improved.
In some embodiments, the training of the pre-training model based on the labeling information of the target object of the multiple third sample images and the third sample images to obtain the image segmentation model for the target object may specifically be implemented by using the following steps:
firstly, each third sample image is divided into a plurality of image blocks, and the image blocks corresponding to the third sample images are input into a pre-training model to obtain a prediction division image corresponding to each third sample image; then, determining image segmentation loss information based on the labeling information of each third sample image to the target object and each prediction segmentation image; and finally, training the pre-training model based on the image segmentation loss information until a second cutoff condition of training is met, and obtaining an image segmentation model for the target object. The second cutoff condition may specifically be the number of iterations, or the segmentation accuracy of the image segmentation model.
For example, as shown in fig. 4, information of an image block 4B or an image block corresponding to a certain third sample image 4A is input to an encoder in a pre-training model, the encoder performs encoding processing on the input image block or information to obtain encoded information 4C, then further performs information processing on the obtained encoded information 4C to obtain processed information 4D, then inputs the processed information 4D to a decoder in the pre-training model, and obtains decoded information 4E after the input information or feature is processed by the decoder, where the decoded information 4E includes partition information of a target object, and finally, a predicted partition image 4F can be generated based on the decoded information 4E.
The image segmentation loss information includes a plurality of types of image segmentation sub-loss information, and may be, for example, cross entropy sub-loss information or Dice sub-loss information. The pre-training model may be trained based on the image segmentation sub-loss information of the plurality of classes, and the image segmentation model for the target object may be obtained by:
firstly, determining target loss information based on image segmentation sub-loss information of a plurality of categories; and finally, training the pre-training model based on the target loss information to obtain an image segmentation model for the target object.
When determining the target loss information, the sum of a plurality of categories of image segmentation sub-loss information may be used as the target loss information. Of course, the target loss information may be obtained by performing a weighted summation operation on the image segmentation sub-loss information of a plurality of categories.
The initial parameters of the image segmentation model are parameters obtained by training a pre-training model, namely, the initial parameters with high quality are used for training the image segmentation model, the number of training samples with labeled information to be used can be reduced, the precision of image segmentation is improved by using a transfer learning mode, the number of required training samples can be reduced, and the method is suitable for training the segmentation model for medical image segmentation. For example, an adam optimizer with a learning rate of 10-5 and a training turn of 200 rounds can be used in a specific training process, and finally an image segmentation model for liver segmentation can be obtained.
In summary, as shown in fig. 5, the network model training method of the present disclosure may include the following steps:
firstly, generating a probability map by using a plurality of first sample images with labeling information aiming at a target object;
and secondly, performing self-supervision learning by using the probability map and a plurality of second sample images without labeled information and taking the masked image blocks in the recovered second sample images as targets to obtain a pre-training model.
And thirdly, further training the pre-training model by utilizing a plurality of third sample images with labeling information aiming at the target object to obtain a trained image segmentation model.
As shown in fig. 6, in the first step, when determining the probability map, first, a mask image corresponding to each first sample image needs to be generated, and then, the probability map is generated according to each mask image and label information of each first sample image.
As shown in fig. 6, in the second step, when the pre-training model is trained, firstly, an auto-supervised learning frame is constructed, then, information of the image blocks or the remaining image blocks, which are left after masking part of the image blocks of each second sample image, is input into the pre-training model to be trained, and the pre-training model to be trained outputs a prediction reduction image; then, determining loss information of each pixel point in each masked image block based on the prediction reduction image and the second sample image; and finally, determining image recovery loss information based on each weighted loss information. The image restoration loss information may be a sum of weighted individual loss information.
In the embodiment, the pre-training model is trained on a large number of general training samples without labeled information, general image features are learned, and then transfer training is performed on a task in a targeted manner, and the training method of the pre-training model can be realized by using an automatic supervision learning technology. And performing transfer learning by using the trained parameters of the pre-training model as initialization parameters of the image segmentation model, so as to finally obtain a high-precision image segmentation model. The method in the embodiment can be applied to the field of medical image segmentation, scenes with less label information are marked, a pre-training model is obtained by combining self-supervision learning in a migration learning method, the mode obtained by the pre-training model is optimized, the characteristics of the target object and priori knowledge, namely a probability map, are fused into training of the pre-training model, the pre-training model is prompted to pay attention and learn in a focused manner to some important regions in the target object in the training process, and the pre-training model is prompted to be used for migrating to the target object segmentation to further improve the segmentation accuracy of the image segmentation model obtained by training.
The method of the embodiment is suitable for medical image segmentation, and based on the characteristics of the object in the field of medical image segmentation, the probability map theory is applied to an all-self-supervised learning framework, and then a pre-training model obtained by training of the all-self-supervised learning framework is used for transfer learning in the medical image segmentation. On one hand, the precision of a medical image segmentation task under the condition of less sample amount can be effectively improved by using a pre-training model obtained by self-supervision learning, on the other hand, the prior knowledge information (probability map) is combined with the self-supervision learning framework, so that the self-supervision learning framework can focus more on a target object in image segmentation more pertinently, and the migration capability of the pre-training model to a downstream task (medical image segmentation task) is further improved.
As shown in fig. 7, the present disclosure further provides a training method of an image restoration model, which may specifically include the following steps:
and S710, generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images aiming at the target object.
And S720, based on the probability map and the plurality of second sample images, performing model training by taking the masked image blocks in the recovered second sample images as targets to obtain an image recovery model.
Steps S710 to S720 are the same as steps S110 to S120 in the above embodiment, and the image restoration model corresponds to the pre-training model, so the same contents are not repeated herein.
In some embodiments, the probability map includes probabilities that respective pixel points in the image with a preset size belong to the target object. The model training is performed by taking the image blocks masked off in each second sample image as a target to recover based on the probability atlas and the plurality of second sample images to obtain an image recovery model, and the method can be realized by the following steps:
firstly, negation operation is carried out on each probability included in the probability map to obtain a target map; and then, based on the target atlas and the plurality of second sample images, performing model training by taking the image blocks masked in the recovered second sample images as targets to obtain an image recovery model.
The probability map and the image without the label are used for training, so that the requirement on training samples is reduced, a large number of samples are easily obtained, and the prior knowledge information (the probability map) is combined with the self-supervision learning, so that the self-supervision learning can focus on the target object more pertinently, and the image recovery model with higher precision is obtained.
Because the larger the probability value in the probability map is, the larger the probability of the pixel point being the target object is, the easier the pixel point is to learn, and the smaller the value is, the larger the segmentation difficulty of the pixel point is, the more generally, therefore, in order to learn the edge of the target object more accurately, the negation operation can be performed on each probability in the probability map, and then the model training is performed, which is beneficial to improving the accuracy of recovering or restoring the edge of the target object.
Based on the same inventive concept, the embodiment of the present disclosure further provides a network model training device corresponding to the network model training method, which is used for training an image segmentation model.
As shown in fig. 8, a schematic structural diagram of a network model training apparatus provided in the embodiment of the present disclosure includes:
the first atlas determination module 810 is configured to generate a probability atlas corresponding to the target object based on the labeling information of the plurality of first sample images for the target object.
And the pre-training module 820 is configured to perform model training with the image blocks masked out in the recovered second sample images as targets based on the probability map and the plurality of second sample images to obtain a pre-training model.
The segmentation model training module 830 is configured to train the pre-training model based on the labeling information of the target object of the multiple third sample images and the third sample images to obtain an image segmentation model for the target object.
In some embodiments, the probability map includes probabilities that respective pixel points in an image of a preset size belong to a target object;
the pre-training module 820 is specifically configured to:
negation operation is carried out on each probability included in the probability map to obtain a target map;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the target atlas and the plurality of second sample images to obtain a pre-training model.
In some embodiments, pre-training module 820 is specifically configured to:
for each second sample image, dividing the second sample image into a plurality of image blocks, and masking at least one image block;
inputting the residual image blocks of each second sample image into a pre-training model to be trained to obtain a prediction restoration image corresponding to each second sample image;
determining image restoration loss information based on each second sample image, each predicted and restored image and the target map;
and training the pre-training model to be trained by taking the image blocks masked out from the second sample images as targets based on the image recovery loss information to obtain the trained pre-training model.
In some embodiments, the image of the preset size has the same resolution and size as the second sample image.
In some embodiments, the first map determination module 810 is specifically configured to:
for each first sample image, generating a mask image of the first sample image for the target object based on the labeling information of the first sample image for the target object;
and generating a probability map corresponding to the target object based on the mask image corresponding to each second sample image.
In some embodiments, the first map determination module 810 is specifically configured to:
performing first preprocessing operation on the first sample image to obtain a first image with a preset resolution;
performing second preprocessing operation on the first image to obtain a second image with a preset size;
and generating a mask image corresponding to the target object based on the labeling information of the first sample image for the target object and the second image.
In some embodiments, the segmentation model training module 830 is specifically configured to:
dividing each third sample image into a plurality of image blocks, and inputting the image block corresponding to each third sample image into a pre-training model to obtain a predicted divided image corresponding to each third sample image;
determining image segmentation loss information based on the labeling information of each third sample image to the target object and each predicted segmentation image;
and training the pre-training model based on the image segmentation loss information to obtain an image segmentation model for the target object.
In some embodiments, the image segmentation loss information comprises a plurality of classes of image segmentation sub-loss information;
the segmentation model training module 830 is specifically configured to:
determining target loss information based on the image segmentation sub-loss information of the plurality of categories;
and training the pre-training model based on the target loss information to obtain an image segmentation model aiming at the target object.
Based on the same inventive concept, the embodiment of the present disclosure further provides a network model training device corresponding to the network model training method, which is used for training an image recovery model.
As shown in fig. 9, a schematic structural diagram of a network model training apparatus provided in the embodiment of the present disclosure includes:
a second map determining module 910, configured to generate a probability map corresponding to the target object based on the labeling information of the plurality of first sample images for the target object;
and a model recovery training module 920, configured to perform model training with the image blocks masked off in each second sample image recovered as targets based on the probability map and the plurality of second sample images, to obtain an image recovery model.
In some embodiments, the probability map includes probabilities that respective pixel points in an image of a preset size belong to a target object;
the recovery model training module 920 is specifically configured to:
negation operation is carried out on each probability included in the probability map to obtain a target map;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the target atlas and the plurality of second sample images to obtain an image recovery model.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, device 1000 includes a computing unit 1010 that may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1020 or a computer program loaded from a storage unit 1080 into a Random Access Memory (RAM) 1030. In the RAM1030, various programs and data required for the operation of the device 1000 can also be stored. The calculation unit 1010, the ROM1020, and the RAM1030 are connected to each other by a bus 1040. An input/output (I/O) interface 1050 is also connected to bus 1040.
A number of components in device 1000 are connected to I/O interface 1050, including: an input unit 1060 such as a keyboard, a mouse, or the like; an output unit 1070 such as various types of displays, speakers, and the like; a storage unit 1080, such as a magnetic disk, optical disk, or the like; and a communication unit 1090 such as a network card, modem, wireless communication transceiver, or the like. A communication unit 1090 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1010 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1010 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 1010 performs the various methods and processes described above, such as the method network model training method. For example, in some embodiments, the network model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1080. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 500 via ROM1020 and/or communication unit 1090. When loaded into RAM1030 and executed by computing unit 1010, a computer program may perform one or more steps of the network model training method described above. Alternatively, in other embodiments, the computing unit 1010 may be configured to perform the network model training method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (24)

1. A network model training method, comprising:
generating a probability map corresponding to a target object based on labeling information of a plurality of first sample images for the target object;
based on the probability map and the plurality of second sample images, model training is carried out by taking the image blocks masked in the recovered second sample images as targets to obtain a pre-training model;
and training the pre-training model based on the labeling information of the target object of the plurality of third sample images and the third sample images to obtain an image segmentation model for the target object.
2. The method according to claim 1, wherein the probability map comprises probabilities that respective pixel points in an image of a preset size belong to a target object;
the model training is performed by taking the image blocks masked off in each second sample image as targets to recover based on the probability map and the plurality of second sample images to obtain a pre-training model, and the method comprises the following steps:
negation operation is carried out on each probability included in the probability map to obtain a target map;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the target atlas and the plurality of second sample images to obtain a pre-training model.
3. The method according to claim 2, wherein the model training with the object of recovering the masked image blocks in the second sample images as the object based on the target atlas and the second sample images to obtain a pre-training model comprises:
for each second sample image, dividing the second sample image into a plurality of image blocks, and masking at least one image block;
inputting the residual image blocks of each second sample image into a pre-training model to be trained to obtain a prediction restoration image corresponding to each second sample image;
determining image restoration loss information based on each second sample image, each predicted and restored image and the target map;
and training the pre-training model to be trained by taking the image blocks masked out in each second sample image as targets based on the image recovery loss information to obtain the trained pre-training model.
4. A method according to claim 2 or 3, wherein the image of the preset size has the same resolution and size as the second sample image.
5. The method according to any one of claims 1 to 4, wherein the generating a probability map corresponding to a target object based on the labeling information of the plurality of first sample images for the target object comprises:
for each first sample image, generating a mask image of the first sample image for a target object based on the labeling information of the first sample image for the target object;
and generating a probability map corresponding to the target object based on the mask image corresponding to each first sample image.
6. The method of claim 5, wherein the generating a mask image of the first sample image for a target object based on the labeling information of the first sample image for the target object comprises:
performing first preprocessing operation on the first sample image to obtain a first image with a preset resolution;
performing second preprocessing operation on the first image to obtain a second image with a preset size;
and generating a mask image corresponding to the target object based on the labeling information of the first sample image for the target object and the second image.
7. The method according to any one of claims 1 to 6, wherein the training the pre-trained model based on labeling information of a target object for a plurality of third sample images and third sample images to obtain an image segmentation model for the target object comprises:
dividing each third sample image into a plurality of image blocks, and inputting the image blocks of each third sample image into a pre-training model to obtain a prediction division image corresponding to each third sample image;
determining image segmentation loss information based on the labeling information of each third sample image to the target object and each predicted segmentation image;
and training the pre-training model based on the image segmentation loss information to obtain an image segmentation model for the target object.
8. The method of claim 7, wherein the image segmentation loss information includes a plurality of classes of image segmentation sub-loss information;
the training the pre-training model based on the image segmentation loss information to obtain an image segmentation model for the target object, including:
determining target loss information based on the image segmentation sub-loss information of the plurality of classes;
and training the pre-training model based on the target loss information to obtain an image segmentation model for the target object.
9. The method of any of claims 1 to 8, wherein the first sample image comprises a medical image; and/or the presence of a gas in the gas,
the second sample image comprises a medical image; and/or the presence of a gas in the gas,
the third sample image comprises a medical image.
10. A network model training method, comprising:
generating a probability map corresponding to a target object based on labeling information of a plurality of first sample images for the target object;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the probability map and the plurality of second sample images to obtain an image recovery model.
11. The method according to claim 10, wherein the probability map comprises probabilities that respective pixel points in an image of a preset size belong to a target object;
the model training is performed by taking the image blocks masked off in each second sample image as targets to recover based on the probability atlas and the plurality of second sample images to obtain an image recovery model, and the method comprises the following steps:
negation operation is carried out on each probability included in the probability map to obtain a target map;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the target atlas and the plurality of second sample images to obtain an image recovery model.
12. A network model training apparatus comprising:
the first map determining module is used for generating a probability map corresponding to a target object based on the labeling information of a plurality of first sample images for the target object;
the pre-training module is used for carrying out model training by taking the image blocks which are masked off in the recovered second sample images as targets based on the probability map and the plurality of second sample images to obtain a pre-training model;
and the segmentation model training module is used for training the pre-training model based on the labeling information of the target object of the third sample images and the third sample images to obtain an image segmentation model for the target object.
13. The apparatus according to claim 12, wherein the probability map includes probabilities that respective pixel points in an image of a preset size belong to a target object;
the pre-training module is specifically configured to:
negation operation is carried out on each probability included in the probability map to obtain a target map;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the target atlas and the plurality of second sample images to obtain a pre-training model.
14. The apparatus of claim 13, wherein the pre-training module is specifically configured to:
for each second sample image, dividing the second sample image into a plurality of image blocks, and masking at least one image block;
inputting the residual image blocks of each second sample image into a pre-training model to be trained to obtain a prediction restoration image corresponding to each second sample image;
determining image restoration loss information based on each second sample image, each predicted and restored image and the target map;
and training the pre-training model to be trained by taking the image blocks masked out in each second sample image as targets based on the image recovery loss information to obtain the trained pre-training model.
15. The apparatus of claim 13 or 14, wherein the image of the preset size has the same resolution and size as the second sample image.
16. The apparatus according to any one of claims 12 to 15, wherein the first atlas determination module is specifically configured to:
for each first sample image, generating a mask image of the first sample image for a target object based on the labeling information of the first sample image for the target object;
and generating a probability map corresponding to the target object based on the mask image corresponding to each second sample image.
17. The apparatus of claim 16, wherein the first map determination module is specifically configured to:
performing first preprocessing operation on the first sample image to obtain a first image with a preset resolution;
performing second preprocessing operation on the first image to obtain a second image with a preset size;
and generating a mask image corresponding to the target object based on the labeling information of the first sample image for the target object and the second image.
18. The apparatus of any one of claims 12 to 17, wherein the segmentation model training module is specifically configured to:
dividing each third sample image into a plurality of image blocks, and inputting the image block corresponding to each third sample image into a pre-training model to obtain a predicted divided image corresponding to each third sample image;
determining image segmentation loss information based on the labeling information of each third sample image to the target object and each prediction segmentation image;
and training the pre-training model based on the image segmentation loss information to obtain an image segmentation model for the target object.
19. The apparatus of claim 18, wherein the image segmentation loss information comprises a plurality of classes of image segmentation sub-loss information;
the segmentation model training module is specifically configured to:
determining target loss information based on the image segmentation sub-loss information of the plurality of classes;
and training the pre-training model based on the target loss information to obtain an image segmentation model for the target object.
20. A network model training apparatus comprising:
the second map determining module is used for generating a probability map corresponding to the target object based on the labeling information of the plurality of first sample images aiming at the target object;
and the recovery model training module is used for performing model training by taking the masked image blocks in the recovered second sample images as targets to obtain an image recovery model based on the probability map and the plurality of second sample images.
21. The apparatus of claim 20, wherein the probability map comprises probabilities that respective pixel points in an image of a preset size belong to a target object;
the recovery model training module is specifically configured to:
negation operation is carried out on each probability included in the probability map to obtain a target map;
and performing model training by taking the image blocks masked out in the recovered second sample images as targets based on the target atlas and the plurality of second sample images to obtain an image recovery model.
22. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.
23. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 11.
24. A computer program product comprising computer programs/instructions, wherein the computer programs/instructions, when executed by a processor, implement the method of any of claims 1 to 11.
CN202210138535.XA 2022-02-15 2022-02-15 Network model training method, device, equipment, medium and program product Pending CN114463361A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210138535.XA CN114463361A (en) 2022-02-15 2022-02-15 Network model training method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210138535.XA CN114463361A (en) 2022-02-15 2022-02-15 Network model training method, device, equipment, medium and program product

Publications (1)

Publication Number Publication Date
CN114463361A true CN114463361A (en) 2022-05-10

Family

ID=81412753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210138535.XA Pending CN114463361A (en) 2022-02-15 2022-02-15 Network model training method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN114463361A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117474932A (en) * 2023-12-27 2024-01-30 苏州镁伽科技有限公司 Object segmentation method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117474932A (en) * 2023-12-27 2024-01-30 苏州镁伽科技有限公司 Object segmentation method and device, electronic equipment and storage medium
CN117474932B (en) * 2023-12-27 2024-03-19 苏州镁伽科技有限公司 Object segmentation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112966742A (en) Model training method, target detection method and device and electronic equipment
CN113963110B (en) Texture map generation method and device, electronic equipment and storage medium
CN114186632A (en) Method, device, equipment and storage medium for training key point detection model
CN113627536B (en) Model training, video classification method, device, equipment and storage medium
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
CN115565177B (en) Character recognition model training, character recognition method, device, equipment and medium
CN114549840A (en) Training method of semantic segmentation model and semantic segmentation method and device
CN114973279B (en) Training method and device for handwritten text image generation model and storage medium
CN113344089A (en) Model training method and device and electronic equipment
CN113393371A (en) Image processing method and device and electronic equipment
CN114511743B (en) Detection model training, target detection method, device, equipment, medium and product
CN115147680A (en) Pre-training method, device and equipment of target detection model
CN114078097A (en) Method and device for acquiring image defogging model and electronic equipment
CN114549904A (en) Visual processing and model training method, apparatus, storage medium, and program product
CN114494747A (en) Model training method, image processing method, device, electronic device and medium
CN114463361A (en) Network model training method, device, equipment, medium and program product
CN113361519B (en) Target processing method, training method of target processing model and device thereof
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium
CN113379592A (en) Method and device for processing sensitive area in picture and electronic equipment
CN113903071A (en) Face recognition method and device, electronic equipment and storage medium
CN114445668A (en) Image recognition method and device, electronic equipment and storage medium
CN114093006A (en) Training method, device and equipment of living human face detection model and storage medium
CN115641481A (en) Method and device for training image processing model and image processing
CN113947146A (en) Sample data generation method, model training method, image detection method and device
CN113361719A (en) Incremental learning method based on image processing model and image processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination