CN114022406A - Image segmentation method, system and terminal for semi-supervised learning - Google Patents
Image segmentation method, system and terminal for semi-supervised learning Download PDFInfo
- Publication number
- CN114022406A CN114022406A CN202111082414.XA CN202111082414A CN114022406A CN 114022406 A CN114022406 A CN 114022406A CN 202111082414 A CN202111082414 A CN 202111082414A CN 114022406 A CN114022406 A CN 114022406A
- Authority
- CN
- China
- Prior art keywords
- image
- data set
- image segmentation
- label
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003709 image segmentation Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000011218 segmentation Effects 0.000 claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 48
- 230000006870 function Effects 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 13
- 238000005381 potential energy Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 5
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 241000123240 Pseudophoxinus handlirschi Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001125 extrusion Methods 0.000 description 1
- 238000010304 firing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The application discloses an image segmentation method, a system and a terminal for semi-supervised learning, wherein an image segmentation network is trained by using a first data set with tagged images; inputting the data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set and the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next non-label image data set after training is finished, and predicting to generate a pseudo label. After generating pseudo labels and expanding to a training set each time, the network continuously learns new characteristics due to the addition of new training data until a trained segmentation network is finally obtained. The method realizes effective training of the segmentation network by jointly using a small amount of labeled data and a large amount of unlabeled data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on label data.
Description
Technical Field
The application relates to the technical field of image segmentation processing, in particular to an image segmentation method, an image segmentation system and an image segmentation terminal for semi-supervised learning.
Background
In recent years, a deep learning method represented by a deep convolutional neural network can automatically extract a large number of effective high-level features by learning a large number of labeled samples, thereby improving the tissue segmentation accuracy. The full convolution neural network can directly process the whole image, and end-to-end image segmentation is realized. In 2015, Ronneberger proposed UNet and applied it in the field of biomedical image segmentation, which was its first application in the field of biomedical image segmentation. Since UNet can combine high-level semantic information with low-level information, many researchers at home and abroad have applied UNet as a backbone network to many automatic organization segmentation tasks in recent years. Cicek expands UNet to the field of three-dimensional images. Christ et al used two cascaded UNet models to achieve tissue and tumor segmentation. Liu et al combines the improved UNet with an active contour boundary evolution method to achieve accurate segmentation of tissue CT images.
With the progress of research, researchers found that feature redundancy occurs when 3DUNet is used for image segmentation. The attention mechanism can increase the weight of the effective features, so that the network ignores irrelevant information and pays attention to effective information, and the capability of deep network learning image features is improved. Oktay uses an attention gating signal at the end of each hop junction layer of UNet to control the importance of features at different spatial locations. Hu et al propose another attention mechanism, namely squashed-and-Excitation Networks (SENet), which adaptively recalibrates the feature response of channel directions by explicitly modeling the interdependencies between feature channels, boosting the effective features and suppressing irrelevant features. The Roy inspired by SENet proposes a parallel spatial/channel squeeze and excitation module.
Currently, the image segmentation task is completed by deep learning in a fully supervised mode. The performance of the fully supervised deep learning approach depends to a large extent on the quantity and quality of the annotation data. Only a small number of images in the training data of the neural network are labeled, and a large number of images have no corresponding label, however, the labeling of the images is a time-consuming and labor-consuming process.
Disclosure of Invention
In order to solve the technical problems, the following technical scheme is provided:
in a first aspect, an embodiment of the present application provides an image segmentation method for semi-supervised learning, where the method includes: training an image segmentation network using a first dataset of tagged images; inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.
By adopting the implementation mode, after the pseudo label is generated and expanded to the training set each time, the network continuously learns new characteristics due to the addition of new training data until a trained segmentation network is finally obtained. The method realizes effective training of the segmentation network by jointly using a small amount of labeled data and a large amount of unlabeled data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on label data.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the inputting a data set of an unlabeled image into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled image includes: randomly dividing a data set of the unlabeled image into n sub-data sets; inputting the subdata sets into the image segmentation network in sequence, and continuously predicting the unlabeled images in the subdata by the image segmentation network to generate corresponding segmentation results; performing edge thinning processing on the segmentation result by using a fully connected conditional random field Dense CRF; and adding the pseudo label obtained by thinning into the sub data set to obtain a pseudo label data set.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the performing, by using a fully connected conditional random field density CRF, edge refinement processing on the segmentation result includes: representing an original image as a node graph model, wherein each pixel is a node in the graph; determining a connecting edge formed by connecting each pixel with all pixels; if the similarity of two pixels is higher, the two pixels are assigned to the same label; alternatively, if the two pixels are less similar, they may be assigned to different labels.
With reference to the first or second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the energy function of the density CRF is:
wherein psiu(fi)=-logP(fi) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f isiRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)i) The second term ψ is the probability that the prediction result of pixel i isp(fi,fj) Is the prediction result f on pixel i and pixel ji,fjThe binary potential energy is used for describing the relationship between the pixel points in the original image;
wherein, mu (f)i,fj) For tag-compatible items, fi≠fjWhen, mu (f)i,fj) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image.
With reference to the first aspect or any one of the first to the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the image segmentation network is a 3D scSE-UNet segmentation network, and the 3D scSE-UNet segmentation network is a U-shaped structure network and includes an encoding and decoding portion, where: the coding part extracts and analyzes the features of different levels and resolutions of the input image through convolution and downsampling operations; the decoding part is connected by jumping to fuse the information before down sampling, and generates a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, a scSE-block + module is added to the end of each jump connection layer of the decoding portion, and the scSE-block + module recalibrates the feature channel weight, strengthens a useful feature channel, and suppresses an irrelevant feature channel.
With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the scSE-block + module includes a channel SE module cSE-block and a space SE module sSE-block; the cSE-block adds a new parallel global max-pooling based channel attention, and the two parallel channel attentions select different feature tensors when compressing the spatial information and compressing the feature tensors into vectors, respectively.
With reference to the fourth possible implementation manner of the first aspect, in a seventh possible implementation manner of the first aspect, the 3D scSE-UNet model includes a plurality of downsampling layers in the coding path, image features of different layers are extracted from two convolutional layers in each downsampling layer, a ReLU function is used for activation in a convolution operation, and a parameter is reduced for a maximum pooled layer compression feature next to a preset step size.
With reference to the fourth or seventh possible implementation manner of the first aspect, in an eighth possible implementation manner of the first aspect, in a decoding path of the 3D scSE-UNet model, each layer includes an upsampling layer with a preset step size, and two convolutional layers are immediately followed, and a ReLU layer is connected to the back of each convolutional layer; fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and thinning the image by combining the features of the shallow layer and the deep layer; inputting the features after the jump connection fusion into scSE-block + to inhibit unimportant features so as to improve the accuracy of the segmentation result; and finally, outputting a prediction segmentation graph with the number of channels being the number of label categories by using the convolution layer.
In a second aspect, an embodiment of the present application provides an image segmentation system for semi-supervised learning, where the system includes: a training module for training an image segmentation network using a first dataset of tagged images; the first acquisition module is used for inputting a data set of a label-free image into the trained image segmentation network to acquire a pseudo label data set corresponding to the label-free image; the second acquisition module is used for combining the pseudo tag data set and the data set of the initial tagged image to obtain a second data set; and the third acquisition module is used for adopting the second data set to train the image segmentation network again, inputting the next batch of label-free image data sets after the training is finished, and predicting to generate a pseudo label.
Drawings
Fig. 1 is a schematic flowchart of an image segmentation method for semi-supervised learning according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram provided by an embodiment of the present application;
FIG. 3 is a schematic diagram provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of an image segmentation system for semi-supervised learning according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a terminal according to an embodiment of the present application.
Detailed Description
The present invention will be described with reference to the accompanying drawings and embodiments.
Fig. 1 is a schematic flowchart of an image segmentation method for semi-supervised learning according to an embodiment of the present application, and referring to fig. 1, the image segmentation method for semi-supervised learning according to the present embodiment includes:
s101, training an image segmentation network by using a first data set with tagged images.
In this embodiment, X represents a grayscale image, and Y represents a label image corresponding thereto. The method uses two data sets, one being a data set D with a labeled imageL={XL,YLIn which the label image YLIs a true value annotation from an expert on a grayscale image XLManual segmentation of (2); the other is a label-free image dataset DU={XUIn the data set, only grayscale image XUIts corresponding tag is unknown. By training the 3D scSE-UNet segmentation model, for XUPredicting to obtain a pseudo label YUExtended to unlabeled data set DUTo obtain D'U={XU,YU}。
And S102, inputting the data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image.
In the self-training semi-supervised learning segmentation process, after the features are extracted through convolution and downsampling by the 3D scSE-UNet segmentation network, although the feature map can be gradually upsampled to the size of an input image in a decoding stage, so that a segmentation result with the same size as an original image is output. But some characteristics are lost in the down-sampling process of the network, so that the boundary of the segmentation result is fuzzy. And errors are easily generated when predicting unmarked samples by the 3D scSE-UNet segmentation model. The fuzzy or wrong labels and the corresponding samples can be expanded to the model training after the training set participates, so that the quality of training data in the subsequent training process can be influenced to a certain extent, and the effect of the next iterative training is reduced. Eventually causing the errors to accumulate and even amplify them. In order to reduce the effect of error accumulation, it is necessary to improve the accuracy of the pseudo tag to the maximum extent.
In order to obtain more accurate pseudo label YUWe use fully connected conditional random fields[70]And post-processing the predicted image. The method comprises the steps of representing an original image as a node graph model, wherein each pixel is a node in the graph, and each pixel is connected with all pixels to form a connecting edge. For some two pixels in the image, if the similarity of the two pixels is high, the two pixels are assigned to the same label, namely the probability of being divided is low; if the two pixels have a low similarity, they will be assigned to different labels, i.e., they are more likely to be segmented. Therefore, the sense CRF is introduced in the self-training process, and the pseudo label obtained by each iteration in the self-training process is subjected to thinning processing, so that a more accurate and detailed pseudo label is obtained. The energy function of density CRF is shown in the formula.
Wherein psiu(fi)=-logP(fi) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f isiRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)i) The second term ψ is the probability that the prediction result of pixel i isp(fi,fj) Is the prediction result f on pixel i and pixel ji,fjThe binary potential energy is used for describing the relationship between the pixel points in the original image;
wherein, mu (f)i,fj) For tag-compatible items, fi≠fjWhen, mu (f)i,fj) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image. A binary potential energy function will more closely notice pixels with similar position x, similar intensity y but with a different label f. The relationship between the pixels can be considered more comprehensively by combining the unitary potential energy and the binary potential energy, and an optimized result can be obtained. The smaller the energy e (x) of the density CRF, the more accurate the predicted class label x.
Specifically, in the process of processing the segmentation result by using the Dense CRF, the energy function e (x) is iteratively minimized continuously, each image is inferred through 5 iterations, the class to which each voxel in the image belongs most probably is found, and a prediction result is obtained. And finally, adding each optimized segmentation result as a pseudo label into the iteration of self-training. By optimizing the prediction result of the segmentation network by the Dense CRF, the accuracy of the pseudo label is improved, so that the error between the pseudo label and the true value is reduced, the performance reduction of the segmentation model caused by excessive error accumulation is avoided, the quality of a training set in the self-training semi-supervised learning process is ensured, and the average gradient of the network learning is still approximately correct.
S103, combining the pseudo label data set and the data set of the initial labeled image to obtain a second data set.
And S104, adopting the second data set to train the image segmentation network again, inputting the next non-label image data set after training is finished, and predicting to generate a pseudo label.
Because the image boundary is fuzzy, the gradient is complex, and more high-resolution information is needed, the image segmentation network in the implementation is a 3D scSE-UNet segmentation network, and the 3D UNet comprises jump connection, so that the point can be well realized. The network structure is shown in fig. 2, and the numbers above the blocks in fig. 2 represent the number of channels of the characteristic diagram.
The overall framework of the 3D scSE-UNet is similar to that of the 3D UNet, and the overall framework also adopts a U-shaped structure and comprises two parts: an encoding and decoding section. The encoding part extracts and analyzes features of different levels and resolutions of an input image through convolution and downsampling operations, and the features are basically the same as 3D UNet, but low-resolution information after downsampling cannot be recognized by human eyes; the right decoding part corresponding to the image can be connected by jumping to fuse the information before down sampling, and generate a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image. We have made improvements in the decoding section, adding a scSE-block + at the end of each skip connection layer. The module recalibrates the weight of the characteristic channel, strengthens the useful characteristic channel, and inhibits the irrelevant characteristic channel, thereby improving the capability of automatically learning the effective characteristic of the segmentation network.
As shown in fig. 3, the scSE-block + module in the embodiment of the present application is a sum of a channel SE-block (cSE-block) and a spatial SE-block (sSE-block). cSE-block was improved. We added a new parallel global max pooling based channel attention on the basis of cSE-block. The two parallel channels pay attention to the fact that different feature tensors are selected when spatial information is compressed and feature tensors are compressed into vectors. The global max pooling layer maximizes the whole feature map, and can well retain image texture and edge features because the image edges may generate the largest feature values. The global average pooling layer takes the average value of the whole feature map as output, emphasizes the downsampling of the whole feature and can well reserve the background. Therefore, for the extraction of the image edge information, the global maximum pooling layer is more effective than the global average pooling layer. The two layers are used in parallel, so that two characteristics of the edge and the background can be reserved, and the effect of improving the performance of the model is obvious.
Specifically, cSE-block measures the importance of a channel by compressing the spatial information, only firing in the channel direction. The module enables the H multiplied by W multiplied by S multiplied by C feature maps to simultaneously pass through a global average pooling layer and a global maximum pooling layer in parallel, the two pooling layers respectively compress global space information on each channel into a tensor value, and 1 multiplied by C feature values are respectively generated. Then, three-dimensional convolution operation with convolution kernel of 1 × 1 × 1 and channel number of C is performed on the data, the convolution results are respectively passed through the nonlinear activation function ReLU, the obtained outputs are respectively passed through the same convolution operation, and finally two 1 × 1 × 1 × C tensors with different numerical values and the same dimension are obtained. After the two tensors are added, each value is normalized to a value domain [0,1] through a sigmoid layer, information in an unimportant channel can be obviously inhibited after the two values are multiplied by an original feature matrix, the information in the important channel is kept almost unchanged, and the extraction of effective features is promoted by phase change.
For image segmentation, the pixel-by-pixel spatial information provides more information, thus introducing sSE-blocks in parallel. It can measure the importance of spatial location by compressing the channel information, squeezing along the channel and exciting spatially. For an input feature map, the module realizes space extrusion operation through convolution, then normalizes the feature map through sigmoid, and finally multiplies the feature map by the original feature tensor. Thus, for a spatial location, if the information it contains is not important, it is multiplied by a smaller value and suppressed, and not suppressed otherwise.
And respectively carrying out improved cSE-block and sSE-block on the original characteristic diagram to generate a new characteristic diagram after recalibration, adding the two characteristic diagrams, and outputting a final characteristic diagram through an activation function. Therefore, the scSE-block + can be used for re-calibrating the characteristic diagrams along the channel and the space respectively and then combining and outputting the characteristic diagrams, so that the effective information of the characteristic diagrams in the two aspects of the space and the channel is effectively utilized, more fine information is provided for the network, and the representation of the model is favorably promoted.
In this embodiment, the 3D scSE-UNet partition network includes 4 downsampling layers in the coding path, each downsampling layer extracts image features of different layers from two convolutional layers, and the activation is performed by using the ReLU function in the convolution operation, and at the same time, a maximum pooling layer compression feature with a step size of 2 is followed to reduce the number of parameters. In the decoding path, each layer contains an upsampled layer with a step size of 2, followed by two 3 × 3 × 3 convolutional layers, each of which is still followed by a ReLU layer. And fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and refining the image by combining the features of the shallow layer and the deep layer. Inputting the features after jump connection fusion into scSE-block + to inhibit unimportant features and improve the accuracy of the segmentation result. The last layer is a convolution layer of 1 × 1 × 1, and a prediction partition map with the number of output channels being the number of label categories is output.
TABLE 3.13D scSE-UNet network architecture Table
The second column in the table indicates the size of the output characteristic of the current layer and the number of channels. In the third column [ ] indicates convolution operation, "3 × 3 × 3, 8" indicates convolution layers passing through convolution kernels of 3 × 3 × 3 and 8 channels in size. The "+" in "Dropout _1+ UpSampling3D _ 1" indicates that the Concatenate _1 is a jump connection between Dropout _1 and UpSampling3D _ 1.
TABLE 3.23D scSE-UNet training parameter settings
The parameters in the experimental setup are as in the second column of the table indicating the size of the output feature and the number of channels of the current layer. In the third column [ ] indicates convolution operation, "3 × 3 × 3, 8" indicates convolution layers passing through convolution kernels of 3 × 3 × 3 and 8 channels in size. The "+" in "Dropout _1+ UpSampling3D _ 1" indicates that the Concatenate _1 is a jump connection between Dropout _1 and UpSampling3D _ 1.
Table 3.2. In the training process, a data set of the unlabeled image is randomly divided into 5 sub-data sets which are sequentially input into a segmentation model for prediction, and a corresponding segmentation result is generated. A Dice loss function is used as an objective function in the training process, as shown in a formula 3.2, wherein N represents the total number of pixels on an image, pi represents the probability value that the ith pixel is predicted as a foreground, and gi represents the real label of the ith pixel.
An Adam optimizer is used to implement a gradient descent algorithm to find the network parameters that minimize the error function. In order to avoid the influence on the accuracy of segmenting the liver region by different methods due to different image sizes of the input network and to consider the video memory problem, the original image is resampled to be 128 × 128 × 64. According to the multiple parameter adjusting experiment results, the batch size is set to be 1, the epoch is set to be 150, the training is stopped when the maximum epoch number is reached, and the network learning rate is set to be 0.0001.
The above embodiments provide a semi-supervised learning organization segmentation method based on a self-training 3D scSE-UNet segmentation network. The proposed 3D scSE-UNet is characterized by the introduction of an improved scSE-block + in the 3D UNet. Compared with the classical scSE-block, the scSE-block + can better retain image edge information by including a Global Max Pooling (GMP), the characteristic learning capability of the image edge information is further improved, and the segmentation precision of the network is favorably improved. And a Dense CRF is added in the self-training process, and edge thinning processing is performed on the predicted pseudo label, so that the accuracy of the pseudo label generated by the semi-supervised segmentation network is improved. The method realizes effective training of the 3D scSE-UNet by jointly using a small amount of marked data and a large amount of unmarked data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on the label data.
Corresponding to the image segmentation method for semi-supervised learning provided by the above embodiment, the present application also provides an embodiment of an image segmentation system for semi-supervised learning.
Referring to fig. 4, the image segmentation system 20 for semi-supervised learning includes: a training module 201, a first acquisition module 202, a second acquisition module 203, and a third acquisition module 204.
The training module 201 is configured to train the image segmentation network using the first data set with the labeled image. The first obtaining module 202 is configured to input a data set of an unlabeled image into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled image. The second obtaining module 203 is configured to combine the pseudo tag data set and the data set of the initial tagged image to obtain a second data set. The third obtaining module 204 is configured to train the image segmentation network again by using the second data set, and input a next batch of unlabeled image data sets after the training is completed, so as to predict and generate a pseudo label.
The present application further provides an embodiment of a terminal, and referring to fig. 5, the terminal 30 includes: a processor 301, a memory 302, and a communication interface 303.
In fig. 5, the processor 301, the memory 302, and the communication interface 303 may be connected to each other by a bus; the bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The processor 301 generally controls the overall functions of the terminal 30, such as the start-up of the terminal 30 and the training of the image segmentation network using the first data set of tagged images after the terminal 30 is started up; inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.
The same and similar parts among the various embodiments in the specification of the present application may be referred to each other. Especially, for the system and terminal embodiments, since the method therein is basically similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the description in the method embodiments.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. An image segmentation method for semi-supervised learning, the method comprising:
training an image segmentation network using a first dataset of tagged images;
inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image;
merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set;
and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.
2. The image segmentation method for semi-supervised learning according to claim 1, wherein the inputting of the data set of unlabeled images into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled images comprises:
randomly dividing a data set of the unlabeled image into n sub-data sets;
inputting the subdata sets into the image segmentation network in sequence, and continuously predicting the unlabeled images in the subdata by the image segmentation network to generate corresponding segmentation results;
performing edge thinning processing on the segmentation result by using a fully connected conditional random field Dense CRF;
and adding the pseudo label obtained by thinning into the sub data set to obtain a pseudo label data set.
3. The image segmentation method for semi-supervised learning according to claim 2, wherein the edge refinement processing on the segmentation result by using a fully connected conditional random field Dense CRF comprises:
representing an original image as a node graph model, wherein each pixel is a node in the graph;
determining a connecting edge formed by connecting each pixel with all pixels;
if the similarity of two pixels is higher, the two pixels are assigned to the same label; alternatively, if the two pixels are less similar, they may be assigned to different labels.
4. The image segmentation method for semi-supervised learning according to claim 2 or 3, wherein the energy function of the Dense CRF is as follows:
wherein psiu(fi)=-logP(fi) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f isiRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)i) The second term ψ is the probability that the prediction result of pixel i isp(fi,fj) Is the prediction result f on pixel i and pixel ji,fjThe binary potential energy is used for describing the relationship between the pixel points in the original image;
wherein, mu (f)i,fj) For tag-compatible items, fi≠fjWhen, mu (f)i,fj) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image.
5. The image segmentation method for semi-supervised learning according to any one of claims 1 to 4, wherein the image segmentation network is a 3D scSE-UNet segmentation network, and the 3D scSE-UNet segmentation network is a U-shaped structure network and comprises an encoding and decoding part, wherein:
the coding part extracts and analyzes the features of different levels and resolutions of the input image through convolution and downsampling operations;
the decoding part is connected by jumping to fuse the information before down sampling, and generates a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image.
6. The image segmentation method for semi-supervised learning according to claim 5, wherein a scSE-block + module is added at the end of each jump connection layer of the decoding part, and the scSE-block + module recalibrates the weight of the feature channel, strengthens the useful feature channel, and suppresses the irrelevant feature channel.
7. The semi-supervised learning image segmentation method of claim 6, wherein the scSE-block + module comprises a channel SE module cSE-block and a space SE module sSE-block; the cSE-block adds a new parallel global max-pooling based channel attention, and the two parallel channel attentions select different feature tensors when compressing the spatial information and compressing the feature tensors into vectors, respectively.
8. The image segmentation method for semi-supervised learning according to claim 5, wherein the 3D scSE-UNet model comprises a plurality of downsampling layers in a coding path, image features of different levels are extracted from two convolution layers in each downsampling layer, a ReLU function is used for activation in a convolution operation, and meanwhile, the maximum pooling layer compression features next to a preset step size are reduced by parameters.
9. The image segmentation method for semi-supervised learning according to claim 5 or 8, wherein the 3D scSE-UNet model is in a decoding path, each layer comprises an upsampling layer with a preset step size, and two convolutional layers are followed, and each convolutional layer is followed by a ReLU layer; fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and thinning the image by combining the features of the shallow layer and the deep layer; inputting the features after the jump connection fusion into scSE-block + to inhibit unimportant features so as to improve the accuracy of the segmentation result; and finally, outputting a prediction segmentation graph with the number of channels being the number of label categories by using the convolution layer.
10. An image segmentation system for semi-supervised learning, the system comprising:
a training module for training an image segmentation network using a first dataset of tagged images;
the first acquisition module is used for inputting a data set of a label-free image into the trained image segmentation network to acquire a pseudo label data set corresponding to the label-free image;
the second acquisition module is used for combining the pseudo tag data set and the data set of the initial tagged image to obtain a second data set;
and the third acquisition module is used for adopting the second data set to train the image segmentation network again, inputting the next batch of label-free image data sets after the training is finished, and predicting to generate a pseudo label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111082414.XA CN114022406A (en) | 2021-09-15 | 2021-09-15 | Image segmentation method, system and terminal for semi-supervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111082414.XA CN114022406A (en) | 2021-09-15 | 2021-09-15 | Image segmentation method, system and terminal for semi-supervised learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114022406A true CN114022406A (en) | 2022-02-08 |
Family
ID=80054430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111082414.XA Pending CN114022406A (en) | 2021-09-15 | 2021-09-15 | Image segmentation method, system and terminal for semi-supervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114022406A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758125A (en) * | 2022-03-31 | 2022-07-15 | 江苏庆慈机械制造有限公司 | Gear surface defect detection method and system based on deep learning |
CN114782384A (en) * | 2022-04-28 | 2022-07-22 | 东南大学 | Heart chamber image segmentation method and device based on semi-supervision method |
CN114862770A (en) * | 2022-04-18 | 2022-08-05 | 华南理工大学 | Gastric cancer pathological section image segmentation prediction method based on SEnet |
CN115049945A (en) * | 2022-06-10 | 2022-09-13 | 安徽农业大学 | Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image |
CN115147426A (en) * | 2022-09-06 | 2022-10-04 | 北京大学 | Model training and image segmentation method and system based on semi-supervised learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112232416A (en) * | 2020-10-16 | 2021-01-15 | 浙江大学 | Semi-supervised learning method based on pseudo label weighting |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
CN112381098A (en) * | 2020-11-19 | 2021-02-19 | 上海交通大学 | Semi-supervised learning method and system based on self-learning in target segmentation field |
CN113159048A (en) * | 2021-04-23 | 2021-07-23 | 杭州电子科技大学 | Weak supervision semantic segmentation method based on deep learning |
-
2021
- 2021-09-15 CN CN202111082414.XA patent/CN114022406A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112232416A (en) * | 2020-10-16 | 2021-01-15 | 浙江大学 | Semi-supervised learning method based on pseudo label weighting |
CN112381098A (en) * | 2020-11-19 | 2021-02-19 | 上海交通大学 | Semi-supervised learning method and system based on self-learning in target segmentation field |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
CN113159048A (en) * | 2021-04-23 | 2021-07-23 | 杭州电子科技大学 | Weak supervision semantic segmentation method based on deep learning |
Non-Patent Citations (1)
Title |
---|
刘清清: "基于半监督深度学习的肝脏及肿瘤CT图像自动分割方法研究", 中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑, no. 2021, 15 August 2021 (2021-08-15), pages 25 - 31 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758125A (en) * | 2022-03-31 | 2022-07-15 | 江苏庆慈机械制造有限公司 | Gear surface defect detection method and system based on deep learning |
CN114862770A (en) * | 2022-04-18 | 2022-08-05 | 华南理工大学 | Gastric cancer pathological section image segmentation prediction method based on SEnet |
CN114862770B (en) * | 2022-04-18 | 2024-05-14 | 华南理工大学 | SENet-based gastric cancer pathological section image segmentation prediction method |
CN114782384A (en) * | 2022-04-28 | 2022-07-22 | 东南大学 | Heart chamber image segmentation method and device based on semi-supervision method |
CN114782384B (en) * | 2022-04-28 | 2024-06-18 | 东南大学 | Cardiac chamber image segmentation method and device based on semi-supervision method |
CN115049945A (en) * | 2022-06-10 | 2022-09-13 | 安徽农业大学 | Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image |
CN115049945B (en) * | 2022-06-10 | 2023-10-20 | 安徽农业大学 | Unmanned aerial vehicle image-based wheat lodging area extraction method and device |
CN115147426A (en) * | 2022-09-06 | 2022-10-04 | 北京大学 | Model training and image segmentation method and system based on semi-supervised learning |
CN115147426B (en) * | 2022-09-06 | 2022-11-29 | 北京大学 | Model training and image segmentation method and system based on semi-supervised learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114022406A (en) | Image segmentation method, system and terminal for semi-supervised learning | |
EP3298576B1 (en) | Training a neural network | |
CN109299373B (en) | Recommendation system based on graph convolution technology | |
CN108062754B (en) | Segmentation and identification method and device based on dense network image | |
CN109711481B (en) | Neural networks for drawing multi-label recognition, related methods, media and devices | |
AU2019451948B2 (en) | Real-time video ultra resolution | |
Chen et al. | Nas-dip: Learning deep image prior with neural architecture search | |
CN111079532A (en) | Video content description method based on text self-encoder | |
CN111382555B (en) | Data processing method, medium, device and computing equipment | |
CN111832570A (en) | Image semantic segmentation model training method and system | |
US20220254146A1 (en) | Method for filtering image feature points and terminal | |
CN112634296A (en) | RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism | |
US11948281B2 (en) | Guided up-sampling for image inpainting | |
CN111931779A (en) | Image information extraction and generation method based on condition predictable parameters | |
CN111696110A (en) | Scene segmentation method and system | |
CN113111716B (en) | Remote sensing image semiautomatic labeling method and device based on deep learning | |
CN113435430B (en) | Video behavior identification method, system and equipment based on self-adaptive space-time entanglement | |
CN113284155A (en) | Video object segmentation method and device, storage medium and electronic equipment | |
CN116612280A (en) | Vehicle segmentation method, device, computer equipment and computer readable storage medium | |
CN113393435B (en) | Video saliency detection method based on dynamic context sensing filter network | |
CN111914949B (en) | Zero sample learning model training method and device based on reinforcement learning | |
CN117593275A (en) | Medical image segmentation system | |
CN115082840B (en) | Action video classification method and device based on data combination and channel correlation | |
CN116844032A (en) | Target detection and identification method, device, equipment and medium in marine environment | |
CN116957964A (en) | Small sample image generation method and system based on diffusion model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |