CN114022406A - Image segmentation method, system and terminal for semi-supervised learning - Google Patents

Image segmentation method, system and terminal for semi-supervised learning Download PDF

Info

Publication number
CN114022406A
CN114022406A CN202111082414.XA CN202111082414A CN114022406A CN 114022406 A CN114022406 A CN 114022406A CN 202111082414 A CN202111082414 A CN 202111082414A CN 114022406 A CN114022406 A CN 114022406A
Authority
CN
China
Prior art keywords
image
data set
image segmentation
label
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111082414.XA
Other languages
Chinese (zh)
Inventor
周志勇
戴亚康
刘燕
耿辰
胡冀苏
钱旭升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Guoke Medical Engineering Technology Development Co ltd
Original Assignee
Jinan Guoke Medical Engineering Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Guoke Medical Engineering Technology Development Co ltd filed Critical Jinan Guoke Medical Engineering Technology Development Co ltd
Priority to CN202111082414.XA priority Critical patent/CN114022406A/en
Publication of CN114022406A publication Critical patent/CN114022406A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image segmentation method, a system and a terminal for semi-supervised learning, wherein an image segmentation network is trained by using a first data set with tagged images; inputting the data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set and the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next non-label image data set after training is finished, and predicting to generate a pseudo label. After generating pseudo labels and expanding to a training set each time, the network continuously learns new characteristics due to the addition of new training data until a trained segmentation network is finally obtained. The method realizes effective training of the segmentation network by jointly using a small amount of labeled data and a large amount of unlabeled data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on label data.

Description

Image segmentation method, system and terminal for semi-supervised learning
Technical Field
The application relates to the technical field of image segmentation processing, in particular to an image segmentation method, an image segmentation system and an image segmentation terminal for semi-supervised learning.
Background
In recent years, a deep learning method represented by a deep convolutional neural network can automatically extract a large number of effective high-level features by learning a large number of labeled samples, thereby improving the tissue segmentation accuracy. The full convolution neural network can directly process the whole image, and end-to-end image segmentation is realized. In 2015, Ronneberger proposed UNet and applied it in the field of biomedical image segmentation, which was its first application in the field of biomedical image segmentation. Since UNet can combine high-level semantic information with low-level information, many researchers at home and abroad have applied UNet as a backbone network to many automatic organization segmentation tasks in recent years. Cicek expands UNet to the field of three-dimensional images. Christ et al used two cascaded UNet models to achieve tissue and tumor segmentation. Liu et al combines the improved UNet with an active contour boundary evolution method to achieve accurate segmentation of tissue CT images.
With the progress of research, researchers found that feature redundancy occurs when 3DUNet is used for image segmentation. The attention mechanism can increase the weight of the effective features, so that the network ignores irrelevant information and pays attention to effective information, and the capability of deep network learning image features is improved. Oktay uses an attention gating signal at the end of each hop junction layer of UNet to control the importance of features at different spatial locations. Hu et al propose another attention mechanism, namely squashed-and-Excitation Networks (SENet), which adaptively recalibrates the feature response of channel directions by explicitly modeling the interdependencies between feature channels, boosting the effective features and suppressing irrelevant features. The Roy inspired by SENet proposes a parallel spatial/channel squeeze and excitation module.
Currently, the image segmentation task is completed by deep learning in a fully supervised mode. The performance of the fully supervised deep learning approach depends to a large extent on the quantity and quality of the annotation data. Only a small number of images in the training data of the neural network are labeled, and a large number of images have no corresponding label, however, the labeling of the images is a time-consuming and labor-consuming process.
Disclosure of Invention
In order to solve the technical problems, the following technical scheme is provided:
in a first aspect, an embodiment of the present application provides an image segmentation method for semi-supervised learning, where the method includes: training an image segmentation network using a first dataset of tagged images; inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.
By adopting the implementation mode, after the pseudo label is generated and expanded to the training set each time, the network continuously learns new characteristics due to the addition of new training data until a trained segmentation network is finally obtained. The method realizes effective training of the segmentation network by jointly using a small amount of labeled data and a large amount of unlabeled data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on label data.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the inputting a data set of an unlabeled image into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled image includes: randomly dividing a data set of the unlabeled image into n sub-data sets; inputting the subdata sets into the image segmentation network in sequence, and continuously predicting the unlabeled images in the subdata by the image segmentation network to generate corresponding segmentation results; performing edge thinning processing on the segmentation result by using a fully connected conditional random field Dense CRF; and adding the pseudo label obtained by thinning into the sub data set to obtain a pseudo label data set.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the performing, by using a fully connected conditional random field density CRF, edge refinement processing on the segmentation result includes: representing an original image as a node graph model, wherein each pixel is a node in the graph; determining a connecting edge formed by connecting each pixel with all pixels; if the similarity of two pixels is higher, the two pixels are assigned to the same label; alternatively, if the two pixels are less similar, they may be assigned to different labels.
With reference to the first or second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the energy function of the density CRF is:
Figure RE-GDA0003418402650000031
wherein psiu(fi)=-logP(fi) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f isiRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)i) The second term ψ is the probability that the prediction result of pixel i isp(fi,fj) Is the prediction result f on pixel i and pixel ji,fjThe binary potential energy is used for describing the relationship between the pixel points in the original image;
Figure RE-GDA0003418402650000032
wherein, mu (f)i,fj) For tag-compatible items, fi≠fjWhen, mu (f)i,fj) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image.
With reference to the first aspect or any one of the first to the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the image segmentation network is a 3D scSE-UNet segmentation network, and the 3D scSE-UNet segmentation network is a U-shaped structure network and includes an encoding and decoding portion, where: the coding part extracts and analyzes the features of different levels and resolutions of the input image through convolution and downsampling operations; the decoding part is connected by jumping to fuse the information before down sampling, and generates a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, a scSE-block + module is added to the end of each jump connection layer of the decoding portion, and the scSE-block + module recalibrates the feature channel weight, strengthens a useful feature channel, and suppresses an irrelevant feature channel.
With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the scSE-block + module includes a channel SE module cSE-block and a space SE module sSE-block; the cSE-block adds a new parallel global max-pooling based channel attention, and the two parallel channel attentions select different feature tensors when compressing the spatial information and compressing the feature tensors into vectors, respectively.
With reference to the fourth possible implementation manner of the first aspect, in a seventh possible implementation manner of the first aspect, the 3D scSE-UNet model includes a plurality of downsampling layers in the coding path, image features of different layers are extracted from two convolutional layers in each downsampling layer, a ReLU function is used for activation in a convolution operation, and a parameter is reduced for a maximum pooled layer compression feature next to a preset step size.
With reference to the fourth or seventh possible implementation manner of the first aspect, in an eighth possible implementation manner of the first aspect, in a decoding path of the 3D scSE-UNet model, each layer includes an upsampling layer with a preset step size, and two convolutional layers are immediately followed, and a ReLU layer is connected to the back of each convolutional layer; fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and thinning the image by combining the features of the shallow layer and the deep layer; inputting the features after the jump connection fusion into scSE-block + to inhibit unimportant features so as to improve the accuracy of the segmentation result; and finally, outputting a prediction segmentation graph with the number of channels being the number of label categories by using the convolution layer.
In a second aspect, an embodiment of the present application provides an image segmentation system for semi-supervised learning, where the system includes: a training module for training an image segmentation network using a first dataset of tagged images; the first acquisition module is used for inputting a data set of a label-free image into the trained image segmentation network to acquire a pseudo label data set corresponding to the label-free image; the second acquisition module is used for combining the pseudo tag data set and the data set of the initial tagged image to obtain a second data set; and the third acquisition module is used for adopting the second data set to train the image segmentation network again, inputting the next batch of label-free image data sets after the training is finished, and predicting to generate a pseudo label.
Drawings
Fig. 1 is a schematic flowchart of an image segmentation method for semi-supervised learning according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram provided by an embodiment of the present application;
FIG. 3 is a schematic diagram provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of an image segmentation system for semi-supervised learning according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a terminal according to an embodiment of the present application.
Detailed Description
The present invention will be described with reference to the accompanying drawings and embodiments.
Fig. 1 is a schematic flowchart of an image segmentation method for semi-supervised learning according to an embodiment of the present application, and referring to fig. 1, the image segmentation method for semi-supervised learning according to the present embodiment includes:
s101, training an image segmentation network by using a first data set with tagged images.
In this embodiment, X represents a grayscale image, and Y represents a label image corresponding thereto. The method uses two data sets, one being a data set D with a labeled imageL={XL,YLIn which the label image YLIs a true value annotation from an expert on a grayscale image XLManual segmentation of (2); the other is a label-free image dataset DU={XUIn the data set, only grayscale image XUIts corresponding tag is unknown. By training the 3D scSE-UNet segmentation model, for XUPredicting to obtain a pseudo label YUExtended to unlabeled data set DUTo obtain D'U={XU,YU}。
And S102, inputting the data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image.
In the self-training semi-supervised learning segmentation process, after the features are extracted through convolution and downsampling by the 3D scSE-UNet segmentation network, although the feature map can be gradually upsampled to the size of an input image in a decoding stage, so that a segmentation result with the same size as an original image is output. But some characteristics are lost in the down-sampling process of the network, so that the boundary of the segmentation result is fuzzy. And errors are easily generated when predicting unmarked samples by the 3D scSE-UNet segmentation model. The fuzzy or wrong labels and the corresponding samples can be expanded to the model training after the training set participates, so that the quality of training data in the subsequent training process can be influenced to a certain extent, and the effect of the next iterative training is reduced. Eventually causing the errors to accumulate and even amplify them. In order to reduce the effect of error accumulation, it is necessary to improve the accuracy of the pseudo tag to the maximum extent.
In order to obtain more accurate pseudo label YUWe use fully connected conditional random fields[70]And post-processing the predicted image. The method comprises the steps of representing an original image as a node graph model, wherein each pixel is a node in the graph, and each pixel is connected with all pixels to form a connecting edge. For some two pixels in the image, if the similarity of the two pixels is high, the two pixels are assigned to the same label, namely the probability of being divided is low; if the two pixels have a low similarity, they will be assigned to different labels, i.e., they are more likely to be segmented. Therefore, the sense CRF is introduced in the self-training process, and the pseudo label obtained by each iteration in the self-training process is subjected to thinning processing, so that a more accurate and detailed pseudo label is obtained. The energy function of density CRF is shown in the formula.
Figure RE-GDA0003418402650000061
Wherein psiu(fi)=-logP(fi) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f isiRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)i) The second term ψ is the probability that the prediction result of pixel i isp(fi,fj) Is the prediction result f on pixel i and pixel ji,fjThe binary potential energy is used for describing the relationship between the pixel points in the original image;
Figure RE-GDA0003418402650000071
wherein, mu (f)i,fj) For tag-compatible items, fi≠fjWhen, mu (f)i,fj) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image. A binary potential energy function will more closely notice pixels with similar position x, similar intensity y but with a different label f. The relationship between the pixels can be considered more comprehensively by combining the unitary potential energy and the binary potential energy, and an optimized result can be obtained. The smaller the energy e (x) of the density CRF, the more accurate the predicted class label x.
Specifically, in the process of processing the segmentation result by using the Dense CRF, the energy function e (x) is iteratively minimized continuously, each image is inferred through 5 iterations, the class to which each voxel in the image belongs most probably is found, and a prediction result is obtained. And finally, adding each optimized segmentation result as a pseudo label into the iteration of self-training. By optimizing the prediction result of the segmentation network by the Dense CRF, the accuracy of the pseudo label is improved, so that the error between the pseudo label and the true value is reduced, the performance reduction of the segmentation model caused by excessive error accumulation is avoided, the quality of a training set in the self-training semi-supervised learning process is ensured, and the average gradient of the network learning is still approximately correct.
S103, combining the pseudo label data set and the data set of the initial labeled image to obtain a second data set.
And S104, adopting the second data set to train the image segmentation network again, inputting the next non-label image data set after training is finished, and predicting to generate a pseudo label.
Because the image boundary is fuzzy, the gradient is complex, and more high-resolution information is needed, the image segmentation network in the implementation is a 3D scSE-UNet segmentation network, and the 3D UNet comprises jump connection, so that the point can be well realized. The network structure is shown in fig. 2, and the numbers above the blocks in fig. 2 represent the number of channels of the characteristic diagram.
The overall framework of the 3D scSE-UNet is similar to that of the 3D UNet, and the overall framework also adopts a U-shaped structure and comprises two parts: an encoding and decoding section. The encoding part extracts and analyzes features of different levels and resolutions of an input image through convolution and downsampling operations, and the features are basically the same as 3D UNet, but low-resolution information after downsampling cannot be recognized by human eyes; the right decoding part corresponding to the image can be connected by jumping to fuse the information before down sampling, and generate a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image. We have made improvements in the decoding section, adding a scSE-block + at the end of each skip connection layer. The module recalibrates the weight of the characteristic channel, strengthens the useful characteristic channel, and inhibits the irrelevant characteristic channel, thereby improving the capability of automatically learning the effective characteristic of the segmentation network.
As shown in fig. 3, the scSE-block + module in the embodiment of the present application is a sum of a channel SE-block (cSE-block) and a spatial SE-block (sSE-block). cSE-block was improved. We added a new parallel global max pooling based channel attention on the basis of cSE-block. The two parallel channels pay attention to the fact that different feature tensors are selected when spatial information is compressed and feature tensors are compressed into vectors. The global max pooling layer maximizes the whole feature map, and can well retain image texture and edge features because the image edges may generate the largest feature values. The global average pooling layer takes the average value of the whole feature map as output, emphasizes the downsampling of the whole feature and can well reserve the background. Therefore, for the extraction of the image edge information, the global maximum pooling layer is more effective than the global average pooling layer. The two layers are used in parallel, so that two characteristics of the edge and the background can be reserved, and the effect of improving the performance of the model is obvious.
Specifically, cSE-block measures the importance of a channel by compressing the spatial information, only firing in the channel direction. The module enables the H multiplied by W multiplied by S multiplied by C feature maps to simultaneously pass through a global average pooling layer and a global maximum pooling layer in parallel, the two pooling layers respectively compress global space information on each channel into a tensor value, and 1 multiplied by C feature values are respectively generated. Then, three-dimensional convolution operation with convolution kernel of 1 × 1 × 1 and channel number of C is performed on the data, the convolution results are respectively passed through the nonlinear activation function ReLU, the obtained outputs are respectively passed through the same convolution operation, and finally two 1 × 1 × 1 × C tensors with different numerical values and the same dimension are obtained. After the two tensors are added, each value is normalized to a value domain [0,1] through a sigmoid layer, information in an unimportant channel can be obviously inhibited after the two values are multiplied by an original feature matrix, the information in the important channel is kept almost unchanged, and the extraction of effective features is promoted by phase change.
For image segmentation, the pixel-by-pixel spatial information provides more information, thus introducing sSE-blocks in parallel. It can measure the importance of spatial location by compressing the channel information, squeezing along the channel and exciting spatially. For an input feature map, the module realizes space extrusion operation through convolution, then normalizes the feature map through sigmoid, and finally multiplies the feature map by the original feature tensor. Thus, for a spatial location, if the information it contains is not important, it is multiplied by a smaller value and suppressed, and not suppressed otherwise.
And respectively carrying out improved cSE-block and sSE-block on the original characteristic diagram to generate a new characteristic diagram after recalibration, adding the two characteristic diagrams, and outputting a final characteristic diagram through an activation function. Therefore, the scSE-block + can be used for re-calibrating the characteristic diagrams along the channel and the space respectively and then combining and outputting the characteristic diagrams, so that the effective information of the characteristic diagrams in the two aspects of the space and the channel is effectively utilized, more fine information is provided for the network, and the representation of the model is favorably promoted.
In this embodiment, the 3D scSE-UNet partition network includes 4 downsampling layers in the coding path, each downsampling layer extracts image features of different layers from two convolutional layers, and the activation is performed by using the ReLU function in the convolution operation, and at the same time, a maximum pooling layer compression feature with a step size of 2 is followed to reduce the number of parameters. In the decoding path, each layer contains an upsampled layer with a step size of 2, followed by two 3 × 3 × 3 convolutional layers, each of which is still followed by a ReLU layer. And fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and refining the image by combining the features of the shallow layer and the deep layer. Inputting the features after jump connection fusion into scSE-block + to inhibit unimportant features and improve the accuracy of the segmentation result. The last layer is a convolution layer of 1 × 1 × 1, and a prediction partition map with the number of output channels being the number of label categories is output.
TABLE 3.13D scSE-UNet network architecture Table
Figure RE-GDA0003418402650000101
Figure RE-GDA0003418402650000111
The second column in the table indicates the size of the output characteristic of the current layer and the number of channels. In the third column [ ] indicates convolution operation, "3 × 3 × 3, 8" indicates convolution layers passing through convolution kernels of 3 × 3 × 3 and 8 channels in size. The "+" in "Dropout _1+ UpSampling3D _ 1" indicates that the Concatenate _1 is a jump connection between Dropout _1 and UpSampling3D _ 1.
TABLE 3.23D scSE-UNet training parameter settings
Figure RE-GDA0003418402650000112
The parameters in the experimental setup are as in the second column of the table indicating the size of the output feature and the number of channels of the current layer. In the third column [ ] indicates convolution operation, "3 × 3 × 3, 8" indicates convolution layers passing through convolution kernels of 3 × 3 × 3 and 8 channels in size. The "+" in "Dropout _1+ UpSampling3D _ 1" indicates that the Concatenate _1 is a jump connection between Dropout _1 and UpSampling3D _ 1.
Table 3.2. In the training process, a data set of the unlabeled image is randomly divided into 5 sub-data sets which are sequentially input into a segmentation model for prediction, and a corresponding segmentation result is generated. A Dice loss function is used as an objective function in the training process, as shown in a formula 3.2, wherein N represents the total number of pixels on an image, pi represents the probability value that the ith pixel is predicted as a foreground, and gi represents the real label of the ith pixel.
Figure RE-GDA0003418402650000121
An Adam optimizer is used to implement a gradient descent algorithm to find the network parameters that minimize the error function. In order to avoid the influence on the accuracy of segmenting the liver region by different methods due to different image sizes of the input network and to consider the video memory problem, the original image is resampled to be 128 × 128 × 64. According to the multiple parameter adjusting experiment results, the batch size is set to be 1, the epoch is set to be 150, the training is stopped when the maximum epoch number is reached, and the network learning rate is set to be 0.0001.
The above embodiments provide a semi-supervised learning organization segmentation method based on a self-training 3D scSE-UNet segmentation network. The proposed 3D scSE-UNet is characterized by the introduction of an improved scSE-block + in the 3D UNet. Compared with the classical scSE-block, the scSE-block + can better retain image edge information by including a Global Max Pooling (GMP), the characteristic learning capability of the image edge information is further improved, and the segmentation precision of the network is favorably improved. And a Dense CRF is added in the self-training process, and edge thinning processing is performed on the predicted pseudo label, so that the accuracy of the pseudo label generated by the semi-supervised segmentation network is improved. The method realizes effective training of the 3D scSE-UNet by jointly using a small amount of marked data and a large amount of unmarked data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on the label data.
Corresponding to the image segmentation method for semi-supervised learning provided by the above embodiment, the present application also provides an embodiment of an image segmentation system for semi-supervised learning.
Referring to fig. 4, the image segmentation system 20 for semi-supervised learning includes: a training module 201, a first acquisition module 202, a second acquisition module 203, and a third acquisition module 204.
The training module 201 is configured to train the image segmentation network using the first data set with the labeled image. The first obtaining module 202 is configured to input a data set of an unlabeled image into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled image. The second obtaining module 203 is configured to combine the pseudo tag data set and the data set of the initial tagged image to obtain a second data set. The third obtaining module 204 is configured to train the image segmentation network again by using the second data set, and input a next batch of unlabeled image data sets after the training is completed, so as to predict and generate a pseudo label.
The present application further provides an embodiment of a terminal, and referring to fig. 5, the terminal 30 includes: a processor 301, a memory 302, and a communication interface 303.
In fig. 5, the processor 301, the memory 302, and the communication interface 303 may be connected to each other by a bus; the bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The processor 301 generally controls the overall functions of the terminal 30, such as the start-up of the terminal 30 and the training of the image segmentation network using the first data set of tagged images after the terminal 30 is started up; inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.
The same and similar parts among the various embodiments in the specification of the present application may be referred to each other. Especially, for the system and terminal embodiments, since the method therein is basically similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the description in the method embodiments.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. An image segmentation method for semi-supervised learning, the method comprising:
training an image segmentation network using a first dataset of tagged images;
inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image;
merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set;
and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.
2. The image segmentation method for semi-supervised learning according to claim 1, wherein the inputting of the data set of unlabeled images into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled images comprises:
randomly dividing a data set of the unlabeled image into n sub-data sets;
inputting the subdata sets into the image segmentation network in sequence, and continuously predicting the unlabeled images in the subdata by the image segmentation network to generate corresponding segmentation results;
performing edge thinning processing on the segmentation result by using a fully connected conditional random field Dense CRF;
and adding the pseudo label obtained by thinning into the sub data set to obtain a pseudo label data set.
3. The image segmentation method for semi-supervised learning according to claim 2, wherein the edge refinement processing on the segmentation result by using a fully connected conditional random field Dense CRF comprises:
representing an original image as a node graph model, wherein each pixel is a node in the graph;
determining a connecting edge formed by connecting each pixel with all pixels;
if the similarity of two pixels is higher, the two pixels are assigned to the same label; alternatively, if the two pixels are less similar, they may be assigned to different labels.
4. The image segmentation method for semi-supervised learning according to claim 2 or 3, wherein the energy function of the Dense CRF is as follows:
Figure FDA0003264447400000021
wherein psiu(fi)=-logP(fi) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f isiRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)i) The second term ψ is the probability that the prediction result of pixel i isp(fi,fj) Is the prediction result f on pixel i and pixel ji,fjThe binary potential energy is used for describing the relationship between the pixel points in the original image;
Figure FDA0003264447400000022
wherein, mu (f)i,fj) For tag-compatible items, fi≠fjWhen, mu (f)i,fj) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image.
5. The image segmentation method for semi-supervised learning according to any one of claims 1 to 4, wherein the image segmentation network is a 3D scSE-UNet segmentation network, and the 3D scSE-UNet segmentation network is a U-shaped structure network and comprises an encoding and decoding part, wherein:
the coding part extracts and analyzes the features of different levels and resolutions of the input image through convolution and downsampling operations;
the decoding part is connected by jumping to fuse the information before down sampling, and generates a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image.
6. The image segmentation method for semi-supervised learning according to claim 5, wherein a scSE-block + module is added at the end of each jump connection layer of the decoding part, and the scSE-block + module recalibrates the weight of the feature channel, strengthens the useful feature channel, and suppresses the irrelevant feature channel.
7. The semi-supervised learning image segmentation method of claim 6, wherein the scSE-block + module comprises a channel SE module cSE-block and a space SE module sSE-block; the cSE-block adds a new parallel global max-pooling based channel attention, and the two parallel channel attentions select different feature tensors when compressing the spatial information and compressing the feature tensors into vectors, respectively.
8. The image segmentation method for semi-supervised learning according to claim 5, wherein the 3D scSE-UNet model comprises a plurality of downsampling layers in a coding path, image features of different levels are extracted from two convolution layers in each downsampling layer, a ReLU function is used for activation in a convolution operation, and meanwhile, the maximum pooling layer compression features next to a preset step size are reduced by parameters.
9. The image segmentation method for semi-supervised learning according to claim 5 or 8, wherein the 3D scSE-UNet model is in a decoding path, each layer comprises an upsampling layer with a preset step size, and two convolutional layers are followed, and each convolutional layer is followed by a ReLU layer; fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and thinning the image by combining the features of the shallow layer and the deep layer; inputting the features after the jump connection fusion into scSE-block + to inhibit unimportant features so as to improve the accuracy of the segmentation result; and finally, outputting a prediction segmentation graph with the number of channels being the number of label categories by using the convolution layer.
10. An image segmentation system for semi-supervised learning, the system comprising:
a training module for training an image segmentation network using a first dataset of tagged images;
the first acquisition module is used for inputting a data set of a label-free image into the trained image segmentation network to acquire a pseudo label data set corresponding to the label-free image;
the second acquisition module is used for combining the pseudo tag data set and the data set of the initial tagged image to obtain a second data set;
and the third acquisition module is used for adopting the second data set to train the image segmentation network again, inputting the next batch of label-free image data sets after the training is finished, and predicting to generate a pseudo label.
CN202111082414.XA 2021-09-15 2021-09-15 Image segmentation method, system and terminal for semi-supervised learning Pending CN114022406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111082414.XA CN114022406A (en) 2021-09-15 2021-09-15 Image segmentation method, system and terminal for semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111082414.XA CN114022406A (en) 2021-09-15 2021-09-15 Image segmentation method, system and terminal for semi-supervised learning

Publications (1)

Publication Number Publication Date
CN114022406A true CN114022406A (en) 2022-02-08

Family

ID=80054430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111082414.XA Pending CN114022406A (en) 2021-09-15 2021-09-15 Image segmentation method, system and terminal for semi-supervised learning

Country Status (1)

Country Link
CN (1) CN114022406A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758125A (en) * 2022-03-31 2022-07-15 江苏庆慈机械制造有限公司 Gear surface defect detection method and system based on deep learning
CN114782384A (en) * 2022-04-28 2022-07-22 东南大学 Heart chamber image segmentation method and device based on semi-supervision method
CN114862770A (en) * 2022-04-18 2022-08-05 华南理工大学 Gastric cancer pathological section image segmentation prediction method based on SEnet
CN115049945A (en) * 2022-06-10 2022-09-13 安徽农业大学 Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image
CN115147426A (en) * 2022-09-06 2022-10-04 北京大学 Model training and image segmentation method and system based on semi-supervised learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232416A (en) * 2020-10-16 2021-01-15 浙江大学 Semi-supervised learning method based on pseudo label weighting
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN112381098A (en) * 2020-11-19 2021-02-19 上海交通大学 Semi-supervised learning method and system based on self-learning in target segmentation field
CN113159048A (en) * 2021-04-23 2021-07-23 杭州电子科技大学 Weak supervision semantic segmentation method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232416A (en) * 2020-10-16 2021-01-15 浙江大学 Semi-supervised learning method based on pseudo label weighting
CN112381098A (en) * 2020-11-19 2021-02-19 上海交通大学 Semi-supervised learning method and system based on self-learning in target segmentation field
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN113159048A (en) * 2021-04-23 2021-07-23 杭州电子科技大学 Weak supervision semantic segmentation method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘清清: "基于半监督深度学习的肝脏及肿瘤CT图像自动分割方法研究", 中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑, no. 2021, 15 August 2021 (2021-08-15), pages 25 - 31 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758125A (en) * 2022-03-31 2022-07-15 江苏庆慈机械制造有限公司 Gear surface defect detection method and system based on deep learning
CN114862770A (en) * 2022-04-18 2022-08-05 华南理工大学 Gastric cancer pathological section image segmentation prediction method based on SEnet
CN114862770B (en) * 2022-04-18 2024-05-14 华南理工大学 SENet-based gastric cancer pathological section image segmentation prediction method
CN114782384A (en) * 2022-04-28 2022-07-22 东南大学 Heart chamber image segmentation method and device based on semi-supervision method
CN114782384B (en) * 2022-04-28 2024-06-18 东南大学 Cardiac chamber image segmentation method and device based on semi-supervision method
CN115049945A (en) * 2022-06-10 2022-09-13 安徽农业大学 Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image
CN115049945B (en) * 2022-06-10 2023-10-20 安徽农业大学 Unmanned aerial vehicle image-based wheat lodging area extraction method and device
CN115147426A (en) * 2022-09-06 2022-10-04 北京大学 Model training and image segmentation method and system based on semi-supervised learning
CN115147426B (en) * 2022-09-06 2022-11-29 北京大学 Model training and image segmentation method and system based on semi-supervised learning

Similar Documents

Publication Publication Date Title
CN114022406A (en) Image segmentation method, system and terminal for semi-supervised learning
EP3298576B1 (en) Training a neural network
CN109299373B (en) Recommendation system based on graph convolution technology
CN108062754B (en) Segmentation and identification method and device based on dense network image
CN109711481B (en) Neural networks for drawing multi-label recognition, related methods, media and devices
AU2019451948B2 (en) Real-time video ultra resolution
Chen et al. Nas-dip: Learning deep image prior with neural architecture search
CN111079532A (en) Video content description method based on text self-encoder
CN111382555B (en) Data processing method, medium, device and computing equipment
CN111832570A (en) Image semantic segmentation model training method and system
US20220254146A1 (en) Method for filtering image feature points and terminal
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
US11948281B2 (en) Guided up-sampling for image inpainting
CN111931779A (en) Image information extraction and generation method based on condition predictable parameters
CN111696110A (en) Scene segmentation method and system
CN113111716B (en) Remote sensing image semiautomatic labeling method and device based on deep learning
CN113435430B (en) Video behavior identification method, system and equipment based on self-adaptive space-time entanglement
CN113284155A (en) Video object segmentation method and device, storage medium and electronic equipment
CN116612280A (en) Vehicle segmentation method, device, computer equipment and computer readable storage medium
CN113393435B (en) Video saliency detection method based on dynamic context sensing filter network
CN111914949B (en) Zero sample learning model training method and device based on reinforcement learning
CN117593275A (en) Medical image segmentation system
CN115082840B (en) Action video classification method and device based on data combination and channel correlation
CN116844032A (en) Target detection and identification method, device, equipment and medium in marine environment
CN116957964A (en) Small sample image generation method and system based on diffusion model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination