CN114022406A

CN114022406A - Image segmentation method, system and terminal for semi-supervised learning

Info

Publication number: CN114022406A
Application number: CN202111082414.XA
Authority: CN
Inventors: 周志勇; 戴亚康; 刘燕; 耿辰; 胡冀苏; 钱旭升
Original assignee: Jinan Guoke Medical Engineering Technology Development Co ltd
Current assignee: Jinan Guoke Medical Engineering Technology Development Co ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2022-02-08

Abstract

The application discloses an image segmentation method, a system and a terminal for semi-supervised learning, wherein an image segmentation network is trained by using a first data set with tagged images; inputting the data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set and the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next non-label image data set after training is finished, and predicting to generate a pseudo label. After generating pseudo labels and expanding to a training set each time, the network continuously learns new characteristics due to the addition of new training data until a trained segmentation network is finally obtained. The method realizes effective training of the segmentation network by jointly using a small amount of labeled data and a large amount of unlabeled data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on label data.

Description

Image segmentation method, system and terminal for semi-supervised learning

Technical Field

The application relates to the technical field of image segmentation processing, in particular to an image segmentation method, an image segmentation system and an image segmentation terminal for semi-supervised learning.

Background

In recent years, a deep learning method represented by a deep convolutional neural network can automatically extract a large number of effective high-level features by learning a large number of labeled samples, thereby improving the tissue segmentation accuracy. The full convolution neural network can directly process the whole image, and end-to-end image segmentation is realized. In 2015, Ronneberger proposed UNet and applied it in the field of biomedical image segmentation, which was its first application in the field of biomedical image segmentation. Since UNet can combine high-level semantic information with low-level information, many researchers at home and abroad have applied UNet as a backbone network to many automatic organization segmentation tasks in recent years. Cicek expands UNet to the field of three-dimensional images. Christ et al used two cascaded UNet models to achieve tissue and tumor segmentation. Liu et al combines the improved UNet with an active contour boundary evolution method to achieve accurate segmentation of tissue CT images.

With the progress of research, researchers found that feature redundancy occurs when 3DUNet is used for image segmentation. The attention mechanism can increase the weight of the effective features, so that the network ignores irrelevant information and pays attention to effective information, and the capability of deep network learning image features is improved. Oktay uses an attention gating signal at the end of each hop junction layer of UNet to control the importance of features at different spatial locations. Hu et al propose another attention mechanism, namely squashed-and-Excitation Networks (SENet), which adaptively recalibrates the feature response of channel directions by explicitly modeling the interdependencies between feature channels, boosting the effective features and suppressing irrelevant features. The Roy inspired by SENet proposes a parallel spatial/channel squeeze and excitation module.

Currently, the image segmentation task is completed by deep learning in a fully supervised mode. The performance of the fully supervised deep learning approach depends to a large extent on the quantity and quality of the annotation data. Only a small number of images in the training data of the neural network are labeled, and a large number of images have no corresponding label, however, the labeling of the images is a time-consuming and labor-consuming process.

Disclosure of Invention

In order to solve the technical problems, the following technical scheme is provided:

in a first aspect, an embodiment of the present application provides an image segmentation method for semi-supervised learning, where the method includes: training an image segmentation network using a first dataset of tagged images; inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.

By adopting the implementation mode, after the pseudo label is generated and expanded to the training set each time, the network continuously learns new characteristics due to the addition of new training data until a trained segmentation network is finally obtained. The method realizes effective training of the segmentation network by jointly using a small amount of labeled data and a large amount of unlabeled data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on label data.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the inputting a data set of an unlabeled image into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled image includes: randomly dividing a data set of the unlabeled image into n sub-data sets; inputting the subdata sets into the image segmentation network in sequence, and continuously predicting the unlabeled images in the subdata by the image segmentation network to generate corresponding segmentation results; performing edge thinning processing on the segmentation result by using a fully connected conditional random field Dense CRF; and adding the pseudo label obtained by thinning into the sub data set to obtain a pseudo label data set.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the performing, by using a fully connected conditional random field density CRF, edge refinement processing on the segmentation result includes: representing an original image as a node graph model, wherein each pixel is a node in the graph; determining a connecting edge formed by connecting each pixel with all pixels; if the similarity of two pixels is higher, the two pixels are assigned to the same label; alternatively, if the two pixels are less similar, they may be assigned to different labels.

With reference to the first or second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the energy function of the density CRF is:

wherein psi_u(f_i)＝-logP(f_i) Representing unary potential energy, namely pixel-by-pixel class probability obtained by the model through an activation function softmax; wherein f is_iRepresenting the prediction obtained for pixel i after the segmentation of the network, P (f)_i) The second term ψ is the probability that the prediction result of pixel i is_p(f_i,f_j) Is the prediction result f on pixel i and pixel j_i，f_jThe binary potential energy is used for describing the relationship between the pixel points in the original image;

wherein, mu (f)_i,f_j) For tag-compatible items, f_i≠f_jWhen, mu (f)_i,f_j) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image.

With reference to the first aspect or any one of the first to the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the image segmentation network is a 3D scSE-UNet segmentation network, and the 3D scSE-UNet segmentation network is a U-shaped structure network and includes an encoding and decoding portion, where: the coding part extracts and analyzes the features of different levels and resolutions of the input image through convolution and downsampling operations; the decoding part is connected by jumping to fuse the information before down sampling, and generates a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image.

With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, a scSE-block + module is added to the end of each jump connection layer of the decoding portion, and the scSE-block + module recalibrates the feature channel weight, strengthens a useful feature channel, and suppresses an irrelevant feature channel.

With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the scSE-block + module includes a channel SE module cSE-block and a space SE module sSE-block; the cSE-block adds a new parallel global max-pooling based channel attention, and the two parallel channel attentions select different feature tensors when compressing the spatial information and compressing the feature tensors into vectors, respectively.

With reference to the fourth possible implementation manner of the first aspect, in a seventh possible implementation manner of the first aspect, the 3D scSE-UNet model includes a plurality of downsampling layers in the coding path, image features of different layers are extracted from two convolutional layers in each downsampling layer, a ReLU function is used for activation in a convolution operation, and a parameter is reduced for a maximum pooled layer compression feature next to a preset step size.

With reference to the fourth or seventh possible implementation manner of the first aspect, in an eighth possible implementation manner of the first aspect, in a decoding path of the 3D scSE-UNet model, each layer includes an upsampling layer with a preset step size, and two convolutional layers are immediately followed, and a ReLU layer is connected to the back of each convolutional layer; fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and thinning the image by combining the features of the shallow layer and the deep layer; inputting the features after the jump connection fusion into scSE-block + to inhibit unimportant features so as to improve the accuracy of the segmentation result; and finally, outputting a prediction segmentation graph with the number of channels being the number of label categories by using the convolution layer.

In a second aspect, an embodiment of the present application provides an image segmentation system for semi-supervised learning, where the system includes: a training module for training an image segmentation network using a first dataset of tagged images; the first acquisition module is used for inputting a data set of a label-free image into the trained image segmentation network to acquire a pseudo label data set corresponding to the label-free image; the second acquisition module is used for combining the pseudo tag data set and the data set of the initial tagged image to obtain a second data set; and the third acquisition module is used for adopting the second data set to train the image segmentation network again, inputting the next batch of label-free image data sets after the training is finished, and predicting to generate a pseudo label.

Drawings

Fig. 1 is a schematic flowchart of an image segmentation method for semi-supervised learning according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram provided by an embodiment of the present application;

FIG. 3 is a schematic diagram provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an image segmentation system for semi-supervised learning according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a terminal according to an embodiment of the present application.

Detailed Description

The present invention will be described with reference to the accompanying drawings and embodiments.

Fig. 1 is a schematic flowchart of an image segmentation method for semi-supervised learning according to an embodiment of the present application, and referring to fig. 1, the image segmentation method for semi-supervised learning according to the present embodiment includes:

s101, training an image segmentation network by using a first data set with tagged images.

In this embodiment, X represents a grayscale image, and Y represents a label image corresponding thereto. The method uses two data sets, one being a data set D with a labeled image_L＝{X_L,Y_LIn which the label image Y_LIs a true value annotation from an expert on a grayscale image X_LManual segmentation of (2); the other is a label-free image dataset D_U＝{X_UIn the data set, only grayscale image X_UIts corresponding tag is unknown. By training the 3D scSE-UNet segmentation model, for X_UPredicting to obtain a pseudo label Y_UExtended to unlabeled data set D_UTo obtain D'_U＝{X_U,Y_U}。

And S102, inputting the data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image.

In the self-training semi-supervised learning segmentation process, after the features are extracted through convolution and downsampling by the 3D scSE-UNet segmentation network, although the feature map can be gradually upsampled to the size of an input image in a decoding stage, so that a segmentation result with the same size as an original image is output. But some characteristics are lost in the down-sampling process of the network, so that the boundary of the segmentation result is fuzzy. And errors are easily generated when predicting unmarked samples by the 3D scSE-UNet segmentation model. The fuzzy or wrong labels and the corresponding samples can be expanded to the model training after the training set participates, so that the quality of training data in the subsequent training process can be influenced to a certain extent, and the effect of the next iterative training is reduced. Eventually causing the errors to accumulate and even amplify them. In order to reduce the effect of error accumulation, it is necessary to improve the accuracy of the pseudo tag to the maximum extent.

In order to obtain more accurate pseudo label Y_UWe use fully connected conditional random fields^[70]And post-processing the predicted image. The method comprises the steps of representing an original image as a node graph model, wherein each pixel is a node in the graph, and each pixel is connected with all pixels to form a connecting edge. For some two pixels in the image, if the similarity of the two pixels is high, the two pixels are assigned to the same label, namely the probability of being divided is low; if the two pixels have a low similarity, they will be assigned to different labels, i.e., they are more likely to be segmented. Therefore, the sense CRF is introduced in the self-training process, and the pseudo label obtained by each iteration in the self-training process is subjected to thinning processing, so that a more accurate and detailed pseudo label is obtained. The energy function of density CRF is shown in the formula.

wherein, mu (f)_i,f_j) For tag-compatible items, f_i≠f_jWhen, mu (f)_i,f_j) Otherwise 0, x represents the position information between the pixels, y represents the intensity value of the pixel, and x and y are both provided by the original image. A binary potential energy function will more closely notice pixels with similar position x, similar intensity y but with a different label f. The relationship between the pixels can be considered more comprehensively by combining the unitary potential energy and the binary potential energy, and an optimized result can be obtained. The smaller the energy e (x) of the density CRF, the more accurate the predicted class label x.

Specifically, in the process of processing the segmentation result by using the Dense CRF, the energy function e (x) is iteratively minimized continuously, each image is inferred through 5 iterations, the class to which each voxel in the image belongs most probably is found, and a prediction result is obtained. And finally, adding each optimized segmentation result as a pseudo label into the iteration of self-training. By optimizing the prediction result of the segmentation network by the Dense CRF, the accuracy of the pseudo label is improved, so that the error between the pseudo label and the true value is reduced, the performance reduction of the segmentation model caused by excessive error accumulation is avoided, the quality of a training set in the self-training semi-supervised learning process is ensured, and the average gradient of the network learning is still approximately correct.

S103, combining the pseudo label data set and the data set of the initial labeled image to obtain a second data set.

And S104, adopting the second data set to train the image segmentation network again, inputting the next non-label image data set after training is finished, and predicting to generate a pseudo label.

Because the image boundary is fuzzy, the gradient is complex, and more high-resolution information is needed, the image segmentation network in the implementation is a 3D scSE-UNet segmentation network, and the 3D UNet comprises jump connection, so that the point can be well realized. The network structure is shown in fig. 2, and the numbers above the blocks in fig. 2 represent the number of channels of the characteristic diagram.

The overall framework of the 3D scSE-UNet is similar to that of the 3D UNet, and the overall framework also adopts a U-shaped structure and comprises two parts: an encoding and decoding section. The encoding part extracts and analyzes features of different levels and resolutions of an input image through convolution and downsampling operations, and the features are basically the same as 3D UNet, but low-resolution information after downsampling cannot be recognized by human eyes; the right decoding part corresponding to the image can be connected by jumping to fuse the information before down sampling, and generate a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image. We have made improvements in the decoding section, adding a scSE-block + at the end of each skip connection layer. The module recalibrates the weight of the characteristic channel, strengthens the useful characteristic channel, and inhibits the irrelevant characteristic channel, thereby improving the capability of automatically learning the effective characteristic of the segmentation network.

As shown in fig. 3, the scSE-block + module in the embodiment of the present application is a sum of a channel SE-block (cSE-block) and a spatial SE-block (sSE-block). cSE-block was improved. We added a new parallel global max pooling based channel attention on the basis of cSE-block. The two parallel channels pay attention to the fact that different feature tensors are selected when spatial information is compressed and feature tensors are compressed into vectors. The global max pooling layer maximizes the whole feature map, and can well retain image texture and edge features because the image edges may generate the largest feature values. The global average pooling layer takes the average value of the whole feature map as output, emphasizes the downsampling of the whole feature and can well reserve the background. Therefore, for the extraction of the image edge information, the global maximum pooling layer is more effective than the global average pooling layer. The two layers are used in parallel, so that two characteristics of the edge and the background can be reserved, and the effect of improving the performance of the model is obvious.

Specifically, cSE-block measures the importance of a channel by compressing the spatial information, only firing in the channel direction. The module enables the H multiplied by W multiplied by S multiplied by C feature maps to simultaneously pass through a global average pooling layer and a global maximum pooling layer in parallel, the two pooling layers respectively compress global space information on each channel into a tensor value, and 1 multiplied by C feature values are respectively generated. Then, three-dimensional convolution operation with convolution kernel of 1 × 1 × 1 and channel number of C is performed on the data, the convolution results are respectively passed through the nonlinear activation function ReLU, the obtained outputs are respectively passed through the same convolution operation, and finally two 1 × 1 × 1 × C tensors with different numerical values and the same dimension are obtained. After the two tensors are added, each value is normalized to a value domain [0,1] through a sigmoid layer, information in an unimportant channel can be obviously inhibited after the two values are multiplied by an original feature matrix, the information in the important channel is kept almost unchanged, and the extraction of effective features is promoted by phase change.

For image segmentation, the pixel-by-pixel spatial information provides more information, thus introducing sSE-blocks in parallel. It can measure the importance of spatial location by compressing the channel information, squeezing along the channel and exciting spatially. For an input feature map, the module realizes space extrusion operation through convolution, then normalizes the feature map through sigmoid, and finally multiplies the feature map by the original feature tensor. Thus, for a spatial location, if the information it contains is not important, it is multiplied by a smaller value and suppressed, and not suppressed otherwise.

And respectively carrying out improved cSE-block and sSE-block on the original characteristic diagram to generate a new characteristic diagram after recalibration, adding the two characteristic diagrams, and outputting a final characteristic diagram through an activation function. Therefore, the scSE-block + can be used for re-calibrating the characteristic diagrams along the channel and the space respectively and then combining and outputting the characteristic diagrams, so that the effective information of the characteristic diagrams in the two aspects of the space and the channel is effectively utilized, more fine information is provided for the network, and the representation of the model is favorably promoted.

In this embodiment, the 3D scSE-UNet partition network includes 4 downsampling layers in the coding path, each downsampling layer extracts image features of different layers from two convolutional layers, and the activation is performed by using the ReLU function in the convolution operation, and at the same time, a maximum pooling layer compression feature with a step size of 2 is followed to reduce the number of parameters. In the decoding path, each layer contains an upsampled layer with a step size of 2, followed by two 3 × 3 × 3 convolutional layers, each of which is still followed by a ReLU layer. And fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and refining the image by combining the features of the shallow layer and the deep layer. Inputting the features after jump connection fusion into scSE-block + to inhibit unimportant features and improve the accuracy of the segmentation result. The last layer is a convolution layer of 1 × 1 × 1, and a prediction partition map with the number of output channels being the number of label categories is output.

TABLE 3.13D scSE-UNet network architecture Table

The second column in the table indicates the size of the output characteristic of the current layer and the number of channels. In the third column [ ] indicates convolution operation, "3 × 3 × 3, 8" indicates convolution layers passing through convolution kernels of 3 × 3 × 3 and 8 channels in size. The "+" in "Dropout _1+ UpSampling3D _ 1" indicates that the Concatenate _1 is a jump connection between Dropout _1 and UpSampling3D _ 1.

TABLE 3.23D scSE-UNet training parameter settings

The parameters in the experimental setup are as in the second column of the table indicating the size of the output feature and the number of channels of the current layer. In the third column [ ] indicates convolution operation, "3 × 3 × 3, 8" indicates convolution layers passing through convolution kernels of 3 × 3 × 3 and 8 channels in size. The "+" in "Dropout _1+ UpSampling3D _ 1" indicates that the Concatenate _1 is a jump connection between Dropout _1 and UpSampling3D _ 1.

Table 3.2. In the training process, a data set of the unlabeled image is randomly divided into 5 sub-data sets which are sequentially input into a segmentation model for prediction, and a corresponding segmentation result is generated. A Dice loss function is used as an objective function in the training process, as shown in a formula 3.2, wherein N represents the total number of pixels on an image, pi represents the probability value that the ith pixel is predicted as a foreground, and gi represents the real label of the ith pixel.

An Adam optimizer is used to implement a gradient descent algorithm to find the network parameters that minimize the error function. In order to avoid the influence on the accuracy of segmenting the liver region by different methods due to different image sizes of the input network and to consider the video memory problem, the original image is resampled to be 128 × 128 × 64. According to the multiple parameter adjusting experiment results, the batch size is set to be 1, the epoch is set to be 150, the training is stopped when the maximum epoch number is reached, and the network learning rate is set to be 0.0001.

The above embodiments provide a semi-supervised learning organization segmentation method based on a self-training 3D scSE-UNet segmentation network. The proposed 3D scSE-UNet is characterized by the introduction of an improved scSE-block + in the 3D UNet. Compared with the classical scSE-block, the scSE-block + can better retain image edge information by including a Global Max Pooling (GMP), the characteristic learning capability of the image edge information is further improved, and the segmentation precision of the network is favorably improved. And a Dense CRF is added in the self-training process, and edge thinning processing is performed on the predicted pseudo label, so that the accuracy of the pseudo label generated by the semi-supervised segmentation network is improved. The method realizes effective training of the 3D scSE-UNet by jointly using a small amount of marked data and a large amount of unmarked data, thereby improving the purpose of tissue segmentation precision and reducing the dependency of the deep learning image segmentation method on the label data.

Corresponding to the image segmentation method for semi-supervised learning provided by the above embodiment, the present application also provides an embodiment of an image segmentation system for semi-supervised learning.

Referring to fig. 4, the image segmentation system 20 for semi-supervised learning includes: a training module 201, a first acquisition module 202, a second acquisition module 203, and a third acquisition module 204.

The training module 201 is configured to train the image segmentation network using the first data set with the labeled image. The first obtaining module 202 is configured to input a data set of an unlabeled image into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled image. The second obtaining module 203 is configured to combine the pseudo tag data set and the data set of the initial tagged image to obtain a second data set. The third obtaining module 204 is configured to train the image segmentation network again by using the second data set, and input a next batch of unlabeled image data sets after the training is completed, so as to predict and generate a pseudo label.

The present application further provides an embodiment of a terminal, and referring to fig. 5, the terminal 30 includes: a processor 301, a memory 302, and a communication interface 303.

In fig. 5, the processor 301, the memory 302, and the communication interface 303 may be connected to each other by a bus; the bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

The processor 301 generally controls the overall functions of the terminal 30, such as the start-up of the terminal 30 and the training of the image segmentation network using the first data set of tagged images after the terminal 30 is started up; inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image; merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set; and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.

The same and similar parts among the various embodiments in the specification of the present application may be referred to each other. Especially, for the system and terminal embodiments, since the method therein is basically similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the description in the method embodiments.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. An image segmentation method for semi-supervised learning, the method comprising:

training an image segmentation network using a first dataset of tagged images;

inputting a data set of the unlabeled image into the trained image segmentation network to obtain a pseudo label data set corresponding to the unlabeled image;

merging the pseudo label data set with the data set of the initial labeled image to obtain a second data set;

and training the image segmentation network again by adopting the second data set, inputting the next unlabeled image data set after training is finished, and predicting to generate a pseudo label.

2. The image segmentation method for semi-supervised learning according to claim 1, wherein the inputting of the data set of unlabeled images into the trained image segmentation network to obtain a pseudo-label data set corresponding to the unlabeled images comprises:

randomly dividing a data set of the unlabeled image into n sub-data sets;

inputting the subdata sets into the image segmentation network in sequence, and continuously predicting the unlabeled images in the subdata by the image segmentation network to generate corresponding segmentation results;

performing edge thinning processing on the segmentation result by using a fully connected conditional random field Dense CRF;

and adding the pseudo label obtained by thinning into the sub data set to obtain a pseudo label data set.

3. The image segmentation method for semi-supervised learning according to claim 2, wherein the edge refinement processing on the segmentation result by using a fully connected conditional random field Dense CRF comprises:

representing an original image as a node graph model, wherein each pixel is a node in the graph;

determining a connecting edge formed by connecting each pixel with all pixels;

if the similarity of two pixels is higher, the two pixels are assigned to the same label; alternatively, if the two pixels are less similar, they may be assigned to different labels.

4. The image segmentation method for semi-supervised learning according to claim 2 or 3, wherein the energy function of the Dense CRF is as follows:

5. The image segmentation method for semi-supervised learning according to any one of claims 1 to 4, wherein the image segmentation network is a 3D scSE-UNet segmentation network, and the 3D scSE-UNet segmentation network is a U-shaped structure network and comprises an encoding and decoding part, wherein:

the coding part extracts and analyzes the features of different levels and resolutions of the input image through convolution and downsampling operations;

the decoding part is connected by jumping to fuse the information before down sampling, and generates a feature map with the same size as the original image by up sampling, thereby gradually restoring the original size of the image.

6. The image segmentation method for semi-supervised learning according to claim 5, wherein a scSE-block + module is added at the end of each jump connection layer of the decoding part, and the scSE-block + module recalibrates the weight of the feature channel, strengthens the useful feature channel, and suppresses the irrelevant feature channel.

7. The semi-supervised learning image segmentation method of claim 6, wherein the scSE-block + module comprises a channel SE module cSE-block and a space SE module sSE-block; the cSE-block adds a new parallel global max-pooling based channel attention, and the two parallel channel attentions select different feature tensors when compressing the spatial information and compressing the feature tensors into vectors, respectively.

8. The image segmentation method for semi-supervised learning according to claim 5, wherein the 3D scSE-UNet model comprises a plurality of downsampling layers in a coding path, image features of different levels are extracted from two convolution layers in each downsampling layer, a ReLU function is used for activation in a convolution operation, and meanwhile, the maximum pooling layer compression features next to a preset step size are reduced by parameters.

9. The image segmentation method for semi-supervised learning according to claim 5 or 8, wherein the 3D scSE-UNet model is in a decoding path, each layer comprises an upsampling layer with a preset step size, and two convolutional layers are followed, and each convolutional layer is followed by a ReLU layer; fusing the feature map obtained in the encoding stage and the feature map with the same resolution obtained in the decoding stage together through jump connection, and thinning the image by combining the features of the shallow layer and the deep layer; inputting the features after the jump connection fusion into scSE-block + to inhibit unimportant features so as to improve the accuracy of the segmentation result; and finally, outputting a prediction segmentation graph with the number of channels being the number of label categories by using the convolution layer.

10. An image segmentation system for semi-supervised learning, the system comprising:

a training module for training an image segmentation network using a first dataset of tagged images;

the first acquisition module is used for inputting a data set of a label-free image into the trained image segmentation network to acquire a pseudo label data set corresponding to the label-free image;

the second acquisition module is used for combining the pseudo tag data set and the data set of the initial tagged image to obtain a second data set;

and the third acquisition module is used for adopting the second data set to train the image segmentation network again, inputting the next batch of label-free image data sets after the training is finished, and predicting to generate a pseudo label.