Disclosure of Invention
In order to solve the above problems in the prior art, the invention provides a medical image liver segmentation method based on unsupervised learning, which has higher accuracy compared with artificial segmentation, and the technical scheme of the invention is as follows:
step 1, obtaining an original image of abdominal tomography, obtaining a segmentation mask of a liver region through artificial marking, and carrying out padding pretreatment and normalization pretreatment on all image data, wherein all the image data comprise the original image of abdominal tomography and the segmentation mask; randomly selecting 80% of the preprocessed images to form a training set, and taking the other 20% of the preprocessed images as a test set;
selecting an MRI image, constructing a training set and a test set containing images of different liver types, and normalizing the liver images in the training set and the test set to have pixel values between (0,1) to reach a common scale, so as to obtain a normalized image set, thereby being convenient for faster convergence during training; in addition, the image data in the training set and the test set are processed by a padding method, so that all the image data have the same dimension;
step 2, constructing an unsupervised learning network parameter model by superposing two UNETs with Attention Gate modules, wherein the unsupervised learning network parameter model consists of 18 convolution modules, each convolution module consists of 2 three-dimensional convolution layers, and the size of a convolution kernel is 3; each three-dimensional convolutional layer is followed by a nonlinear activation function and an instance normalization layer; the unsupervised learning network parameter model comprises 46 3D convolutional layers in total;
the unsupervised learning network parameter model is divided into a self-coding module and a self-decoding module, the UNET on the left side of unsupervised learning is the self-coding module, the UNET on the right side of unsupervised learning is the self-decoding module, and the self-coding module and the self-decoding module respectively comprise 9 convolution modules; the self-coding module is used for predicting, dividing and mapping the input image, and the self-decoding module is used for restoring the dividing and mapping output by the self-coding module by taking the image input by the self-decoding module as a target; the image data output by the self-coding module is transmitted to a fully-connected 3D convolution layer with convolution kernel size of 1 and step length of 1, and then is input to the input end of the self-decoding module through a Softmax layer;
step 3, constructing a loss function module at the tail of the U-Net of the self-coding module and the self-decoding module respectively, and training the unsupervised learning network parameter model through a training set to obtain a trained unsupervised learning network parameter model;
the Loss function module comprises an N-Cuts Loss and a Reconstruction Loss, wherein the N-Cuts Loss is arranged at the tail of the U-Net of the self-coding module, the Loss of the segmentation output is calculated, and only the U-Net of the self-coding module is optimized; the reconstraction Loss is at the end of the U-Net of the automatic decoder, SSIM is used for calculating Loss of reconstructed output, a conditional random field CRF is used as a post-processing step for fine-tuning the result, and the reconstraction Loss optimizes the U-Net of the self-encoding module and the U-Net of the self-decoding module;
step 4, testing the trained unsupervised learning network parameter model, verifying the effect of the model, and obtaining a feasible unsupervised learning network parameter model if the effect is in line with expectation;
and 5, inputting the image to be segmented into a feasibility unsupervised learning network parameter model to obtain a segmented liver region image.
Further, the non-linear activation function specifically used by the unsupervised learning network parameter model in the training process in step 3 is a parameter rectification linear unit or a PReLU, and the PReLU can adaptively learn a hyper-parameter alpha which has a difference in negative result usage in the training process; the liver region images of the same batch processing size are used when the unsupervised learning network parameter model is trained, so that the number of pixels is equal when data filling is carried out in the batch processing process of constructing the liver region images in a training set.
Further, the N-Cuts Loss in step 3 is specifically:
The reconfiguration Loss function is specifically as follows:
wherein SSIM is the structural similarity index, wherein μ
x、μ
yIs the average of x and y and,
is the variance σ of x and y
xyIs the covariance of x and y, C1, C2 are positive integers that prevent the denominator from being approximated to 0.
The invention has the beneficial effects that:
through the U-Net and the loss function and the like in the U-Net, medical image segmentation of the liver based on unsupervised learning can be realized.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a medical image liver segmentation method based on unsupervised learning, which is implemented according to the following steps:
step 1, acquiring an MRI data set of 100 subjects to form an original image of abdominal tomography; selecting a T1-DUAL sequence for experiment, carrying out padding pretreatment and normalization pretreatment on all image data, wherein all the image data comprise original images of abdominal tomography and segmentation masks; in order to facilitate subsequent network training, the size of the image is set to x y z according to preprocessing, 80% of the preprocessed image is randomly selected to form a training set, and the other 20% of the preprocessed image is used as a test set;
selecting an MRI image, constructing a training set and a test set containing images of different liver types, and normalizing the liver images in the training set and the test set to have pixel values between (0,1) to reach a common scale, so as to obtain a normalized image set, thereby being convenient for faster convergence during training; in addition, the image data in the training set and the test set are processed by a padding method, so that all the image data have the same dimension;
step 2, as shown in fig. 1, constructing an unsupervised learning network parameter model by superposing two UNETs with the Attention Gate module; the unsupervised learning network parameter model consists of 18 convolution modules, each convolution module consists of 2 three-dimensional convolution layers, and the size of a convolution kernel is 3; each three-dimensional convolutional layer is followed by a nonlinear activation function and an instance normalization layer; the unsupervised learning network parameter model comprises 46 3D convolutional layers in total;
the unsupervised learning network parameter model is divided into a self-coding module and a self-decoding module, the UNET on the left side of unsupervised learning is the self-coding module, the UNET on the right side of unsupervised learning is the self-decoding module, and the self-coding module and the self-decoding module respectively comprise 9 convolution modules; the self-coding module is used for predicting, dividing and mapping the input image, and the self-decoding module is used for restoring the dividing and mapping output by the self-coding module by taking the image input by the self-decoding module as a target; the image data output by the self-coding module is transmitted to a fully-connected 3D convolution layer with convolution kernel size of 1 and step length of 1, and then is input to the input end of the self-decoding module through a Softmax layer;
in the embodiment of the present application, in a training process, as shown in fig. 1, a terminal first assigns an image with x, y, z size to a first convolution module of a self-encoding module, then performs PRELU and two instance normalization on the image after convolution operation, and then proceeds to a next convolution module, where an upper convolution module is connected to a lower convolution module through a three-dimensional maximum pooling layer, and the maximum pooling operation is used for down-sampling to reduce the size of a feature map, so that the image size is reduced by two times, which becomes a size of a feature map
(ii) a By analogy, the feature diagram size of the i-th layer in the contraction path of the two U-nets is
. The original image size is also stored before the max pooling operation is performed, restoring the image size in the extended path of the two U-Nets. The convolution module from the encoding module produces 64 feature maps as outputs, and the number of features increases by a factor of two after processing by each convolution module. In the contraction paths of the two U-nets, the upper layer convolution module and the lower layer convolution module are connected through a three-dimensional maximum pooling layer; in the extension paths of the two U-nets, the convolution module at the lower layer is connected with the convolution module at the upper layer through the upper sampling layer. Upsampling is performed using tri-linear interpolation, the output size of the interpolation being set to the image size saved to each maximum pooling operation.
The decoding module comprises the Attention Gate (indicated by circles in fig. 1), the input of each layer of the Attention Gate comprises the input from the self-coding module at the same layer and the input from the previous layer of the self-decoding module, the specific operation in the self-decoding module is as shown in fig. 1, the first input from the coding module and the second input at the previous layer of the self-coding module are firstly overlapped and activated by the Relu function after being convolved, the convolution of 1 × 1 × 1 is carried out, then the Sigmoid function is activated and resampled, the resampled features are fused with the first input and the second input, and the fused features are input into the deconvolution layer of the convolution module in the self-decoding module. The Attention Gate module is structured as shown in fig. 2, and two skipped connections of U-Net are both Attention Gates for suppressing irrelevant areas and noise response.
Step 3, constructing a loss function module at the tail of the U-Net of the self-coding module and the self-decoding module respectively, and training the unsupervised learning network parameter model through a training set to obtain a trained unsupervised learning network parameter model;
the Loss function module comprises an N-Cuts Loss and a Reconstruction Loss, as shown in figure 1, wherein the N-Cuts Loss is at the end of the U-Net of the self-coding module, the Loss of the segmentation output is calculated, and only the U-Net of the self-coding module is optimized; the reconstraction Loss is at the end of the U-Net of the automatic decoder, SSIM is used for calculating Loss of reconstructed output, a conditional random field CRF is used as a post-processing step for fine-tuning the result, and the reconstraction Loss optimizes the U-Net of the self-encoding module and the U-Net of the self-decoding module;
through the above steps, since the image is subjected to the maximum pooling operation a plurality of times, it is possible to cause an increase in invariance, resulting in a decrease in positioning accuracy. In order to obtain finer boundaries of the image in the output phase, a conditional random field CRF is used as a post-processing step to fine-tune the result. The function of the CRF is as follows:
E(X)=∑φ(u)+∑ψ(u,v) (4)
where u and v are voxels, φ (u) is a unigram potential, and ψ (u, v) is an opponent potential. After the conditional random field CRF processing, manually identifying the clustering value corresponding to the liver area as a volume. And then merging the rest clusters so as to obtain liver region segmentation of the rest clusters.
Step 4, testing the trained unsupervised learning network parameter model, verifying the effect of the model, and obtaining a feasible unsupervised learning network parameter model if the effect is in line with expectation;
and 5, inputting the image to be segmented into a feasibility unsupervised learning network parameter model to obtain a segmented liver region image.
Further, the non-linear activation function specifically used by the unsupervised learning network parameter model in the training process in step 3 is a parameter rectification linear unit or a PReLU, and the PReLU can adaptively learn a hyper-parameter alpha which has a difference in negative result usage in the training process; a uniform batch of process-sized liver region images is used in training the unsupervised learning network parameter model.
Further, the N-Cuts Loss in step 3 is specifically:
wherein ω (-) is
The reconfiguration Loss function is specifically as follows:
wherein SSIM is the structural similarity index, wherein μ
x、μ
yIs the average of x and y and,
is the variance σ of x and y
xyIs the covariance of x and y, C1, C2 are positive integers that prevent the denominator from being approximated to 0.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.