CN113362310A

CN113362310A - Medical image liver segmentation method based on unsupervised learning

Info

Publication number: CN113362310A
Application number: CN202110643707.4A
Authority: CN
Inventors: 金烁; 董家鸿; 王博; 赵威; 申建虎; 张伟; 徐正清
Original assignee: Xi'an Zhizhen Intelligent Technology Co ltd
Current assignee: Beijing precision diagnosis Medical Technology Co.,Ltd.
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-09-07

Abstract

The invention discloses a medical image liver segmentation method based on unsupervised learning, which comprises the steps of firstly taking an abdominal image of original computed tomography, manually marking a mask image of a liver region to form a training set and a test set, carrying out normalization pretreatment on image data in the training set, then constructing an unsupervised learning network parameter model by superposing two UNETs with attentionGate, constructing a loss function module, training the unsupervised learning network parameter model by the normalized image set, then testing the trained unsupervised learning network parameter model, verifying the effect of the unsupervised learning network parameter model, and obtaining a feasible unsupervised learning network parameter model if the effect is in line with expectation; and finally, inputting the image to be segmented into a feasibility unsupervised learning network parameter model to obtain a segmented liver region image. Compared with the existing method, the method is more accurate.

Description

Medical image liver segmentation method based on unsupervised learning

Technical Field

The invention belongs to the field of medical image processing, and particularly relates to a medical image liver segmentation method based on unsupervised learning.

Background

With the rapid development and popularization of medical imaging equipment, the prior art provides medical image segmentation methods of various deep learning techniques. However, these methods require a large amount of training data, the respective segmentation mask labels of which are also referred to as group-truth. Abdominal MR image segmentation is a field of research that is receiving increasing attention, with liver segmentation being one of the most challenging tasks due to the high variability of liver shape and its proximity to various other organs. The present study addresses the challenge of segmenting the liver from three-dimensional MR images without using any ground-truth to train deep neural network models. Our model was developed based on U-Net, built using two attentionU-Net cascades. The present invention utilizes and modifies the unsupervised learning architecture to enable its application to 3D volumes. Furthermore, to suppress noise in the segmentation, we add an entry gate on each hop connection of U-Net. The loss of segmentation output is calculated with SoftN-Cuts, the loss of reconstruction output is calculated with SSIM, and conditional random field CRF is used as a post-processing step to fine-tune the results. Compared with manual segmentation, the medical image liver segmentation method based on unsupervised learning has high accuracy.

Disclosure of Invention

In order to solve the above problems in the prior art, the invention provides a medical image liver segmentation method based on unsupervised learning, which has higher accuracy compared with artificial segmentation, and the technical scheme of the invention is as follows:

step 1, obtaining an original image of abdominal tomography, obtaining a segmentation mask of a liver region through artificial marking, and carrying out padding pretreatment and normalization pretreatment on all image data, wherein all the image data comprise the original image of abdominal tomography and the segmentation mask; randomly selecting 80% of the preprocessed images to form a training set, and taking the other 20% of the preprocessed images as a test set;

selecting an MRI image, constructing a training set and a test set containing images of different liver types, and normalizing the liver images in the training set and the test set to have pixel values between (0,1) to reach a common scale, so as to obtain a normalized image set, thereby being convenient for faster convergence during training; in addition, the image data in the training set and the test set are processed by a padding method, so that all the image data have the same dimension;

step 2, constructing an unsupervised learning network parameter model by superposing two UNETs with Attention Gate modules, wherein the unsupervised learning network parameter model consists of 18 convolution modules, each convolution module consists of 2 three-dimensional convolution layers, and the size of a convolution kernel is 3; each three-dimensional convolutional layer is followed by a nonlinear activation function and an instance normalization layer; the unsupervised learning network parameter model comprises 46 3D convolutional layers in total;

the unsupervised learning network parameter model is divided into a self-coding module and a self-decoding module, the UNET on the left side of unsupervised learning is the self-coding module, the UNET on the right side of unsupervised learning is the self-decoding module, and the self-coding module and the self-decoding module respectively comprise 9 convolution modules; the self-coding module is used for predicting, dividing and mapping the input image, and the self-decoding module is used for restoring the dividing and mapping output by the self-coding module by taking the image input by the self-decoding module as a target; the image data output by the self-coding module is transmitted to a fully-connected 3D convolution layer with convolution kernel size of 1 and step length of 1, and then is input to the input end of the self-decoding module through a Softmax layer;

step 3, constructing a loss function module at the tail of the U-Net of the self-coding module and the self-decoding module respectively, and training the unsupervised learning network parameter model through a training set to obtain a trained unsupervised learning network parameter model;

the Loss function module comprises an N-Cuts Loss and a Reconstruction Loss, wherein the N-Cuts Loss is arranged at the tail of the U-Net of the self-coding module, the Loss of the segmentation output is calculated, and only the U-Net of the self-coding module is optimized; the reconstraction Loss is at the end of the U-Net of the automatic decoder, SSIM is used for calculating Loss of reconstructed output, a conditional random field CRF is used as a post-processing step for fine-tuning the result, and the reconstraction Loss optimizes the U-Net of the self-encoding module and the U-Net of the self-decoding module;

step 4, testing the trained unsupervised learning network parameter model, verifying the effect of the model, and obtaining a feasible unsupervised learning network parameter model if the effect is in line with expectation;

and 5, inputting the image to be segmented into a feasibility unsupervised learning network parameter model to obtain a segmented liver region image.

Further, the non-linear activation function specifically used by the unsupervised learning network parameter model in the training process in step 3 is a parameter rectification linear unit or a PReLU, and the PReLU can adaptively learn a hyper-parameter alpha which has a difference in negative result usage in the training process; the liver region images of the same batch processing size are used when the unsupervised learning network parameter model is trained, so that the number of pixels is equal when data filling is carried out in the batch processing process of constructing the liver region images in a training set.

Further, the N-Cuts Loss in step 3 is specifically:

wherein ω (-) is

The reconfiguration Loss function is specifically as follows:

wherein SSIM is the structural similarity index, wherein μ_x、μ_yIs the average of x and y and,

is the variance σ of x and y_xyIs the covariance of x and y, C1, C2 are positive integers that prevent the denominator from being approximated to 0.

The invention has the beneficial effects that:

through the U-Net and the loss function and the like in the U-Net, medical image segmentation of the liver based on unsupervised learning can be realized.

Drawings

FIG. 1 is a schematic diagram of an unsupervised learning network parameter model according to the present invention;

FIG. 2 is a schematic structural diagram of an Attention Gate module in the present invention;

fig. 3 is a flow chart of a medical image liver segmentation method based on unsupervised learning according to the invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to a medical image liver segmentation method based on unsupervised learning, which is implemented according to the following steps:

step 1, acquiring an MRI data set of 100 subjects to form an original image of abdominal tomography; selecting a T1-DUAL sequence for experiment, carrying out padding pretreatment and normalization pretreatment on all image data, wherein all the image data comprise original images of abdominal tomography and segmentation masks; in order to facilitate subsequent network training, the size of the image is set to x y z according to preprocessing, 80% of the preprocessed image is randomly selected to form a training set, and the other 20% of the preprocessed image is used as a test set;

step 2, as shown in fig. 1, constructing an unsupervised learning network parameter model by superposing two UNETs with the Attention Gate module; the unsupervised learning network parameter model consists of 18 convolution modules, each convolution module consists of 2 three-dimensional convolution layers, and the size of a convolution kernel is 3; each three-dimensional convolutional layer is followed by a nonlinear activation function and an instance normalization layer; the unsupervised learning network parameter model comprises 46 3D convolutional layers in total;

in the embodiment of the present application, in a training process, as shown in fig. 1, a terminal first assigns an image with x, y, z size to a first convolution module of a self-encoding module, then performs PRELU and two instance normalization on the image after convolution operation, and then proceeds to a next convolution module, where an upper convolution module is connected to a lower convolution module through a three-dimensional maximum pooling layer, and the maximum pooling operation is used for down-sampling to reduce the size of a feature map, so that the image size is reduced by two times, which becomes a size of a feature map

(ii) a By analogy, the feature diagram size of the i-th layer in the contraction path of the two U-nets is

. The original image size is also stored before the max pooling operation is performed, restoring the image size in the extended path of the two U-Nets. The convolution module from the encoding module produces 64 feature maps as outputs, and the number of features increases by a factor of two after processing by each convolution module. In the contraction paths of the two U-nets, the upper layer convolution module and the lower layer convolution module are connected through a three-dimensional maximum pooling layer; in the extension paths of the two U-nets, the convolution module at the lower layer is connected with the convolution module at the upper layer through the upper sampling layer. Upsampling is performed using tri-linear interpolation, the output size of the interpolation being set to the image size saved to each maximum pooling operation.

The decoding module comprises the Attention Gate (indicated by circles in fig. 1), the input of each layer of the Attention Gate comprises the input from the self-coding module at the same layer and the input from the previous layer of the self-decoding module, the specific operation in the self-decoding module is as shown in fig. 1, the first input from the coding module and the second input at the previous layer of the self-coding module are firstly overlapped and activated by the Relu function after being convolved, the convolution of 1 × 1 × 1 is carried out, then the Sigmoid function is activated and resampled, the resampled features are fused with the first input and the second input, and the fused features are input into the deconvolution layer of the convolution module in the self-decoding module. The Attention Gate module is structured as shown in fig. 2, and two skipped connections of U-Net are both Attention Gates for suppressing irrelevant areas and noise response.

the Loss function module comprises an N-Cuts Loss and a Reconstruction Loss, as shown in figure 1, wherein the N-Cuts Loss is at the end of the U-Net of the self-coding module, the Loss of the segmentation output is calculated, and only the U-Net of the self-coding module is optimized; the reconstraction Loss is at the end of the U-Net of the automatic decoder, SSIM is used for calculating Loss of reconstructed output, a conditional random field CRF is used as a post-processing step for fine-tuning the result, and the reconstraction Loss optimizes the U-Net of the self-encoding module and the U-Net of the self-decoding module;

through the above steps, since the image is subjected to the maximum pooling operation a plurality of times, it is possible to cause an increase in invariance, resulting in a decrease in positioning accuracy. In order to obtain finer boundaries of the image in the output phase, a conditional random field CRF is used as a post-processing step to fine-tune the result. The function of the CRF is as follows:

E(X)＝∑φ(u)+∑ψ(u，v) (4)

where u and v are voxels, φ (u) is a unigram potential, and ψ (u, v) is an opponent potential. After the conditional random field CRF processing, manually identifying the clustering value corresponding to the liver area as a volume. And then merging the rest clusters so as to obtain liver region segmentation of the rest clusters.

Further, the non-linear activation function specifically used by the unsupervised learning network parameter model in the training process in step 3 is a parameter rectification linear unit or a PReLU, and the PReLU can adaptively learn a hyper-parameter alpha which has a difference in negative result usage in the training process; a uniform batch of process-sized liver region images is used in training the unsupervised learning network parameter model.

Further, the N-Cuts Loss in step 3 is specifically:

wherein ω (-) is

The reconfiguration Loss function is specifically as follows:

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A medical image liver segmentation method based on unsupervised learning is characterized by comprising the following steps:

the unsupervised learning network parameter model is divided into a self-coding module and a self-decoding module, the UNET on the left side of unsupervised learning is the self-coding module, the UNET on the right side of unsupervised learning is the self-decoding module, and the self-coding module and the self-decoding module respectively comprise 9 convolution modules; the self-encoding module is used for predicting, dividing and mapping the input image, and the self-decoding module is used for restoring the dividing and mapping output by the self-encoding module by taking the image input by the self-decoding module as a target; the image data output by the self-coding module is transmitted to a fully-connected 3D convolution layer with convolution kernel size of 1 and step length of 1, and then is input to the input end of the self-decoding module through a Softmax layer;

step 3, constructing a loss function module at the tail of the U-Net of the self-coding module and the self-decoding module respectively, and training the unsupervised learning network parameter model through the training set to obtain a trained unsupervised learning network parameter model;

the Loss function module comprises an N-Cuts Loss and a Reconstruction Loss, wherein the N-Cuts Loss is at the end of the U-Net of the self-coding module, the Loss of the segmentation output is calculated, and only the U-Net of the self-coding module is optimized; the Reconstruction Loss is calculated by SSIM at the end of U-Net of the automatic decoder, and a conditional random field CRF is used as a post-processing step to fine-tune the result, wherein the Reconstruction Loss optimizes the U-Net of the self-encoding module and the U-Net of the self-decoding module;

2. The method for segmenting the liver of a medical image based on unsupervised learning as claimed in claim 1, wherein the non-linear activation function specifically used in the training process of the unsupervised learning network parameter model in step 3 is a parameter rectification linear unit or a PReLU, and the PReLU can adaptively learn the hyper-parameter α with difference in use of negative results in the training process; a batch of process-sized liver region images are used in training the unsupervised learning network parameter model.

3. The method for segmenting the liver of the medical image based on the unsupervised learning as claimed in claim 1, wherein the N-Cuts Loss in the step 3 is specifically as follows:

wherein ω (-) is

The reconfiguration Loss function is specifically as follows: