Liver tumor segmentation method based on cascade network
Technical Field
The invention relates to the technical field of image segmentation, in particular to a liver tumor segmentation method based on a cascade network.
Background
Liver cancer is the most common and most fatal tumor in the world, and seriously threatens the life health of people. According to the national cancer center data, the incidence rate of liver cancer in China is 5 th in all malignant tumors, and the death rate is 2 nd higher. With the development of science and technology, the computer application technology and the medical informatization level are greatly developed, and the medical image facilities are more perfect. The computed tomography imaging (computed tomography, CT) has the characteristics of quick scanning time, high image resolution and the like, and is a diagnosis mode commonly adopted by liver lesions at present. Because the liver tumor in the CT image is usually characterized by low contrast, fuzzy boundary, unfixed size, shape, position and quantity, and the like, the current clinical liver tumor segmentation needs manual intervention, namely, a doctor with abundant experience marks the outline manually, which is time-consuming and labor-consuming, is difficult to effectively adapt to the complexity and diversity of the liver tumor, has poor segmentation accuracy and lower efficiency on the target, and cannot realize the automatic segmentation of the tumor region; and the tumor area of the liver CT image of the same patient can cause different results under the mark of different doctors, and the experience and skill of the doctors are seriously depended. Therefore, the research on the accurate and efficient automatic liver tumor segmentation method has important significance for clinical diagnosis and treatment of liver cancer.
In recent years, deep learning techniques have been rapidly developed and widely used in the field of medical image segmentation. Ronneeberger et al propose a U-shaped convolutional network (Unet) that introduces a jumping connection into the convolutional network for the first time, which achieves end-to-end semantic segmentation of images through encoding-decoding operations, an encoder downsamples extracted features to capture contextual information of the images, and a decoder upsamples extracted features to precisely locate segmented regions. Li et al propose a bottleneck-supervised Unet model (bottleneck supervised Unet, BS-Unet) which is a hybrid tightly-coupled structure that is partitioned by making full use of information between layers of the network. Schlemper et al incorporate the attention mechanism into the Unet network and propose an A-Unet (attention Unet) model that automatically learns regional features related to segmentation tasks and suppresses irrelevant features. Although these networks are widely used in the field of liver tumor segmentation, there are still problems of inaccurate boundary segmentation, difficult detection of small tumors, difficult tumor segmentation due to unbalanced data types, and the like.
Disclosure of Invention
The invention aims to solve the problems of difficult segmentation and inaccurate boundary segmentation in the existing liver tumor segmentation and provides a liver tumor segmentation method based on a cascade network.
In order to solve the problems, the invention is realized by the following technical scheme:
a liver tumor segmentation method based on cascade network comprises the following steps:
firstly, constructing a liver tumor segmentation model based on a cascade network; the liver tumor segmentation model based on the cascade network consists of a liver segmentation network, a liver tumor segmentation network and a characteristic addition layer; the input of the liver segmentation network is used as the input of a liver tumor segmentation model based on a cascade network, the input and the output of the liver segmentation network are simultaneously connected with the input of the feature addition layer, the output of the feature addition layer is connected with the input of the liver tumor segmentation network, and the output of the liver tumor segmentation network is used as the output of the liver tumor segmentation model based on the cascade network;
then, constructing a training sample set by utilizing CT images which have been segmented into liver tumors in advance, and performing deep learning training on the constructed liver tumor segmentation model based on the cascade network by utilizing the training sample set to obtain a trained liver tumor segmentation model based on the cascade network;
and finally, sending the CT image which is currently required to be segmented into a trained liver tumor segmentation model based on a cascade network, and obtaining the CT image of segmented liver tumor.
The liver segmentation network and the liver tumor segmentation network are both separable convolution residual segmentation networks based on the mixed depth; the separable convolution residual error segmentation network based on the mixed depth consists of 1 input layer, 2 convolution layers, 5 first residual error modules, 4 second residual error modules, 4 pooling modules, 4 up-sampling modules and 1 output layer; the input of the input layer is used as the input of the separable convolution residual segmentation network based on the mixed depth; the output of the input layer is connected with the input of a first residual module through a first convolution layer, the output of the first residual module is connected with the input of a first pooling module and the first input of a fourth upsampling module, the output of the first pooling module is connected with the input of a second first residual module, the output of the second first residual module is connected with the input of a second pooling module and the first input of a third upsampling module, the output of the second pooling module is connected with the input of a third first residual module, the output of the third first residual module is connected with the input of a third pooling module and the first input of a second upsampling module, the output of the third pooling module is connected with the input of a fourth first residual module, the output of the fourth pooling module is connected with the input of a fourth pooling module and the first input of the first upsampling module, and the output of the fourth pooling module is connected with the input of a fifth first residual module; the output of the fifth first residual module is connected with the second input of the first up-sampling module, the output of the first up-sampling module is connected with the input of the first second residual module, the output of the first second residual module is connected with the second input of the second up-sampling module, the output of the second up-sampling module is connected with the second input of the third up-sampling module, the output of the third up-sampling module is connected with the input of the third second residual module, the output of the third up-sampling module is connected with the second input of the fourth up-sampling module, the output of the fourth up-sampling module is connected with the input of the fourth third residual module, and the output of the fourth second residual module is connected with the input of the output layer through the second convolution layer; the output of the output layer is output as a separable convolutional residual segmentation network based on the blend depth.
The first residual module of the liver segmentation network is different from the first residual module of the liver tumor segmentation network. The first residual error module of the liver segmentation network consists of 2 mixed depth separable convolution layers, 2 convolution layers and 1 characteristic addition layer; the input of the first mixed depth separable convolution layer is used as the input of a first residual error module of the liver segmentation network, the output of the first mixed depth separable convolution layer is connected with the input of the second mixed depth separable convolution layer, and the output of the second mixed depth separable convolution layer is connected with the input of the first convolution layer; the input of the second convolution layer is connected with the input of the first mixed depth separable convolution layer; the outputs of the first convolution layer and the second convolution layer are simultaneously connected with the input of the feature addition layer, and the output of the feature addition layer is used as the output of a first residual error module of the liver segmentation network. The first residual error module of the liver tumor segmentation network consists of 2 mixed depth separable convolution layers, 2 convolution layers, 1 coordinate attention mechanism layer and 1 characteristic addition layer; the input of the first mixed depth separable convolution layer is used as the input of a first residual error module of the liver segmentation network, the output of the first mixed depth separable convolution layer is connected with the input of the second mixed depth separable convolution layer, the output of the second mixed depth separable convolution layer is connected with the input of the coordinate attention mechanism layer, and the output of the coordinate attention mechanism layer is connected with the input of the first convolution layer; the input of the second convolution layer is connected with the input of the first mixed depth separable convolution layer; the outputs of the first convolution layer and the second convolution layer are simultaneously connected with the input of the feature addition layer, and the output of the feature addition layer is used as the output of a first residual error module of the liver segmentation network.
The second residual error module consists of 3 convolution layers and 1 characteristic addition layer; the input of the first convolution layer is used as the input of the second residual error module, and the output of the first convolution layer is connected with the input of the second convolution layer; the input of the third convolution layer is connected with the input of the first convolution layer; the outputs of the second convolution layer and the third convolution layer are simultaneously connected with the input of the characteristic addition layer; the output of the feature addition layer serves as the output of the second residual block.
The pooling module consists of 1 maximum pooling layer, 1 convolution layer and 1 splicing layer; the input of the maximum pooling layer and the output of the convolution layer are jointly used as the input of the pooling module, the output of the maximum pooling layer and the output of the convolution layer are jointly connected with the input of the splicing layer, and the output of the splicing layer is used as the output of the pooling module.
The up-sampling module consists of 1 bilinear interpolation layer and 1 splicing layer; the input of the bilinear interpolation layer is used as the first input of the up-sampling module, the output of the bilinear interpolation layer is connected with one input of the splicing layer, the other input of the splicing layer is used as the second input of the up-sampling module, and the output of the splicing layer is used as the output of the up-sampling module.
Compared with the prior art, the invention has the following characteristics:
1. the liver segmentation network and the liver tumor segmentation network are adopted to carry out cascading, the liver segmentation network segments the liver from the CT image firstly, a tumor region of interest is extracted, then the tumor region of interest is used as the input of the liver tumor segmentation network, the accurate segmentation of the liver tumor is further realized, and the problem of unbalanced data caused by small proportion of the tumor in the whole CT image is solved;
2. the liver segmentation network and the liver tumor segmentation network both take a residual network as a framework, the residual network greatly improves the depth of the network which can be effectively trained, accelerates the convergence of the training network, and reduces the model degradation, thereby effectively avoiding the problem of gradient disappearance along with the deepening of the network layers and solving the problem of gradient dispersion caused by the over-deep network;
3. the mixed depth separable convolution is operated on different channels by using convolution kernels with different sizes, and the multi-scale convolution kernels are fused into a single convolution operation, so that characteristic modes with different resolutions are captured, and edge details and deeper small target characteristics are extracted; through strengthening the receptive field of the segmented network feature map and fully utilizing the channel and space structure information, pixel level detail and space information can be captured better, so that the segmentation performance of the network on medical images is improved.
4. The coordinate attention mechanism can capture cross-channel information so that the model can more accurately locate and identify the lesion area.
Drawings
Fig. 1 is a schematic diagram of a liver tumor segmentation model based on a cascade network.
Fig. 2 is a schematic diagram of a separable convolutional residual partitioning network based on hybrid depth (CMDCRA-UNet).
Fig. 3 is a schematic diagram of a first Residual block (a) of a liver segmentation network and (b) of a liver tumor segmentation network.
Fig. 4 is a schematic diagram of a second Residual block 2.
Fig. 5 is a schematic diagram of a pooling module (Pool).
Fig. 6 is a schematic diagram of an Up sampling module (Up Sample).
Detailed Description
The present invention will be further described in detail with reference to specific examples in order to make the objects, technical solutions and advantages of the present invention more apparent.
A liver tumor segmentation method based on cascade network, firstly constructing a liver tumor segmentation model based on cascade network; then constructing a training sample set by utilizing CT images which have been segmented into liver tumors in advance, and performing deep learning training on the constructed liver tumor segmentation model based on the cascade network by utilizing the training sample set to obtain a trained liver tumor segmentation model based on the cascade network; and finally, sending the CT image which is currently required to be segmented into a trained liver tumor segmentation model based on a cascade network, and obtaining the CT image of segmented liver tumor.
Although the liver tumor density is different from the normal liver tissue density, the liver tumor density is similar to the tissue density of other organs in the abdomen, so that the ideal effect is difficult to obtain by directly dividing the tumor by using a network, and the liver interested region can ensure that the original CT image only keeps the liver region, thereby effectively avoiding the interference of other organs in the abdomen on the division of the liver tumor. Therefore, the liver segmentation network is firstly utilized to extract the liver region in the CT image, and then the liver tumor segmentation network is utilized to extract the tumor region in the liver region. That is, the liver tumor segmentation model based on the cascade network constructed by the invention consists of a liver segmentation network, a liver tumor segmentation network and a characteristic addition layer. As shown in fig. 1. The input of the liver segmentation network is used as the input of the liver tumor segmentation model based on the cascade network, the input and the output of the liver segmentation network are simultaneously connected with the input of the feature addition layer, the output of the feature addition layer is connected with the input of the liver tumor segmentation network, and the output of the liver tumor segmentation network is used as the output of the liver tumor segmentation model based on the cascade network. The liver segmentation network segments the liver from the original CT image, extracts a tumor region of interest, and inputs the tumor region of interest and the tumor region of interest as the liver tumor segmentation network, so that the liver tumor segmentation network further realizes accurate segmentation of liver tumors.
The liver segmentation network and the tumor segmentation network are both separable convolution residual segmentation networks based on the mixed depth. Based on the mixed depth separable convolution residual error segmentation network, the whole adopts an encoding-decoding architecture, and the residual error network is used as a framework. The separable convolution residual segmentation network based on the mixed depth consists of 1 input layer, 2 convolution layers, 5 first residual modules, 4 second residual modules, 4 pooling modules, 4 up-sampling modules and 1 output layer. As shown in fig. 2. The input of the input layer serves as the input to the separable convolutional residual segmentation network based on the blend depth. The output of the input layer is connected with the input of a first residual module through a first convolution layer, the output of the first residual module is connected with the input of a first pooling module and the first input of a fourth upsampling module, the output of the first pooling module is connected with the input of a second first residual module, the output of the second first residual module is connected with the input of a second pooling module and the first input of a third upsampling module, the output of the second pooling module is connected with the input of a third first residual module, the output of the third first residual module is connected with the input of a third pooling module and the first input of a second upsampling module, the output of the third pooling module is connected with the input of a fourth first residual module, the output of the fourth pooling module is connected with the input of a fourth pooling module and the first input of the first upsampling module, and the output of the fourth pooling module is connected with the input of a fifth first residual module. The output of the fifth first residual module is connected with the second input of the first up-sampling module, the output of the first up-sampling module is connected with the input of the first second residual module, the output of the first second residual module is connected with the second input of the second up-sampling module, the output of the second up-sampling module is connected with the second input of the third up-sampling module, the output of the third up-sampling module is connected with the input of the third second residual module, the output of the third second residual module is connected with the second input of the fourth up-sampling module, the output of the fourth up-sampling module is connected with the input of the fourth third residual module, and the output of the fourth second residual module is connected with the input of the output layer through the second convolution layer. The output of the output layer is output as a separable convolutional residual segmentation network based on the blend depth. In a separable convolution residual segmentation network based on mixed depth, firstly, an input image (an original CT image or a tumor region of interest) is convolved and dimensionality-increased by using 3×3, a coder adopts a first residual module to carry out convolution operation and extract feature images of different layers in cooperation with pooling operation, then a decoder uses a second residual module to carry out convolution operation and cooperate with up-sampling operation, the information of downsampling deletion is complemented by fusing corresponding coding layer features, and finally, pixel-level classification is carried out by using 1×1 convolution to segment a liver region or a tumor region.
In a separable convolutional residual segmentation network based on hybrid depth, a first residual module is used to extract features on the encoding path and obtain context information. In the present invention, the first residual module of the liver segmentation network and the first residual module of the liver tumor segmentation network are slightly different.
In the liver segmentation network, a first residual module of the mixed depth separable convolution residual segmentation network consists of 2 mixed depth separable convolution layers, 2 convolution layers and 1 feature addition layer. As shown in fig. 3 (a). The input of the first mixed depth separable convolution layer is used as the input of a first residual error module of the liver segmentation network, the output of the first mixed depth separable convolution layer is connected with the input of the second mixed depth separable convolution layer, and the output of the second mixed depth separable convolution layer is connected with the input of the first convolution layer; the input of the second convolution layer is connected with the input of the first mixed depth separable convolution layer; the outputs of the first convolution layer and the second convolution layer are simultaneously connected with the input of the feature addition layer, and the output of the feature addition layer is used as the output of a first residual error module of the liver segmentation network. In a first residual error module of the liver segmentation network, an input feature map is subjected to two-time mixed depth separable convolution, then 1×1 convolution is performed, and feature addition is performed on the feature map after the 1×1 convolution and the feature map after the 1×1 convolution which is originally input as an output of the first residual error module.
In a liver tumor segmentation network, a first residual module of the mixed depth separable convolution residual segmentation network consists of 2 mixed depth separable convolution layers, 2 convolution layers, 1 coordinate attention mechanism layer and 1 feature addition layer. As shown in fig. 3 (b). The input of the first mixed depth separable convolution layer is used as the input of a first residual error module of the liver segmentation network, the output of the first mixed depth separable convolution layer is connected with the input of the second mixed depth separable convolution layer, the output of the second mixed depth separable convolution layer is connected with the input of the coordinate attention mechanism layer, and the output of the coordinate attention mechanism layer is connected with the input of the first convolution layer; the input of the second convolution layer is connected with the input of the first mixed depth separable convolution layer; the outputs of the first convolution layer and the second convolution layer are simultaneously connected with the input of the feature addition layer, and the output of the feature addition layer is used as the output of a first residual error module of the liver segmentation network. In a first residual error module of the liver tumor segmentation network, an input feature map is subjected to two-time mixed depth separable convolution, then is subjected to a coordinate attention mechanism, is subjected to 1×1 convolution, and is subjected to feature addition with a feature map obtained by carrying out 1×1 convolution on the feature map obtained after the 1×1 convolution and the feature map which is originally input, so as to be used as the output of the first residual error module.
Because the convolution receptive field range in the residual error network is limited, the image features of the high-resolution liver edge and the tumor extracted by the network are insufficient, the method adds the mixed depth separable convolution into the first residual error module, groups the channels by the mixed depth separable convolution, and convolves by using convolution kernels with different sizes, thereby obtaining the mixed receptive field and capturing the high-resolution features and the low-resolution features. In the mixed depth separable convolution, channels of an input image are uniformly divided into 4 groups, convolution kernels of {3×3,5×5,7×7,9×9} are used for convolution, and finally, four feature maps after convolution are spliced.
In a separable convolution residual segmentation network based on mixed depth, a second residual module is used for precisely positioning a liver region and a tumor region on a decoding path. In the present invention, the second residual module of the liver segmentation network is identical to the second residual module of the liver tumor segmentation network. The second residual block consists of 3 convolutional layers and 1 feature addition layer. As shown in fig. 4. The input of the first convolution layer is used as the input of the second residual error module, and the output of the first convolution layer is connected with the input of the second convolution layer; the input of the third convolution layer is connected with the input of the first convolution layer; the outputs of the second convolution layer and the third convolution layer are simultaneously connected with the input of the characteristic addition layer; the output of the feature addition layer serves as the output of the second residual block. In the second residual module, the input feature map is subjected to 3×3 convolution operation twice, and then feature addition is performed on the feature map subjected to 1×1 convolution with the feature map which is initially input, so as to serve as the output of the second residual module.
In a separable convolutional residual segmentation network based on mixed depth, the pooling module consists of 1 max pooling layer, 1 convolutional layer and 1 splice layer. As shown in fig. 5. The input of the maximum pooling layer and the output of the convolution layer are jointly used as the input of the pooling module, the output of the maximum pooling layer and the output of the convolution layer are jointly connected with the input of the splicing layer, and the output of the splicing layer is used as the output of the pooling module. In the pooling module, the size of the feature map is reduced by adopting 3×3 convolution with the maximum pooling and the step length of 2, and the pooled feature map and the convolved feature map are spliced, so that the receptive field is enlarged.
In a separable convolution residual segmentation network based on mixed depth, an up-sampling module consists of 1 bilinear interpolation layer and 1 splicing layer. As shown in fig. 6. The input of the bilinear interpolation layer is used as the first input of the up-sampling module, the output of the bilinear interpolation layer is connected with one input of the splicing layer, the other input of the splicing layer is used as the second input of the up-sampling module, and the output of the splicing layer is used as the output of the up-sampling module. In the up-sampling module, the image size is expanded by bilinear interpolation, and the expanded feature image is spliced with the corresponding feature image in the coding path, so that a better feature reconstruction effect is achieved.
It should be noted that, although the examples described above are illustrative, this is not a limitation of the present invention, and thus the present invention is not limited to the above-described specific embodiments. Other embodiments, which are apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein, are considered to be within the scope of the invention as claimed.