CN115690066A - Automatic accurate segmentation method for liver region in abdominal CT sequence image - Google Patents

Automatic accurate segmentation method for liver region in abdominal CT sequence image Download PDF

Info

Publication number
CN115690066A
CN115690066A CN202211403625.3A CN202211403625A CN115690066A CN 115690066 A CN115690066 A CN 115690066A CN 202211403625 A CN202211403625 A CN 202211403625A CN 115690066 A CN115690066 A CN 115690066A
Authority
CN
China
Prior art keywords
layer
convolution
network
aspp
unet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211403625.3A
Other languages
Chinese (zh)
Inventor
廖苗
邸拴虎
梁伟
赵于前
杨振
曾业战
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Science and Technology
Original Assignee
Hunan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Science and Technology filed Critical Hunan University of Science and Technology
Priority to CN202211403625.3A priority Critical patent/CN115690066A/en
Publication of CN115690066A publication Critical patent/CN115690066A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a method for automatically and accurately segmenting a liver region in an abdominal CT sequence image, which mainly comprises the following steps: (1) For a CT sequence to be detected, firstly reconstructing a two-dimensional slice from three view directions of a sagittal plane, a coronal plane and a transverse plane; (2) Partitioning two-dimensional slices in different view directions by adopting a U-shaped 2D convolution network based on void space pyramid convolution; (3) Fusing segmentation results in different view directions by adopting a lightweight 3D convolutional network to obtain the probability of each voxel in a CT sequence belonging to a target; (5) And constructing a full-connection conditional random field energy function according to the obtained probability, and obtaining an accurate liver segmentation result through a minimized energy function. According to the method, the three-dimensional characteristics of the CT sequence are extracted by fusing different view direction information, and the three-dimensional liver segmentation result is obtained by introducing the full-connection conditional random field, so that the method is high in accuracy and robustness.

Description

Automatic accurate segmentation method for liver region in abdominal CT sequence image
Technical Field
The invention relates to the technical field of medical image processing, in particular to an automatic accurate segmentation method for a liver region in an abdominal CT sequence image.
Background
The liver is the largest solid organ in human organs, has rich blood vessels and complex structure, and has important detoxification function. The diseases of the liver are various and seriously harm the health of human bodies. Currently about 33% of the world suffer from different types of liver disease. The treatment of liver diseases mainly comprises chemotherapy, surgical operation and radiotherapy, wherein the surgical operation and the radiotherapy both need to accurately segment liver tissues from medical images to acquire information such as pathology, physics, anatomy and the like, and provide theoretical basis for treatment schemes. The liver has complex structure, fuzzy boundary and various shapes, even though the liver is drawn by experts with abundant experience manually, the liver is time-consuming and labor-consuming and is easily influenced by subjective factors. Therefore, the research on automatic liver segmentation becomes one of the current research hotspots, and has important clinical application value.
Aiming at the problem of liver segmentation of CT sequence images, many researchers propose different methods, which are mainly divided into methods based on artificial features and methods based on deep learning. Methods based on artificial features mainly include region growing, thresholding, model-based methods and machine learning-based methods. These methods artificially extract input image features such as intensity, shape, edge or texture, and then generate contours or regions of the liver from these features. Most of the methods are semi-automatic methods, a seed point or an interested area needs to be set artificially, the segmentation precision is easily influenced by feature selection, and the generalization is poor. In recent years, deep learning techniques have been widely used for medical image segmentation due to their powerful feature extraction capabilities. In consideration of space and time efficiency, most of liver segmentation methods based on deep learning adopt a 2D segmentation network, and the final liver segmentation result of the patient is obtained by sequentially segmenting the liver region of each slice in the CT sequence. The method does not consider the correlation information between the slices, and the segmentation precision is limited. In order to extract three-dimensional feature information of the CT sequence, some experts and scholars have proposed a 3D segmentation network of CT sequence images. Limited by computing resources, these three-dimensional networks usually require down-sampling or truncating the CT sequence into small-sized three-dimensional data in advance, which results in loss of image detail information and reduced segmentation accuracy of the network.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an automatic accurate segmentation method for liver regions in an abdominal CT sequence image, which realizes accurate and effective segmentation of the liver regions in the CT sequence image and improves the accuracy and efficiency of computer-aided diagnosis and treatment by combining a 2D depth convolution network, a 3D light-weight convolution network and a full-connection conditional random field.
A liver region automatic accurate segmentation method in abdominal CT sequence images comprises the following steps:
(1) Establishing an original training data set A and an original training data set B which comprise an original CT sequence image and a liver region manual segmentation result;
(2) The method comprises the following steps of constructing a U-shaped 2D convolution network based on void space pyramid convolution, recording the network as ASPP-UNet, and specifically comprising the following steps:
(2-a) adopting a U-shaped network as a backbone network, the backbone network comprising three coding layers, two hopping connections, a void space pyramid convolutional layer, three decoding layers and a 1 × 1 convolutional layer, wherein: the output of the first coding layer is not only used as the input of the second coding layer, but also connected with the second decoding layer through the first jump connection to be used as the input of the decoding layer; the output of the second coding layer is not only used as the input of the third coding layer, but also connected with the first decoding layer through a second skip connection to be used as the input of the decoding layer; the output of the third coding layer is used as the input of the void space pyramid convolutional layer, and the output of the void space pyramid convolutional layer is used as the input of the first decoding layer; in addition, the output of the previous decoding layer is used as the input of the next decoding layer; to obtain the segmentation result, the method will be most suitableThe latter decoding layer is connected to a 1 × 1 convolutional layer, wherein the output of the last decoding layer is used as the input of the 1 × 1 convolutional layer, the output of the 1 × 1 convolutional layer is the probability that each pixel belongs to the target, and the threshold value epsilon is introduced 1 Obtaining a segmentation result; the epsilon 1 Preferably a constant of 0.3 to 0.7;
(2-b) in the backbone network described in the step (2-a), each coding layer is formed by connecting two 2D convolution modules, namely 2D double convolution modules, wherein each 2D convolution module comprises a convolution layer with the size of 3 × 3, a batch normalization layer and a Relu activation layer; in order to down-sample the image, in the second and third coding layers, 1 maximum pooling layer with the size of 2 × 2 is added at the end of the 2D double convolution module;
(2-c) in the backbone network described in the step (2-a), the void space pyramid convolution layer specifically includes: using n samples with different sampling radii r v Respectively performing hole convolution on the input feature map by using a 3 x 3 convolution kernel of | v =1,2,.., n }, and splicing the hole convolution results to serve as the output of the hole space pyramid convolution layer, wherein n is a natural number larger than 1, and preferably a natural number of 2-10; in order to enlarge the receptive field of the convolution kernel and obtain the multi-scale context information, the sampling radius is set as r v K × v +1, where k is a natural number greater than 0, preferably a natural number of 1 to 8;
(2-D) in the backbone network of step (2-a), the first and second decoding layers are composed of a 2D double convolution module of step (2-b), a 2 x 2 deconvolution layer and a concatenation operation, and the third decoding layer is composed of only a 2D double convolution module of step (2-b), wherein: the input of the 2D double convolution module in the first decoding layer is the output of the void space pyramid convolution layer, and the input of the 2D double convolution module in the next decoding layer is the output of the previous decoding layer; splicing operation in the first decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the second coding layer, and the splicing result is used as the output of the decoding layer; the splicing operation in the second decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the first coding layer, and the splicing result is used as the output of the decoding layer;
(3) Constructing a lightweight 3D convolutional network, recording as LW-3DNet, wherein the network relates to three inputs and one output, and the specific structure of the network comprises the following steps: firstly, respectively convolving three inputs by adopting three 3D double convolution modules, splicing the convolution results by adopting splicing operation, convolving the splicing results by adopting one 3D double convolution module to obtain a feature map F, and finally convolving the feature map F by adopting one 1 × 1 × 1 convolution layer, wherein the output of the 1 × 1 × 1 convolution layer is the probability that each voxel belongs to a target; the 3D double-convolution modules in the step (3) are formed by connecting two 3D convolution modules, wherein each 3D convolution module comprises a convolution layer with the size of 3 x 3, a batch normalization layer and a Relu activation layer;
(4) The method comprises the following steps of training a plurality of network models which can be used for segmenting two-dimensional slices in different view directions by using ASPP-UNet, and specifically comprises the following steps: for each CT sequence in the training data set A, firstly, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively
Figure BDA0003936153820000031
And
Figure BDA0003936153820000032
then, two-dimensional slices in the sagittal view direction acquired in the training data set are sliced
Figure BDA0003936153820000033
Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the sagittal plane X Two-dimensional slicing of coronal view directions acquired in the training dataset
Figure BDA0003936153820000034
Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the coronal plane Y Will trainTwo-dimensional slice of cross-sectional view direction obtained in dataset
Figure BDA0003936153820000041
Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the cross section Z (ii) a Training and obtaining network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z In the method, the loss function is preferably a mixed loss function based on cross entropy and Dice, and is specifically defined as follows:
l=l c +η·l d
Figure BDA0003936153820000042
Figure BDA0003936153820000043
wherein l c And l d Respectively representing cross entropy and Dice loss, eta is a weighting parameter, preferably a constant of 0.5-2, g w Expert manual segmentation result representing w-th pixel in CT image, wherein background is marked as 0, and target is marked as 1,p w Representing the probability that the w-th pixel predicted by the network model belongs to the target, wherein T is the number of pixels in the CT image;
(5) Training a network model for fusing segmentation results of different view directions by using LW-3DNet, which specifically comprises the following steps:
(5-a) constructing a training data set C of the LW-3DNet network, which specifically comprises the following steps: firstly, for each CT sequence in an original training data set B, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices respectively
Figure BDA0003936153820000044
And
Figure BDA0003936153820000045
then, respectively, will
Figure BDA0003936153820000046
And
Figure BDA0003936153820000047
input to the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The test is carried out to obtain two-dimensional slice segmentation results S in different view directions X 、S Y And S Z (ii) a Finally, predicting the network S X 、S Y And S Z As the input of LW-3DNet network training, the three-dimensional manual segmentation result of the CT sequence in the training data set B is used as a label to construct a training data set C of the LW-3DNet network;
(5-b) inputting the training data set C into the LW-3Dnet network for training, and preferably selecting a Dice loss function as the loss function to obtain a trained network model LW-3DNet F
(6) For the CT sequence to be detected, firstly, two-dimensional slice reconstruction is carried out from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, two-dimensional slices in different view directions of the CT sequence image are obtained and are respectively marked as T X 、T Y And T Z (ii) a Then, respectively adding T X 、T Y And T Z Input into the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The two-dimensional slice segmentation results F in different view directions are obtained by testing X 、F Y And F Z (ii) a Finally, F is mixed X 、F Y And F Z Inputting LW-3DNetF network model for testing to obtain the probability of each voxel of CT sequence belonging to target
Figure BDA0003936153820000051
Wherein
Figure BDA0003936153820000052
Representing the probability that the ith voxel belongs to the liver, wherein N is the number of voxels of the CT sequence to be detected;
(7) The method for acquiring the accurate liver segmentation result by using the full-connection conditional random field specifically comprises the following steps:
(7-a) for the CT sequence to be detected, constructing a fully connected conditional random field energy function:
Figure BDA0003936153820000053
wherein, x = { x i |i=1,...,N},x i Is represented as the assigned label of the ith voxel, phi u (x i ) And phi p (x i ,x j ) First and second order energy terms, respectively. Phi is a u (x i ) Assigning label x to ith voxel i The calculation formula of the cost is as follows:
φ u (x i )=-log(P(x i ))
wherein, P (x) i ) Denotes assigning the ith voxel a label of x i The formula is as follows:
Figure BDA0003936153820000054
wherein the content of the first and second substances,
Figure BDA0003936153820000055
indicates the probability that the ith voxel belongs to the liver, obtained in the manner described in step (6), phi p (x i ,x j ) Indicating that the ith and jth voxels are assigned labels x, respectively i And x j The calculation formula is as follows:
φ p (x i ,x j )=μ(x i ,x j )·G(f i ,f j )
wherein f is i And f j Feature vectors representing the ith and jth voxels, respectively, containing location and intensity features, G (f) i ,f j ) Representation applied to feature vector f i And f j Gauss potential energy function of (c), mu (x) i ,x j ) As a function of class compatibility, for constraining energy only inThe voxel pairs with the same class label are transmitted, namely the voxel pairs with the same class label can be mutually influenced. Aiming at the two-classification problem of liver segmentation of a CT sequence image, a dual-core Gaussian potential energy function sensitive to contrast is adopted:
G(f i ,f j )=G a (f i ,f j )+G s (f i ,f j )
wherein G is a And G s Surface nuclei and smooth nuclei, respectively. The surface kernel is used for assigning the same label to pixels with adjacent positions and similar intensity, the smooth kernel is used for removing isolated small regions, and the specific calculation formula is as follows:
Figure BDA0003936153820000061
Figure BDA0003936153820000062
wherein, beta 1 And beta 2 Weight parameters, L, for surface kernel and smoothing kernel, respectively i And L j Spatial positions, I, of the ith and jth voxels, respectively i And I j Intensity of the ith and jth voxels, | | L, respectively i -L j I represents solving L i And L j Euclidean distance, | I i -I j I is solved by | i And I j Absolute value of difference, parameter σ α And σ β For controlling the spatial proximity and intensity similarity, respectively, between voxels assigned to labels of the same class, a parameter σ γ For controlling the smoothness of the area; beta is the same as 1 Preferably 0.8 to 1.2, beta 2 Preferably 0.8 to 1.2, σ α Preferably 3.0 to 7.0, σ β Preferably 1.0 to 4.0 γ Preferably a constant of 3.0 to 7.0;
and (7-b) minimizing a full-connection condition random field energy function E (x) by adopting an average field approximation method, and obtaining an optimal label distribution result, namely a final liver segmentation result.
Drawings
FIG. 1 is a schematic diagram of an ASPP-UNet network structure
FIG. 2 is a schematic diagram of a pyramid convolution layer with a void space according to an embodiment of the present invention
FIG. 3 is a schematic diagram of LW-3DNet network architecture
FIG. 4 is a schematic diagram of a network model training process that may be used to segment two-dimensional slices from different view directions
FIG. 5 is a schematic diagram of a liver segmentation process of a CT sequence image according to an embodiment of the present invention
FIG. 6 three-dimensional example of liver segmentation results according to an embodiment of the present invention
Detailed Description
A method for automatically and accurately segmenting a liver region in an abdominal CT sequence image comprises the following specific implementation steps:
(1) Randomly selecting 100 abdominal CT original sequence images and corresponding liver region manual segmentation results from a LiTS public database, selecting 50 cases from the 100 abdominal CT original sequence images and the corresponding liver region manual segmentation results as a training data set A, and taking the remaining 50 cases as a training data set B; in the manual segmentation result, the liver region (i.e. the target region) is marked as "1", and the background region is marked as "0";
(2) A U-shaped 2D convolution network based on void space pyramid convolution is constructed and recorded as ASPP-UNet, and the structure is shown in figure 1, and specifically comprises the following steps:
(2-a) adopting a U-shaped network as a backbone network, the backbone network comprising three coding layers, two hopping connections, a void space pyramid convolutional layer, three decoding layers and a 1 × 1 convolutional layer, wherein: the output of the first coding layer is not only used as the input of the second coding layer, but also connected with the second decoding layer through the first jump connection to be used as the input of the decoding layer; the output of the second coding layer is not only used as the input of the third coding layer, but also connected with the first decoding layer through a second skip connection to be used as the input of the decoding layer; the output of the third coding layer is used as the input of the void space pyramid convolution layerAs an input to a first decoding layer; in addition, the output of the previous decoding layer is used as the input of the next decoding layer; in order to obtain the segmentation result, the last decoded layer is connected to a 1 × 1 convolutional layer, wherein the output of the last decoded layer is used as the input of the 1 × 1 convolutional layer, the output of the 1 × 1 convolutional layer is the probability that each pixel belongs to the target, and the threshold value epsilon is introduced 1 Obtaining a segmentation result; in this embodiment, epsilon is preferable 1 =0.5;
(2-b) in the backbone network described in the step (2-a), each coding layer is formed by connecting two 2D convolution modules, namely 2D double convolution modules, wherein each 2D convolution module comprises a convolution layer with the size of 3 × 3, a batch normalization layer and a Relu activation layer; in order to down-sample the image, in the second and third coding layers, 1 maximum pooling layer with the size of 2 × 2 is added at the end of the 2D double convolution module;
(2-c) in the backbone network of step (2-a), the void space pyramid convolution layer specifically includes: using n samples with different sampling radii r v Respectively performing hole convolution on the input feature map by using a 3 x 3 convolution kernel of | v =1,2,.., n }, and splicing the hole convolution results to serve as the output of the hole space pyramid convolution layer, wherein n is a natural number greater than 1; in order to enlarge the receptive field of the convolution kernel and obtain the multi-scale context information, the sampling radius is set as r v K × v +1, where k is a natural number greater than 0; n =4 and k =2 are preferred in the present embodiment, and the specific structure of the void space pyramid convolution layer in the present embodiment is as shown in fig. 2, and first, 4 3 × 3 convolution kernels with radii of 3, 5, 7, and 9 are adopted to perform void convolution on input features respectively, and then, void convolution results are spliced to be output of the void space pyramid convolution layer;
(2-D) in the backbone network of step (2-a), the first and second decoding layers are each composed of a 2D double convolution module of step (2-b), a 2 x 2 deconvolution layer and a concatenation operation, and the third decoding layer is composed of only a 2D double convolution module of step (2-b), wherein: the input of the 2D double convolution module in the first decoding layer is the output of the void space pyramid convolution layer, and the input of the 2D double convolution module in the next decoding layer is the output of the previous decoding layer; splicing operation in the first decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the second coding layer, and the splicing result is used as the output of the decoding layer; the splicing operation in the second decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the first coding layer, and the splicing result is used as the output of the decoding layer;
(3) Constructing a lightweight 3D convolutional network, which is recorded as LW-3DNet, wherein the specific structure of the network is shown in FIG. 3, the network relates to three inputs and one output, and the specific structure of the network comprises: firstly, respectively convolving three inputs by adopting three 3D double convolution modules, splicing the convolution results by adopting splicing operation, convolving the splicing results by adopting one 3D double convolution module to obtain a feature map F, and finally convolving the feature map F by adopting one 1 × 1 × 1 convolution layer, wherein the output of the 1 × 1 × 1 convolution layer is the probability that each voxel belongs to a target; the 3D double-convolution modules in the step (3) are formed by connecting two 3D convolution modules, wherein each 3D convolution module comprises a convolution layer with the size of 3 x 3, a batch normalization layer and a Relu activation layer;
(4) Using ASPP-UNet to train a plurality of network models that can be used to segment two-dimensional slices in different view directions, the training process is shown in fig. 4, and the specific steps include: for each CT sequence in the training data set A, firstly, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively
Figure BDA0003936153820000091
And
Figure BDA0003936153820000092
then, two-dimensional slices of different view directions acquired in the training dataset are sliced
Figure BDA0003936153820000093
And the corresponding two-dimensional slice manual segmentation results are respectively input into the ASPP-UNet networkTraining is carried out to obtain three network models which can be used for segmenting two-dimensional slices in different view directions and are respectively recorded as ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z In the training process, the loss function is preferably a mixed loss function based on cross entropy and Dice, and is specifically defined as follows:
l=l c +η·l d
Figure BDA0003936153820000094
Figure BDA0003936153820000095
wherein l c And l d Represents cross entropy and Dice loss, eta is a weight parameter, and eta =1,g is preferable in the present embodiment w Expert manual segmentation result representing w-th pixel in CT image, wherein background is marked as 0, and target is marked as 1,p w Representing the probability that the w-th pixel predicted by the network model belongs to the target, wherein T is the number of pixels in the CT image;
(5) Training a network model for fusing segmentation results of different view directions by using LW-3DNet, which specifically comprises the following steps:
(5-a) constructing a training data set C of the LW-3DNet network, which specifically comprises the following steps: firstly, for each CT sequence in an original training data set B, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices respectively
Figure BDA0003936153820000096
And
Figure BDA0003936153820000097
then, respectively mixing
Figure BDA0003936153820000098
And
Figure BDA0003936153820000099
input into the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The two-dimensional slice segmentation result S in different view directions is obtained by testing X 、S Y And S Z (ii) a Finally, predicting the network S X 、S Y And S Z As the input of LW-3DNet network training, the three-dimensional manual segmentation result of the CT sequence in the training data set B is used as a label to construct a training data set C of the LW-3DNet network;
(5-b) inputting the training data set C into an LW-3Dnet network for training, and preferably selecting a Dice loss function as a loss function to obtain a trained network model LW-3DNet F
(6) For the CT sequence to be detected, firstly, two-dimensional slice reconstruction is carried out from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, two-dimensional slices in different view directions of the CT sequence image are obtained and are respectively marked as T X 、T Y And T Z (ii) a Then, respectively adding T X 、T Y And T Z Input into the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The two-dimensional slice segmentation results F in different view directions are obtained by testing X 、F Y And F Z (ii) a Finally, F is mixed X 、F Y And F Z Inputting LW-3DNetF network model for testing to obtain the probability of each voxel of CT sequence belonging to target
Figure BDA0003936153820000101
Wherein
Figure BDA0003936153820000102
Representing the probability that the ith voxel belongs to the target (i.e., liver), N being the number of voxels of the CT sequence;
(7) The method for acquiring the accurate liver segmentation result by using the full-connection conditional random field specifically comprises the following steps:
(7-a) for the CT sequence to be detected, constructing a fully connected conditional random field energy function:
Figure BDA0003936153820000103
wherein, x = { x i |i=1,...,N},x i Is represented as the assigned label of the ith voxel, phi u (x i ) And phi p (x i ,x j ) First and second order energy terms, respectively. Phi is a u (x i ) Assigning label x to ith voxel i The calculation formula is as follows:
φ u (x i )=-log(P(x i ))
wherein, P (x) i ) Denotes assigning the ith voxel a label of x i The formula is as follows:
Figure BDA0003936153820000104
wherein the content of the first and second substances,
Figure BDA0003936153820000105
representing the probability that the ith voxel belongs to the target (i.e., liver), obtained in the manner described in step (6), phi p (x i ,x j ) Indicating that the ith and jth voxels are assigned labels x, respectively i And x j The calculation formula of the cost is as follows:
φ p (x i ,x j )=μ(x i ,x j )·G(f i ,f j )
wherein, f i And f j Feature vectors representing the ith and jth voxels, respectively, containing location and intensity features, G (f) i ,f j ) Representation applied to feature vector f i And f j Gauss potential energy function of (c), mu (x) i ,x j ) For the class compatibility function, the energy is constrained to be transferred only between the voxel pairs with the same class label, i.e. between the voxel pairs with the same class label. Aiming at the two-classification problem of liver segmentation of a CT sequence image, a dual-core Gaussian potential energy function sensitive to contrast is adopted:
G(f i ,f j )=G a (f i ,f j )+G s (f i ,f j )
wherein G is a And G s Surface nuclei and smooth nuclei, respectively. The surface kernel is used for assigning the same label to pixels with adjacent positions and similar intensity, the smooth kernel is used for removing isolated small regions, and the specific calculation formula is as follows:
Figure BDA0003936153820000111
Figure BDA0003936153820000112
wherein beta is 1 And beta 2 Weight parameters, L, for surface kernel and smoothing kernel, respectively i And L j Spatial positions, I, of the ith and jth voxels, respectively i And I j Intensity of the ith and jth voxels, | | L, respectively i -L j I represents solving L i And L j Euclidean distance, | I i -I j I is solved by | i And I j Absolute value of difference, parameter σ α And σ β For controlling the spatial proximity and intensity similarity, respectively, between voxels assigned to labels of the same class, a parameter σ γ For controlling the smoothness of the area; in the present embodiment, β is preferred 1 =1.0,β 2 =1.0,σ α =0.5,σ β =2.5,σ γ =5.0;
(7-b) minimizing a random field energy function E (x) under the full-connection condition by adopting an average field approximation method, and obtaining an optimal label distribution result, namely a final liver segmentation result.
FIG. 5 is a schematic view of a liver segmentation process of a CT sequence image according to an embodiment of the present invention, in which for an original CT sequence image to be segmented, two-dimensional slice reconstruction is first performed from three view directions of a sagittal plane, a coronal plane, and a transverse plane, respectively, to obtain two-dimensional slices T of the CT sequence image in different view directions X 、T Y And T Z (ii) a Then, respectively adding T X 、T Y And T Z Input into the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The two-dimensional slice segmentation results F in different view directions are obtained by testing X 、F Y And F Z (ii) a Then, F is put X 、F Y And F Z Input LW-3DNet F Testing the network model to obtain the probability P that each voxel of the CT sequence belongs to the target obj (ii) a Finally, CT sequence images and probability P are utilized obj And constructing a full-connection conditional random field energy function, and acquiring an accurate liver segmentation result through a minimized energy function.
Fig. 6 shows a three-dimensional example of a liver segmentation result obtained in the embodiment, in the figure, a gray scale bar is used to represent the minimum distance between a surface voxel of the segmentation result and a surface voxel of a real liver, and when the distance is smaller, the gray scale value is lower, which indicates that the segmentation result is more accurate, and it can be seen that the method of the present invention can effectively and accurately segment a liver region in a CT sequence, and the error value of the distance of most surfaces of livers in the obtained segmentation result is smaller and close to 0.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (4)

1. A method for automatically and accurately segmenting a liver region in an abdominal CT sequence image is characterized by comprising the following steps:
(1) Establishing an original training data set A and an original training data set B which comprise an original CT sequence image and a liver region manual segmentation result;
(2) Constructing a U-shaped 2D convolution network based on void space pyramid convolution, recording the network as ASPP-UNet, and specifically comprising the following steps:
(2-a) adopting a U-shaped network as a backbone network, the backbone network comprising three coding layers, two hopping connections, a void space pyramid convolutional layer, three decoding layers and a 1 x 1 convolutional layer, wherein: the output of the first coding layer is not only used as the input of the second coding layer, but also connected with the second decoding layer through the first jump connection to be used as the input of the decoding layer; the output of the second coding layer is not only used as the input of the third coding layer, but also connected with the first decoding layer through a second skip connection to be used as the input of the decoding layer; the output of the third coding layer is used as the input of the void space pyramid convolutional layer, and the output of the void space pyramid convolutional layer is used as the input of the first decoding layer; in addition, the output of the previous decoding layer is used as the input of the next decoding layer; in order to obtain the segmentation result, the last decoded layer is connected to a 1 × 1 convolutional layer, wherein the output of the last decoded layer is used as the input of the 1 × 1 convolutional layer, the output of the 1 × 1 convolutional layer is the probability that each pixel belongs to the target, and the threshold value epsilon is introduced 1 Obtaining a segmentation result;
(2-b) in the backbone network described in the step (2-a), each coding layer is formed by connecting two 2D convolution modules, namely 2D double convolution modules, wherein each 2D convolution module comprises a convolution layer with the size of 3 × 3, a batch normalization layer and a Relu activation layer; in order to down-sample the image, in the second and third coding layers, 1 maximum pooling layer with the size of 2 × 2 is added at the end of the 2D double convolution module;
(2-c) in the backbone network described in the step (2-a), the void space pyramid convolution layer specifically includes:
using n samples with different sampling radii r v Respectively performing hole convolution on the input feature map by using a 3 × 3 convolution kernel of | v =1,2,.., n }, and splicing hole convolution results to serve as output of the hole space pyramid convolution layer, wherein n is a natural number greater than 1; in order to enlarge the receptive field of the convolution kernel and obtain the multi-scale context information, the sampling radius is set as r v K × v +1, where k is a natural number greater than 0;
(2-D) in the backbone network of step (2-a), the first and second decoding layers are composed of a 2D double convolution module of step (2-b), a 2 x 2 deconvolution layer and a concatenation operation, and the third decoding layer is composed of only a 2D double convolution module of step (2-b), wherein: the input of the 2D double convolution module in the first decoding layer is the output of the void space pyramid convolution layer, and the input of the 2D double convolution module in the next decoding layer is the output of the previous decoding layer; splicing operation in the first decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the second coding layer, and the splicing result is used as the output of the decoding layer; the splicing operation in the second decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the first coding layer, and the splicing result is used as the output of the decoding layer;
(3) Constructing a lightweight 3D convolutional network, recording as LW-3DNet, wherein the network relates to three inputs and one output, and the specific structure of the network comprises the following steps: firstly, respectively convolving three inputs by adopting three 3D double convolution modules, splicing the convolution results by adopting splicing operation, convolving the splicing results by adopting one 3D double convolution module to obtain a feature map F, and finally convolving the feature map F by adopting one 1 × 1 × 1 convolution layer, wherein the output of the 1 × 1 × 1 convolution layer is the probability that each voxel belongs to a target; the 3D double-convolution modules in the step (3) are formed by connecting two 3D convolution modules, wherein each 3D convolution module comprises a convolution layer with the size of 3 x 3, a batch normalization layer and a Relu activation layer;
(4) The method comprises the following steps of training a plurality of network models which can be used for segmenting two-dimensional slices in different view directions by using ASPP-UNet, and specifically comprises the following steps: for each CT sequence in the training data set A, firstly, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively
Figure FDA0003936153810000021
And
Figure FDA0003936153810000022
then, two-dimensional slices in the sagittal view direction acquired in the training data set are sliced
Figure FDA0003936153810000023
Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the sagittal plane X Two-dimensional slice of coronal view direction obtained from the training dataset
Figure FDA0003936153810000024
Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the coronal plane Y Two-dimensional slice of cross-sectional view direction obtained in training data set
Figure FDA0003936153810000025
Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the cross section Z
(5) Training a network model for fusing segmentation results of different view directions by using LW-3DNet, which specifically comprises the following steps:
(5-a) constructing a training data set C of the LW-3DNet network, which specifically comprises the following steps: firstly, for each CT sequence in an original training data set B, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively
Figure FDA0003936153810000031
And
Figure FDA0003936153810000032
then, respectively mixing
Figure FDA0003936153810000033
And
Figure FDA0003936153810000034
input to the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The two-dimensional slice segmentation result S in different view directions is obtained by testing X 、S Y And S Z (ii) a Finally, predicting the network by S X 、S Y And S Z As the input of LW-3DNet network training, the three-dimensional manual segmentation result of the CT sequence in the training data set B is used as a label to construct a training data set C of the LW-3DNet network;
(5-b) inputting the training data set C into an LW-3Dnet network for training to obtain a trained network model LW-3DNet F
(6) For the CT sequence to be detected, firstly, two-dimensional slice reconstruction is carried out from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, two-dimensional slices in different view directions of the CT sequence image are obtained and are respectively marked as T X 、T Y And T Z (ii) a Then, respectively adding T X 、T Y And T Z Input into the trained network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z The two-dimensional slice segmentation results F in different view directions are obtained by testing X 、F Y And F Z (ii) a Finally, F is mixed X 、F Y And F Z Input LW-3DNet F Testing the network model to obtain the probability of each voxel of the CT sequence belonging to the target
Figure FDA0003936153810000035
Wherein
Figure FDA0003936153810000036
Representing the probability that the ith voxel belongs to the liver, wherein N is the number of voxels of the CT sequence to be detected;
(7) The method for acquiring the accurate liver segmentation result by using the full-connection conditional random field specifically comprises the following steps:
(7-a) for the CT sequence to be detected, constructing a full-connection conditional random field energy function:
Figure FDA0003936153810000037
wherein, x = { x i |i=1,...,N},x i Is represented as the assigned label of the ith voxel, phi u (x i ) And phi p (x i ,x j ) A first order energy term and a second order energy term are respectively; phi is a u (x i ) Assigning label x to ith voxel i The calculation formula is as follows:
φ u (x i )=-log(P(x i ))
wherein, P (x) i ) Denotes assigning the ith voxel a label of x i The formula is as follows:
Figure FDA0003936153810000041
wherein the content of the first and second substances,
Figure FDA0003936153810000042
indicates the probability that the ith voxel belongs to the liver, obtained in the manner described in step (6), phi p (x i ,x j ) Indicating that the ith and jth voxels are assigned labels x, respectively i And x j The calculation formula is as follows:
φ p (x i ,x j )=μ(x i ,x j )·G(f i ,f j )
wherein f is i And f j Feature vectors representing the ith and jth voxels, respectively, containing location and intensity features, G (f) i ,f j ) Representation applied to feature vector f i And f j Gauss potential energy function of (c), mu (x) i ,x j ) The method is a category compatibility function and is used for constraining energy to be transferred between voxel pairs with the same category label only, namely the voxel pairs with the same category label can influence each other; aiming at the two-classification problem of liver segmentation of a CT sequence image, a dual-core Gaussian potential energy function sensitive to contrast is adopted:
G(f i ,f j )=G a (f i ,f j )+G s (f i ,f j )
wherein G is a And G s Surface nuclei and smooth nuclei, respectively; the surface kernel is used for assigning the same label to pixels with adjacent positions and similar intensity, the smooth kernel is used for removing isolated small regions, and the specific calculation formula is as follows:
Figure FDA0003936153810000043
Figure FDA0003936153810000044
wherein, beta 1 And beta 2 Weight parameters, L, for surface kernel and smoothing kernel, respectively i And L j Spatial positions, I, of the ith and jth voxels, respectively i And I j The intensity of the ith and jth voxels, | | L, respectively i -L j I represents solving L i And L j Euclidean distance, | I i -I j I is solved by | i And I j Absolute value of difference, parameter σ α And σ β For controlling spatial proximity and intensity similarity, respectively, between voxels assigned to labels of the same class, parameter σ γ For controlling the smoothness of the area;
and (7-b) minimizing a full-connection condition random field energy function E (x) by adopting an average field approximation method, and obtaining an optimal label distribution result, namely a final liver segmentation result.
2. The method for automatically and precisely segmenting the liver region in the abdominal CT sequence image as claimed in claim 1, wherein: in step (4), training and acquiring a network model ASPP-UNet X 、ASPP-UNet Y And ASPP-UNet Z In the method, the loss function is preferably a mixed loss function based on cross entropy and Dice, and is specifically defined as follows:
l=l c +η·l d
Figure FDA0003936153810000051
Figure FDA0003936153810000052
wherein l c And l d Respectively representing cross entropy and Dice loss, eta is a weight parameter, g w Expert manual segmentation result representing w-th pixel in CT image, wherein background is marked as 0, and target is marked as 1,p w The probability that the w-th pixel belongs to the target is predicted by the network model, and T is the number of pixels in the CT image.
3. The method for automatically and precisely segmenting the liver region in the abdominal CT sequence image as claimed in claim 1, wherein: in step (5-b), training and acquiring network model LW-3DNet F The loss function is preferably a Dice loss function.
4. The method for automatically and precisely segmenting the liver region in the abdominal CT sequence image as claimed in claim 1, wherein: the epsilon 1 Preferably a constant of 0.3 to 0.7, n is a natural number of 2 to 10, k is a natural number of 1 to 8, η is a constant of 0.5 to 2, β 1 Preferably 0.8 to 1.2, beta 2 Preferably 0.8 to 1.2 α Preferably 3.0 to 7.0, σ β Preferably 1.0 to 4.0 γ Preferably 3.0 to 7.0.
CN202211403625.3A 2022-11-10 2022-11-10 Automatic accurate segmentation method for liver region in abdominal CT sequence image Pending CN115690066A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211403625.3A CN115690066A (en) 2022-11-10 2022-11-10 Automatic accurate segmentation method for liver region in abdominal CT sequence image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211403625.3A CN115690066A (en) 2022-11-10 2022-11-10 Automatic accurate segmentation method for liver region in abdominal CT sequence image

Publications (1)

Publication Number Publication Date
CN115690066A true CN115690066A (en) 2023-02-03

Family

ID=85049746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211403625.3A Pending CN115690066A (en) 2022-11-10 2022-11-10 Automatic accurate segmentation method for liver region in abdominal CT sequence image

Country Status (1)

Country Link
CN (1) CN115690066A (en)

Similar Documents

Publication Publication Date Title
CN110675406A (en) CT image kidney segmentation algorithm based on residual double-attention depth network
CN110310287A (en) It is neural network based to jeopardize the automatic delineation method of organ, equipment and storage medium
CN111354002A (en) Kidney and kidney tumor segmentation method based on deep neural network
CN116309650B (en) Medical image segmentation method and system based on double-branch embedded attention mechanism
CN114693933A (en) Medical image segmentation device based on generation of confrontation network and multi-scale feature fusion
CN110853048A (en) MRI image segmentation method, device and storage medium based on rough training and fine training
CN117078692B (en) Medical ultrasonic image segmentation method and system based on self-adaptive feature fusion
CN114596317A (en) CT image whole heart segmentation method based on deep learning
CN110751651A (en) MRI pancreas image segmentation method based on multi-scale migration learning
Kamiya Deep learning technique for musculoskeletal analysis
CN114972362A (en) Medical image automatic segmentation method and system based on RMAU-Net network
CN110782427A (en) Magnetic resonance brain tumor automatic segmentation method based on separable cavity convolution
CN115147600A (en) GBM multi-mode MR image segmentation method based on classifier weight converter
CN113706486A (en) Pancreas tumor image segmentation method based on dense connection network migration learning
CN114581453A (en) Medical image segmentation method based on multi-axial-plane feature fusion two-dimensional convolution neural network
CN114398979A (en) Ultrasonic image thyroid nodule classification method based on feature decoupling
CN117274599A (en) Brain magnetic resonance segmentation method and system based on combined double-task self-encoder
CN116883341A (en) Liver tumor CT image automatic segmentation method based on deep learning
CN114387282A (en) Accurate automatic segmentation method and system for medical image organs
CN115841457A (en) Three-dimensional medical image segmentation method fusing multi-view information
CN112750131A (en) Pelvis nuclear magnetic resonance image musculoskeletal segmentation method based on scale and sequence relation
CN116721253A (en) Abdominal CT image multi-organ segmentation method based on deep learning
CN115690423A (en) CT sequence image liver tumor segmentation method based on deep learning
CN115690066A (en) Automatic accurate segmentation method for liver region in abdominal CT sequence image
Shao et al. Semantic segmentation method of 3D liver image based on contextual attention model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination