CN115690066A

CN115690066A - Automatic accurate segmentation method for liver region in abdominal CT sequence image

Info

Publication number: CN115690066A
Application number: CN202211403625.3A
Authority: CN
Inventors: 廖苗; 邸拴虎; 梁伟; 赵于前; 杨振; 曾业战
Original assignee: Hunan University of Science and Technology
Current assignee: Hunan University of Science and Technology
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2023-02-03

Abstract

The invention discloses a method for automatically and accurately segmenting a liver region in an abdominal CT sequence image, which mainly comprises the following steps: (1) For a CT sequence to be detected, firstly reconstructing a two-dimensional slice from three view directions of a sagittal plane, a coronal plane and a transverse plane; (2) Partitioning two-dimensional slices in different view directions by adopting a U-shaped 2D convolution network based on void space pyramid convolution; (3) Fusing segmentation results in different view directions by adopting a lightweight 3D convolutional network to obtain the probability of each voxel in a CT sequence belonging to a target; (5) And constructing a full-connection conditional random field energy function according to the obtained probability, and obtaining an accurate liver segmentation result through a minimized energy function. According to the method, the three-dimensional characteristics of the CT sequence are extracted by fusing different view direction information, and the three-dimensional liver segmentation result is obtained by introducing the full-connection conditional random field, so that the method is high in accuracy and robustness.

Description

Automatic accurate segmentation method for liver region in abdominal CT sequence image

Technical Field

The invention relates to the technical field of medical image processing, in particular to an automatic accurate segmentation method for a liver region in an abdominal CT sequence image.

Background

The liver is the largest solid organ in human organs, has rich blood vessels and complex structure, and has important detoxification function. The diseases of the liver are various and seriously harm the health of human bodies. Currently about 33% of the world suffer from different types of liver disease. The treatment of liver diseases mainly comprises chemotherapy, surgical operation and radiotherapy, wherein the surgical operation and the radiotherapy both need to accurately segment liver tissues from medical images to acquire information such as pathology, physics, anatomy and the like, and provide theoretical basis for treatment schemes. The liver has complex structure, fuzzy boundary and various shapes, even though the liver is drawn by experts with abundant experience manually, the liver is time-consuming and labor-consuming and is easily influenced by subjective factors. Therefore, the research on automatic liver segmentation becomes one of the current research hotspots, and has important clinical application value.

Aiming at the problem of liver segmentation of CT sequence images, many researchers propose different methods, which are mainly divided into methods based on artificial features and methods based on deep learning. Methods based on artificial features mainly include region growing, thresholding, model-based methods and machine learning-based methods. These methods artificially extract input image features such as intensity, shape, edge or texture, and then generate contours or regions of the liver from these features. Most of the methods are semi-automatic methods, a seed point or an interested area needs to be set artificially, the segmentation precision is easily influenced by feature selection, and the generalization is poor. In recent years, deep learning techniques have been widely used for medical image segmentation due to their powerful feature extraction capabilities. In consideration of space and time efficiency, most of liver segmentation methods based on deep learning adopt a 2D segmentation network, and the final liver segmentation result of the patient is obtained by sequentially segmenting the liver region of each slice in the CT sequence. The method does not consider the correlation information between the slices, and the segmentation precision is limited. In order to extract three-dimensional feature information of the CT sequence, some experts and scholars have proposed a 3D segmentation network of CT sequence images. Limited by computing resources, these three-dimensional networks usually require down-sampling or truncating the CT sequence into small-sized three-dimensional data in advance, which results in loss of image detail information and reduced segmentation accuracy of the network.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an automatic accurate segmentation method for liver regions in an abdominal CT sequence image, which realizes accurate and effective segmentation of the liver regions in the CT sequence image and improves the accuracy and efficiency of computer-aided diagnosis and treatment by combining a 2D depth convolution network, a 3D light-weight convolution network and a full-connection conditional random field.

A liver region automatic accurate segmentation method in abdominal CT sequence images comprises the following steps:

(1) Establishing an original training data set A and an original training data set B which comprise an original CT sequence image and a liver region manual segmentation result;

(2) The method comprises the following steps of constructing a U-shaped 2D convolution network based on void space pyramid convolution, recording the network as ASPP-UNet, and specifically comprising the following steps:

(2-a) adopting a U-shaped network as a backbone network, the backbone network comprising three coding layers, two hopping connections, a void space pyramid convolutional layer, three decoding layers and a 1 × 1 convolutional layer, wherein: the output of the first coding layer is not only used as the input of the second coding layer, but also connected with the second decoding layer through the first jump connection to be used as the input of the decoding layer; the output of the second coding layer is not only used as the input of the third coding layer, but also connected with the first decoding layer through a second skip connection to be used as the input of the decoding layer; the output of the third coding layer is used as the input of the void space pyramid convolutional layer, and the output of the void space pyramid convolutional layer is used as the input of the first decoding layer; in addition, the output of the previous decoding layer is used as the input of the next decoding layer; to obtain the segmentation result, the method will be most suitableThe latter decoding layer is connected to a 1 × 1 convolutional layer, wherein the output of the last decoding layer is used as the input of the 1 × 1 convolutional layer, the output of the 1 × 1 convolutional layer is the probability that each pixel belongs to the target, and the threshold value epsilon is introduced ₁ Obtaining a segmentation result; the epsilon ₁ Preferably a constant of 0.3 to 0.7;

(2-b) in the backbone network described in the step (2-a), each coding layer is formed by connecting two 2D convolution modules, namely 2D double convolution modules, wherein each 2D convolution module comprises a convolution layer with the size of 3 × 3, a batch normalization layer and a Relu activation layer; in order to down-sample the image, in the second and third coding layers, 1 maximum pooling layer with the size of 2 × 2 is added at the end of the 2D double convolution module;

(2-c) in the backbone network described in the step (2-a), the void space pyramid convolution layer specifically includes: using n samples with different sampling radii r _v Respectively performing hole convolution on the input feature map by using a 3 x 3 convolution kernel of | v =1,2,.., n }, and splicing the hole convolution results to serve as the output of the hole space pyramid convolution layer, wherein n is a natural number larger than 1, and preferably a natural number of 2-10; in order to enlarge the receptive field of the convolution kernel and obtain the multi-scale context information, the sampling radius is set as r _v K × v +1, where k is a natural number greater than 0, preferably a natural number of 1 to 8;

(2-D) in the backbone network of step (2-a), the first and second decoding layers are composed of a 2D double convolution module of step (2-b), a 2 x 2 deconvolution layer and a concatenation operation, and the third decoding layer is composed of only a 2D double convolution module of step (2-b), wherein: the input of the 2D double convolution module in the first decoding layer is the output of the void space pyramid convolution layer, and the input of the 2D double convolution module in the next decoding layer is the output of the previous decoding layer; splicing operation in the first decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the second coding layer, and the splicing result is used as the output of the decoding layer; the splicing operation in the second decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the first coding layer, and the splicing result is used as the output of the decoding layer;

(3) Constructing a lightweight 3D convolutional network, recording as LW-3DNet, wherein the network relates to three inputs and one output, and the specific structure of the network comprises the following steps: firstly, respectively convolving three inputs by adopting three 3D double convolution modules, splicing the convolution results by adopting splicing operation, convolving the splicing results by adopting one 3D double convolution module to obtain a feature map F, and finally convolving the feature map F by adopting one 1 × 1 × 1 convolution layer, wherein the output of the 1 × 1 × 1 convolution layer is the probability that each voxel belongs to a target; the 3D double-convolution modules in the step (3) are formed by connecting two 3D convolution modules, wherein each 3D convolution module comprises a convolution layer with the size of 3 x 3, a batch normalization layer and a Relu activation layer;

(4) The method comprises the following steps of training a plurality of network models which can be used for segmenting two-dimensional slices in different view directions by using ASPP-UNet, and specifically comprises the following steps: for each CT sequence in the training data set A, firstly, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively

And

then, two-dimensional slices in the sagittal view direction acquired in the training data set are sliced

Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the sagittal plane ^X Two-dimensional slicing of coronal view directions acquired in the training dataset

Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the coronal plane ^Y Will trainTwo-dimensional slice of cross-sectional view direction obtained in dataset

Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the cross section ^Z (ii) a Training and obtaining network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z In the method, the loss function is preferably a mixed loss function based on cross entropy and Dice, and is specifically defined as follows:

l＝l _c +η·l _d

wherein l _c And l _d Respectively representing cross entropy and Dice loss, eta is a weighting parameter, preferably a constant of 0.5-2, g _w Expert manual segmentation result representing w-th pixel in CT image, wherein background is marked as 0, and target is marked as 1,p _w Representing the probability that the w-th pixel predicted by the network model belongs to the target, wherein T is the number of pixels in the CT image;

(5) Training a network model for fusing segmentation results of different view directions by using LW-3DNet, which specifically comprises the following steps:

(5-a) constructing a training data set C of the LW-3DNet network, which specifically comprises the following steps: firstly, for each CT sequence in an original training data set B, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices respectively

And

then, respectively, will

And

input to the trained network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z The test is carried out to obtain two-dimensional slice segmentation results S in different view directions ^X 、S ^Y And S ^Z (ii) a Finally, predicting the network S ^X 、S ^Y And S ^Z As the input of LW-3DNet network training, the three-dimensional manual segmentation result of the CT sequence in the training data set B is used as a label to construct a training data set C of the LW-3DNet network;

(5-b) inputting the training data set C into the LW-3Dnet network for training, and preferably selecting a Dice loss function as the loss function to obtain a trained network model LW-3DNet ^F ；

(6) For the CT sequence to be detected, firstly, two-dimensional slice reconstruction is carried out from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, two-dimensional slices in different view directions of the CT sequence image are obtained and are respectively marked as T ^X 、T ^Y And T ^Z (ii) a Then, respectively adding T ^X 、T ^Y And T ^Z Input into the trained network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z The two-dimensional slice segmentation results F in different view directions are obtained by testing ^X 、F ^Y And F ^Z (ii) a Finally, F is mixed ^X 、F ^Y And F ^Z Inputting LW-3DNetF network model for testing to obtain the probability of each voxel of CT sequence belonging to target

Wherein

Representing the probability that the ith voxel belongs to the liver, wherein N is the number of voxels of the CT sequence to be detected;

(7) The method for acquiring the accurate liver segmentation result by using the full-connection conditional random field specifically comprises the following steps:

(7-a) for the CT sequence to be detected, constructing a fully connected conditional random field energy function:

wherein, x = { x _i |i＝1,...,N}，x _i Is represented as the assigned label of the ith voxel, phi _u (x _i ) And phi _p (x _i ,x _j ) First and second order energy terms, respectively. Phi is a _u (x _i ) Assigning label x to ith voxel _i The calculation formula of the cost is as follows:

φ _u (x _i )＝-log(P(x _i ))

wherein, P (x) _i ) Denotes assigning the ith voxel a label of x _i The formula is as follows:

wherein the content of the first and second substances,

indicates the probability that the ith voxel belongs to the liver, obtained in the manner described in step (6), phi _p (x _i ,x _j ) Indicating that the ith and jth voxels are assigned labels x, respectively _i And x _j The calculation formula is as follows:

φ _p (x _i ,x _j )＝μ(x _i ,x _j )·G(f _i ,f _j )

wherein f is _i And f _j Feature vectors representing the ith and jth voxels, respectively, containing location and intensity features, G (f) _i ,f _j ) Representation applied to feature vector f _i And f _j Gauss potential energy function of (c), mu (x) _i ,x _j ) As a function of class compatibility, for constraining energy only inThe voxel pairs with the same class label are transmitted, namely the voxel pairs with the same class label can be mutually influenced. Aiming at the two-classification problem of liver segmentation of a CT sequence image, a dual-core Gaussian potential energy function sensitive to contrast is adopted:

G(f _i ,f _j )＝G _a (f _i ,f _j )+G _s (f _i ,f _j )

wherein G is _a And G _s Surface nuclei and smooth nuclei, respectively. The surface kernel is used for assigning the same label to pixels with adjacent positions and similar intensity, the smooth kernel is used for removing isolated small regions, and the specific calculation formula is as follows:

wherein, beta ₁ And beta ₂ Weight parameters, L, for surface kernel and smoothing kernel, respectively _i And L _j Spatial positions, I, of the ith and jth voxels, respectively _i And I _j Intensity of the ith and jth voxels, | | L, respectively _i -L _j I represents solving L _i And L _j Euclidean distance, | I _i -I _j I is solved by | _i And I _j Absolute value of difference, parameter σ _α And σ _β For controlling the spatial proximity and intensity similarity, respectively, between voxels assigned to labels of the same class, a parameter σ _γ For controlling the smoothness of the area; beta is the same as ₁ Preferably 0.8 to 1.2, beta ₂ Preferably 0.8 to 1.2, σ _α Preferably 3.0 to 7.0, σ _β Preferably 1.0 to 4.0 _γ Preferably a constant of 3.0 to 7.0;

and (7-b) minimizing a full-connection condition random field energy function E (x) by adopting an average field approximation method, and obtaining an optimal label distribution result, namely a final liver segmentation result.

Drawings

FIG. 1 is a schematic diagram of an ASPP-UNet network structure

FIG. 2 is a schematic diagram of a pyramid convolution layer with a void space according to an embodiment of the present invention

FIG. 3 is a schematic diagram of LW-3DNet network architecture

FIG. 4 is a schematic diagram of a network model training process that may be used to segment two-dimensional slices from different view directions

FIG. 5 is a schematic diagram of a liver segmentation process of a CT sequence image according to an embodiment of the present invention

FIG. 6 three-dimensional example of liver segmentation results according to an embodiment of the present invention

Detailed Description

A method for automatically and accurately segmenting a liver region in an abdominal CT sequence image comprises the following specific implementation steps:

(1) Randomly selecting 100 abdominal CT original sequence images and corresponding liver region manual segmentation results from a LiTS public database, selecting 50 cases from the 100 abdominal CT original sequence images and the corresponding liver region manual segmentation results as a training data set A, and taking the remaining 50 cases as a training data set B; in the manual segmentation result, the liver region (i.e. the target region) is marked as "1", and the background region is marked as "0";

(2) A U-shaped 2D convolution network based on void space pyramid convolution is constructed and recorded as ASPP-UNet, and the structure is shown in figure 1, and specifically comprises the following steps:

(2-a) adopting a U-shaped network as a backbone network, the backbone network comprising three coding layers, two hopping connections, a void space pyramid convolutional layer, three decoding layers and a 1 × 1 convolutional layer, wherein: the output of the first coding layer is not only used as the input of the second coding layer, but also connected with the second decoding layer through the first jump connection to be used as the input of the decoding layer; the output of the second coding layer is not only used as the input of the third coding layer, but also connected with the first decoding layer through a second skip connection to be used as the input of the decoding layer; the output of the third coding layer is used as the input of the void space pyramid convolution layerAs an input to a first decoding layer; in addition, the output of the previous decoding layer is used as the input of the next decoding layer; in order to obtain the segmentation result, the last decoded layer is connected to a 1 × 1 convolutional layer, wherein the output of the last decoded layer is used as the input of the 1 × 1 convolutional layer, the output of the 1 × 1 convolutional layer is the probability that each pixel belongs to the target, and the threshold value epsilon is introduced ₁ Obtaining a segmentation result; in this embodiment, epsilon is preferable ₁ ＝0.5；

(2-c) in the backbone network of step (2-a), the void space pyramid convolution layer specifically includes: using n samples with different sampling radii r _v Respectively performing hole convolution on the input feature map by using a 3 x 3 convolution kernel of | v =1,2,.., n }, and splicing the hole convolution results to serve as the output of the hole space pyramid convolution layer, wherein n is a natural number greater than 1; in order to enlarge the receptive field of the convolution kernel and obtain the multi-scale context information, the sampling radius is set as r _v K × v +1, where k is a natural number greater than 0; n =4 and k =2 are preferred in the present embodiment, and the specific structure of the void space pyramid convolution layer in the present embodiment is as shown in fig. 2, and first, 4 3 × 3 convolution kernels with radii of 3, 5, 7, and 9 are adopted to perform void convolution on input features respectively, and then, void convolution results are spliced to be output of the void space pyramid convolution layer;

(2-D) in the backbone network of step (2-a), the first and second decoding layers are each composed of a 2D double convolution module of step (2-b), a 2 x 2 deconvolution layer and a concatenation operation, and the third decoding layer is composed of only a 2D double convolution module of step (2-b), wherein: the input of the 2D double convolution module in the first decoding layer is the output of the void space pyramid convolution layer, and the input of the 2D double convolution module in the next decoding layer is the output of the previous decoding layer; splicing operation in the first decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the second coding layer, and the splicing result is used as the output of the decoding layer; the splicing operation in the second decoding layer is used for splicing the deconvolution result in the decoding layer and the output of the first coding layer, and the splicing result is used as the output of the decoding layer;

(3) Constructing a lightweight 3D convolutional network, which is recorded as LW-3DNet, wherein the specific structure of the network is shown in FIG. 3, the network relates to three inputs and one output, and the specific structure of the network comprises: firstly, respectively convolving three inputs by adopting three 3D double convolution modules, splicing the convolution results by adopting splicing operation, convolving the splicing results by adopting one 3D double convolution module to obtain a feature map F, and finally convolving the feature map F by adopting one 1 × 1 × 1 convolution layer, wherein the output of the 1 × 1 × 1 convolution layer is the probability that each voxel belongs to a target; the 3D double-convolution modules in the step (3) are formed by connecting two 3D convolution modules, wherein each 3D convolution module comprises a convolution layer with the size of 3 x 3, a batch normalization layer and a Relu activation layer;

(4) Using ASPP-UNet to train a plurality of network models that can be used to segment two-dimensional slices in different view directions, the training process is shown in fig. 4, and the specific steps include: for each CT sequence in the training data set A, firstly, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively

And

then, two-dimensional slices of different view directions acquired in the training dataset are sliced

And the corresponding two-dimensional slice manual segmentation results are respectively input into the ASPP-UNet networkTraining is carried out to obtain three network models which can be used for segmenting two-dimensional slices in different view directions and are respectively recorded as ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z In the training process, the loss function is preferably a mixed loss function based on cross entropy and Dice, and is specifically defined as follows:

l＝l _c +η·l _d

wherein l _c And l _d Represents cross entropy and Dice loss, eta is a weight parameter, and eta =1,g is preferable in the present embodiment _w Expert manual segmentation result representing w-th pixel in CT image, wherein background is marked as 0, and target is marked as 1,p _w Representing the probability that the w-th pixel predicted by the network model belongs to the target, wherein T is the number of pixels in the CT image;

And

then, respectively mixing

And

input into the trained network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z The two-dimensional slice segmentation result S in different view directions is obtained by testing ^X 、S ^Y And S ^Z (ii) a Finally, predicting the network S ^X 、S ^Y And S ^Z As the input of LW-3DNet network training, the three-dimensional manual segmentation result of the CT sequence in the training data set B is used as a label to construct a training data set C of the LW-3DNet network;

(5-b) inputting the training data set C into an LW-3Dnet network for training, and preferably selecting a Dice loss function as a loss function to obtain a trained network model LW-3DNet ^F ；

Wherein

Representing the probability that the ith voxel belongs to the target (i.e., liver), N being the number of voxels of the CT sequence;

wherein, x = { x _i |i＝1,...,N}，x _i Is represented as the assigned label of the ith voxel, phi _u (x _i ) And phi _p (x _i ,x _j ) First and second order energy terms, respectively. Phi is a _u (x _i ) Assigning label x to ith voxel _i The calculation formula is as follows:

φ _u (x _i )＝-log(P(x _i ))

wherein the content of the first and second substances,

representing the probability that the ith voxel belongs to the target (i.e., liver), obtained in the manner described in step (6), phi _p (x _i ,x _j ) Indicating that the ith and jth voxels are assigned labels x, respectively _i And x _j The calculation formula of the cost is as follows:

φ _p (x _i ,x _j )＝μ(x _i ,x _j )·G(f _i ,f _j )

wherein, f _i And f _j Feature vectors representing the ith and jth voxels, respectively, containing location and intensity features, G (f) _i ,f _j ) Representation applied to feature vector f _i And f _j Gauss potential energy function of (c), mu (x) _i ,x _j ) For the class compatibility function, the energy is constrained to be transferred only between the voxel pairs with the same class label, i.e. between the voxel pairs with the same class label. Aiming at the two-classification problem of liver segmentation of a CT sequence image, a dual-core Gaussian potential energy function sensitive to contrast is adopted:

G(f _i ,f _j )＝G _a (f _i ,f _j )+G _s (f _i ,f _j )

wherein beta is ₁ And beta ₂ Weight parameters, L, for surface kernel and smoothing kernel, respectively _i And L _j Spatial positions, I, of the ith and jth voxels, respectively _i And I _j Intensity of the ith and jth voxels, | | L, respectively _i -L _j I represents solving L _i And L _j Euclidean distance, | I _i -I _j I is solved by | _i And I _j Absolute value of difference, parameter σ _α And σ _β For controlling the spatial proximity and intensity similarity, respectively, between voxels assigned to labels of the same class, a parameter σ _γ For controlling the smoothness of the area; in the present embodiment, β is preferred ₁ ＝1.0，β ₂ ＝1.0，σ _α ＝0.5，σ _β ＝2.5，σ _γ ＝5.0；

(7-b) minimizing a random field energy function E (x) under the full-connection condition by adopting an average field approximation method, and obtaining an optimal label distribution result, namely a final liver segmentation result.

FIG. 5 is a schematic view of a liver segmentation process of a CT sequence image according to an embodiment of the present invention, in which for an original CT sequence image to be segmented, two-dimensional slice reconstruction is first performed from three view directions of a sagittal plane, a coronal plane, and a transverse plane, respectively, to obtain two-dimensional slices T of the CT sequence image in different view directions ^X 、T ^Y And T ^Z (ii) a Then, respectively adding T ^X 、T ^Y And T ^Z Input into the trained network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z The two-dimensional slice segmentation results F in different view directions are obtained by testing ^X 、F ^Y And F ^Z (ii) a Then, F is put ^X 、F ^Y And F ^Z Input LW-3DNet ^F Testing the network model to obtain the probability P that each voxel of the CT sequence belongs to the target ^obj (ii) a Finally, CT sequence images and probability P are utilized ^obj And constructing a full-connection conditional random field energy function, and acquiring an accurate liver segmentation result through a minimized energy function.

Fig. 6 shows a three-dimensional example of a liver segmentation result obtained in the embodiment, in the figure, a gray scale bar is used to represent the minimum distance between a surface voxel of the segmentation result and a surface voxel of a real liver, and when the distance is smaller, the gray scale value is lower, which indicates that the segmentation result is more accurate, and it can be seen that the method of the present invention can effectively and accurately segment a liver region in a CT sequence, and the error value of the distance of most surfaces of livers in the obtained segmentation result is smaller and close to 0.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for automatically and accurately segmenting a liver region in an abdominal CT sequence image is characterized by comprising the following steps:

(2) Constructing a U-shaped 2D convolution network based on void space pyramid convolution, recording the network as ASPP-UNet, and specifically comprising the following steps:

(2-a) adopting a U-shaped network as a backbone network, the backbone network comprising three coding layers, two hopping connections, a void space pyramid convolutional layer, three decoding layers and a 1 x 1 convolutional layer, wherein: the output of the first coding layer is not only used as the input of the second coding layer, but also connected with the second decoding layer through the first jump connection to be used as the input of the decoding layer; the output of the second coding layer is not only used as the input of the third coding layer, but also connected with the first decoding layer through a second skip connection to be used as the input of the decoding layer; the output of the third coding layer is used as the input of the void space pyramid convolutional layer, and the output of the void space pyramid convolutional layer is used as the input of the first decoding layer; in addition, the output of the previous decoding layer is used as the input of the next decoding layer; in order to obtain the segmentation result, the last decoded layer is connected to a 1 × 1 convolutional layer, wherein the output of the last decoded layer is used as the input of the 1 × 1 convolutional layer, the output of the 1 × 1 convolutional layer is the probability that each pixel belongs to the target, and the threshold value epsilon is introduced ₁ Obtaining a segmentation result;

(2-c) in the backbone network described in the step (2-a), the void space pyramid convolution layer specifically includes:

using n samples with different sampling radii r _v Respectively performing hole convolution on the input feature map by using a 3 × 3 convolution kernel of | v =1,2,.., n }, and splicing hole convolution results to serve as output of the hole space pyramid convolution layer, wherein n is a natural number greater than 1; in order to enlarge the receptive field of the convolution kernel and obtain the multi-scale context information, the sampling radius is set as r _v K × v +1, where k is a natural number greater than 0;

And

Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the sagittal plane ^X Two-dimensional slice of coronal view direction obtained from the training dataset

Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the coronal plane ^Y Two-dimensional slice of cross-sectional view direction obtained in training data set

Inputting the manual segmentation result of the two-dimensional slice into an ASPP-UNet network for training to obtain a network model ASPP-UNet for segmenting the two-dimensional slice of the cross section ^Z ；

(5-a) constructing a training data set C of the LW-3DNet network, which specifically comprises the following steps: firstly, for each CT sequence in an original training data set B, reconstructing two-dimensional slices from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, acquiring two-dimensional slices in different view directions, and recording the two-dimensional slices as the two-dimensional slices in different view directions respectively

And

then, respectively mixing

And

input to the trained network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z The two-dimensional slice segmentation result S in different view directions is obtained by testing ^X 、S ^Y And S ^Z (ii) a Finally, predicting the network by S ^X 、S ^Y And S ^Z As the input of LW-3DNet network training, the three-dimensional manual segmentation result of the CT sequence in the training data set B is used as a label to construct a training data set C of the LW-3DNet network;

(5-b) inputting the training data set C into an LW-3Dnet network for training to obtain a trained network model LW-3DNet ^F ；

(6) For the CT sequence to be detected, firstly, two-dimensional slice reconstruction is carried out from three view directions of a sagittal plane, a coronal plane and a transverse plane respectively, two-dimensional slices in different view directions of the CT sequence image are obtained and are respectively marked as T ^X 、T ^Y And T ^Z (ii) a Then, respectively adding T ^X 、T ^Y And T ^Z Input into the trained network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z The two-dimensional slice segmentation results F in different view directions are obtained by testing ^X 、F ^Y And F ^Z (ii) a Finally, F is mixed ^X 、F ^Y And F ^Z Input LW-3DNet ^F Testing the network model to obtain the probability of each voxel of the CT sequence belonging to the target

Wherein

(7-a) for the CT sequence to be detected, constructing a full-connection conditional random field energy function:

wherein, x = { x _i |i＝1,...,N}，x _i Is represented as the assigned label of the ith voxel, phi _u (x _i ) And phi _p (x _i ,x _j ) A first order energy term and a second order energy term are respectively; phi is a _u (x _i ) Assigning label x to ith voxel _i The calculation formula is as follows:

φ _u (x _i )＝-log(P(x _i ))

wherein the content of the first and second substances,

φ _p (x _i ,x _j )＝μ(x _i ,x _j )·G(f _i ,f _j )

wherein f is _i And f _j Feature vectors representing the ith and jth voxels, respectively, containing location and intensity features, G (f) _i ,f _j ) Representation applied to feature vector f _i And f _j Gauss potential energy function of (c), mu (x) _i ,x _j ) The method is a category compatibility function and is used for constraining energy to be transferred between voxel pairs with the same category label only, namely the voxel pairs with the same category label can influence each other; aiming at the two-classification problem of liver segmentation of a CT sequence image, a dual-core Gaussian potential energy function sensitive to contrast is adopted:

G(f _i ,f _j )＝G _a (f _i ,f _j )+G _s (f _i ,f _j )

wherein G is _a And G _s Surface nuclei and smooth nuclei, respectively; the surface kernel is used for assigning the same label to pixels with adjacent positions and similar intensity, the smooth kernel is used for removing isolated small regions, and the specific calculation formula is as follows:

wherein, beta ₁ And beta ₂ Weight parameters, L, for surface kernel and smoothing kernel, respectively _i And L _j Spatial positions, I, of the ith and jth voxels, respectively _i And I _j The intensity of the ith and jth voxels, | | L, respectively _i -L _j I represents solving L _i And L _j Euclidean distance, | I _i -I _j I is solved by | _i And I _j Absolute value of difference, parameter σ _α And σ _β For controlling spatial proximity and intensity similarity, respectively, between voxels assigned to labels of the same class, parameter σ _γ For controlling the smoothness of the area;

2. The method for automatically and precisely segmenting the liver region in the abdominal CT sequence image as claimed in claim 1, wherein: in step (4), training and acquiring a network model ASPP-UNet ^X 、ASPP-UNet ^Y And ASPP-UNet ^Z In the method, the loss function is preferably a mixed loss function based on cross entropy and Dice, and is specifically defined as follows:

l＝l _c +η·l _d

wherein l _c And l _d Respectively representing cross entropy and Dice loss, eta is a weight parameter, g _w Expert manual segmentation result representing w-th pixel in CT image, wherein background is marked as 0, and target is marked as 1,p _w The probability that the w-th pixel belongs to the target is predicted by the network model, and T is the number of pixels in the CT image.

3. The method for automatically and precisely segmenting the liver region in the abdominal CT sequence image as claimed in claim 1, wherein: in step (5-b), training and acquiring network model LW-3DNet ^F The loss function is preferably a Dice loss function.

4. The method for automatically and precisely segmenting the liver region in the abdominal CT sequence image as claimed in claim 1, wherein: the epsilon ₁ Preferably a constant of 0.3 to 0.7, n is a natural number of 2 to 10, k is a natural number of 1 to 8, η is a constant of 0.5 to 2, β ₁ Preferably 0.8 to 1.2, beta ₂ Preferably 0.8 to 1.2 _α Preferably 3.0 to 7.0, σ _β Preferably 1.0 to 4.0 _γ Preferably 3.0 to 7.0.