CN115170582A

CN115170582A - Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism

Info

Publication number: CN115170582A
Application number: CN202210666323.9A
Authority: CN
Inventors: 张晓龙; 郑帅; 邓鹤; 任宏伟; 邵赛; 边小勇; 李波
Original assignee: Wuhan University of Science and Engineering WUSE
Current assignee: Wuhan University of Science and Engineering WUSE
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-10-11

Abstract

The invention relates to a liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism, which comprises the following steps of: selecting a liver image data set needing liver segmentation, and dividing the liver image data set into a training set and a test set; preprocessing the liver images in the training set; at the stage of an encoder, a feature map of the liver is obtained by using a multi-scale feature fusion module and a convolution network; in the decoder stage, a segmented image of the liver is obtained by utilizing a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism; and performing morphological post-processing on the liver image obtained after segmentation. The method has the characteristics of improving the segmentation effect of the three-dimensional liver image, realizes more accurate segmentation of the three-dimensional liver image, and provides a great auxiliary effect for medical diagnosis of doctors.

Description

Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism

Technical Field

The invention relates to a three-dimensional medical image segmentation method, in particular to a three-dimensional liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism.

Background

In recent years, computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are the main Imaging methods for doctors to diagnose and evaluate liver cancer. In medical imaging, accurate segmentation of the liver has important implications in the qualitative analysis and treatment planning of liver cancer. In clinical diagnosis, the segmentation of the liver is usually performed by an experienced expert who manually delineates the edges of the liver according to the anatomical structure, which is tedious, time-consuming and labor-intensive, and requires a high level of expertise. The segmentation result is influenced by the subjective experience of experts, cognitive ability and other factors, so that the liver segmentation is a challenging task.

In general, the segmentation methods of medical images can be divided into three categories: manual segmentation, semi-automatic segmentation, and automatic segmentation. Manual segmentation is a highly subjective, poorly repeatable and time consuming method. It relies heavily on human recognizable features and requires a person with advanced technical skills to perform such tasks. These factors make it impractical for practical applications. Semi-automatic segmentation is a combination of manual and computer operations, where the manual operation provides some useful information and the computer then performs the segmentation process based on this information, and where manual intervention may lead to a deviation of the segmentation. The automatic segmentation method is completely dependent on a computer for segmentation. With the development of recent years, the research in the field of medical image segmentation is mainly automatic segmentation, i.e. automatic segmentation is performed by designing a computer-executable algorithm. At present, the segmentation algorithm of the liver mainly comprises a traditional method and a deep learning method. The traditional methods mainly comprise a threshold-based method, a region growing method, an active contour model and an edge detection-based method. These methods are mainly based on information such as gray scale, texture, edges, etc., but automated segmentation becomes very difficult due to variations in liver structure, similarity of liver and its neighboring organs, complexity of three-dimensional spatial features, and influence of noise.

In recent years, deep Neural Networks (DNN) methods have been rapidly developed in the fields of computer vision and image processing. Deep learning methods, particularly Convolutional Neural Networks (CNNs), have achieved tremendous success in the field of medical image segmentation. According to the method, a large number of labeled samples are learned, and the automatic segmentation of the image is accurate due to the outstanding characteristic learning capacity. Later full Convolutional neural Networks (FCN) can classify images at the pixel level, thereby solving the semantic level image segmentation problem. And the most classical Unet and Vnet network models. Both are a U-like network structure using skip-connection to connect low-level features and high-level features. The difference between the two is that the Unet is used for processing two-dimensional data, the Vnet is used for processing three-dimensional data, and a residual block is added into the Vnet. The method makes a certain progress in the field of liver segmentation, but the spatial information among the three-dimensional image slices of the liver is not fully utilized, and the high-level features and the low-level features are simply connected and are not sufficiently fused.

Disclosure of Invention

The invention provides a liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism in order to overcome the defects of the prior art, and the method is realized by adopting the following technical scheme:

a liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism comprises the following steps:

s1, selecting a medical image data set needing liver segmentation, and dividing the medical image data set into a training set and a testing set;

s2, preprocessing the three-dimensional liver image in the selected training set, initializing network model parameters, and inputting the preprocessed image into a network model, wherein the network model comprises an encoder network and a decoder network;

s3, obtaining a feature map of the liver by using a multi-scale feature fusion module and a convolution network at an encoder stage;

and S4, in the decoder stage, obtaining a segmented image of the liver by using a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism.

Further, in step S2, the preprocessing of the three-dimensional liver image in the selected training set specifically includes:

s21, selecting a proper window for a CT image window in a training set, and setting a CT value of the window as a preset interval;

s22, performing down-sampling and re-sampling on the training set, and adjusting the interlayer spacing of the image data to 1mm;

s23, finding the starting section and the ending section of the liver area, and expanding 20 sections in two directions respectively;

s24, carrying out histogram equalization on the acquired slice images;

and S25, randomly selecting 32 continuous slices as the input of the network model, wherein the size of the input image is 1 × 32 × 256 × 256.

Further, in step S2, initializing network model parameters specifically includes:

s26, initializing network model parameters including batch processing size, learning rate, iteration times, learning rate attenuation strategy and deep supervision attenuation coefficient;

and S27, initializing the network model weight by using a kaiming weight initialization method.

Further, step S3 specifically includes:

s31, adding a multi-scale feature fusion module in each layer of the encoder network,

s32, performing feature extraction through the maximum pooling operation downsampling to obtain a feature map of the liver;

wherein the encoder network comprises four downsampling layers, each downsampling layer is composed of a multi-scale feature fusion module, two convolutions with convolution kernel size of 3 x 3, a batch normalization and a ReLU activation function, and the number of filters of each convolution is [32,64,128,256,512]; finally, each layer is followed by a maximal pooling operation with a step size of 2, and finally the characteristic map of the liver is obtained.

Further, in the multi-scale feature fusion module, the number of channels is adjusted by performing three-dimensional convolution with a convolution kernel size of 1 × 1 × 1 on the input feature map, and the feature map is divided into four different feature maps according to the number of channels, which are marked as x _i I =1,2,3,4, the number of characteristic channels in each group is one fourth of the number of input characteristic channels, and the size of the characteristic graph is unchanged; except for x ₁ And each x _i A convolution operation with convolution kernel size of 3 x 3 is carried out, and then batch normalization and ReLU activation are carried out to obtain four different scales of features x _i '; adding four features of different sizes element by element, convolving with convolution kernel size of 1 × 1 × 1 to obtain h, and using the similar residual error idea to combine h and x _i ' separately adding element by element to obtain four features x _i "(x =1,2,3,4); combining four different scale features x _i "perform concat operation, the number of the obtained feature map channels is consistent with the number of the input feature map channels; the final output feature map is obtained by convolution layers with convolution kernel sizes of 1 × 1 × 1.

Further, the multi-scale feature fusion module has the following formula:

x″ _i ＝x′ _i +h,i＝1,2,3,4

O＝Conv1(Concat(x″ ₁ ,x″ ₂ ,x″ ₃ ,x″ ₄ ))

where Conv3 denotes the convolution operation with a convolution kernel size of 3 × 3 × 3, x _i "(x =1,2,3,4) represents feature information of four different scales, conv1 represents a convolution operation with a convolution kernel size of 1 × 1 × 1, and O is an output feature map of the multi-scale feature fusion module.

Further, in step S4, the decoder network includes four upsampling layers, each layer including a deconvolution with a step size of 2 followed by two convolutions with convolution kernel sizes of 3 × 3 × 3, a batch normalization, a ReLU activation function, and a multi-scale feature fusion module.

Further, step S4 specifically includes:

s41, based on the characteristics finally obtained through the multi-scale characteristic fusion module at each layer of the decoder network and the characteristics finally obtained through convolution at the upper layer of the corresponding encoder network, the characteristics are respectively used as high-layer characteristics and low-layer characteristics to obtain an attention diagram through an attention guiding connection module;

s42, carrying out a series of convolution operations on the attention diagram obtained by each layer through the attention guiding connection module to extract features; except for the bottom layer, the results of each other layer in the decoder network obtain an output through the up-sampling recovery image size with different scales, and finally four outputs are obtained, wherein the first three outputs can be used as auxiliary losses in a deep supervision mechanism, and the last output is used as a final mask map of the output;

s43, in the back propagation process of the network model iterative training, calculating a loss value by calculating the difference between the segmentation result predicted by the network model and the label value, and further continuously updating the iterative parameter value on the basis of the loss value to enable the segmentation result predicted by the network model to be close to the label value, so that the segmented image of the liver is obtained.

Furthermore, in the attention guiding connection module, a high-level feature and a low-level feature are respectively input, the high-level feature is firstly up-sampled by using a transposition convolution operation, the two features are respectively input into the grid attention module to obtain an attention feature map, then the generated attention feature map and the low-level feature are multiplied element by element, the obtained result and the feature map obtained by the transposition convolution of the high-level feature are subjected to concat operation, and a feature map is output;

in the grid attention mechanism, a high-level feature and a low-level feature are respectively input; firstly, adjusting the number of channels of the high-layer feature and the low-layer feature through three-dimensional convolution with convolution kernel size of 1 multiplied by 1, carrying out down-sampling once through three-dimensional convolution operation with step length of 2, and then carrying out simple addition operation on the down-sampling; reLU activation nonlinear transformation is carried out on the features obtained through addition and fusion, attention coefficients are generated through a Sigmoid activation function, the attention coefficients are up-sampled by using a transposition convolution to match the dimensionality of the upper-layer input low-layer features, and finally the attention coefficients are multiplied by the low-layer features element by element to obtain the final attention diagram.

Further, the formula of the grid attention mechanism is as follows:

therein, Ψ, W _f ，W _g In order to perform the convolution operation,

for the purpose of the function of the ReLU activation,

activating a function for Sigmoid, c _i Is the grid attention coefficient;

the formula of the attention-guiding connection module is as follows:

wherein G is a grid attention mechanism,

is an element-by-element multiplication.

The invention has the following beneficial technical effects:

the method firstly trains the three-dimensional liver image by utilizing the multi-scale feature fusion module and the convolution block, enlarges the receptive field of the neural network and the representation capability of the enhanced feature, and obtains the liver feature map with more representative feature information by fully utilizing the information among slices and in the slices in the three-dimensional liver image; then, the attention guide connection module and the grid attention mechanism in the jumping structure are utilized to highlight the characteristics of the segmentation region and suppress other noise parts; and a deep supervision mechanism is utilized to reduce training and verification errors, and problems of gradient disappearance, gradient explosion, over-slow convergence speed and the like are reduced, so that a segmented image of the liver is finally obtained. The multi-scale semantic information and the important context information are obtained by using the multi-scale feature fusion module, the attention guide connection module and the grid attention machine mechanism for multiple times at the joint of the encoder and the decoder. And finally, optimizing the model segmentation result by using a morphological post-processing method.

Therefore, the method extracts the semantic information of the three-dimensional liver image and performs segmentation by utilizing the multi-scale feature fusion module, the attention guide connection module, the grid attention mechanism and the three-dimensional convolution neural network training and fusion.

Drawings

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.

FIG. 2 is a comparison of liver pretreatment before and after the present invention.

Fig. 3 is a diagram of the overall architecture of the network in an embodiment of the present invention.

FIG. 4 is a structural design diagram of a multi-scale feature fusion module in an embodiment of the present invention.

Fig. 5 is a structural design diagram of the attention-guiding connection module in the embodiment of the present invention.

FIG. 6 is a structural layout diagram of the attention mechanism in an embodiment of the present invention.

Fig. 7 is a visualization of the segmentation result on the test set 3Dircadb by the method of the present invention.

Detailed Description

In order to facilitate a better understanding of the invention for those skilled in the art, the invention will be described in further detail with reference to the accompanying drawings and specific examples, which are given by way of illustration only and do not limit the scope of the invention.

Interpretation of terms:

1. kaiming: an initialization method of a neural network is presented.

2. ReLU: represents an activation function of the neural network that transforms the final result of the fitted curve to the interval of [0, + ∞).

3. skip-connection: representing the part of the hop connection in the middle of the U-type network.

4. concat: representing the connection of two tensors along the channel dimension.

5. Sigmoid: an activation function representing a neural network converts the final result of fitting a curve to the (0, 1) interval.

This embodiment discloses a three-dimensional liver image segmentation method (MAGNet method for short) based on multi-scale feature fusion and grid attention mechanism, which takes LiTS (from a medical liver common data set published by https:// composites. Primary. Org/composites/17094), 3Dircadb (from a medical liver common data set published by https:// www. Ircd. Fr/research/3 Dircadb), slice 07 (from a medical liver common data set published by https:// slice 07. Grade-change. Org/Download), where LiTS is a three-dimensional liver image, 131 sequences with tags, where 28-47 sequences are 3 sequences, and a slice 07 data set is a whole of 20 sequences, pixels are all of 20 sequences, a training data set with 3 × 512 removed, and a slice data set is a testing data set, where 3Dircadb is 3 sets, and a test data set is 3 Dircadb.

As shown in fig. 1, the three-dimensional liver image segmentation method based on multi-scale feature fusion and mesh attention mechanism described in this embodiment specifically includes the following steps:

step 1) data set division;

selecting a medical image data set which needs to be subjected to liver segmentation, and dividing the medical image data set into a training set and a test set;

step 2) data pre-processing, wherein the comparison of the liver before and after pre-processing is shown in fig. 2 (a) (before processing) and fig. 2 (b) (after processing);

2.1 Select a suitable window for the window of CT images in the training set, and set the CT value of the window between-200,200;

2.2 Down-sampling and re-sampling the training set, and adjusting the interlayer spacing of the image data to 1mm;

2.3 Find the beginning and ending slices of the liver region and expand 20 slices outward in both directions;

2.4 Histogram equalization of images in the training set;

2.5 32 consecutive slices are randomly selected as input to the network model, where the input size of the network is 1 × 32 × 256 × 256.

Step 3) obtaining a liver feature map by using a multi-scale feature fusion module and a convolution network at an encoder stage, wherein the whole network structure is shown in fig. 3;

3.1 Initializing network parameters including batch processing size, learning rate, iteration number, learning rate attenuation strategy and deep supervision attenuation coefficient;

3.2 Initializing network weights using a kaiming weight initialization method;

3.3 The preprocessed three-dimensional image is input into a convolutional neural network of the network model.

3.4 A multi-scale feature fusion module is added in each layer of the encoder stage and the decoder stage, and feature extraction is performed by maximum pooling downsampling in the encoder stage. Four downsampling layers are included in the encoder path, and each downsampling layer is composed of a multi-scale feature fusion module, two convolutions with convolution kernel size of 3 x 3, a batch normalization and a ReLU activation function. Each layer is finally followed by a maximal pooling operation with a step size of 2, and finally a liver feature map is obtained, and the overall network structure is shown in fig. 3.

Wherein the number of filters per convolution is [32,64,128,256,512]]. In the multi-scale feature fusion module, the input feature graph is subjected to three-dimensional convolution with the convolution kernel size of 1 multiplied by 1 to adjust the number of channels, and the feature graph is divided into four different feature graphs according to the number of the channels, and the feature graphs are marked as x _i I =1,2,3,4. The number of the characteristic channels in each group is one fourth of the number of the input characteristic channels, and the size of the characteristic graph is unchanged. Except for x ₁ Each x _i Performing a convolution operation with convolution kernel size of 3 × 3 × 3, and performing batch normalization and ReLU activation to obtain four different scales of features x _i '. Adding four features of different scales element by element, performing convolution operation with convolution kernel size of 1 × 1 × 1 to obtain h, and performing residual error-like idea on h and x _i ' adding element by element respectively to obtain four characteristics x _i "(x =1,2,3,4). Combining four different scale features x _i "perform concat operation, and the obtained number of channels of the feature map is consistent with the number of channels of the input feature map. The final output signature is obtained by convolution layers with convolution kernel sizes of 1 × 1 × 1, as shown in fig. 4.

In step 3.4), the multi-scale feature fusion module formula is as follows:

x″ _i ＝x′ _i +h,i＝1,2,3,4 (3)

O＝Conv1(Concat(x″ ₁ ,x″ ₂ ,x″ ₃ ,x″ ₄ )) (4)

where Conv3 denotes a convolution operation with a convolution kernel size of 3 × 3 × 3. x is the number of _i "(x =1,2,3,4) represents feature information at four different scales. Conv1 represents the convolution operation with a convolution kernel size of 1 × 1 × 1. And O is an output feature map of the multi-scale feature fusion module.

Step 4) obtaining a segmented image of the liver by utilizing a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism;

4.1 At the decoder stage, there are four upsampled layers, each layer containing a deconvolution with step size 2 followed by two convolution blocks with convolution kernel size 3 x 3, a batch normalization, a ReLU activation function, and a multi-scale feature fusion module.

4.2 Using the features finally obtained by the multi-scale feature fusion module at each layer at the decoder stage and the features finally obtained by convolution of the corresponding layer on the encoder, as the high-layer information and the low-layer information, respectively, to obtain an attention map through the attention-directed connection module, as shown in fig. 5.

4.3 In the attention guiding connection module, a high-level feature and a low-level feature are respectively input, firstly, the high-level feature is up-sampled by using a transposition convolution operation, the two features are respectively input into the grid attention module to obtain an attention feature map, then, the generated attention feature map and the low-level feature are multiplied element by element, the obtained result and the feature map obtained by the transposition convolution of the high-level feature are subjected to concat operation, and a feature map is output. In the grid attention module, the high-level features and the low-level features are input separately. First, the high-level feature and the low-level feature adjust the number of channels by three-dimensional convolution with a convolution kernel size of 1 × 1 × 1, perform down-sampling once by three-dimensional convolution operation with a step size of 2, and then perform simple addition operation thereon. ReLU activation nonlinear transformation is performed on the features obtained through addition and fusion, an attention coefficient is generated through a Sigmoid activation function, the attention coefficient is up-sampled by using transposition convolution to match the dimensionality of the upper-layer input low-layer features, and finally the feature is multiplied by the low-layer features element by element to obtain a final feature map, as shown in FIG. 6.

Step 4.3) the grid attention formula is as follows:

therein, Ψ, W _f ，W _g In order to perform the convolution operation,

for the purpose of the function of the ReLU activation,

activating a function for Sigmoid, c _i Is the grid attention coefficient;

the attention guidance connection module formula is as follows:

wherein G is a grid attention mechanism,

is an element-by-element multiplication.

4.4 Carrying out a series of convolution operations on the feature map obtained by each layer through the attention-guiding connection module to extract features; except for the bottom layer, the results of each layer except the bottom layer in the decoder are subjected to upsampling recovery image size with different scales to obtain one output, and finally four outputs are obtained, wherein the first three outputs are used as auxiliary losses in a depth supervision mechanism, and the final output is used as a final segmentation graph;

4.5 In a back propagation process in iterative training of the neural network, a loss value is calculated by calculating a difference between a segmentation result predicted by the network and a label value, and then an iteration parameter value is continuously updated on the basis of the loss value, so that the segmentation result predicted by the network approaches the label value.

The loss calculated in the steps 4.4) and 4.5) adopts a TverskyLoss specific formula as follows:

wherein A and B represent the predicted value and the tag value, respectively, and α and β are hyper-parameters. The balance between false positives and false negatives can be controlled by adjusting alpha and beta;

the deep supervised network joint loss function is as follows:

loss＝(loss1+loss2+loss3)*ε+loss4 (8)

and loss1-loss4 are loss functions of up-sampling results of output results of each layer of the decoder, and epsilon is a deep supervision coefficient.

And 5) performing morphological post-processing on the liver image obtained after model segmentation.

5.1 Maximum connected component extraction is carried out on the liver segmentation region;

5.2 The divided fine regions are removed, and internal void filling is performed.

The visual result of the segmentation on the 3Dircadb test set by using the three-dimensional liver image segmentation Method (MAGNET) based on the multi-scale feature fusion and the grid attention mechanism is shown in FIG. 7, and it can be seen that the segmentation result obtained by the MAGNET method is very close to the label result. In addition, the comparison of the segmentation result of the three-dimensional liver image based on the multi-scale feature fusion and the mesh attention mechanism (MAGNet) in the 3dircad test set with the segmentation result of other advanced methods as shown in the following table 1 (the comparison (mean ± std) between the segmentation result of the 3dircad data set by the method (MAGNet) in the 3dircad data set and the segmentation result of other advanced methods) shows that the comparison of the segmentation result of the three-dimensional liver image based on the multi-scale feature fusion and the mesh attention mechanism (MAGNet) in the slice 07 with the segmentation result of other advanced methods as shown in the following table 2 (the comparison (mean ± std) between the segmentation result of the method (MAGNet) in the slice 07 and the segmentation result of other advanced methods) in the following table 2.

TABLE 1

As shown in table 1, we compare the results predicted by the method MAGNet of this example on the 3d lcd data set with other advanced models. Compared with UNet and ResNet, the performance of MAGNET on DSC indexes is obviously improved, and compared with other advanced models, the performance of MAGNET on DSC indexes is also improved to a certain extent. For example, the model U that performs best on DSC index ³ -Net + DC is improved by 0.1 percentage point compared to DC. In the ASD and RMSD indexes, MAGnet is greatly reduced compared with other advanced models. There is also some reduction in the RVD index, slightly above U in the VOE index ³ -Net + DC model.

TABLE 2

As can be seen from Table 2, the prediction result of the method provided by the invention reaches 97.3% on DSC index, and compared with other advanced methods, the method provided by the invention has certain improvement on DSC, VOE and RVD indexes, and the method provided by the invention has better superiority and generalization capability.

The foregoing merely illustrates the principles and preferred embodiments of the invention and many variations and modifications may be made by those skilled in the art in light of the foregoing description, which are within the scope of the invention.

Claims

1. A liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism is characterized by comprising the following steps:

2. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 1, wherein the step S2 of preprocessing the three-dimensional liver image in the selected training set specifically comprises:

s24, carrying out histogram equalization on the acquired slice images;

3. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism according to claim 2, wherein in step S2, initializing network model parameters specifically comprises:

4. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 3, wherein step S3 specifically comprises:

wherein the encoder network comprises four downsampling layers, each downsampling layer is composed of a multiscale feature fusion module, two convolutions with a convolution kernel size of 3 x 3, a batch normalization and a ReLU activation function, and the number of filters of each convolution is [32,64,128,256,512]; and finally, performing maximum pooling operation with the step size of 2 on each layer to finally obtain a characteristic map of the liver.

5. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 4, wherein: in the multi-scale feature fusion module, the input feature graph is subjected to three-dimensional convolution with convolution kernel size of 1 multiplied by 1 to adjust the number of channels, and the feature graph is divided into four different feature graphs according to the number of the channels, and the feature graphs are marked as x _i I =1,2,3,4, the number of characteristic channels in each group is one fourth of the number of input characteristic channels, and the size of the characteristic graph is unchanged; except for x ₁ And each x _i A convolution operation with convolution kernel size of 3 x 3 is carried out, and then batch normalization and ReLU activation are carried out to obtain four different scales of features x _i '; adding four features of different scales element by element, performing convolution operation with convolution kernel size of 1 × 1 × 1 to obtain h, and performing residual error-like idea on h and x _i ' respective element-by-element addition gives four features x _i "(x =1,2,3,4); combining four different scale features x _i "performing concat operation to obtain a feature map channel number consistent with the input feature map channel number; the final output feature map is obtained by convolution layers with convolution kernel sizes of 1 × 1 × 1.

6. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 5, wherein the multi-scale feature fusion module formula is as follows:

x” _i ＝x’ _i +h,i＝1,2,3,4

O＝Conv1(Concat(x” ₁ ,x” ₂ ,x” ₃ ,x” ₄ ))

where Conv3 denotes a convolution operation with a convolution kernel size of 3 × 3 × 3, x _i "(x =1,2,3,4) represents feature information of four different scales, conv1 represents a convolution operation with a convolution kernel size of 1 × 1 × 1, and O is an output feature map of the multi-scale feature fusion module.

7. The method for segmenting a three-dimensional liver image based on multi-scale feature fusion and grid attention mechanism as claimed in claim 6, wherein in step S4, the decoder network comprises four upsampling layers, each layer comprising a deconvolution with step size of 2 followed by two convolutions with convolution kernel size of 3 x 3, a batch normalization, a ReLU activation function and a multi-scale feature fusion module.

8. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 7, wherein step S4 specifically comprises:

s41, based on the characteristics finally obtained by the multi-scale characteristic fusion module at each layer of the decoder network and the characteristics finally obtained by convolution at the upper layer of the corresponding encoder network, the characteristics are respectively used as the high-layer characteristics and the low-layer characteristics to obtain an attention diagram through the attention guiding connection module;

s42, carrying out a series of convolution operations on the attention diagrams obtained by each layer through the attention guiding connection module to extract features; except for the bottom layer in the decoder network, the result of each other layer is subjected to up-sampling recovery image size with different scales to obtain one output and finally four outputs, wherein the first three outputs can be used as auxiliary loss in a depth supervision mechanism, and the last output is used as a final mask image of the output;

s43, in the back propagation process of the network model iterative training, calculating a loss value by calculating the difference between the segmentation result predicted by the network model and the label value, and further continuously updating the iterative parameter value on the basis of the loss value to enable the segmentation result predicted by the network model to be close to the label value, so that the segmentation image of the liver is obtained.

9. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 8, wherein:

in the attention guiding connection module, respectively inputting a high-level feature and a low-level feature, firstly, performing up-sampling on the high-level feature by using a transposition convolution operation, respectively inputting the two features into a grid attention module to obtain an attention feature map, then performing element-by-element multiplication on the generated attention feature map and the low-level feature, performing concat operation on the obtained result and a feature map obtained by high-level feature transposition convolution, and outputting the feature map;

respectively inputting high-level features and low-level features in a grid attention mechanism; firstly, adjusting the number of channels of the high-layer feature and the low-layer feature through three-dimensional convolution with convolution kernel size of 1 multiplied by 1, carrying out down-sampling once through three-dimensional convolution operation with step length of 2, and then carrying out simple addition operation on the down-sampling; reLU activation nonlinear transformation is carried out on the features obtained through addition and fusion, attention coefficients are generated through a Sigmoid activation function, the attention coefficients are up-sampled by using transposition convolution to match the dimensionality of upper-layer input low-layer features, and finally the attention coefficients are multiplied by the low-layer features element by element to obtain final attention maps.

10. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 9, wherein:

the formula of the grid attention mechanism is as follows:

therein, Ψ, W _f ，W _g In order to perform the convolution operation,

for the purpose of the function of the ReLU activation,

activating a function for Sigmoid, c _i Is the grid attention coefficient;

the formula of the attention-guiding connection module is as follows:

wherein G is a grid attention mechanism,

is an element-by-element multiplication.