CN115170582A - Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism - Google Patents

Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism Download PDF

Info

Publication number
CN115170582A
CN115170582A CN202210666323.9A CN202210666323A CN115170582A CN 115170582 A CN115170582 A CN 115170582A CN 202210666323 A CN202210666323 A CN 202210666323A CN 115170582 A CN115170582 A CN 115170582A
Authority
CN
China
Prior art keywords
liver
attention
convolution
grid
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210666323.9A
Other languages
Chinese (zh)
Inventor
张晓龙
郑帅
邓鹤
任宏伟
邵赛
边小勇
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Science and Engineering WUSE
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN202210666323.9A priority Critical patent/CN115170582A/en
Publication of CN115170582A publication Critical patent/CN115170582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30056Liver; Hepatic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism, which comprises the following steps of: selecting a liver image data set needing liver segmentation, and dividing the liver image data set into a training set and a test set; preprocessing the liver images in the training set; at the stage of an encoder, a feature map of the liver is obtained by using a multi-scale feature fusion module and a convolution network; in the decoder stage, a segmented image of the liver is obtained by utilizing a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism; and performing morphological post-processing on the liver image obtained after segmentation. The method has the characteristics of improving the segmentation effect of the three-dimensional liver image, realizes more accurate segmentation of the three-dimensional liver image, and provides a great auxiliary effect for medical diagnosis of doctors.

Description

Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism
Technical Field
The invention relates to a three-dimensional medical image segmentation method, in particular to a three-dimensional liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism.
Background
In recent years, computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are the main Imaging methods for doctors to diagnose and evaluate liver cancer. In medical imaging, accurate segmentation of the liver has important implications in the qualitative analysis and treatment planning of liver cancer. In clinical diagnosis, the segmentation of the liver is usually performed by an experienced expert who manually delineates the edges of the liver according to the anatomical structure, which is tedious, time-consuming and labor-intensive, and requires a high level of expertise. The segmentation result is influenced by the subjective experience of experts, cognitive ability and other factors, so that the liver segmentation is a challenging task.
In general, the segmentation methods of medical images can be divided into three categories: manual segmentation, semi-automatic segmentation, and automatic segmentation. Manual segmentation is a highly subjective, poorly repeatable and time consuming method. It relies heavily on human recognizable features and requires a person with advanced technical skills to perform such tasks. These factors make it impractical for practical applications. Semi-automatic segmentation is a combination of manual and computer operations, where the manual operation provides some useful information and the computer then performs the segmentation process based on this information, and where manual intervention may lead to a deviation of the segmentation. The automatic segmentation method is completely dependent on a computer for segmentation. With the development of recent years, the research in the field of medical image segmentation is mainly automatic segmentation, i.e. automatic segmentation is performed by designing a computer-executable algorithm. At present, the segmentation algorithm of the liver mainly comprises a traditional method and a deep learning method. The traditional methods mainly comprise a threshold-based method, a region growing method, an active contour model and an edge detection-based method. These methods are mainly based on information such as gray scale, texture, edges, etc., but automated segmentation becomes very difficult due to variations in liver structure, similarity of liver and its neighboring organs, complexity of three-dimensional spatial features, and influence of noise.
In recent years, deep Neural Networks (DNN) methods have been rapidly developed in the fields of computer vision and image processing. Deep learning methods, particularly Convolutional Neural Networks (CNNs), have achieved tremendous success in the field of medical image segmentation. According to the method, a large number of labeled samples are learned, and the automatic segmentation of the image is accurate due to the outstanding characteristic learning capacity. Later full Convolutional neural Networks (FCN) can classify images at the pixel level, thereby solving the semantic level image segmentation problem. And the most classical Unet and Vnet network models. Both are a U-like network structure using skip-connection to connect low-level features and high-level features. The difference between the two is that the Unet is used for processing two-dimensional data, the Vnet is used for processing three-dimensional data, and a residual block is added into the Vnet. The method makes a certain progress in the field of liver segmentation, but the spatial information among the three-dimensional image slices of the liver is not fully utilized, and the high-level features and the low-level features are simply connected and are not sufficiently fused.
Disclosure of Invention
The invention provides a liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism in order to overcome the defects of the prior art, and the method is realized by adopting the following technical scheme:
a liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism comprises the following steps:
s1, selecting a medical image data set needing liver segmentation, and dividing the medical image data set into a training set and a testing set;
s2, preprocessing the three-dimensional liver image in the selected training set, initializing network model parameters, and inputting the preprocessed image into a network model, wherein the network model comprises an encoder network and a decoder network;
s3, obtaining a feature map of the liver by using a multi-scale feature fusion module and a convolution network at an encoder stage;
and S4, in the decoder stage, obtaining a segmented image of the liver by using a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism.
Further, in step S2, the preprocessing of the three-dimensional liver image in the selected training set specifically includes:
s21, selecting a proper window for a CT image window in a training set, and setting a CT value of the window as a preset interval;
s22, performing down-sampling and re-sampling on the training set, and adjusting the interlayer spacing of the image data to 1mm;
s23, finding the starting section and the ending section of the liver area, and expanding 20 sections in two directions respectively;
s24, carrying out histogram equalization on the acquired slice images;
and S25, randomly selecting 32 continuous slices as the input of the network model, wherein the size of the input image is 1 × 32 × 256 × 256.
Further, in step S2, initializing network model parameters specifically includes:
s26, initializing network model parameters including batch processing size, learning rate, iteration times, learning rate attenuation strategy and deep supervision attenuation coefficient;
and S27, initializing the network model weight by using a kaiming weight initialization method.
Further, step S3 specifically includes:
s31, adding a multi-scale feature fusion module in each layer of the encoder network,
s32, performing feature extraction through the maximum pooling operation downsampling to obtain a feature map of the liver;
wherein the encoder network comprises four downsampling layers, each downsampling layer is composed of a multi-scale feature fusion module, two convolutions with convolution kernel size of 3 x 3, a batch normalization and a ReLU activation function, and the number of filters of each convolution is [32,64,128,256,512]; finally, each layer is followed by a maximal pooling operation with a step size of 2, and finally the characteristic map of the liver is obtained.
Further, in the multi-scale feature fusion module, the number of channels is adjusted by performing three-dimensional convolution with a convolution kernel size of 1 × 1 × 1 on the input feature map, and the feature map is divided into four different feature maps according to the number of channels, which are marked as x i I =1,2,3,4, the number of characteristic channels in each group is one fourth of the number of input characteristic channels, and the size of the characteristic graph is unchanged; except for x 1 And each x i A convolution operation with convolution kernel size of 3 x 3 is carried out, and then batch normalization and ReLU activation are carried out to obtain four different scales of features x i '; adding four features of different sizes element by element, convolving with convolution kernel size of 1 × 1 × 1 to obtain h, and using the similar residual error idea to combine h and x i ' separately adding element by element to obtain four features x i "(x =1,2,3,4); combining four different scale features x i "perform concat operation, the number of the obtained feature map channels is consistent with the number of the input feature map channels; the final output feature map is obtained by convolution layers with convolution kernel sizes of 1 × 1 × 1.
Further, the multi-scale feature fusion module has the following formula:
Figure BDA0003691732250000041
Figure BDA0003691732250000042
x″ i =x′ i +h,i=1,2,3,4
O=Conv1(Concat(x″ 1 ,x″ 2 ,x″ 3 ,x″ 4 ))
where Conv3 denotes the convolution operation with a convolution kernel size of 3 × 3 × 3, x i "(x =1,2,3,4) represents feature information of four different scales, conv1 represents a convolution operation with a convolution kernel size of 1 × 1 × 1, and O is an output feature map of the multi-scale feature fusion module.
Further, in step S4, the decoder network includes four upsampling layers, each layer including a deconvolution with a step size of 2 followed by two convolutions with convolution kernel sizes of 3 × 3 × 3, a batch normalization, a ReLU activation function, and a multi-scale feature fusion module.
Further, step S4 specifically includes:
s41, based on the characteristics finally obtained through the multi-scale characteristic fusion module at each layer of the decoder network and the characteristics finally obtained through convolution at the upper layer of the corresponding encoder network, the characteristics are respectively used as high-layer characteristics and low-layer characteristics to obtain an attention diagram through an attention guiding connection module;
s42, carrying out a series of convolution operations on the attention diagram obtained by each layer through the attention guiding connection module to extract features; except for the bottom layer, the results of each other layer in the decoder network obtain an output through the up-sampling recovery image size with different scales, and finally four outputs are obtained, wherein the first three outputs can be used as auxiliary losses in a deep supervision mechanism, and the last output is used as a final mask map of the output;
s43, in the back propagation process of the network model iterative training, calculating a loss value by calculating the difference between the segmentation result predicted by the network model and the label value, and further continuously updating the iterative parameter value on the basis of the loss value to enable the segmentation result predicted by the network model to be close to the label value, so that the segmented image of the liver is obtained.
Furthermore, in the attention guiding connection module, a high-level feature and a low-level feature are respectively input, the high-level feature is firstly up-sampled by using a transposition convolution operation, the two features are respectively input into the grid attention module to obtain an attention feature map, then the generated attention feature map and the low-level feature are multiplied element by element, the obtained result and the feature map obtained by the transposition convolution of the high-level feature are subjected to concat operation, and a feature map is output;
in the grid attention mechanism, a high-level feature and a low-level feature are respectively input; firstly, adjusting the number of channels of the high-layer feature and the low-layer feature through three-dimensional convolution with convolution kernel size of 1 multiplied by 1, carrying out down-sampling once through three-dimensional convolution operation with step length of 2, and then carrying out simple addition operation on the down-sampling; reLU activation nonlinear transformation is carried out on the features obtained through addition and fusion, attention coefficients are generated through a Sigmoid activation function, the attention coefficients are up-sampled by using a transposition convolution to match the dimensionality of the upper-layer input low-layer features, and finally the attention coefficients are multiplied by the low-layer features element by element to obtain the final attention diagram.
Further, the formula of the grid attention mechanism is as follows:
Figure BDA0003691732250000051
therein, Ψ, W f ,W g In order to perform the convolution operation,
Figure BDA0003691732250000052
for the purpose of the function of the ReLU activation,
Figure BDA0003691732250000053
activating a function for Sigmoid, c i Is the grid attention coefficient;
the formula of the attention-guiding connection module is as follows:
Figure BDA0003691732250000054
wherein G is a grid attention mechanism,
Figure BDA0003691732250000055
is an element-by-element multiplication.
The invention has the following beneficial technical effects:
the method firstly trains the three-dimensional liver image by utilizing the multi-scale feature fusion module and the convolution block, enlarges the receptive field of the neural network and the representation capability of the enhanced feature, and obtains the liver feature map with more representative feature information by fully utilizing the information among slices and in the slices in the three-dimensional liver image; then, the attention guide connection module and the grid attention mechanism in the jumping structure are utilized to highlight the characteristics of the segmentation region and suppress other noise parts; and a deep supervision mechanism is utilized to reduce training and verification errors, and problems of gradient disappearance, gradient explosion, over-slow convergence speed and the like are reduced, so that a segmented image of the liver is finally obtained. The multi-scale semantic information and the important context information are obtained by using the multi-scale feature fusion module, the attention guide connection module and the grid attention machine mechanism for multiple times at the joint of the encoder and the decoder. And finally, optimizing the model segmentation result by using a morphological post-processing method.
Therefore, the method extracts the semantic information of the three-dimensional liver image and performs segmentation by utilizing the multi-scale feature fusion module, the attention guide connection module, the grid attention mechanism and the three-dimensional convolution neural network training and fusion.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
FIG. 2 is a comparison of liver pretreatment before and after the present invention.
Fig. 3 is a diagram of the overall architecture of the network in an embodiment of the present invention.
FIG. 4 is a structural design diagram of a multi-scale feature fusion module in an embodiment of the present invention.
Fig. 5 is a structural design diagram of the attention-guiding connection module in the embodiment of the present invention.
FIG. 6 is a structural layout diagram of the attention mechanism in an embodiment of the present invention.
Fig. 7 is a visualization of the segmentation result on the test set 3Dircadb by the method of the present invention.
Detailed Description
In order to facilitate a better understanding of the invention for those skilled in the art, the invention will be described in further detail with reference to the accompanying drawings and specific examples, which are given by way of illustration only and do not limit the scope of the invention.
Interpretation of terms:
1. kaiming: an initialization method of a neural network is presented.
2. ReLU: represents an activation function of the neural network that transforms the final result of the fitted curve to the interval of [0, + ∞).
3. skip-connection: representing the part of the hop connection in the middle of the U-type network.
4. concat: representing the connection of two tensors along the channel dimension.
5. Sigmoid: an activation function representing a neural network converts the final result of fitting a curve to the (0, 1) interval.
This embodiment discloses a three-dimensional liver image segmentation method (MAGNet method for short) based on multi-scale feature fusion and grid attention mechanism, which takes LiTS (from a medical liver common data set published by https:// composites. Primary. Org/composites/17094), 3Dircadb (from a medical liver common data set published by https:// www. Ircd. Fr/research/3 Dircadb), slice 07 (from a medical liver common data set published by https:// slice 07. Grade-change. Org/Download), where LiTS is a three-dimensional liver image, 131 sequences with tags, where 28-47 sequences are 3 sequences, and a slice 07 data set is a whole of 20 sequences, pixels are all of 20 sequences, a training data set with 3 × 512 removed, and a slice data set is a testing data set, where 3Dircadb is 3 sets, and a test data set is 3 Dircadb.
As shown in fig. 1, the three-dimensional liver image segmentation method based on multi-scale feature fusion and mesh attention mechanism described in this embodiment specifically includes the following steps:
step 1) data set division;
selecting a medical image data set which needs to be subjected to liver segmentation, and dividing the medical image data set into a training set and a test set;
step 2) data pre-processing, wherein the comparison of the liver before and after pre-processing is shown in fig. 2 (a) (before processing) and fig. 2 (b) (after processing);
2.1 Select a suitable window for the window of CT images in the training set, and set the CT value of the window between-200,200;
2.2 Down-sampling and re-sampling the training set, and adjusting the interlayer spacing of the image data to 1mm;
2.3 Find the beginning and ending slices of the liver region and expand 20 slices outward in both directions;
2.4 Histogram equalization of images in the training set;
2.5 32 consecutive slices are randomly selected as input to the network model, where the input size of the network is 1 × 32 × 256 × 256.
Step 3) obtaining a liver feature map by using a multi-scale feature fusion module and a convolution network at an encoder stage, wherein the whole network structure is shown in fig. 3;
3.1 Initializing network parameters including batch processing size, learning rate, iteration number, learning rate attenuation strategy and deep supervision attenuation coefficient;
3.2 Initializing network weights using a kaiming weight initialization method;
3.3 The preprocessed three-dimensional image is input into a convolutional neural network of the network model.
3.4 A multi-scale feature fusion module is added in each layer of the encoder stage and the decoder stage, and feature extraction is performed by maximum pooling downsampling in the encoder stage. Four downsampling layers are included in the encoder path, and each downsampling layer is composed of a multi-scale feature fusion module, two convolutions with convolution kernel size of 3 x 3, a batch normalization and a ReLU activation function. Each layer is finally followed by a maximal pooling operation with a step size of 2, and finally a liver feature map is obtained, and the overall network structure is shown in fig. 3.
Wherein the number of filters per convolution is [32,64,128,256,512]]. In the multi-scale feature fusion module, the input feature graph is subjected to three-dimensional convolution with the convolution kernel size of 1 multiplied by 1 to adjust the number of channels, and the feature graph is divided into four different feature graphs according to the number of the channels, and the feature graphs are marked as x i I =1,2,3,4. The number of the characteristic channels in each group is one fourth of the number of the input characteristic channels, and the size of the characteristic graph is unchanged. Except for x 1 Each x i Performing a convolution operation with convolution kernel size of 3 × 3 × 3, and performing batch normalization and ReLU activation to obtain four different scales of features x i '. Adding four features of different scales element by element, performing convolution operation with convolution kernel size of 1 × 1 × 1 to obtain h, and performing residual error-like idea on h and x i ' adding element by element respectively to obtain four characteristics x i "(x =1,2,3,4). Combining four different scale features x i "perform concat operation, and the obtained number of channels of the feature map is consistent with the number of channels of the input feature map. The final output signature is obtained by convolution layers with convolution kernel sizes of 1 × 1 × 1, as shown in fig. 4.
In step 3.4), the multi-scale feature fusion module formula is as follows:
Figure BDA0003691732250000081
Figure BDA0003691732250000082
x″ i =x′ i +h,i=1,2,3,4 (3)
O=Conv1(Concat(x″ 1 ,x″ 2 ,x″ 3 ,x″ 4 )) (4)
where Conv3 denotes a convolution operation with a convolution kernel size of 3 × 3 × 3. x is the number of i "(x =1,2,3,4) represents feature information at four different scales. Conv1 represents the convolution operation with a convolution kernel size of 1 × 1 × 1. And O is an output feature map of the multi-scale feature fusion module.
Step 4) obtaining a segmented image of the liver by utilizing a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism;
4.1 At the decoder stage, there are four upsampled layers, each layer containing a deconvolution with step size 2 followed by two convolution blocks with convolution kernel size 3 x 3, a batch normalization, a ReLU activation function, and a multi-scale feature fusion module.
4.2 Using the features finally obtained by the multi-scale feature fusion module at each layer at the decoder stage and the features finally obtained by convolution of the corresponding layer on the encoder, as the high-layer information and the low-layer information, respectively, to obtain an attention map through the attention-directed connection module, as shown in fig. 5.
4.3 In the attention guiding connection module, a high-level feature and a low-level feature are respectively input, firstly, the high-level feature is up-sampled by using a transposition convolution operation, the two features are respectively input into the grid attention module to obtain an attention feature map, then, the generated attention feature map and the low-level feature are multiplied element by element, the obtained result and the feature map obtained by the transposition convolution of the high-level feature are subjected to concat operation, and a feature map is output. In the grid attention module, the high-level features and the low-level features are input separately. First, the high-level feature and the low-level feature adjust the number of channels by three-dimensional convolution with a convolution kernel size of 1 × 1 × 1, perform down-sampling once by three-dimensional convolution operation with a step size of 2, and then perform simple addition operation thereon. ReLU activation nonlinear transformation is performed on the features obtained through addition and fusion, an attention coefficient is generated through a Sigmoid activation function, the attention coefficient is up-sampled by using transposition convolution to match the dimensionality of the upper-layer input low-layer features, and finally the feature is multiplied by the low-layer features element by element to obtain a final feature map, as shown in FIG. 6.
Step 4.3) the grid attention formula is as follows:
Figure BDA0003691732250000091
therein, Ψ, W f ,W g In order to perform the convolution operation,
Figure BDA0003691732250000092
for the purpose of the function of the ReLU activation,
Figure BDA0003691732250000093
activating a function for Sigmoid, c i Is the grid attention coefficient;
the attention guidance connection module formula is as follows:
Figure BDA0003691732250000101
wherein G is a grid attention mechanism,
Figure BDA0003691732250000102
is an element-by-element multiplication.
4.4 Carrying out a series of convolution operations on the feature map obtained by each layer through the attention-guiding connection module to extract features; except for the bottom layer, the results of each layer except the bottom layer in the decoder are subjected to upsampling recovery image size with different scales to obtain one output, and finally four outputs are obtained, wherein the first three outputs are used as auxiliary losses in a depth supervision mechanism, and the final output is used as a final segmentation graph;
4.5 In a back propagation process in iterative training of the neural network, a loss value is calculated by calculating a difference between a segmentation result predicted by the network and a label value, and then an iteration parameter value is continuously updated on the basis of the loss value, so that the segmentation result predicted by the network approaches the label value.
The loss calculated in the steps 4.4) and 4.5) adopts a TverskyLoss specific formula as follows:
Figure BDA0003691732250000103
wherein A and B represent the predicted value and the tag value, respectively, and α and β are hyper-parameters. The balance between false positives and false negatives can be controlled by adjusting alpha and beta;
the deep supervised network joint loss function is as follows:
loss=(loss1+loss2+loss3)*ε+loss4 (8)
and loss1-loss4 are loss functions of up-sampling results of output results of each layer of the decoder, and epsilon is a deep supervision coefficient.
And 5) performing morphological post-processing on the liver image obtained after model segmentation.
5.1 Maximum connected component extraction is carried out on the liver segmentation region;
5.2 The divided fine regions are removed, and internal void filling is performed.
The visual result of the segmentation on the 3Dircadb test set by using the three-dimensional liver image segmentation Method (MAGNET) based on the multi-scale feature fusion and the grid attention mechanism is shown in FIG. 7, and it can be seen that the segmentation result obtained by the MAGNET method is very close to the label result. In addition, the comparison of the segmentation result of the three-dimensional liver image based on the multi-scale feature fusion and the mesh attention mechanism (MAGNet) in the 3dircad test set with the segmentation result of other advanced methods as shown in the following table 1 (the comparison (mean ± std) between the segmentation result of the 3dircad data set by the method (MAGNet) in the 3dircad data set and the segmentation result of other advanced methods) shows that the comparison of the segmentation result of the three-dimensional liver image based on the multi-scale feature fusion and the mesh attention mechanism (MAGNet) in the slice 07 with the segmentation result of other advanced methods as shown in the following table 2 (the comparison (mean ± std) between the segmentation result of the method (MAGNet) in the slice 07 and the segmentation result of other advanced methods) in the following table 2.
TABLE 1
Figure BDA0003691732250000111
As shown in table 1, we compare the results predicted by the method MAGNet of this example on the 3d lcd data set with other advanced models. Compared with UNet and ResNet, the performance of MAGNET on DSC indexes is obviously improved, and compared with other advanced models, the performance of MAGNET on DSC indexes is also improved to a certain extent. For example, the model U that performs best on DSC index 3 -Net + DC is improved by 0.1 percentage point compared to DC. In the ASD and RMSD indexes, MAGnet is greatly reduced compared with other advanced models. There is also some reduction in the RVD index, slightly above U in the VOE index 3 -Net + DC model.
TABLE 2
Figure BDA0003691732250000112
Figure BDA0003691732250000121
As can be seen from Table 2, the prediction result of the method provided by the invention reaches 97.3% on DSC index, and compared with other advanced methods, the method provided by the invention has certain improvement on DSC, VOE and RVD indexes, and the method provided by the invention has better superiority and generalization capability.
The foregoing merely illustrates the principles and preferred embodiments of the invention and many variations and modifications may be made by those skilled in the art in light of the foregoing description, which are within the scope of the invention.

Claims (10)

1. A liver image segmentation method based on multi-scale feature fusion and a grid attention mechanism is characterized by comprising the following steps:
s1, selecting a medical image data set needing liver segmentation, and dividing the medical image data set into a training set and a testing set;
s2, preprocessing the three-dimensional liver image in the selected training set, initializing network model parameters, and inputting the preprocessed image into a network model, wherein the network model comprises an encoder network and a decoder network;
s3, obtaining a feature map of the liver by using a multi-scale feature fusion module and a convolution network at an encoder stage;
and S4, in the decoder stage, obtaining a segmented image of the liver by using a multi-scale feature fusion module, a grid attention mechanism, an attention guide connection module, a transposition convolution and a depth supervision mechanism.
2. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 1, wherein the step S2 of preprocessing the three-dimensional liver image in the selected training set specifically comprises:
s21, selecting a proper window for a CT image window in a training set, and setting a CT value of the window as a preset interval;
s22, performing down-sampling and re-sampling on the training set, and adjusting the interlayer spacing of the image data to 1mm;
s23, finding the starting section and the ending section of the liver area, and expanding 20 sections in two directions respectively;
s24, carrying out histogram equalization on the acquired slice images;
and S25, randomly selecting 32 continuous slices as the input of the network model, wherein the size of the input image is 1 × 32 × 256 × 256.
3. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism according to claim 2, wherein in step S2, initializing network model parameters specifically comprises:
s26, initializing network model parameters including batch processing size, learning rate, iteration times, learning rate attenuation strategy and deep supervision attenuation coefficient;
and S27, initializing the network model weight by using a kaiming weight initialization method.
4. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 3, wherein step S3 specifically comprises:
s31, adding a multi-scale feature fusion module in each layer of the encoder network,
s32, performing feature extraction through the maximum pooling operation downsampling to obtain a feature map of the liver;
wherein the encoder network comprises four downsampling layers, each downsampling layer is composed of a multiscale feature fusion module, two convolutions with a convolution kernel size of 3 x 3, a batch normalization and a ReLU activation function, and the number of filters of each convolution is [32,64,128,256,512]; and finally, performing maximum pooling operation with the step size of 2 on each layer to finally obtain a characteristic map of the liver.
5. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 4, wherein: in the multi-scale feature fusion module, the input feature graph is subjected to three-dimensional convolution with convolution kernel size of 1 multiplied by 1 to adjust the number of channels, and the feature graph is divided into four different feature graphs according to the number of the channels, and the feature graphs are marked as x i I =1,2,3,4, the number of characteristic channels in each group is one fourth of the number of input characteristic channels, and the size of the characteristic graph is unchanged; except for x 1 And each x i A convolution operation with convolution kernel size of 3 x 3 is carried out, and then batch normalization and ReLU activation are carried out to obtain four different scales of features x i '; adding four features of different scales element by element, performing convolution operation with convolution kernel size of 1 × 1 × 1 to obtain h, and performing residual error-like idea on h and x i ' respective element-by-element addition gives four features x i "(x =1,2,3,4); combining four different scale features x i "performing concat operation to obtain a feature map channel number consistent with the input feature map channel number; the final output feature map is obtained by convolution layers with convolution kernel sizes of 1 × 1 × 1.
6. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 5, wherein the multi-scale feature fusion module formula is as follows:
Figure FDA0003691732240000021
Figure FDA0003691732240000031
x” i =x’ i +h,i=1,2,3,4
O=Conv1(Concat(x” 1 ,x” 2 ,x” 3 ,x” 4 ))
where Conv3 denotes a convolution operation with a convolution kernel size of 3 × 3 × 3, x i "(x =1,2,3,4) represents feature information of four different scales, conv1 represents a convolution operation with a convolution kernel size of 1 × 1 × 1, and O is an output feature map of the multi-scale feature fusion module.
7. The method for segmenting a three-dimensional liver image based on multi-scale feature fusion and grid attention mechanism as claimed in claim 6, wherein in step S4, the decoder network comprises four upsampling layers, each layer comprising a deconvolution with step size of 2 followed by two convolutions with convolution kernel size of 3 x 3, a batch normalization, a ReLU activation function and a multi-scale feature fusion module.
8. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 7, wherein step S4 specifically comprises:
s41, based on the characteristics finally obtained by the multi-scale characteristic fusion module at each layer of the decoder network and the characteristics finally obtained by convolution at the upper layer of the corresponding encoder network, the characteristics are respectively used as the high-layer characteristics and the low-layer characteristics to obtain an attention diagram through the attention guiding connection module;
s42, carrying out a series of convolution operations on the attention diagrams obtained by each layer through the attention guiding connection module to extract features; except for the bottom layer in the decoder network, the result of each other layer is subjected to up-sampling recovery image size with different scales to obtain one output and finally four outputs, wherein the first three outputs can be used as auxiliary loss in a depth supervision mechanism, and the last output is used as a final mask image of the output;
s43, in the back propagation process of the network model iterative training, calculating a loss value by calculating the difference between the segmentation result predicted by the network model and the label value, and further continuously updating the iterative parameter value on the basis of the loss value to enable the segmentation result predicted by the network model to be close to the label value, so that the segmentation image of the liver is obtained.
9. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 8, wherein:
in the attention guiding connection module, respectively inputting a high-level feature and a low-level feature, firstly, performing up-sampling on the high-level feature by using a transposition convolution operation, respectively inputting the two features into a grid attention module to obtain an attention feature map, then performing element-by-element multiplication on the generated attention feature map and the low-level feature, performing concat operation on the obtained result and a feature map obtained by high-level feature transposition convolution, and outputting the feature map;
respectively inputting high-level features and low-level features in a grid attention mechanism; firstly, adjusting the number of channels of the high-layer feature and the low-layer feature through three-dimensional convolution with convolution kernel size of 1 multiplied by 1, carrying out down-sampling once through three-dimensional convolution operation with step length of 2, and then carrying out simple addition operation on the down-sampling; reLU activation nonlinear transformation is carried out on the features obtained through addition and fusion, attention coefficients are generated through a Sigmoid activation function, the attention coefficients are up-sampled by using transposition convolution to match the dimensionality of upper-layer input low-layer features, and finally the attention coefficients are multiplied by the low-layer features element by element to obtain final attention maps.
10. The liver image segmentation method based on multi-scale feature fusion and grid attention mechanism as claimed in claim 9, wherein:
the formula of the grid attention mechanism is as follows:
Figure FDA0003691732240000041
therein, Ψ, W f ,W g In order to perform the convolution operation,
Figure FDA0003691732240000042
for the purpose of the function of the ReLU activation,
Figure FDA0003691732240000043
activating a function for Sigmoid, c i Is the grid attention coefficient;
the formula of the attention-guiding connection module is as follows:
Figure FDA0003691732240000044
wherein G is a grid attention mechanism,
Figure FDA0003691732240000045
is an element-by-element multiplication.
CN202210666323.9A 2022-06-13 2022-06-13 Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism Pending CN115170582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210666323.9A CN115170582A (en) 2022-06-13 2022-06-13 Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210666323.9A CN115170582A (en) 2022-06-13 2022-06-13 Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism

Publications (1)

Publication Number Publication Date
CN115170582A true CN115170582A (en) 2022-10-11

Family

ID=83485205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210666323.9A Pending CN115170582A (en) 2022-06-13 2022-06-13 Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism

Country Status (1)

Country Link
CN (1) CN115170582A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115345889A (en) * 2022-10-13 2022-11-15 西南科技大学 Liver and tumor image segmentation method thereof
CN116229065A (en) * 2023-02-14 2023-06-06 湖南大学 Multi-branch fusion-based robotic surgical instrument segmentation method
CN116258672A (en) * 2022-12-26 2023-06-13 浙江大学 Medical image segmentation method, system, storage medium and electronic equipment
CN116563265A (en) * 2023-05-23 2023-08-08 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN116843696A (en) * 2023-04-27 2023-10-03 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on feature similarity and super-parameter convolution attention
CN117422880A (en) * 2023-12-18 2024-01-19 齐鲁工业大学(山东省科学院) Segmentation method and system combining improved attention mechanism and CV model
CN117635942A (en) * 2023-12-05 2024-03-01 齐鲁工业大学(山东省科学院) Cardiac MRI image segmentation method based on edge feature enhancement
CN117745745A (en) * 2024-02-18 2024-03-22 湖南大学 CT image segmentation method based on context fusion perception

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115345889B (en) * 2022-10-13 2023-01-03 西南科技大学 Liver and tumor image segmentation method thereof
CN115345889A (en) * 2022-10-13 2022-11-15 西南科技大学 Liver and tumor image segmentation method thereof
CN116258672A (en) * 2022-12-26 2023-06-13 浙江大学 Medical image segmentation method, system, storage medium and electronic equipment
CN116258672B (en) * 2022-12-26 2023-11-17 浙江大学 Medical image segmentation method, system, storage medium and electronic equipment
CN116229065A (en) * 2023-02-14 2023-06-06 湖南大学 Multi-branch fusion-based robotic surgical instrument segmentation method
CN116229065B (en) * 2023-02-14 2023-12-01 湖南大学 Multi-branch fusion-based robotic surgical instrument segmentation method
CN116843696B (en) * 2023-04-27 2024-04-09 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on feature similarity and super-parameter convolution attention
CN116843696A (en) * 2023-04-27 2023-10-03 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on feature similarity and super-parameter convolution attention
CN116563265A (en) * 2023-05-23 2023-08-08 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN116563265B (en) * 2023-05-23 2024-03-01 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN117635942A (en) * 2023-12-05 2024-03-01 齐鲁工业大学(山东省科学院) Cardiac MRI image segmentation method based on edge feature enhancement
CN117635942B (en) * 2023-12-05 2024-05-07 齐鲁工业大学(山东省科学院) Cardiac MRI image segmentation method based on edge feature enhancement
CN117422880A (en) * 2023-12-18 2024-01-19 齐鲁工业大学(山东省科学院) Segmentation method and system combining improved attention mechanism and CV model
CN117422880B (en) * 2023-12-18 2024-03-22 齐鲁工业大学(山东省科学院) Segmentation method and system combining improved attention mechanism and CV model
CN117745745A (en) * 2024-02-18 2024-03-22 湖南大学 CT image segmentation method based on context fusion perception
CN117745745B (en) * 2024-02-18 2024-05-10 湖南大学 CT image segmentation method based on context fusion perception

Similar Documents

Publication Publication Date Title
CN115170582A (en) Liver image segmentation method based on multi-scale feature fusion and grid attention mechanism
CN110889853B (en) Tumor segmentation method based on residual error-attention deep neural network
CN109191476B (en) Novel biomedical image automatic segmentation method based on U-net network structure
CN110889852B (en) Liver segmentation method based on residual error-attention deep neural network
Kumar et al. Breast cancer classification of image using convolutional neural network
CN111784671B (en) Pathological image focus region detection method based on multi-scale deep learning
CN107016681B (en) Brain MRI tumor segmentation method based on full convolution network
CN111798462B (en) Automatic delineation method of nasopharyngeal carcinoma radiotherapy target area based on CT image
CN112927255B (en) Three-dimensional liver image semantic segmentation method based on context attention strategy
CN114581662B (en) Brain tumor image segmentation method, system, device and storage medium
CN113808146B (en) Multi-organ segmentation method and system for medical image
CN110706214B (en) Three-dimensional U-Net brain tumor segmentation method fusing condition randomness and residual error
CN111583285B (en) Liver image semantic segmentation method based on edge attention strategy
Lameski et al. Skin lesion segmentation with deep learning
CN111640120A (en) Pancreas CT automatic segmentation method based on significance dense connection expansion convolution network
CN114972362A (en) Medical image automatic segmentation method and system based on RMAU-Net network
Osadebey et al. Three-stage segmentation of lung region from CT images using deep neural networks
Shan et al. SCA-Net: A spatial and channel attention network for medical image segmentation
CN112750137A (en) Liver tumor segmentation method and system based on deep learning
Tan et al. Automatic prostate segmentation based on fusion between deep network and variational methods
Dong et al. Supervised learning-based retinal vascular segmentation by m-unet full convolutional neural network
CN112990359B (en) Image data processing method, device, computer and storage medium
Khattar et al. Computer assisted diagnosis of skin cancer: a survey and future recommendations
CN113436127A (en) Method and device for constructing automatic liver segmentation model based on deep learning, computer equipment and storage medium
CN112489062B (en) Medical image segmentation method and system based on boundary and neighborhood guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination