CN115457051A - Liver CT image segmentation method based on global self-attention and multi-scale feature fusion - Google Patents
Liver CT image segmentation method based on global self-attention and multi-scale feature fusion Download PDFInfo
- Publication number
- CN115457051A CN115457051A CN202211064580.1A CN202211064580A CN115457051A CN 115457051 A CN115457051 A CN 115457051A CN 202211064580 A CN202211064580 A CN 202211064580A CN 115457051 A CN115457051 A CN 115457051A
- Authority
- CN
- China
- Prior art keywords
- attention
- features
- feature
- convolution
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 210000004185 liver Anatomy 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000004927 fusion Effects 0.000 title claims abstract description 38
- 238000003709 image segmentation Methods 0.000 title claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000003187 abdominal effect Effects 0.000 claims abstract description 6
- 238000005070 sampling Methods 0.000 claims abstract description 5
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 13
- 210000000056 organ Anatomy 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000002591 computed tomography Methods 0.000 description 36
- 208000014018 liver neoplasm Diseases 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 206010019695 Hepatic neoplasm Diseases 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 201000007270 liver cancer Diseases 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013170 computed tomography imaging Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30056—Liver; Hepatic
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a liver CT image segmentation method based on global self-attention and multi-scale feature fusion, and belongs to the technical field of medical image processing. The invention comprises the following steps: acquiring an abdominal CT data set and preprocessing the abdominal CT data set; (2) Extracting multi-scale features by adopting a ResNeXt convolutional neural network, and introducing multi-scale spatial information; (3) Obtaining a global self-attention fusion feature through a global self-attention module by using the multi-scale feature; (4) And extracting the fusion features through an improved convolution module, and finally performing up-sampling to obtain a segmentation result. The method is verified based on a LiTS public data set, and the average Dice value of an overlapped region of a segmentation result and real segmentation reaches 96.4%, which is 4.3% higher than that of a classical model UNet.
Description
Technical Field
The invention relates to a liver CT image segmentation method based on global self-attention and multi-scale feature fusion, and belongs to the technical field of medical image processing.
Background
Liver cancer is one of the fastest growing cancers worldwide in morbidity and mortality. Computed Tomography (CT) is a commonly used clinical tumor diagnosis method, which mainly benefits from that CT imaging technology can generally avoid the problem of organ image overlapping in other imaging technologies, and is more beneficial to tumor identification. Liver segmentation is a key step of interventional liver cancer clinical diagnosis and analysis, and accurate liver segmentation results can greatly improve the film reading efficiency of a doctor on CT images, so that a diagnosis and treatment scheme can be made as early as possible.
With the increasing number of CT images, the CT scan data of a case is usually accompanied by hundreds of CT slices, and there are problems of subjective interference, inconsistent standards, complex flow, time and labor consuming, and no repeatability in manual one-by-one analysis. Therefore, the liver organ can be accurately and automatically segmented in the abdominal CT image, and the segmentation method has higher value compared with manual segmentation. The difficulty of liver segmentation at present is mainly reflected in that the internal contrast of a liver organ is low, the intensity difference between the liver and other adjacent organs is small, the boundary of adjacent organs is fuzzy, and the shape change is large, so that the difficulty of liver segmentation is high. Therefore, liver organ segmentation based on CT images is a challenging task.
The automatic liver segmentation is mainly solved by the following three methods: 1) The traditional image segmentation method comprises the following steps: the segmentation task is done using shallow features such as grayscale, texture, etc. However, this also results in the conventional method being relatively sensitive to noise pixels and difficult to make good use of the conventional image segmentation method for deeper image features. 2) The machine learning method comprises the following steps: data patterns are analyzed from large-scale data. However, most of the machine learning algorithms need to carefully design artificial image features, and the expression of the features and the final segmentation result are also limited by the feature selection mode. 3) The deep learning method comprises the following steps: more and more abstract features can be extracted without additional intermediate processes, and the selection mode of the features is continuously adjusted according to results, so that the accuracy is greatly improved. The segmentation result of the existing deep learning method is generally better than that of the traditional image processing method, but the existing deep learning method is still insufficient for segmenting the liver and the liver tumor, and is also insufficient for considering relevant characteristics such as fuzzy boundary, variable positions and the like expressed in a CT image of the liver and the liver tumor. Many extracted features play little or no role in the segmentation result in the down-sampling process of the deep learning method, and the features are not weakened and are expressed in the same way as the key segmentation features, so that the segmentation result is not facilitated. In addition, the traditional U-Net jump link mode can cause semantic gap to cause the problem of feature mismatching, and the association among features is not fully considered in part of multi-scale model methods, so that the performance of a segmentation model is influenced.
Disclosure of Invention
In order to solve the above mentioned problems, the invention provides a liver CT image segmentation method based on global attention and multi-scale feature fusion, and the invention selects ResNeXt using packet convolution as an image feature extraction network, and obtains more image features without increasing computation time. Aiming at the problem of fuzzy boundary of liver organs, the method is solved by extracting and fusing different scale characteristics through a multi-scale architecture. And since there must be some relationship between the liver organ and other organs in the CT image, a self-attention mechanism is introduced to capture the relationship between the extracted features. And finally, the features are fused through a residual volume block of an improved attention method, so that the features are better expressed, and a better liver segmentation result is obtained.
The technical scheme of the invention is as follows: a liver CT image segmentation method based on global self-attention and multi-scale feature fusion comprises the following specific steps:
step1, image preprocessing: and processing the CT image in the LiTS data set according to the HU value range to increase the contrast, and expanding the data set by adopting a random overturning mode and the like.
Step2, acquiring the same dimension characteristic and the multi-scale characteristic: after the Step1 preprocessing operation, extracting image features by using a ResNeXt convolution neural network, and obtaining convolution features with uniform dimensionality and multi-scale features based on the convolution features through linear transformation.
Step3, obtaining a global self-attention fusion characteristic: and obtaining a self-attention fusion feature containing global information by a global self-attention module (Non-Local) by using the multi-scale feature obtained in Step2 so as to capture the relation between the target feature and the surrounding features.
And Step4, extracting the features of the self-attention fusion features obtained in Step3 through an improved convolution module, highlighting the effect of important semantic features in channel dimensions, and finally performing up-sampling to obtain a segmentation result.
Further, the specific steps of Step1 are as follows:
step1.1, processing the CT image in the LiTS data set according to the HU value range corresponding to the liver organ to increase the contrast; and processing according to CT values ranging from-130 HU to 230HU, namely the window width 360HU and the window level 50HU, and then performing normalization operation on the processed CT image.
The Step1.2 data expansion adopts the modes of random horizontal turning, vertical turning, zooming and cutting to carry out data enhancement; after random expansion, the data are divided, wherein 82% is used as a training set, the rest 18% is used as a test set, and the training set is further divided into training data and verification data according to the proportion of 8.
Further, the specific steps of Step2 are as follows:
after image preprocessing, step2.1 uses the first five layers of the ResNeXt-101 network as a feature extraction layer, the convolution in each ResNeXt block is divided into 32 paths, the middle channel dimension processed by each path is 4, different paths are equivalent to different feature subspaces and used for extracting different semantic features, meanwhile, the convolution kernels of different paths have sparser relations, and the risk of overfitting is reduced.
Step2.2 unifies the channel dimensions of Layer 1-4 output results in the ResNeXt network structure into 64 through linear transformation, and upsamples the feature map size to be consistent with Layer 1. And splicing the four features, performing 1 × 1 convolution compression on the spliced features to 64 to obtain multi-scale features, wherein the number of the feature channels and the size of the feature diagram are consistent with the dimension of the features processed by Layer 1-4.
Further, the Step3 comprises the following specific steps:
different organs in the Step3.1 abdomen CT image have certain relation, and the obtained relation can improve the liver organ segmentation effect. Inspired by the idea of calculating the correlation between the current position and other positions in the image in the non-local mean algorithm, starting from the multi-scale feature obtained by Step2, performing linear mapping for three times respectively to obtain Key, query and Value embedded space features, wherein the linear mapping is realized by adopting 1 × 1 convolution.
Step3.2 calculates the similarity between the characteristic Key and Query, and the function for calculating the correlation is obtained according to a Gaussian function selected by a non-local mean value, and the calculation formula is as follows:wherein x i Is the ith position of the input feature map and j represents all the positions that may be associated with i. And weighting the calculated similarity to Value to obtain the self-attention feature.
Step3.3 obtains the output of self-attention weight from the attention characteristic through a Softmax layer, thereby integrating the learned long-distance dependency relationship into the output characteristic, and the overall calculation formula is as follows:
where C (x) is a Softmax normalization function, function g linearly maps the representation of the input j position, typically by 1 × 1 convolution, and function f calculates the correlation of the input ith position with the jth position.
Further, the Step4 comprises the following specific steps:
step4.1 extracts the fusion features containing multi-scale information and self-attention relations from Step3, and further feature extraction is carried out through an improved convolution module. The multi-scale self-attention fusion feature is subjected to 1 x 1 convolution, a feature channel is mapped to a specified dimension, and then the feature sum of the feature channel and the specified dimension is obtained through 1 x 1 convolution and 3 x 3 convolution.
Step4.2 uses an Attention module (Channel Attention, CA) acting on the Channel dimension to recalibrate the Channel of the feature Channel, and uses a residual path to fuse the original feature and the Attention feature of the Channel to obtain the output feature of the residual module, and the specific calculation is as shown in the formula:
Y MRA (X)=Y CA (W L X+W E X)+X
wherein Y is MRA (X) denotes a multi-level residual attention convolution operation, and X denotes an input characteristic. W L Is a 1 × 1 convolution matrix used to linearly map the original input, which is equivalent to a residual path. W E Is a 3 x 3 convolution matrix for feature extraction of the input features, Y CA Indicating channel attention operation.
The characteristics extracted by Step4.3 adopt a multi-path parallel idea of ensemble learning to obtain four groups of segmentation outputs, and the four groups of outputs are calculated and averaged to be used as a final output result.
The invention is further explained, in Step1 and Step 4:
1) The data preprocessing method comprises the following steps:
the original CT image contains a CT value in a large range, the contrast is poor as a whole, and the gray level difference among organs in the image is small and difficult to distinguish. On medical CT images, the HU value range corresponding to the target organ is usually used for processing to increase the contrast. The liver part is usually processed by adopting a window width of 150 and a window level of 30, but the liver organ and the liver tumor have gray level difference, and if the method is processed according to the HU value of the liver, the gray level loss of part of the liver tumor area is inevitably caused, so that important information is lost, and the training effect is not good. Aiming at the problem, the CT value range is-130 HU to 230HU, namely the window width 360HU and the window level 50HU, which are obtained by analyzing the distribution of the HU values of the histogram. And obtaining a processed CT image after normalization operation, wherein the processed image enhances the contrast between organs while retaining the information of the target region to the maximum extent, and is more beneficial to training a model, the image before processing is as shown in fig. 3 (a), and the image after processing is as shown in fig. 3 (b).
2) Designing a loss function:
for the condition that positive and negative samples in the data set are unbalanced, a Binary Cross Entropy (BCE) and a Dice Loss (Dice Loss, DL) are combined in a weighted mode to serve as training Loss, and a calculation formula of a Loss function L is as follows:
where y represents the true segmentation map value,for the segmentation map values predicted by the model, ω is set to 0.5 for the weight of the two penalties, and e is set to 1.0 for the smoothing term set to avoid the denominator being 0.
The invention has the beneficial effects that:
1. according to the liver CT image segmentation method based on the fusion of the global self-attention and the multi-scale features, aiming at the characteristics of a liver segmentation task, a multi-scale strategy is selected to extract various features, the multi-scale features extracted by different network layers are used for introducing multi-scale spatial information, and the problems of fuzzy boundaries and the like in liver segmentation are solved. And a global attention mechanism is introduced to construct the relation between the image features corresponding to different semantic categories, so that the model can better capture the association between the semantic features corresponding to the liver and other organs, and the problem of large change of the liver shape is solved.
2. Each channel dimension corresponds to a type of semantic information, which is mapped to a type of image features in the original image. But we hope that the image features corresponding to the liver organ should have a higher degree of importance, and obviously the same channel weight is not good for the expression of key features. Aiming at the problem, the invention designs a multi-stage residual error attention convolution MRA module for highlighting the important semantic features in the channel dimension.
To sum up, the liver CT image segmentation method based on the fusion of the global self-attention and the multi-scale features firstly utilizes the ResNeXt convolutional neural network to obtain the multi-scale features in the abdominal CT image, then uses the global self-attention module to capture the spatial position relationship, and combines the effect of the multi-level residual error attention module to highlight the important semantic features in the channel dimension; finally, the accuracy of liver image segmentation is improved.
Drawings
FIG. 1 is a diagram of a segmentation method of a liver CT image based on global attention and multi-scale feature fusion;
FIG. 2 is a schematic diagram of a global self-attention-based module according to the present invention;
FIG. 3 is a comparison of the pretreatment of the present invention before and after; wherein, (a) the pre-processed image; (b) the processed image;
FIG. 4 is a visual comparison of the segmentation results of the present invention; wherein, (a) a CT picture; (b) a reference segmentation criterion; (c) original model segmentation results; (d) adding the self-attention module model segmentation result; (e) adding the improved convolution model segmentation result.
Detailed Description
Example 1: as shown in fig. 1-4, a liver CT image segmentation method based on global attention and multi-scale feature fusion specifically includes the following steps:
step1, image preprocessing: and processing the CT image in the LiTS data set according to the HU value range to increase the contrast, and expanding the data set by adopting a random overturning mode and the like.
Further, the specific steps of Step1 are as follows:
step1.1 processes the CT images in the LiTS dataset according to the HU value range corresponding to the liver organ to increase the contrast. And processing according to CT values ranging from-130 HU to 230HU, namely the window width 360HU and the window level 50HU, and then performing normalization operation on the processed CT image.
The Step1.2 data expansion adopts the modes of random horizontal turning, vertical turning, zooming and cutting to carry out data enhancement. After random expansion, the data are divided, wherein 82% is used as a training set, the rest 18% is used as a test set, and the training set is further divided into training data and verification data according to the proportion of 8.
Step2, acquiring the same dimension characteristic and the multi-scale characteristic: after the Step1 preprocessing operation, extracting image features by using a ResNeXt convolution neural network, and obtaining convolution features with uniform dimensionality and multi-scale features based on the convolution features through linear transformation.
Further, the specific steps of Step2 are as follows:
after image preprocessing, step2.1 uses the first five layers of the ResNeXt-101 network as feature extraction layers, the convolution in each ResNeXt block is divided into 32 paths, the middle channel dimension processed by each path is 4, and different paths are equivalent to different feature subspaces and used for extracting different semantic features.
Step2.2 unifies the channel dimensions of Layer 1-4 output results in the ResNeXt network structure into 64 through linear transformation, and upsamples the feature map size to keep consistent with Layer 1. And splicing the four features, compressing the spliced features to 64 through a 1 × 1 convolution to obtain the multi-scale features, wherein the number of the feature channels and the size of the feature map are consistent with the dimension of the features processed by Layer 1-4.
Step3, obtaining a global self-attention fusion characteristic: and obtaining a self-attention fusion feature containing global information through a global self-attention module (Non-Local) by using the multi-scale feature obtained in Step2 so as to capture the relation between the target feature and the surrounding features.
Further, the specific steps of Step3 are as follows:
certain relation exists among different organs in the Step3.1 abdominal CT image, and the liver organ segmentation effect can be improved by acquiring the relation. Inspired by a Non-Local mean algorithm, the invention selects Non-Local global self attention, starts from multi-scale characteristics obtained by Step2, and respectively carries out cubic linear mapping to obtain Key, query and Value embedded space characteristics, wherein the linear mapping is realized by adopting 1 × 1 convolution, and the multi-scale fusion characteristics and different attention methods are shown in a table 1.
TABLE 1 Multi-Scale fusion features Using different attention methods for comparison
Step3.2 calculates the similarity between the characteristic Key and Query, and the function for calculating the correlation is obtained according to a Gaussian function selected by a non-local mean value, and the calculation formula is as follows:wherein x i Is the ith position of the input feature map and j represents all the positions that may be associated with i. And weighting the calculated similarity to Value to obtain the self-attention feature.
Step3.3 obtains the output of the self-attention weight from the attention feature through a Softmax layer, so that the learned long-distance dependency relationship is merged into the output feature; the overall calculation formula is as follows:
where C (x) is a Softmax normalization function, function g linearly maps the representation of the input j position, typically by 1 × 1 convolution, and function f calculates the correlation of the input ith position with the jth position.
And Step4, extracting the features of the self-attention fusion features obtained in Step3 through an improved convolution module, highlighting the effect of important semantic features in channel dimensions, and finally performing up-sampling to obtain a segmentation result.
Further, the specific steps of Step4 are as follows:
step4.1 extracting the fusion feature containing multi-scale information and self-attention relation from Step3, and further extracting the information in the fusion feature through an improved convolution module. The multi-scale self-attention fusion feature is subjected to 1 x 1 convolution, the feature channel is mapped to the specified dimension, and then the feature summation of the two is obtained through 1 x 1 convolution and 3 x 3 convolution.
Step4.2 uses an Attention module (Channel Attention, CA) acting on the Channel dimension to perform Channel recalibration on the characteristic Channel, and uses a residual error path to fuse the original characteristic and the Channel Attention characteristic to obtain the output characteristic of the residual error module; the specific calculation is shown as the formula:
Y MRA (X)=Y CA (W L X+W E X)+X
wherein, Y MRA (X) denotes a multi-level residual attention convolution operation, and X denotes an input feature. W is a group of L Is a 1 × 1 convolution matrix used to linearly map the original input, which is equivalent to a residual path. W E Is a 3 x 3 convolution matrix for feature extraction of the input features, Y CA Indicating channel attention operation.
The characteristics extracted by Step4.3 adopt a multi-path parallel idea of ensemble learning to obtain four groups of segmentation outputs, and the four groups of outputs are calculated and averaged to be used as a final output result.
The hardware environment adopted by the experiment is configured as Intel (R) Xeon (R) CPU E5-2620 v4@2.10GHZ and GPU NVDIA TITAN XP hardware platform, the operating system is Ubuntu 18.04.1, and the software platform comprises a GPU parallel computing architecture CUDA and a Pytron programming language-based Pytrch deep learning framework. Here, using Adam optimizer, the learning rate variation strategy employs a cosine annealing (cosine annealing) strategy with an initial learning rate of 0.001 and a minimum of 0.00001, reset every 30 rounds. The total number of training rounds is 80 rounds and the batch size is 4.
Table 2 shows the experimental comparison results between the method of the present invention and the medical image segmentation field on the dataset LiTS, wherein the method includes classical segmentation algorithms including UNet, FCN, etc., DAF, msAUNet, etc. which are similar to the method of the present invention using multi-scale feature fusion, and H-DenseUNet, multiple UNet, etc. which are well-known algorithms in the field of liver segmentation. The liver segmentation result of the method is far higher than that of a classical segmentation algorithm including FCN, and the average Dice value of the overlapped region of the segmentation result and the real segmentation reaches 96.4 percent and is 4.3 percent higher than that of a classical model UNet. Since the invention obtains the result similar to the 3D method under the condition of adopting the 2D model, the method has strong competitiveness in the liver segmentation task.
TABLE 2 comparison with existing Process
Fig. 4 is a result of experimental image segmentation on a dataset LiTS according to the present invention, wherein (a) CT pictures; (b) a reference segmentation criterion; (c) original model segmentation results; (d) adding the self-attention module model segmentation result; (e) adding the improved convolution module model segmentation result; after the self-attention mechanism is added, the error prediction of the region outside the liver in the prediction can be relieved to a certain extent, partial false positive prediction results are eliminated, and the effect of the self-attention mechanism in the liver segmentation task is further proved. And an MRA module with a perfect channel attention mechanism is further added, most false positive predictions are successfully eliminated through enhancement or inhibition of semantic features on channel dimensions, and meanwhile, the segmentation edge is closer to a real segmentation result.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.
Claims (5)
1. The liver CT image segmentation method based on the fusion of global self-attention and multi-scale features is characterized by comprising the following specific operation steps of:
step1, image preprocessing: processing the CT image in the LiTS data set according to the HU value range to increase the contrast, and then expanding the data set;
step2, acquiring the same dimension characteristic and the multi-scale characteristic: after the Step1 preprocessing operation, extracting image features by using a ResNeXt convolutional neural network, and obtaining convolution features of uniform dimensions and multi-scale features based on the convolution features through linear transformation;
step3, obtaining a global self-attention fusion characteristic: obtaining a self-attention fusion feature containing global information through a global self-attention module Non-Local by using the multi-scale feature obtained in Step2 so as to capture the relation between the target feature and the surrounding features;
and Step4, extracting the features of the self-attention fusion features obtained in Step3 through an improved convolution module, highlighting the effect of important semantic features in channel dimensions, and finally performing up-sampling to obtain a segmentation result.
2. The liver CT image segmentation method based on the fusion of the global self-attention and the multi-scale features as claimed in claim 1, wherein Step1 comprises the following specific steps:
step1.1 processing the CT image in the LiTS data set according to the HU value range corresponding to the liver organ to increase the contrast; processing according to the CT value ranging from-130 HU to 230HU, namely the window width 360HU and the window level 50HU, and then performing normalization operation on the processed CT image;
the Step1.2 data expansion adopts the modes of random horizontal turning, vertical turning, zooming and cutting to carry out data enhancement; after random expansion, the data are divided, wherein 82% is used as a training set, the rest 18% is used as a test set, and the training set is further divided into training data and verification data according to the proportion of 8.
3. The liver CT image segmentation method based on the fusion of the global self-attention and the multi-scale features as claimed in claim 1, wherein Step2 comprises the following specific steps:
after image preprocessing, step2.1 uses the first five layers of a ResNeXt-101 network as feature extraction layers, the convolution in each ResNeXt block is divided into 32 paths, the middle channel dimension processed by each path is 4, different paths are equivalent to different feature subspaces and used for extracting different semantic features, meanwhile, the convolution kernels of different paths have sparser relations, and the risk of overfitting is reduced;
step2.2 unifies the channel dimensions of Layer 1-4 output results in the ResNeXt network structure into 64 through linear transformation, and upsamples the feature map size to keep consistent with Layer 1. And splicing the four features, compressing the spliced features to 64 through a 1 × 1 convolution to obtain the multi-scale features, wherein the number of the feature channels and the size of the feature map are consistent with the dimension of the features processed by Layer 1-4.
4. The liver CT image segmentation method based on the fusion of the global self-attention and the multi-scale features as claimed in claim 1, wherein Step3 comprises the following specific steps:
certain relation exists among different organs in a Step3.1 abdominal CT image, and the liver organ segmentation effect can be improved by acquiring the relation; inspired by the idea of calculating the correlation between the current position and other positions in the image in a non-local mean algorithm, starting from the multi-scale feature obtained by Step2, performing linear mapping for three times respectively to obtain Key, query and Value embedded space features, wherein the linear mapping is realized by adopting 1 × 1 convolution;
step3.2 calculates the similarity between the characteristic Key and Query, and the function for calculating the correlation is obtained according to a Gaussian function selected by a non-local mean value, and the calculation formula is as follows:wherein x i I, inputting the ith position of the feature map, wherein j represents all positions possibly related to i, and weighting the calculated similarity on Value to obtain the self-attention feature;
step3.3 obtains the output of self-attention weight from the attention characteristic through a Softmax layer, thereby integrating the learned long-distance dependency relationship into the output characteristic, and the overall calculation formula is as follows:
where C (x) is a Softmax normalization function, function g linearly maps the representation of the input j position, typically by 1 × 1 convolution, and function f calculates the correlation of the input ith position with the jth position.
5. The liver CT image segmentation method based on the fusion of the global self-attention and the multi-scale features as claimed in claim 1, wherein Step4 comprises the following specific steps:
step4.1 extracting fusion characteristics containing multi-scale information and self-attention relation from Step3, and further extracting information in the fusion characteristics through an improved convolution module; the multi-scale self-attention fusion feature is subjected to 1 × 1 convolution, a feature channel is mapped to a specified dimension, and then the feature sum of the multi-scale self-attention fusion feature and the specified dimension is obtained through 1 × 1 convolution and 3 × 3 convolution;
step4.2 uses an attention module acting on channel dimension to perform channel recalibration on the characteristic channel, uses a residual path to fuse the original characteristic and the channel attention characteristic to obtain the output characteristic of the residual module, and the specific calculation is as shown in a formula:
Y MRA (X)=Y CA (W L X+W E X)+X
wherein, Y MRA (X) denotes a multi-level residual attention convolution operation, X denotes an input feature, W L The convolution matrix is 1 multiplied by 1 and is used for linear mapping of original input and is equivalent to a residual path; w is a group of E Is a 3 x 3 convolution matrix for feature extraction of the input features, Y CA Indicating channel attention operation;
the characteristics extracted by Step4.3 adopt a multi-path parallel idea of ensemble learning to obtain four groups of segmentation outputs, and the four groups of outputs are calculated and averaged to be used as a final output result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211064580.1A CN115457051A (en) | 2022-08-31 | 2022-08-31 | Liver CT image segmentation method based on global self-attention and multi-scale feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211064580.1A CN115457051A (en) | 2022-08-31 | 2022-08-31 | Liver CT image segmentation method based on global self-attention and multi-scale feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115457051A true CN115457051A (en) | 2022-12-09 |
Family
ID=84299992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211064580.1A Pending CN115457051A (en) | 2022-08-31 | 2022-08-31 | Liver CT image segmentation method based on global self-attention and multi-scale feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115457051A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115984574A (en) * | 2023-03-20 | 2023-04-18 | 北京航空航天大学 | Image information extraction model and method based on cyclic transform and application thereof |
CN116152278A (en) * | 2023-04-17 | 2023-05-23 | 杭州堃博生物科技有限公司 | Medical image segmentation method and device and nonvolatile storage medium |
CN116248959A (en) * | 2023-05-12 | 2023-06-09 | 深圳市橙视科技发展有限公司 | Network player fault detection method, device, equipment and storage medium |
CN116681958A (en) * | 2023-08-04 | 2023-09-01 | 首都医科大学附属北京妇产医院 | Fetal lung ultrasonic image maturity prediction method based on machine learning |
-
2022
- 2022-08-31 CN CN202211064580.1A patent/CN115457051A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115984574A (en) * | 2023-03-20 | 2023-04-18 | 北京航空航天大学 | Image information extraction model and method based on cyclic transform and application thereof |
CN115984574B (en) * | 2023-03-20 | 2023-09-19 | 北京航空航天大学 | Image information extraction model and method based on cyclic transducer and application thereof |
CN116152278A (en) * | 2023-04-17 | 2023-05-23 | 杭州堃博生物科技有限公司 | Medical image segmentation method and device and nonvolatile storage medium |
CN116248959A (en) * | 2023-05-12 | 2023-06-09 | 深圳市橙视科技发展有限公司 | Network player fault detection method, device, equipment and storage medium |
CN116681958A (en) * | 2023-08-04 | 2023-09-01 | 首都医科大学附属北京妇产医院 | Fetal lung ultrasonic image maturity prediction method based on machine learning |
CN116681958B (en) * | 2023-08-04 | 2023-10-20 | 首都医科大学附属北京妇产医院 | Fetal lung ultrasonic image maturity prediction method based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113077471B (en) | Medical image segmentation method based on U-shaped network | |
CN110930397B (en) | Magnetic resonance image segmentation method and device, terminal equipment and storage medium | |
CN115457051A (en) | Liver CT image segmentation method based on global self-attention and multi-scale feature fusion | |
CN112927255B (en) | Three-dimensional liver image semantic segmentation method based on context attention strategy | |
CN111784671A (en) | Pathological image focus region detection method based on multi-scale deep learning | |
CN112270666A (en) | Non-small cell lung cancer pathological section identification method based on deep convolutional neural network | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN110853011B (en) | Method for constructing convolutional neural network model for pulmonary nodule detection | |
CN113569724B (en) | Road extraction method and system based on attention mechanism and dilation convolution | |
CN115393584A (en) | Establishment method based on multi-task ultrasonic thyroid nodule segmentation and classification model, segmentation and classification method and computer equipment | |
KR20220144687A (en) | Dual attention multiple instance learning method | |
CN115457057A (en) | Multi-scale feature fusion gland segmentation method adopting deep supervision strategy | |
CN112750137A (en) | Liver tumor segmentation method and system based on deep learning | |
CN114511523B (en) | Gastric cancer molecular subtype classification method and device based on self-supervision learning | |
CN116363081A (en) | Placenta implantation MRI sign detection classification method and device based on deep neural network | |
CN116825363B (en) | Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network | |
CN117611599B (en) | Blood vessel segmentation method and system integrating centre line diagram and contrast enhancement network | |
CN114693671A (en) | Lung nodule semi-automatic segmentation method, device, equipment and medium based on deep learning | |
CN112489062B (en) | Medical image segmentation method and system based on boundary and neighborhood guidance | |
CN117409201A (en) | MR medical image colorectal cancer segmentation method and system based on semi-supervised learning | |
CN116884597A (en) | Pathological image breast cancer molecular typing method and system based on self-supervision pre-training and multi-example learning | |
CN115131628A (en) | Mammary gland image classification method and equipment based on typing auxiliary information | |
CN111598144B (en) | Training method and device for image recognition model | |
Salunkhe et al. | Rapid tri-net: breast cancer classification from histology images using rapid tri-attention network | |
Sun et al. | DARMF-UNet: A dual-branch attention-guided refinement network with multi-scale features fusion U-Net for gland segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |