CN115661462A - Medical image segmentation method based on convolution and deformable self-attention mechanism - Google Patents

Medical image segmentation method based on convolution and deformable self-attention mechanism Download PDF

Info

Publication number
CN115661462A
CN115661462A CN202211422579.1A CN202211422579A CN115661462A CN 115661462 A CN115661462 A CN 115661462A CN 202211422579 A CN202211422579 A CN 202211422579A CN 115661462 A CN115661462 A CN 115661462A
Authority
CN
China
Prior art keywords
convolution
attention mechanism
medical image
deformable self
image segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211422579.1A
Other languages
Chinese (zh)
Inventor
高宇飞
马自行
石磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202211422579.1A priority Critical patent/CN115661462A/en
Publication of CN115661462A publication Critical patent/CN115661462A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The invention is applicable to the field of medical image segmentation, and provides a medical image segmentation method based on convolution and a deformable self-attention mechanism, which comprises the following steps of S1: preprocessing and data enhancing the CT image; s2: constructing a U-Net architecture model Med-CaDA based on convolution and a deformable self-attention mechanism; s3: adopting a Dice loss function for the U-Net architecture model constructed in the step S2; s4: training the U-Net architecture model constructed in the step S2 by adopting an Adam optimization algorithm; s5: the segmentation accuracy was measured using two indices, dice score and 95% Hausdorff distance. The invention constructs a segmentation model of a U-Net framework, and provides a CaDA block based on convolution and a deformable self-attention mechanism, so that the advantage of convolution in local information extraction is retained, the capability of the deformable self-attention mechanism for capturing global dependence is utilized, and good medical image segmentation precision is realized under the condition of low calculation amount.

Description

Medical image segmentation method based on convolution and deformable self-attention mechanism
Technical Field
The invention belongs to the technical field of medical image segmentation, and particularly relates to a medical image segmentation method based on convolution and a deformable self-attention mechanism.
Background
With the continuous progress of medical image technology and computer vision technology, medical image analysis has become an indispensable tool and technical means in medical research and clinical disease diagnosis. As an important component of medical image analysis, medical image segmentation plays a very important role in clinical diagnosis and treatment.
In recent years, the full convolution neural network architecture U-Net has a dominant position in medical image segmentation, and the U-Net has achieved great success in various medical image segmentation tasks by constructing an asymmetric encoder-decoder architecture with jump connection. The encoder consists of a series of convolutional layers and downsampling layers and is used for extracting deep layer characteristics; the decoder upsamples the deep features to the same size as the input and fuses the features of different scales obtained by the encoder process through skip-join to compensate for the spatial information loss brought by the downsampling process. Inspired by U-Net design, res-UNet, att-UNet, U-Net + + and UNet3+ and other methods are developed for segmentation of organ lesions, and the excellent performance proves that the convolutional neural network has strong characteristic learning capability.
However, the lack of ability to capture long range dependencies makes it difficult for convolutional neural network-based methods to meet the segmentation accuracy requirements for medical applications. Transformers have achieved breakthrough in some areas of computer vision, resulting in a series of excellent performance methods, such as ViT, PVT, swin Transformer, and others. With this inspiring, transformers are increasingly being applied in the field of medical image segmentation. Recently, the Deformable Attention Transformer overcomes the defect that critical related information may be lost by methods such as PVT and Swin Transformer by designing a Deformable self-Attention mechanism, and obtains optimal precision on ImageNet data sets.
Disclosure of Invention
The invention provides a medical image segmentation method based on convolution and a deformable self-attention mechanism, and aims to solve the problem that a convolution neural network lacks long-distance dependence capture capability.
The invention is realized in such a way that a medical image segmentation method based on convolution and a deformable self-attention mechanism comprises the following steps:
step S1: preprocessing and data enhancing the CT image;
step S2: constructing a U-Net architecture model Med-CaDA based on convolution and a deformable self-attention mechanism;
and step S3: adopting a Dice loss function for the U-Net architecture model constructed in the step S2;
and step S4: training the U-Net architecture model constructed in the step S2 by adopting an Adam optimization algorithm;
step S5: and measuring the segmentation accuracy by using two indexes of Dice Score and 95% Hausdorff distance.
Further, in the step S1, the preprocessing is standardized by using a Z-score method, and the data enhancement includes padding, random cropping, random flipping, and random intensity shifting.
Further, in step S2, the U-Net architecture model is composed of an encoder, a decoder and a jump connection, the encoder includes an Embedding layer, a CaDA block composed of convolution and a deformable self-attention mechanism, and a down-sampling layer, and the decoder includes an up-sampling layer, a CaDA block, and an extension layer.
Further, the Embedding layer is used for mapping the input image to a multi-dimensional space, and the output size is
Figure BDA0003942585320000021
Further, the CaDA block is used for extracting local global information and comprises a reverse residual module and a deformable self-attention mechanism, wherein the reverse residual module is composed of convolutions, and the bottleneck residual module is composed of two 1 × 1 × 1 convolutions and a depth separable convolution and is used for extracting local information and ensuring that the sizes of input and output feature graphs are consistent. The bottleneck residual module can be expressed as:
Bottleneck(X)=Conv(F(Conv(X))) (1)
F(X)=DWConv(X)+X (2)
where Conv () stands for convolution, DWConv () stands for deep separable convolution, and additionally BatchNorm normalization and the GELU activation function are omitted.
The deformable self-attention mechanism is a variant of the self-attention mechanism, does not consume a large amount of computation while realizing the global dependency capture, and is assumed to have H multiplied by W multiplied by D multiplied by C input and H multiplied by W multiplied by D multiplied by C output, so that the input and output dimensions are consistent.
Further, the down-sampling layer and the up-sampling layer are respectively 2 × 2 × 2 convolution and 2 × 2 × 2 deconvolution.
Furthermore, the expansion layer adopts 4 × 4 × 4 deconvolution with step size of 4 to obtain a feature map
Figure BDA0003942585320000031
Figure BDA0003942585320000032
And restoring to the original image H multiplied by W multiplied by D multiplied by K, wherein K is the number of the final segmentation classes.
Further, in step S3, the Dice loss function is implemented by the following formula:
Figure BDA0003942585320000033
| A ≦ B | represents the intersection between A and B, and | A | and | B | represent the number of elements of A and B, respectively.
Further, in the step S4, the learning rate of the Adam optimization algorithm is set to be 1e-4.
Further, in step S5, the Dice score index is:
Figure BDA0003942585320000034
the 95% Hausdorff distance index is:
Figure BDA0003942585320000035
where distance (a, B) is the distance between two points of ab, and A and B are two irregular areas.
Compared with the prior art, the invention has the advantages that: the invention provides a medical image segmentation method based on convolution and a deformable self-attention mechanism, which fully extracts local and global context information by utilizing the convolution and a Transformer attention mechanism. The method greatly improves the segmentation precision of the medical image on the premise of keeping lower calculation amount.
Drawings
FIG. 1 is a schematic diagram of the process steps of the present invention;
FIG. 2 is a model diagram of a U-Net architecture model Med-CaDA in the present invention;
FIG. 3 is a block model diagram of the CaDA of the present invention;
FIG. 4 is a flow chart of the deformable self-attentive mechanism of the present invention;
Detailed Description
In order to clearly understand the above objects, technical solutions and effects of the present invention, the present invention will be clearly and completely described in the following with reference to the accompanying drawings and the detailed description. It should be noted that in the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.
Referring to fig. 1, the present invention provides a technical solution: a medical image segmentation method based on convolution and a deformable self-attention mechanism comprises the following steps:
step S1: preprocessing and data enhancing the CT image;
in this embodiment, the pre-processing is normalized using the method of Z-score to enhance the foreground and background discrimination. The implementation formula is as follows:
Figure BDA0003942585320000041
where μ is the mean of all sample data and σ is the standard deviation of all sample data.
The data enhancement comprises filling, random cutting, random overturning and random intensity deviation, specifically, the image with the input size of 240 × 240 × 155 is filled to 240 × 240 × 160, then the image is randomly cut to the size of 128 × 128 × 128, and finally the random overturning and the random intensity deviation are sequentially adopted, so that the problem that the data size is too small is solved, and the generalization capability of the model is improved.
Step S2: constructing a U-Net architecture model Med-CaDA based on convolution and a deformable self-attention mechanism;
in this embodiment, the U-Net architecture model Med-CaDA is composed of an encoder, a decoder and a jump connection as shown in fig. 2, the encoder includes an Embedding layer, a CaDA block composed of convolution and a deformable self-attention mechanism, and a down-sampling layer, and the decoder includes an up-sampling layer, a CaDA block, and an extension layer.
The Embedding layer is used for mapping an original image with an input size of H multiplied by W multiplied by D multiplied by 4 to a multi-dimensional space, and the output size is
Figure BDA0003942585320000051
The convolution sequence includes 3 × 3 × 3 convolution with 2 padding 1, 3 × 3 × 3 convolution with 1 padding 1, 3 × 3 × 3 convolution with 2 padding 1, and 3 × 3 × 3 convolution with 1 padding 1.
The CaDA block is used for extracting local and global information, and comprises a bottleneck residual module and a deformable self-attention mechanism, wherein the bottleneck residual module is composed of a 1 × 1 × 1 convolution with a step size of 1, a 3 × 3 × 3 deep separable convolution and a 1 × 1 × 1 convolution with a step size of 1 in sequence, and is used for extracting local information and ensuring that the sizes of input and output feature maps are consistent. With the first 1 x 1 convolution responsible for halving the dimension and the second 1 x 1 convolution responsible for doubling the dimension.
The bottleneck residual module can be expressed as:
Bottleneck(X)=Conv(F(Conv(X))) (2)
F(X)=DWConv(X)+X (3)
where Conv () stands for convolution, DWConv () stands for depth separable convolution, the first Conv () is used to reduce dimensionality to the original
Figure BDA0003942585320000052
(R =4 in practical use), the second Conv () reduces the reduced dimensionality, thereby reducing the parameters. Additionally, batchNorm normalization and the GELU activation function are omitted.
The deformable self-attention mechanism is a variant of the self-attention mechanism, does not consume a large amount of calculation while capturing global dependence, and is assumed to have input of H multiplied by W multiplied by D multiplied by C and output of H multiplied by W multiplied by D multiplied by C, so that the input and output dimensions are consistent.
As shown in fig. 4, specifically: for input X, first a set of grid points P e H is generated for it g ×W g ×D g X3, learning the Offset delta P of the grid point P through the Offset network by using the Query key Q (in order to prevent the delta P from being too large, the size of the delta P is controlled by using controllable parameters s and tanh ()), and then obtaining a sampling point X on the original image X by using bilinear interpolation sampled Finally, generating Key and Value keys according to the sampling points, then calculating the self-attention together with the Query keys, and outputting X O
Can be expressed as:
Q=XW q (4)
ΔP=stanh(Offset(Q)) (5)
X sampled =BI(X,P+ΔP) (6)
K=X sampled W k ;V=X sampled W v (7)
Figure BDA0003942585320000061
X o =ZW o (9)
wherein Offset () is an Offset learning network consisting of a deep separable convolution and a 1 x 1 convolution for generatingHas a size of H g ×W g ×D g Offset Δ P of × 3, BI represents bilinear interpolation.
The down-sampling layer and the up-sampling layer are respectively 2 × 2 × 2 convolution with step size of 2 and 2 × 2 × 2 deconvolution with step size of 2, and the down-sampling layer combines the feature map
Figure BDA0003942585320000062
Cascaded down-sampling to
Figure BDA0003942585320000063
Figure BDA0003942585320000064
Upsampling a feature map from
Figure BDA0003942585320000065
Cascaded upsampling back to
Figure BDA0003942585320000066
And (4) size.
The extension layer deconvolves the feature map by 4 × 4 × 4 with step size 4
Figure BDA0003942585320000067
And restoring to the original image H multiplied by W multiplied by D multiplied by K, wherein K is the number of the final segmentation classes.
And step S3: adopting a Dice loss function for the U-Net architecture model constructed in the step S2;
in this embodiment, the Dice loss function is implemented by the following formula:
Figure BDA0003942585320000071
| A ≧ B | represents the intersection between A and B, and | A | and | B | represent the numbers of elements of A and B, respectively.
In the training process, the Dice loss function can judge the difference degree between the estimated value and the reality so as to modify the network weight value to ensure that the estimated value is real.
And step S4: training the U-Net architecture model constructed in the step S2 by adopting an Adam optimization algorithm;
in this embodiment, the learning rate of the Adam optimization algorithm is set to 1e-4, the training round is 800, and the size of the batch size is 4. The overfitting was mitigated with L2 regularization, with the weight decay set to 1e-5. In addition, a cosine learning rate attenuation strategy is adopted, and the optimal solution can be achieved in the training process.
Step S5: and measuring the segmentation accuracy by using two indexes, namely Dice Score and 95% Hausdorff distance.
In this embodiment, the Dice score index is:
Figure BDA0003942585320000072
the 95% Hausdorff distance index is:
Figure BDA0003942585320000073
wherein distance (a, B) is the distance between two points of ab, and A and B are two irregular areas.
The method applies the deformable self-attention mechanism Transformer to the medical image segmentation task for the first time, and combines the residual convolution module to provide the CaDA block, thereby not only ensuring the local information extraction capability of the model, but also realizing the overall dependence capture of the model. Finally, experiments on a BraTS2020 data set prove that compared with a classical U-Net architecture model, the model has very large improvement on two indexes in an ET (entry-time), WT (time-to-live) and TC (time-to-time) area, and compared with the current most advanced method, the model can achieve a segmentation effect equivalent to the calculation amount under the condition of lower calculation amount. The segmentation results are given in table 1 below:
TABLE 1 comparison of segmentation results
Figure BDA0003942585320000081
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (10)

1. A medical image segmentation method based on convolution and a deformable self-attention mechanism is characterized in that: the method comprises the following steps:
step S1: preprocessing and data enhancing the CT image;
step S2: constructing a U-Net architecture model Med-CaDA based on convolution and a deformable self-attention mechanism;
and step S3: adopting a Dice loss function for the U-Net architecture model constructed in the step S2;
and step S4: training the U-Net architecture model constructed in the step S2 by adopting an Adam optimization algorithm;
step S5: the segmentation accuracy was measured using two indices, dice Score and 95% Hausdorff distance.
2. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 1 wherein: in the step S1, the preprocessing adopts a Z-score method for standardization, and the data enhancement comprises filling, random cutting, random overturning and random intensity deviation.
3. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 1 wherein: and S2, the U-Net architecture model is formed by connecting an encoder, a decoder and a jump, wherein the encoder comprises an Embedding layer, a CaDA block formed by convolution and a deformable self-attention mechanism, and a down-sampling layer, and the decoder comprises an up-sampling layer, a CaDA block and an expansion layer.
4. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 3 wherein: the Embedding layer is used for mapping the input image to a multi-dimensional space, and the output size is
Figure FDA0003942585310000011
5. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 3 wherein: the CaDA block is used for extracting local global information and comprises a bottleneck residual error module and a deformable self-attention mechanism, wherein the bottleneck residual error module is composed of two convolutions of 1 multiplied by 1 and a depth separable convolution, and the bottleneck residual error module is used for extracting local information and ensuring that the sizes of input and output feature graphs are consistent. The bottleneck residual module can be expressed as:
Bottleneck(X)=Conv(F(Conv(X))) (1)
F(X)=DWConv(X)+X (2)
where Conv () represents a convolution and DWConv () represents a depth separable convolution.
6. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 3 wherein: the down-sampling layer and the up-sampling layer are respectively 2 × 2 × 2 convolution and 2 × 2 × 2 deconvolution.
7. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 4 wherein: the expansion layer adopts 4 × 4 × 4 deconvolution to make feature map from
Figure FDA0003942585310000021
And restoring to the original image H multiplied by W multiplied by D multiplied by K, wherein K is the number of the final segmentation classes.
8. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 1 wherein: the step S3 is realized by the following formula:
Figure FDA0003942585310000022
| A ≦ B | represents the intersection between A and B, and | A | and | B | represent the number of elements of A and B, respectively.
9. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 1 wherein: and S4, setting the learning rate of the Adam optimization algorithm to be 1e-4.
10. A method for medical image segmentation based on convolution and deformable self-attention mechanism as claimed in claim 1 wherein: the Dice score index of step S5 is:
Figure FDA0003942585310000023
95% Hausdorff distance index:
Figure FDA0003942585310000024
wherein distance (a, B) is the distance between two points of ab, and A and B are two irregular areas.
CN202211422579.1A 2022-11-14 2022-11-14 Medical image segmentation method based on convolution and deformable self-attention mechanism Pending CN115661462A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211422579.1A CN115661462A (en) 2022-11-14 2022-11-14 Medical image segmentation method based on convolution and deformable self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211422579.1A CN115661462A (en) 2022-11-14 2022-11-14 Medical image segmentation method based on convolution and deformable self-attention mechanism

Publications (1)

Publication Number Publication Date
CN115661462A true CN115661462A (en) 2023-01-31

Family

ID=85020547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211422579.1A Pending CN115661462A (en) 2022-11-14 2022-11-14 Medical image segmentation method based on convolution and deformable self-attention mechanism

Country Status (1)

Country Link
CN (1) CN115661462A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030260A (en) * 2023-03-27 2023-04-28 湖南大学 Surgical whole-scene semantic segmentation method based on long-strip convolution attention

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030260A (en) * 2023-03-27 2023-04-28 湖南大学 Surgical whole-scene semantic segmentation method based on long-strip convolution attention

Similar Documents

Publication Publication Date Title
CN113888744B (en) Image semantic segmentation method based on transform visual up-sampling module
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
CN109886871B (en) Image super-resolution method based on channel attention mechanism and multi-layer feature fusion
CN110992270A (en) Multi-scale residual attention network image super-resolution reconstruction method based on attention
CN111476719B (en) Image processing method, device, computer equipment and storage medium
CN109255755A (en) Image super-resolution rebuilding method based on multiple row convolutional neural networks
CN110689483B (en) Image super-resolution reconstruction method based on depth residual error network and storage medium
CN110223304B (en) Image segmentation method and device based on multipath aggregation and computer-readable storage medium
CN112330724B (en) Integrated attention enhancement-based unsupervised multi-modal image registration method
CN111402128A (en) Image super-resolution reconstruction method based on multi-scale pyramid network
CN110288524B (en) Deep learning super-resolution method based on enhanced upsampling and discrimination fusion mechanism
CN110473151B (en) Partition convolution and correlation loss based dual-stage image completion method and system
CN116309070A (en) Super-resolution reconstruction method and device for hyperspectral remote sensing image and computer equipment
CN104899835A (en) Super-resolution processing method for image based on blind fuzzy estimation and anchoring space mapping
CN115661462A (en) Medical image segmentation method based on convolution and deformable self-attention mechanism
CN115082675A (en) Transparent object image segmentation method and system
CN111951164A (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
Li et al. Semantic point cloud upsampling
Zhang et al. Multi-context and enhanced reconstruction network for single image super resolution
CN114821100A (en) Image compressed sensing reconstruction method based on structural group sparse network
CN116797461A (en) Binocular image super-resolution reconstruction method based on multistage attention-strengthening mechanism
CN115311184A (en) Remote sensing image fusion method and system based on semi-supervised deep neural network
TW202409963A (en) Method and apparatus for generating high-resolution image, and a non-transitory computer-readable medium
CN115936992A (en) Garbage image super-resolution method and system of lightweight transform
CN115713462A (en) Super-resolution model training method, image recognition method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination