CN117274147A - Lung CT image segmentation method based on mixed Swin Transformer U-Net - Google Patents
Lung CT image segmentation method based on mixed Swin Transformer U-Net Download PDFInfo
- Publication number
- CN117274147A CN117274147A CN202211412454.0A CN202211412454A CN117274147A CN 117274147 A CN117274147 A CN 117274147A CN 202211412454 A CN202211412454 A CN 202211412454A CN 117274147 A CN117274147 A CN 117274147A
- Authority
- CN
- China
- Prior art keywords
- swin
- module
- lung
- image
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 210000004072 lung Anatomy 0.000 title claims abstract description 26
- 238000003709 image segmentation Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 29
- 230000011218 segmentation Effects 0.000 claims abstract description 24
- 208000032376 Lung infection Diseases 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 239000000203 mixture Substances 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 18
- 230000003044 adaptive effect Effects 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 241000764238 Isis Species 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 230000006698 induction Effects 0.000 abstract description 2
- 238000002591 computed tomography Methods 0.000 description 30
- 238000003745 diagnosis Methods 0.000 description 5
- 208000015181 infectious disease Diseases 0.000 description 4
- 230000001788 irregular Effects 0.000 description 4
- 238000003757 reverse transcription PCR Methods 0.000 description 4
- 208000001528 Coronaviridae Infections Diseases 0.000 description 3
- 230000003902 lesion Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 208000019693 Lung disease Diseases 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 230000002685 pulmonary effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 101100545275 Mus musculus Znf106 gene Proteins 0.000 description 1
- 101150071746 Pbsn gene Proteins 0.000 description 1
- 206010056342 Pulmonary mass Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000005337 ground glass Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000002697 interventional radiology Methods 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000037841 lung tumor Diseases 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000004368 synchrotron infrared microspectroscopy Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30061—Lung
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Apparatus For Radiation Diagnosis (AREA)
Abstract
The invention relates to a lung CT image segmentation method based on a mixture Swin Transformer U-Net. The invention comprises data preprocessing and data enhancement; constructing a segmentation model HySwinUNet; setting a training strategy and a loss function, and training the model; verifying the trained model; by constructing the HySwinUNet model and combining convolution and a Transformer, adding a pre-activated residual module, avoiding large-scale preprocessing by utilizing the induction deviation of a convolution image, directly transmitting information to any other module by one module in the forward and backward propagation of a network, reducing training load and enabling the network to obtain better training; and the self-adaptive attention module is used for acquiring the multi-scale global features by integrating two attention mechanisms, so that the weight ratio of the features of the target region is improved. The invention combines the Swin transducer and the U-Net to enhance the functionality and flexibility of the traditional encoder-decoder architecture, realize the automatic segmentation of the lung infection part of the lung CT, and can accurately segment the lung infection area from the CT image.
Description
Technical Field
The invention belongs to the technical field of image segmentation, and relates to a lung CT image segmentation method based on mixing Swin Transformer U-Net.
Background
Medical images play a critical role in helping healthcare providers contact patients for diagnosis and treatment. Study medical images are primarily dependent on the radiologist's visual interpretation. However, this typically takes a lot of time and is very subjective depending on the experience of the radiologist. To overcome these limitations, the use of computer-aided systems has become necessary. Computerization of medical image segmentation plays an important role in medical imaging applications. It has wide application in different fields of diagnosis, pathological positioning, anatomical structure research, treatment planning, computer integrated operation, etc. However, variability and complexity of human anatomy has led to medical image segmentation as still a problem.
The current standard for diagnosing covd-19 is the real-time reverse transcription polymerase chain reaction (RT-PCR) swab test. However, the diagnostic results of RT-PCR require several hours to process, and the false negative rate of the assay is high, often requiring repeated assays. Compared to RT-PCR, chest Computed Tomography (CT) imaging enables efficient disease screening of covd-19 with high sensitivity and ease of use in a clinical setting.
The application of the deep learning technology in medical diagnosis can improve the detection rate and efficiency of diseases, and has great success in the field of medical image recognition. In order to diagnose lung cancer, lung tumor and lung nodule, many scholars have studied a lung CT image recognition method based on deep learning, and CT image recognition has proven to be very useful for diagnosis of lung diseases. This is critical for quantification and diagnosis of lung disease (including covd-19) if the lung infected area can be accurately segmented from CT images.
However, accurate segmentation of lung infection lesions on CT images remains a challenging task based on the fact: 1. on CT images, the infected boundaries are irregular, different in size and shape, and have the characteristics of blurred appearance and low contrast. This can easily lead to missing some small ground glass lesions, or over-segmentation of infection on CT images; 2. absent the marker dataset, large-scale infection annotations provided by clinicians are not readily available.
Disclosure of Invention
The invention aims to provide a lung CT image segmentation method based on a mixture Swin Transformer U-Net, which is used for accurately segmenting a lung infection area from a CT image.
The method specifically comprises the following steps:
step one, data preprocessing and data enhancement:
collecting a large number of public lung infection CT images, performing data enhancement, expanding the number of samples, and normalizing the images; as a training set of models, for training the models; the data enhancement is specifically: the image is subjected to random cropping, inversion, rotation, scaling and offset processing.
Step two, constructing a segmentation model HySwinUNet:
the method comprises the steps that a partitioning model HySwinUNet is built based on a coder-decoder structure of U-Net, the partitioning model HySwinUNet comprises a coder, an adaptive attention module, a decoder and a jump connection, a basic unit of the HySwinUNet is a Swin converter module (Swin Transformer Block), and the Swin converter is used as a backbone network of the U-Net;
in the encoder, an input image is divided into 4×4 small blocks (patches) by block division (Patch division), and after Linear Embedding (Linear Embedding), vectors are usedThe dimension of (2) will become a preset value; dimension C, resolution isIs fed into two successive Swin converters for performing a characterization learning, in which the feature dimensions and resolution remain unchanged; the Swin converter module is responsible for feature representation learning, performs block Merging (Patch Merging) after learning is completed, downsamples and increases dimensions, reduces the space size by 1/2, and increases the feature dimensions to the original two times, so that a hierarchical design is formed; the above procedure will be repeated three times in the encoder, passing the pre-activated remaining modules (PRBs) in advance during each layer propagation;
in the encoding process, a self-adaptive attention module (ADM) is adopted to locate the characteristic information of the region of interest (RoI), the region characteristic information without correlation is restrained, the characteristic information is effectively extracted, and the focus region is more accurately segmented; thereby improving the weight ratio of the characteristics of the target area and improving the network segmentation precision;
constructing a symmetric decoder based on the Swin converter block; remolding the feature map of adjacent dimensions into a higher resolution feature map by upsampling and correspondingly reducing the feature dimensions to half of the original dimensions; the extracted context features are fused with the multi-scale features of the encoder through jump connection to compensate for the loss of spatial information caused by downsampling and recover valuable spatial information.
The pre-activation residual module is adopted at the inlet of the encoding stage and the outlet of the decoding stage, the pre-activation residual module initializes the transducer into a convolution network, and the local intensity characteristics are extracted by utilizing the convolution layer to avoid large-scale pretreatment of the transducer, so that the training of the Swin converter is easier; since the misclassified region is typically located on the boundary of the region of interest (RoI), high resolution context information plays a critical role in segmentation; the module sequentially executes element addition with the original input after twice continuous batch standardization BN, activation function ReLU and convolution operation Conv; firstly, performing Conv convolution operation through a ReLU layer, wherein the dimension and resolution of the feature map are not changed; the pre-activated remaining modules enable information to be smoother in the forward and backward propagation processes of the network;
the Swin converter modules are constructed based on moving windows, comprising two successive Swin converters, each Swin converter comprising a multi-headed self-attention Module (MSA) and a multi-layer perceptron (MLP), further employing a Layer Normalization (LN) layer prior to each MSA module and MLP module; on the basis of the multi-head self-attention module, the Swin converter provides a window-based multi-head self-attention module (W-MSA) and a moving window-based multi-head self-attention module (SW-MSA), and the calculation formula is as follows:
wherein the method comprises the steps ofAnd z l Respectively representing the output of the first layer W-MSA and the MLP; />And z l+1 Output of the first +1 layer SW-MSA and MLP are shown respectively; the self-attentiveness of the W-MSA and SM-MSA were calculated as: /> Wherein Q, K, < >>Representing a matrix of queries, keys, and values; m is M 2 And 3 represents the patch number of the window and the dimension of the query or key, respectively; the value of B is taken from the bias matrix +.>
The input channels of the adaptive attention module (ADM) are combined into dual-attention input through 3X 3 convolution with expansion rate (expansion rate) of 1 and 3, and different global information is found through two different attention mechanisms; the matrix with the size of C multiplied by H multiplied by W is obtained by obtaining channel data based on global average pooling (Global Average Pool) and Pixel-by-Pixel Correlation (Pixel-wise Correlation), and then is normalized by using a Sigmoid function after being combined through connection operation (connection); furthermore, more non-linear features are generated with fully connected layers (FC); finally, applying Softmax operation to the channel, the attention of the cross-channel can adaptively select receptive fields with different sizes.
Step three, setting a training strategy and a loss function, and training the model;
dividing the preprocessed data set into a training set, a testing set and a verification set; adopting random initialization and Adam optimization algorithm; setting BatchSize, epoch and proper learning rate, and simultaneously adopting a regularization strategy to prevent overfitting; updating weights and biases in the segmentation model HySwinUNet by using a back propagation algorithm; updating parameters by using a loss function in the training iteration process;
step four, verifying the trained network model: inputting the segmented verification set into a trained segmentation model HySwinUNet, segmenting a focus part in a lung CT image by an output result to obtain a segmented image, and evaluating the model by comparing the CT image segmented by an expert with the image segmented by the trained network model;
after verification, any lung CT image is input into a segmentation model HySwinUNet, and a lung CT image of the segmented focus is output.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
the invention effectively combines the Swin transducer and the U-Net to enhance the functionality and flexibility of the traditional encoder-decoder architecture, is applied to the field of medical image segmentation, realizes automatic segmentation of lung CT lung infection parts, and can accurately segment lung infection areas from CT images.
Because the transfomer has no inductive bias on the image, it performs poorly on small-scale datasets, even with pre-trained imagenets, the transfomer may not perform as well as a residual network; the HySwinUNet model combines convolution and a Transformer, adds a pre-activation residual module (PRB) module, utilizes the induction deviation of a convolution image to avoid large-scale preprocessing, can directly transfer information to any other module by one module in the forward and backward propagation of a network, reduces the training burden, and enables the network to obtain better training;
an Adaptive Dual-attention Module is used for acquiring multi-scale global features by integrating two attention mechanisms, so that the weight ratio of the features of a target region is improved, and the aim of more accurately dividing a focus region with irregular CT images and low contrast of novel coronavirus infection is achieved; on CT images, the infected area may have discontinuous boundaries and irregular shapes, and the images may have blurred appearance and low contrast; the information of the channel and the pixel is important information for acquiring the representative characteristics of the region of interest; therefore, an Adaptive Dual-attention Module (Adaptive Dual-attention Module) is used to extract feature information and improve the weight ratio of the features of the target region.
Drawings
Fig. 1 is a block diagram of the hyswinlunet network of the present invention;
FIG. 2 is a block diagram of the pre-activated remaining modules of the present invention;
FIG. 3 is an illustration of a Swin converter module implementation;
fig. 4 is an illustration of an adaptive attention module implementation.
Detailed Description
The invention is further described with reference to the accompanying drawings:
a lung CT image segmentation method based on a mixture Swin Transformer U-Net specifically comprises the following steps:
step one, data preprocessing and data enhancement;
the new coronavirus infection CT images collected by the italian medical and interventional radiology institute (SIRM) and the new coronavirus infection CT images in the MosMedData dataset were used for training models in this example; the image is subjected to random cutting, inversion, rotation, scaling, offset and other modes to enlarge the data set, increase the number of training samples and improve the robustness of the model; finally, normalizing the images;
step two, constructing a segmentation model HySwinUNet:
as shown in fig. 1, the partitioning model hyswinlunet is a U-Net based encoder-decoder architecture, comprising an encoder, an adaptive attention module, a decoder and a jump connection, the basic unit of hyswinlunet is a Swin converter module (Swin Transformer Block), with the Swin converter as the backbone network of the U-Net;
for the encoder, in the process of dividing the model, an input image is firstly divided into small blocks (Patch) with the size of 4 multiplied by 4 through block division (Patch), and after Linear Embedding (Linear Embedding), the dimension of a vector is changed into a preset value; dimension C, resolution isIs fed into two successive Swin converters for performing a characterization learning, in which process both the feature dimension and the resolution remain unchanged; the Swin converter module is responsible for feature representation learning, and the function of block Merging (Patch Merging) is to downsample and increase the dimension, reduce the space size by 1/2, and increase the feature dimension by two times as much as the original, thereby forming a hierarchical design; during the propagation of each layer, the remaining modules (Pre-activation Residual Block) are Pre-activated in advance; this procedure will be repeated three times in the encoder;
in the encoding process, in order to effectively extract the characteristic information, more accurately segment the focus area, an Adaptive Dual-attention Module (Adaptive Dual-attention Module) is adopted to locate the characteristic information of a region of interest (RoI), and the region characteristic information without correlation is restrained, so that the weight ratio of the characteristics of a target region is improved, and the network segmentation precision is improved;
for a decoder, constructing a symmetric decoder based on the Swin transformer block; remolding the feature map of adjacent dimensions into a higher resolution feature map by upsampling and correspondingly reducing the feature dimensions to half of the original dimensions; the extracted context features are fused with the multi-scale features of the encoder through jump connection so as to compensate the space information loss caused by downsampling and recover valuable space information;
as shown in fig. 2, the transfomer does not generalize bias to images and therefore performs poorly on small-scale datasets, even though the pre-trained ImageNet may not perform as well as the residual network; the HySwinUNet model combines the convolution and the force of the transducer, a Pre-activation residual module (Pre-activation Residual Block) initializes the transducer to a convolution network, and local intensity features are extracted by using a convolution layer to avoid large-scale preprocessing of the transducer, so that the training of the Swin converter is easier; since the misclassified region is typically located on the boundary of the region of interest (RoI), high resolution context information plays a critical role in segmentation; the module sequentially executes element addition with the original input after twice continuous batch standardization BN, activation function ReLU and convolution operation Conv; firstly, performing Conv convolution operation after passing through a ReLU layer, and not changing the dimension and resolution of the feature map; the pre-activated remaining modules enable information to be smoother in the forward and backward propagation processes of the network; pre-activation of the remaining modules is adopted at both the inlet of the encoding stage and the outlet of the decoding stage;
as shown in fig. 3, unlike the conventional Multi-head Self-attention Module (MSA), the Swin converter Module is constructed based on a moving window, and is composed of two consecutive Swin converters, each of which is composed of a Multi-head Self-attention Module (Multi-head Self-attention Module) and a Multi-layer perceptron (MLP), and further employs a Layer Normalization (LN) layer before each MSA Module and MLP Module; on the basis of the multi-head self-attention module, the Swin converter provides a window-based multi-head self-attention module (W-MSA) and a moving window-based multi-head self-attention module (SW-MSA), and the calculation formula is as follows: wherein (1)>And z l Respectively representing the output of the first layer W-MSA and the MLP; />And z l+1 Output of the first +1 layer SW-MSA and MLP are shown respectively; the self-attentiveness of the W-MSA and SM-MSA were: />Wherein Q, K, < >>Representing a matrix of queries, keys, and values; m is M 2 And 3 represents the patch number of the window and the dimension of the query or key, respectively; the value of B is taken from the bias matrix +.>
The infected area on the lung CT image may have discontinuous boundaries and irregular shapes, and the image may have a blurred appearance and low contrast; the information of the channel and the pixel is important information for acquiring the representative characteristics of the region of interest; therefore, an Adaptive Dual-attention Module (Adaptive Dual-attention Module) is used for extracting more comprehensive and distinguishing characteristic information, and the weight ratio of the characteristics of the target area is improved so as to identify the boundary of the focus; the module captures boundary discontinuities of a lung CT lesion by a global averaging pooling and processes shape irregularities by another pixel-by-pixel correlation;
the structure of the adaptive attention module is shown in fig. 4, and the input channels are convolved by 3×3 with expansion rates (expansion rates) of 1 and 3, respectively, and they are combined into dual-attention input, and different global information is found out by two different attention mechanisms; the matrix with the size of C multiplied by H multiplied by W is obtained by obtaining channel data based on global average pooling (Global Average Pool) and Pixel-by-Pixel Correlation (Pixel-wise Correlation), and then is normalized by using a Sigmoid function after being combined through connection operation (connection); furthermore, more non-linear features are generated with fully connected layers (FC); finally, applying Softmax operation to the channel and outputting a feature map with the original size; the attention of the cross-channel can adaptively select receptive fields of different sizes;
step three, setting a training strategy and a loss function, and training the model;
dividing the preprocessed data set into a training set, a testing set and a verification set in sequence according to the proportion of 5:3:2; adopting random initialization and Adam optimization algorithm; setting BatchSize, epoch and proper learning rate, and simultaneously adopting a regularization strategy to prevent overfitting; updating weights and biases in the network by using a back propagation algorithm in the HySwinUNet network model; updating parameters by using a loss function in the training iteration process;
training the HySwinUNet network model according to the set training strategy; in the training stage, hySwinUNet trains in an end-to-end manner using an objective function; updating parameters in the iteration process by using the loss function; in the selection of the Loss function, all networks are trained using a combination of Dice Loss (Dice Loss) and binary cross entropy Loss (Binary Cross Entropy Loss); thus, the loss function is Loss=αL Di7e +βL BCE ;
Where y is the true probability of sample i,is the predictive probability for sample i; l (L) Di7e And L BCE Respectively representing dice loss and binary cross entropy loss; loss represents the final Loss function, the dice Loss and the binary cross entropy Loss are combined in one term, and L is given Di7e The class imbalance problem can be better processed by more weights; alpha has a value of 0.9 and beta has a value of 0.1.
Step 4, verifying the trained network model;
sending the segmented verification set into a trained HySwinUNet network model, and outputting a result to segment a focus part in a lung CT image to obtain a segmented image, and evaluating the model by comparing the CT image segmented by an expert with the image segmented by the trained network model;
4 widely adopted evaluation criteria were used to measure the performance of the hyswinlunet model; the evaluation index is as follows:
dice similarity coefficient (Dice similarity coefficient): DSC is used to measure the similarity between predicted pulmonary infections and facts, where V Seg Represents the region divided by the model algorithm, V GT Representing a real segmentation area; TP, TN, FP, FN each represents a true positive, a true negative, a false positive, and a false negative;
sensitivity (Sensitivity): SEN represents the percentage of correctly segmented lung infections;
specificity (Specificity): SPE represents the percentage of non-infected areas that are correctly segmented;
positive predictive value (Precision): PRE represents the accuracy of the segmentation of the pulmonary infection area,
after verification, any lung CT image is input into a segmentation model HySwinUNet, and a lung CT image of the segmented focus is output.
The drawings in the disclosed embodiments of the invention relate only to the structures that are related to the disclosed embodiments, but the above description is only a preferred embodiment of the invention and it is fully applicable to various fields of adaptation to the invention, and therefore the invention is not limited to the specific details and illustrations shown and described herein without departing from the general concepts defined by the claims and their equivalents.
Claims (5)
1. A lung CT image segmentation method based on a mixture Swin Transformer U-Net is characterized by comprising the following steps: the method specifically comprises the following steps:
step one, data preprocessing and data enhancement:
collecting a large number of public lung infection CT images, performing data enhancement, expanding the number of samples, normalizing the images to serve as a training set of a model, and training the model;
step two, constructing a segmentation model HySwinUNet:
constructing a segmentation model HySwinUNet based on a U-Net encoder-decoder structure, wherein the segmentation model HySwinUNet comprises an encoder, an adaptive attention module, a decoder and jump connection;
in the encoder, an input image is divided into small blocks of 4×4 by block division, and after linear embedding, the dimension of the vector will become a preset value; dimension C, resolution isIs fed into two successive Swin converters for performing a characterization learning, in which the feature dimensions and resolution remain unchanged; the Swin converter module is responsible for feature representation learning, performs block merging, downsampling and dimension increasing after learning is completed, reduces the space size by 1/2, and increases the feature dimension to the original two times, so that a hierarchical design is formed; the above process is repeated three times in the encoder, and the rest modules are pre-activated in advance in the process of each layer of propagation;
in the encoding process, the self-adaptive attention module is adopted to locate the characteristic information of the region of interest, the region characteristic information without correlation is restrained, the characteristic information is effectively extracted, and the focus region is more accurately segmented; thereby improving the weight ratio of the characteristics of the target area and improving the network segmentation precision;
constructing a symmetric decoder based on the Swin converter block; remolding the feature map of adjacent dimensions into a higher resolution feature map by upsampling and correspondingly reducing the feature dimensions to half of the original dimensions; the extracted context features are fused with the multi-scale features of the encoder through jump connection so as to compensate the space information loss caused by downsampling and recover valuable space information;
step three, setting a training strategy and a loss function;
dividing the preprocessed data set into a training set, a testing set and a verification set; adopting random initialization and Adam optimization algorithm; setting BatchSize, epoch and proper learning rate, and simultaneously adopting a regularization strategy to prevent overfitting; updating weights and biases in the segmentation model HySwinUNet by using a back propagation algorithm; updating parameters by using a loss function in the training iteration process;
step four, verifying the trained network model: inputting the segmented verification set into a trained segmentation model HySwinUNet, segmenting a focus part in a lung CT image by an output result to obtain a segmented image, and evaluating the model by comparing the CT image segmented by an expert with the image segmented by the trained network model;
after verification, any lung CT image is input into a segmentation model HySwinUNet, and a lung CT image of the segmented focus is output.
2. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the data enhancement specifically comprises the following steps: the image is subjected to random cropping, inversion, rotation, scaling and offset processing.
3. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the pre-activation residual module is adopted at the inlet of the encoding stage and the outlet of the decoding stage, the pre-activation residual module initializes the transducer into a convolution network, and the local intensity characteristics are extracted by utilizing the convolution layer to avoid large-scale pretreatment of the transducer, so that the training of the Swin converter is easier; since the misclassified region is typically located on the boundary of the region of interest, high resolution context information plays a critical role in segmentation; the module sequentially executes element addition with the original input after twice continuous batch standardization BN, activation function ReLU and convolution operation Conv; conv convolution operation is performed after the ReLU layer, the dimension and resolution of the feature map are not changed, and information is smoother in the forward and backward propagation processes of the network.
4. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the Swin converter module is constructed based on a moving window and comprises two continuous Swin converters, wherein each Swin converter comprises a multi-head self-attention module MSA and a multi-layer perceptron MLP, and a layer normalization layer LN is adopted before each MSA module and each MLP module; based on the multi-head self-attention module, the Swin converter provides a window-based multi-head self-attention module W-MSA and a moving window-based multi-head self-attention module SW-MSA, and the calculation formula is as follows:
wherein,and z l Respectively representing the output of the first layer W-MSA and the MLP; />And z l+1 Output of the first +1 layer SW-MSA and MLP are shown respectively; W-MSA and SM-MSASelf-attention is calculated as: /> Wherein (1)>Representing queries, keys, and a matrix of values; m is M 2 And d represents the patch number of the window and the dimension of the query or key, respectively; the value of B is taken from the bias matrix +.>
5. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the input channels of the self-adaptive attention module are combined into double-attention input through 3X 3 convolution with expansion rates of 1 and 3, and different global information is found out through two different attention mechanisms; the matrix with the size of C multiplied by H multiplied by W is obtained based on global average pooling and pixel-by-pixel correlation to obtain a matrix with the size of C multiplied by 1, and the matrix is normalized by a Sigmoid function after the connection operation; further, more non-linear features are generated with fully connected layers; finally, applying Softmax operation to the channel, the attention of the cross-channel can adaptively select receptive fields with different sizes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211412454.0A CN117274147A (en) | 2022-11-11 | 2022-11-11 | Lung CT image segmentation method based on mixed Swin Transformer U-Net |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211412454.0A CN117274147A (en) | 2022-11-11 | 2022-11-11 | Lung CT image segmentation method based on mixed Swin Transformer U-Net |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117274147A true CN117274147A (en) | 2023-12-22 |
Family
ID=89205088
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211412454.0A Pending CN117274147A (en) | 2022-11-11 | 2022-11-11 | Lung CT image segmentation method based on mixed Swin Transformer U-Net |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117274147A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876370A (en) * | 2024-03-11 | 2024-04-12 | 南京信息工程大学 | CT image kidney tumor segmentation system based on three-dimensional axial transducer model |
-
2022
- 2022-11-11 CN CN202211412454.0A patent/CN117274147A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876370A (en) * | 2024-03-11 | 2024-04-12 | 南京信息工程大学 | CT image kidney tumor segmentation system based on three-dimensional axial transducer model |
CN117876370B (en) * | 2024-03-11 | 2024-06-07 | 南京信息工程大学 | CT image kidney tumor segmentation system based on three-dimensional axial transducer model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation | |
Zhou et al. | Lung cancer cell identification based on artificial neural network ensembles | |
Horng et al. | DeepNerve: a new convolutional neural network for the localization and segmentation of the median nerve in ultrasound image sequences | |
CN112767417B (en) | Multi-modal image segmentation method based on cascaded U-Net network | |
Khan et al. | Classification and region analysis of COVID-19 infection using lung CT images and deep convolutional neural networks | |
JP2022547722A (en) | Weakly Supervised Multitask Learning for Cell Detection and Segmentation | |
CN117152433A (en) | Medical image segmentation method based on multi-scale cross-layer attention fusion network | |
CN115457049A (en) | Lung CT image segmentation method based on transfer learning and attention mechanism | |
CN112102343A (en) | Ultrasound image-based PTC diagnostic system | |
Sadeghibakhi et al. | Multiple sclerosis lesions segmentation using attention-based CNNs in FLAIR images | |
CN117274147A (en) | Lung CT image segmentation method based on mixed Swin Transformer U-Net | |
Nawaz et al. | Deep Learning ResNet101 Deep Features of Portable Chest X-Ray Accurately Classify COVID-19 Lung Infection. | |
Chutia et al. | Classification of lung diseases using an attention-based modified DenseNet model | |
Khalifa et al. | Deep learning for image segmentation: a focus on medical imaging | |
Nie et al. | Semantic-guided encoder feature learning for blurry boundary delineation | |
Penso et al. | A token-mixer architecture for CAD-RADS classification of coronary stenosis on multiplanar reconstruction CT images | |
CN114119558B (en) | Method for automatically generating nasopharyngeal carcinoma image diagnosis structured report | |
CN115409812A (en) | CT image automatic classification method based on fusion time attention mechanism | |
Mani | Deep learning models for semantic multi-modal medical image segmentation | |
Ji et al. | ResDSda_U-Net: A novel U-Net based residual network for segmentation of pulmonary nodules in lung CT images | |
Zhang et al. | Boundary-oriented network for automatic breast tumor segmentation in ultrasound images | |
Mittal et al. | CoviSegNet-Covid-19 Disease Area Segmentation using Machine Learning Analyses for Lung Imaging | |
Xiao et al. | GAEI-UNet: Global Attention and Elastic Interaction U-Net for Vessel Image Segmentation | |
Mao et al. | Studies on Category Prediction of Ovarian Cancers Based on Magnetic Resonance Images | |
Micomyiza et al. | An effective automatic segmentation of abdominal adipose tissue using a convolution neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |