CN117274147A - Lung CT image segmentation method based on mixed Swin Transformer U-Net - Google Patents

Lung CT image segmentation method based on mixed Swin Transformer U-Net Download PDF

Info

Publication number
CN117274147A
CN117274147A CN202211412454.0A CN202211412454A CN117274147A CN 117274147 A CN117274147 A CN 117274147A CN 202211412454 A CN202211412454 A CN 202211412454A CN 117274147 A CN117274147 A CN 117274147A
Authority
CN
China
Prior art keywords
swin
module
lung
image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211412454.0A
Other languages
Chinese (zh)
Inventor
张聚
应长钢
龚伟伟
马栋
上官之博
孙晓燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Normal University
Original Assignee
Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Normal University filed Critical Hangzhou Normal University
Priority to CN202211412454.0A priority Critical patent/CN117274147A/en
Publication of CN117274147A publication Critical patent/CN117274147A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The invention relates to a lung CT image segmentation method based on a mixture Swin Transformer U-Net. The invention comprises data preprocessing and data enhancement; constructing a segmentation model HySwinUNet; setting a training strategy and a loss function, and training the model; verifying the trained model; by constructing the HySwinUNet model and combining convolution and a Transformer, adding a pre-activated residual module, avoiding large-scale preprocessing by utilizing the induction deviation of a convolution image, directly transmitting information to any other module by one module in the forward and backward propagation of a network, reducing training load and enabling the network to obtain better training; and the self-adaptive attention module is used for acquiring the multi-scale global features by integrating two attention mechanisms, so that the weight ratio of the features of the target region is improved. The invention combines the Swin transducer and the U-Net to enhance the functionality and flexibility of the traditional encoder-decoder architecture, realize the automatic segmentation of the lung infection part of the lung CT, and can accurately segment the lung infection area from the CT image.

Description

Lung CT image segmentation method based on mixed Swin Transformer U-Net
Technical Field
The invention belongs to the technical field of image segmentation, and relates to a lung CT image segmentation method based on mixing Swin Transformer U-Net.
Background
Medical images play a critical role in helping healthcare providers contact patients for diagnosis and treatment. Study medical images are primarily dependent on the radiologist's visual interpretation. However, this typically takes a lot of time and is very subjective depending on the experience of the radiologist. To overcome these limitations, the use of computer-aided systems has become necessary. Computerization of medical image segmentation plays an important role in medical imaging applications. It has wide application in different fields of diagnosis, pathological positioning, anatomical structure research, treatment planning, computer integrated operation, etc. However, variability and complexity of human anatomy has led to medical image segmentation as still a problem.
The current standard for diagnosing covd-19 is the real-time reverse transcription polymerase chain reaction (RT-PCR) swab test. However, the diagnostic results of RT-PCR require several hours to process, and the false negative rate of the assay is high, often requiring repeated assays. Compared to RT-PCR, chest Computed Tomography (CT) imaging enables efficient disease screening of covd-19 with high sensitivity and ease of use in a clinical setting.
The application of the deep learning technology in medical diagnosis can improve the detection rate and efficiency of diseases, and has great success in the field of medical image recognition. In order to diagnose lung cancer, lung tumor and lung nodule, many scholars have studied a lung CT image recognition method based on deep learning, and CT image recognition has proven to be very useful for diagnosis of lung diseases. This is critical for quantification and diagnosis of lung disease (including covd-19) if the lung infected area can be accurately segmented from CT images.
However, accurate segmentation of lung infection lesions on CT images remains a challenging task based on the fact: 1. on CT images, the infected boundaries are irregular, different in size and shape, and have the characteristics of blurred appearance and low contrast. This can easily lead to missing some small ground glass lesions, or over-segmentation of infection on CT images; 2. absent the marker dataset, large-scale infection annotations provided by clinicians are not readily available.
Disclosure of Invention
The invention aims to provide a lung CT image segmentation method based on a mixture Swin Transformer U-Net, which is used for accurately segmenting a lung infection area from a CT image.
The method specifically comprises the following steps:
step one, data preprocessing and data enhancement:
collecting a large number of public lung infection CT images, performing data enhancement, expanding the number of samples, and normalizing the images; as a training set of models, for training the models; the data enhancement is specifically: the image is subjected to random cropping, inversion, rotation, scaling and offset processing.
Step two, constructing a segmentation model HySwinUNet:
the method comprises the steps that a partitioning model HySwinUNet is built based on a coder-decoder structure of U-Net, the partitioning model HySwinUNet comprises a coder, an adaptive attention module, a decoder and a jump connection, a basic unit of the HySwinUNet is a Swin converter module (Swin Transformer Block), and the Swin converter is used as a backbone network of the U-Net;
in the encoder, an input image is divided into 4×4 small blocks (patches) by block division (Patch division), and after Linear Embedding (Linear Embedding), vectors are usedThe dimension of (2) will become a preset value; dimension C, resolution isIs fed into two successive Swin converters for performing a characterization learning, in which the feature dimensions and resolution remain unchanged; the Swin converter module is responsible for feature representation learning, performs block Merging (Patch Merging) after learning is completed, downsamples and increases dimensions, reduces the space size by 1/2, and increases the feature dimensions to the original two times, so that a hierarchical design is formed; the above procedure will be repeated three times in the encoder, passing the pre-activated remaining modules (PRBs) in advance during each layer propagation;
in the encoding process, a self-adaptive attention module (ADM) is adopted to locate the characteristic information of the region of interest (RoI), the region characteristic information without correlation is restrained, the characteristic information is effectively extracted, and the focus region is more accurately segmented; thereby improving the weight ratio of the characteristics of the target area and improving the network segmentation precision;
constructing a symmetric decoder based on the Swin converter block; remolding the feature map of adjacent dimensions into a higher resolution feature map by upsampling and correspondingly reducing the feature dimensions to half of the original dimensions; the extracted context features are fused with the multi-scale features of the encoder through jump connection to compensate for the loss of spatial information caused by downsampling and recover valuable spatial information.
The pre-activation residual module is adopted at the inlet of the encoding stage and the outlet of the decoding stage, the pre-activation residual module initializes the transducer into a convolution network, and the local intensity characteristics are extracted by utilizing the convolution layer to avoid large-scale pretreatment of the transducer, so that the training of the Swin converter is easier; since the misclassified region is typically located on the boundary of the region of interest (RoI), high resolution context information plays a critical role in segmentation; the module sequentially executes element addition with the original input after twice continuous batch standardization BN, activation function ReLU and convolution operation Conv; firstly, performing Conv convolution operation through a ReLU layer, wherein the dimension and resolution of the feature map are not changed; the pre-activated remaining modules enable information to be smoother in the forward and backward propagation processes of the network;
the Swin converter modules are constructed based on moving windows, comprising two successive Swin converters, each Swin converter comprising a multi-headed self-attention Module (MSA) and a multi-layer perceptron (MLP), further employing a Layer Normalization (LN) layer prior to each MSA module and MLP module; on the basis of the multi-head self-attention module, the Swin converter provides a window-based multi-head self-attention module (W-MSA) and a moving window-based multi-head self-attention module (SW-MSA), and the calculation formula is as follows:
wherein the method comprises the steps ofAnd z l Respectively representing the output of the first layer W-MSA and the MLP; />And z l+1 Output of the first +1 layer SW-MSA and MLP are shown respectively; the self-attentiveness of the W-MSA and SM-MSA were calculated as: /> Wherein Q, K, < >>Representing a matrix of queries, keys, and values; m is M 2 And 3 represents the patch number of the window and the dimension of the query or key, respectively; the value of B is taken from the bias matrix +.>
The input channels of the adaptive attention module (ADM) are combined into dual-attention input through 3X 3 convolution with expansion rate (expansion rate) of 1 and 3, and different global information is found through two different attention mechanisms; the matrix with the size of C multiplied by H multiplied by W is obtained by obtaining channel data based on global average pooling (Global Average Pool) and Pixel-by-Pixel Correlation (Pixel-wise Correlation), and then is normalized by using a Sigmoid function after being combined through connection operation (connection); furthermore, more non-linear features are generated with fully connected layers (FC); finally, applying Softmax operation to the channel, the attention of the cross-channel can adaptively select receptive fields with different sizes.
Step three, setting a training strategy and a loss function, and training the model;
dividing the preprocessed data set into a training set, a testing set and a verification set; adopting random initialization and Adam optimization algorithm; setting BatchSize, epoch and proper learning rate, and simultaneously adopting a regularization strategy to prevent overfitting; updating weights and biases in the segmentation model HySwinUNet by using a back propagation algorithm; updating parameters by using a loss function in the training iteration process;
step four, verifying the trained network model: inputting the segmented verification set into a trained segmentation model HySwinUNet, segmenting a focus part in a lung CT image by an output result to obtain a segmented image, and evaluating the model by comparing the CT image segmented by an expert with the image segmented by the trained network model;
after verification, any lung CT image is input into a segmentation model HySwinUNet, and a lung CT image of the segmented focus is output.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
the invention effectively combines the Swin transducer and the U-Net to enhance the functionality and flexibility of the traditional encoder-decoder architecture, is applied to the field of medical image segmentation, realizes automatic segmentation of lung CT lung infection parts, and can accurately segment lung infection areas from CT images.
Because the transfomer has no inductive bias on the image, it performs poorly on small-scale datasets, even with pre-trained imagenets, the transfomer may not perform as well as a residual network; the HySwinUNet model combines convolution and a Transformer, adds a pre-activation residual module (PRB) module, utilizes the induction deviation of a convolution image to avoid large-scale preprocessing, can directly transfer information to any other module by one module in the forward and backward propagation of a network, reduces the training burden, and enables the network to obtain better training;
an Adaptive Dual-attention Module is used for acquiring multi-scale global features by integrating two attention mechanisms, so that the weight ratio of the features of a target region is improved, and the aim of more accurately dividing a focus region with irregular CT images and low contrast of novel coronavirus infection is achieved; on CT images, the infected area may have discontinuous boundaries and irregular shapes, and the images may have blurred appearance and low contrast; the information of the channel and the pixel is important information for acquiring the representative characteristics of the region of interest; therefore, an Adaptive Dual-attention Module (Adaptive Dual-attention Module) is used to extract feature information and improve the weight ratio of the features of the target region.
Drawings
Fig. 1 is a block diagram of the hyswinlunet network of the present invention;
FIG. 2 is a block diagram of the pre-activated remaining modules of the present invention;
FIG. 3 is an illustration of a Swin converter module implementation;
fig. 4 is an illustration of an adaptive attention module implementation.
Detailed Description
The invention is further described with reference to the accompanying drawings:
a lung CT image segmentation method based on a mixture Swin Transformer U-Net specifically comprises the following steps:
step one, data preprocessing and data enhancement;
the new coronavirus infection CT images collected by the italian medical and interventional radiology institute (SIRM) and the new coronavirus infection CT images in the MosMedData dataset were used for training models in this example; the image is subjected to random cutting, inversion, rotation, scaling, offset and other modes to enlarge the data set, increase the number of training samples and improve the robustness of the model; finally, normalizing the images;
step two, constructing a segmentation model HySwinUNet:
as shown in fig. 1, the partitioning model hyswinlunet is a U-Net based encoder-decoder architecture, comprising an encoder, an adaptive attention module, a decoder and a jump connection, the basic unit of hyswinlunet is a Swin converter module (Swin Transformer Block), with the Swin converter as the backbone network of the U-Net;
for the encoder, in the process of dividing the model, an input image is firstly divided into small blocks (Patch) with the size of 4 multiplied by 4 through block division (Patch), and after Linear Embedding (Linear Embedding), the dimension of a vector is changed into a preset value; dimension C, resolution isIs fed into two successive Swin converters for performing a characterization learning, in which process both the feature dimension and the resolution remain unchanged; the Swin converter module is responsible for feature representation learning, and the function of block Merging (Patch Merging) is to downsample and increase the dimension, reduce the space size by 1/2, and increase the feature dimension by two times as much as the original, thereby forming a hierarchical design; during the propagation of each layer, the remaining modules (Pre-activation Residual Block) are Pre-activated in advance; this procedure will be repeated three times in the encoder;
in the encoding process, in order to effectively extract the characteristic information, more accurately segment the focus area, an Adaptive Dual-attention Module (Adaptive Dual-attention Module) is adopted to locate the characteristic information of a region of interest (RoI), and the region characteristic information without correlation is restrained, so that the weight ratio of the characteristics of a target region is improved, and the network segmentation precision is improved;
for a decoder, constructing a symmetric decoder based on the Swin transformer block; remolding the feature map of adjacent dimensions into a higher resolution feature map by upsampling and correspondingly reducing the feature dimensions to half of the original dimensions; the extracted context features are fused with the multi-scale features of the encoder through jump connection so as to compensate the space information loss caused by downsampling and recover valuable space information;
as shown in fig. 2, the transfomer does not generalize bias to images and therefore performs poorly on small-scale datasets, even though the pre-trained ImageNet may not perform as well as the residual network; the HySwinUNet model combines the convolution and the force of the transducer, a Pre-activation residual module (Pre-activation Residual Block) initializes the transducer to a convolution network, and local intensity features are extracted by using a convolution layer to avoid large-scale preprocessing of the transducer, so that the training of the Swin converter is easier; since the misclassified region is typically located on the boundary of the region of interest (RoI), high resolution context information plays a critical role in segmentation; the module sequentially executes element addition with the original input after twice continuous batch standardization BN, activation function ReLU and convolution operation Conv; firstly, performing Conv convolution operation after passing through a ReLU layer, and not changing the dimension and resolution of the feature map; the pre-activated remaining modules enable information to be smoother in the forward and backward propagation processes of the network; pre-activation of the remaining modules is adopted at both the inlet of the encoding stage and the outlet of the decoding stage;
as shown in fig. 3, unlike the conventional Multi-head Self-attention Module (MSA), the Swin converter Module is constructed based on a moving window, and is composed of two consecutive Swin converters, each of which is composed of a Multi-head Self-attention Module (Multi-head Self-attention Module) and a Multi-layer perceptron (MLP), and further employs a Layer Normalization (LN) layer before each MSA Module and MLP Module; on the basis of the multi-head self-attention module, the Swin converter provides a window-based multi-head self-attention module (W-MSA) and a moving window-based multi-head self-attention module (SW-MSA), and the calculation formula is as follows: wherein (1)>And z l Respectively representing the output of the first layer W-MSA and the MLP; />And z l+1 Output of the first +1 layer SW-MSA and MLP are shown respectively; the self-attentiveness of the W-MSA and SM-MSA were: />Wherein Q, K, < >>Representing a matrix of queries, keys, and values; m is M 2 And 3 represents the patch number of the window and the dimension of the query or key, respectively; the value of B is taken from the bias matrix +.>
The infected area on the lung CT image may have discontinuous boundaries and irregular shapes, and the image may have a blurred appearance and low contrast; the information of the channel and the pixel is important information for acquiring the representative characteristics of the region of interest; therefore, an Adaptive Dual-attention Module (Adaptive Dual-attention Module) is used for extracting more comprehensive and distinguishing characteristic information, and the weight ratio of the characteristics of the target area is improved so as to identify the boundary of the focus; the module captures boundary discontinuities of a lung CT lesion by a global averaging pooling and processes shape irregularities by another pixel-by-pixel correlation;
the structure of the adaptive attention module is shown in fig. 4, and the input channels are convolved by 3×3 with expansion rates (expansion rates) of 1 and 3, respectively, and they are combined into dual-attention input, and different global information is found out by two different attention mechanisms; the matrix with the size of C multiplied by H multiplied by W is obtained by obtaining channel data based on global average pooling (Global Average Pool) and Pixel-by-Pixel Correlation (Pixel-wise Correlation), and then is normalized by using a Sigmoid function after being combined through connection operation (connection); furthermore, more non-linear features are generated with fully connected layers (FC); finally, applying Softmax operation to the channel and outputting a feature map with the original size; the attention of the cross-channel can adaptively select receptive fields of different sizes;
step three, setting a training strategy and a loss function, and training the model;
dividing the preprocessed data set into a training set, a testing set and a verification set in sequence according to the proportion of 5:3:2; adopting random initialization and Adam optimization algorithm; setting BatchSize, epoch and proper learning rate, and simultaneously adopting a regularization strategy to prevent overfitting; updating weights and biases in the network by using a back propagation algorithm in the HySwinUNet network model; updating parameters by using a loss function in the training iteration process;
training the HySwinUNet network model according to the set training strategy; in the training stage, hySwinUNet trains in an end-to-end manner using an objective function; updating parameters in the iteration process by using the loss function; in the selection of the Loss function, all networks are trained using a combination of Dice Loss (Dice Loss) and binary cross entropy Loss (Binary Cross Entropy Loss); thus, the loss function is Loss=αL Di7e +βL BCE
Where y is the true probability of sample i,is the predictive probability for sample i; l (L) Di7e And L BCE Respectively representing dice loss and binary cross entropy loss; loss represents the final Loss function, the dice Loss and the binary cross entropy Loss are combined in one term, and L is given Di7e The class imbalance problem can be better processed by more weights; alpha has a value of 0.9 and beta has a value of 0.1.
Step 4, verifying the trained network model;
sending the segmented verification set into a trained HySwinUNet network model, and outputting a result to segment a focus part in a lung CT image to obtain a segmented image, and evaluating the model by comparing the CT image segmented by an expert with the image segmented by the trained network model;
4 widely adopted evaluation criteria were used to measure the performance of the hyswinlunet model; the evaluation index is as follows:
dice similarity coefficient (Dice similarity coefficient): DSC is used to measure the similarity between predicted pulmonary infections and facts, where V Seg Represents the region divided by the model algorithm, V GT Representing a real segmentation area; TP, TN, FP, FN each represents a true positive, a true negative, a false positive, and a false negative;
sensitivity (Sensitivity): SEN represents the percentage of correctly segmented lung infections;
specificity (Specificity): SPE represents the percentage of non-infected areas that are correctly segmented;
positive predictive value (Precision): PRE represents the accuracy of the segmentation of the pulmonary infection area,
after verification, any lung CT image is input into a segmentation model HySwinUNet, and a lung CT image of the segmented focus is output.
The drawings in the disclosed embodiments of the invention relate only to the structures that are related to the disclosed embodiments, but the above description is only a preferred embodiment of the invention and it is fully applicable to various fields of adaptation to the invention, and therefore the invention is not limited to the specific details and illustrations shown and described herein without departing from the general concepts defined by the claims and their equivalents.

Claims (5)

1. A lung CT image segmentation method based on a mixture Swin Transformer U-Net is characterized by comprising the following steps: the method specifically comprises the following steps:
step one, data preprocessing and data enhancement:
collecting a large number of public lung infection CT images, performing data enhancement, expanding the number of samples, normalizing the images to serve as a training set of a model, and training the model;
step two, constructing a segmentation model HySwinUNet:
constructing a segmentation model HySwinUNet based on a U-Net encoder-decoder structure, wherein the segmentation model HySwinUNet comprises an encoder, an adaptive attention module, a decoder and jump connection;
in the encoder, an input image is divided into small blocks of 4×4 by block division, and after linear embedding, the dimension of the vector will become a preset value; dimension C, resolution isIs fed into two successive Swin converters for performing a characterization learning, in which the feature dimensions and resolution remain unchanged; the Swin converter module is responsible for feature representation learning, performs block merging, downsampling and dimension increasing after learning is completed, reduces the space size by 1/2, and increases the feature dimension to the original two times, so that a hierarchical design is formed; the above process is repeated three times in the encoder, and the rest modules are pre-activated in advance in the process of each layer of propagation;
in the encoding process, the self-adaptive attention module is adopted to locate the characteristic information of the region of interest, the region characteristic information without correlation is restrained, the characteristic information is effectively extracted, and the focus region is more accurately segmented; thereby improving the weight ratio of the characteristics of the target area and improving the network segmentation precision;
constructing a symmetric decoder based on the Swin converter block; remolding the feature map of adjacent dimensions into a higher resolution feature map by upsampling and correspondingly reducing the feature dimensions to half of the original dimensions; the extracted context features are fused with the multi-scale features of the encoder through jump connection so as to compensate the space information loss caused by downsampling and recover valuable space information;
step three, setting a training strategy and a loss function;
dividing the preprocessed data set into a training set, a testing set and a verification set; adopting random initialization and Adam optimization algorithm; setting BatchSize, epoch and proper learning rate, and simultaneously adopting a regularization strategy to prevent overfitting; updating weights and biases in the segmentation model HySwinUNet by using a back propagation algorithm; updating parameters by using a loss function in the training iteration process;
step four, verifying the trained network model: inputting the segmented verification set into a trained segmentation model HySwinUNet, segmenting a focus part in a lung CT image by an output result to obtain a segmented image, and evaluating the model by comparing the CT image segmented by an expert with the image segmented by the trained network model;
after verification, any lung CT image is input into a segmentation model HySwinUNet, and a lung CT image of the segmented focus is output.
2. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the data enhancement specifically comprises the following steps: the image is subjected to random cropping, inversion, rotation, scaling and offset processing.
3. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the pre-activation residual module is adopted at the inlet of the encoding stage and the outlet of the decoding stage, the pre-activation residual module initializes the transducer into a convolution network, and the local intensity characteristics are extracted by utilizing the convolution layer to avoid large-scale pretreatment of the transducer, so that the training of the Swin converter is easier; since the misclassified region is typically located on the boundary of the region of interest, high resolution context information plays a critical role in segmentation; the module sequentially executes element addition with the original input after twice continuous batch standardization BN, activation function ReLU and convolution operation Conv; conv convolution operation is performed after the ReLU layer, the dimension and resolution of the feature map are not changed, and information is smoother in the forward and backward propagation processes of the network.
4. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the Swin converter module is constructed based on a moving window and comprises two continuous Swin converters, wherein each Swin converter comprises a multi-head self-attention module MSA and a multi-layer perceptron MLP, and a layer normalization layer LN is adopted before each MSA module and each MLP module; based on the multi-head self-attention module, the Swin converter provides a window-based multi-head self-attention module W-MSA and a moving window-based multi-head self-attention module SW-MSA, and the calculation formula is as follows:
wherein,and z l Respectively representing the output of the first layer W-MSA and the MLP; />And z l+1 Output of the first +1 layer SW-MSA and MLP are shown respectively; W-MSA and SM-MSASelf-attention is calculated as: /> Wherein (1)>Representing queries, keys, and a matrix of values; m is M 2 And d represents the patch number of the window and the dimension of the query or key, respectively; the value of B is taken from the bias matrix +.>
5. The hybrid Swin Transformer U-Net based lung CT image segmentation method of claim 1, further comprising: the input channels of the self-adaptive attention module are combined into double-attention input through 3X 3 convolution with expansion rates of 1 and 3, and different global information is found out through two different attention mechanisms; the matrix with the size of C multiplied by H multiplied by W is obtained based on global average pooling and pixel-by-pixel correlation to obtain a matrix with the size of C multiplied by 1, and the matrix is normalized by a Sigmoid function after the connection operation; further, more non-linear features are generated with fully connected layers; finally, applying Softmax operation to the channel, the attention of the cross-channel can adaptively select receptive fields with different sizes.
CN202211412454.0A 2022-11-11 2022-11-11 Lung CT image segmentation method based on mixed Swin Transformer U-Net Pending CN117274147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211412454.0A CN117274147A (en) 2022-11-11 2022-11-11 Lung CT image segmentation method based on mixed Swin Transformer U-Net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211412454.0A CN117274147A (en) 2022-11-11 2022-11-11 Lung CT image segmentation method based on mixed Swin Transformer U-Net

Publications (1)

Publication Number Publication Date
CN117274147A true CN117274147A (en) 2023-12-22

Family

ID=89205088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211412454.0A Pending CN117274147A (en) 2022-11-11 2022-11-11 Lung CT image segmentation method based on mixed Swin Transformer U-Net

Country Status (1)

Country Link
CN (1) CN117274147A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876370A (en) * 2024-03-11 2024-04-12 南京信息工程大学 CT image kidney tumor segmentation system based on three-dimensional axial transducer model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876370A (en) * 2024-03-11 2024-04-12 南京信息工程大学 CT image kidney tumor segmentation system based on three-dimensional axial transducer model
CN117876370B (en) * 2024-03-11 2024-06-07 南京信息工程大学 CT image kidney tumor segmentation system based on three-dimensional axial transducer model

Similar Documents

Publication Publication Date Title
Wu et al. Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation
Zhou et al. Lung cancer cell identification based on artificial neural network ensembles
Horng et al. DeepNerve: a new convolutional neural network for the localization and segmentation of the median nerve in ultrasound image sequences
CN112767417B (en) Multi-modal image segmentation method based on cascaded U-Net network
Khan et al. Classification and region analysis of COVID-19 infection using lung CT images and deep convolutional neural networks
JP2022547722A (en) Weakly Supervised Multitask Learning for Cell Detection and Segmentation
CN117152433A (en) Medical image segmentation method based on multi-scale cross-layer attention fusion network
CN115457049A (en) Lung CT image segmentation method based on transfer learning and attention mechanism
CN112102343A (en) Ultrasound image-based PTC diagnostic system
Sadeghibakhi et al. Multiple sclerosis lesions segmentation using attention-based CNNs in FLAIR images
CN117274147A (en) Lung CT image segmentation method based on mixed Swin Transformer U-Net
Nawaz et al. Deep Learning ResNet101 Deep Features of Portable Chest X-Ray Accurately Classify COVID-19 Lung Infection.
Chutia et al. Classification of lung diseases using an attention-based modified DenseNet model
Khalifa et al. Deep learning for image segmentation: a focus on medical imaging
Nie et al. Semantic-guided encoder feature learning for blurry boundary delineation
Penso et al. A token-mixer architecture for CAD-RADS classification of coronary stenosis on multiplanar reconstruction CT images
CN114119558B (en) Method for automatically generating nasopharyngeal carcinoma image diagnosis structured report
CN115409812A (en) CT image automatic classification method based on fusion time attention mechanism
Mani Deep learning models for semantic multi-modal medical image segmentation
Ji et al. ResDSda_U-Net: A novel U-Net based residual network for segmentation of pulmonary nodules in lung CT images
Zhang et al. Boundary-oriented network for automatic breast tumor segmentation in ultrasound images
Mittal et al. CoviSegNet-Covid-19 Disease Area Segmentation using Machine Learning Analyses for Lung Imaging
Xiao et al. GAEI-UNet: Global Attention and Elastic Interaction U-Net for Vessel Image Segmentation
Mao et al. Studies on Category Prediction of Ovarian Cancers Based on Magnetic Resonance Images
Micomyiza et al. An effective automatic segmentation of abdominal adipose tissue using a convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination