CN114049314A - Medical image segmentation method based on feature rearrangement and gated axial attention - Google Patents

Medical image segmentation method based on feature rearrangement and gated axial attention Download PDF

Info

Publication number
CN114049314A
CN114049314A CN202111262731.XA CN202111262731A CN114049314A CN 114049314 A CN114049314 A CN 114049314A CN 202111262731 A CN202111262731 A CN 202111262731A CN 114049314 A CN114049314 A CN 114049314A
Authority
CN
China
Prior art keywords
attention
feature
block
rearrangement
axial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111262731.XA
Other languages
Chinese (zh)
Inventor
俞俊
于云杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111262731.XA priority Critical patent/CN114049314A/en
Publication of CN114049314A publication Critical patent/CN114049314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a medical image segmentation method based on feature rearrangement and gated axial attention. Firstly, unifying all original images and real segmentation images into the same size, and then randomly turning the adjusted training images horizontally/vertically, thereby improving the diversity of samples. And secondly, down sampling is performed by adopting characteristic rearrangement, so that the characteristics of the original image are better retained, up sampling is performed by adopting reverse characteristic rearrangement, and the decoding capability of the network is improved. Then, the global branch and the local branch are trained cooperatively, and the global information interaction and the local information interaction of the image are respectively extracted. And finally, combining the two parts of information to segment the image. Meanwhile, by optimizing parameter setting and utilizing proper training skills, a more accurate medical image segmentation effect is realized.

Description

Medical image segmentation method based on feature rearrangement and gated axial attention
Technical Field
The invention relates to the field of medical image segmentation, in particular to a medical image segmentation method based on end-to-end.
Background
With the popularization of deep convolutional neural networks in computer vision, deep convolutional neural networks are used for medical image segmentation tasks. Networks like U-Net, Res-UNet and U-Net + + have been specifically proposed for performing image segmentation for various medical imaging modalities. These methods also achieved good performance on many difficult datasets, demonstrating the effectiveness of CNN in learning to discern features to segment organs or lesions from medical scans. Although the CNN-based method achieves excellent performance in the field of medical image segmentation, it is difficult for the CNN-based method to learn exact global and remote semantic information interaction due to the inherent locality of convolution operation. The segmentation accuracy does not fully satisfy medical applications. Medical image segmentation remains a challenging task in medical image analysis.
Recently, the transform-based approach has been highly varied in the field of computer vision. The main reason for the success of transformers is that they are able to learn long-term dependencies between input tokens, which can learn global and remote semantic information interactions. Transformer-based deformable axial attention block, which decomposes 2D self attention into two 1D self attentions, introduces position-sensitive axial attention that has been used for panorama segmentation. Transformer-based morphosome gated axial attention, a gating mechanism can be used to control the circulation of attention information in the network.
Most of the existing medical image segmentation models adopt a convolution layer or a pooling layer to down-sample an original image. For upsampling, the upsampling is performed using convolution plus double linear interpolation. In the process of downsampling by using a convolutional layer or pooling method, a part of original image information is lost. The invention uses the characteristic rearrangement mode to store the original information on the C channel. Because the double linear interpolation depends on an interpolation formula, the flexibility of different data sets is poor, and the method adopts a reverse characteristic rearrangement mode to perform upsampling. And meanwhile, the rolling block is replaced by the axial attention block with global and remote semantic information interactive gating so as to realize better segmentation precision on the medical image.
Disclosure of Invention
The invention provides a medical image segmentation method based on feature rearrangement and gated axial attention. The method adopts a feature rearrangement and gating axial attention mechanism, and the global branch and the local branch are cooperatively trained in an end-to-end mode, so that the global information interaction and local information interaction features of an input original image can be effectively extracted, and the circulation of information in a network is controlled through the gating mechanism, so that a model can learn good position deviation information on a small sample data set. Experimental results show that the method can more accurately segment the medical image.
A medical image segmentation method based on feature rearrangement and gated axial attention comprises the following steps:
step 1, acquiring a data set; selecting 3 data sets from existing public medical image segmentation data sets;
the 3 data sets in the data acquisition are gland segmentation data sets Glas respectively, and comprise 85 training images and 80 testing images. The nuclear segmentation dataset moninseg, comprised 30 training images and 14 test images. A nuclear segmentation data set TNBC comprising 35 training images and 15 test images.
Step 2, data processing; on the medical image segmentation data set acquired in the step 1, adjusting images in the data set to be the same in size; then, randomly turning the adjusted training sample image horizontally/vertically, thereby increasing the diversity of the training sample;
step 3, defining a medical image segmentation model based on feature rearrangement and gated axial attention, wherein the model comprises a global branch and a local branch; taking the training image processed in the step 2 and a real segmentation graph of the training image as input;
step 4, a loss function; the function of the loss function is to measure the error between the predicted value and the real sample mark; here a cross entropy loss function is employed;
step 5, defining an Adam optimizer, setting a reasonable learning rate for the model, setting the initial learning rate to be 0.001, slowing down the learning rate along with the increase of the number of batches in the model training process, and adjusting the learning rate to be 0.8 in each 50 batches, so that the oscillation is effectively inhibited, and a better network parameter is found; meanwhile, L2 regularization is adopted to effectively reduce overfitting;
the learning rate attenuation formula is as follows (3):
lp=l0×0.8p//50 (3)
in the above formula, p is the number of training batches (epoch). The hyper-parameter defining the L2 regularization term is 0.0005.
And 6, training and testing the network, cooperatively training the global branch and the local branch in the step 3, and evaluating on the test set provided by each data set while training, wherein the evaluation adopts average IoU and average F1 score.
Further, the data processing in step 2 is specifically implemented as follows:
firstly, an original image in a data set and a real segmentation map are formed into a size of 128 x 128 through resize;
and finally, randomly turning the training image subjected to resize and the corresponding segmentation image horizontally/vertically with the probability of 50%.
Further, the global branch and the local branch of the model described in step 3 are specifically implemented as follows:
global branch code (encode) part:
3-1, using a 7 × 7 convolution kernel for an input training sample image, setting the step length to be 1 and Padding to be 3, reserving the input height H and width W, and mapping by a BatchNorm layer and a ReLU activation function to obtain a feature block;
3-2, performing characteristic rearrangement on the characteristic blocks, dividing H and W patches into 2 multiplied by 2 patch blocks, and downsampling (B, C, H, W) into (B,4C, H/2, W/2); b is the number of pictures input at one time, C is the number of channels of the feature block, and H and W are the height and width of the feature block respectively; the characteristic rearrangement can keep the characteristics of the adjacent elements on the C channel, and compared with the adoption of a pooling layer, the information can be better kept;
3-3, carrying out convolution with unchanged height and width and a BatchNorm layer and a ReLU activation function mapping on the feature block after feature rearrangement to obtain a feature block x, and enhancing information circulation of a local panel block;
3-4, inputting the characteristic block x into a gated axial attention block; a gated axial attention block is first mapped through a 1 × 1 convolution plus the BatchNorm layer and the ReLU activation function; second, applying a gated axial attention along the width axis of the tensor, as expressed by the following equation (1):
Figure BDA0003326366280000041
wherein, N represents the width after last down-sampling, qij=Wq′x,kij=Wk′x,vij=Wv′x is query, key, and value, respectively; w is the linear transformation, x is the input, y is the output;
qij=Wq′x, denotes the jth element of the ith row, Wq′Representing parameters of a linear transformation on x
kij=Wk′x, denotes the jth element of the ith row, Wk′Representing parameters of a linear transformation on x
vij=Wv′x, denotes the jth element of the ith row, Wv′Representing parameters of a linear transformation on x
Wherein qe, ke and e are learnable position deviation terms, which are generally called relative position codes;
wherein G is a learnable gating parameter used to control the effect of learned relative position coding on coding non-local contexts; then, a gated axial attention applied along the height axis of the tensor; the method is consistent with a gated axial attention applied at the width axis; finally, mapping by a convolution of 1 × 1 and a BatchNorm layer and a ReLU activation function, and establishing residual connection before and after the characteristic block passes through each gated axial attention block;
3-5, performing feature rearrangement on the feature blocks obtained after the 1 gate-controlled axial attention, performing down-sampling, and finally passing through two gate-controlled axial attention blocks; obtaining compact feature value block x with long-range dependency and global information1
Global branch decoding (decoder) section:
firstly, feature block x1Expanding the capacity of a C channel by using convolution of 1 multiplied by 1 after passing through two gated axial attention blocks so as to carry out up-sampling by using inverse characteristic rearrangement; then carrying out inverse characteristic rearrangement to carry out up-sampling; compared with the method adopting bilinear interpolation up-sampling, the method has strong feature rearrangement flexibility, the parameters can be obtained by network learning, and the bilinear interpolation is obtained by a formula; secondly, the C channel is expanded by a gated axial attention block and a convolution of 1 multiplied by 1, and then inverse feature rearrangement is carried out to carry out up-sampling to obtain a feature block x which has long-range dependence and global information and has the same size with the input original image2All the blocks corresponding to the decoding part and the coding part adopt jump connection in a way of addition;
partial branching portion:
301. the input training sample image is divided into 32 x 32 patch blocks on H and W patches, and then for each patch block the following operations are performed:
7 multiplied by 7 convolution kernel, the step length is 1, Padding is set to be 3, the input height and width are reserved, and a characteristic block is obtained through BatchNorm layer and ReLU activation function mapping;
302. performing feature rearrangement downsampling on the feature block obtained in the step 301, and performing convolution without changing height and width, a BatchNorm layer and a ReLU activation function mapping for 2 times to obtain a feature block x';
303. 3 times of gated axial attention blocks and characteristic rearrangement downsampling are carried out on x', and the gated axial attention blocks adopted each time are respectively 3, 4 and 1; obtaining a characteristic block x'1(ii) a Then x 'is processed'1Performing up-sampling by 3 times of gating axial attention blocks and reverse characteristic rearrangement, wherein the gating axial attention blocks adopted each time are respectively 1, 4 and 3;
304. pressing each dough sheet block as it isSequentially splicing to obtain a feature block x 'with local information and the same size as the input original image'2
Further, the loss function in step 4 is as follows (2):
Figure BDA0003326366280000051
where H and W are dimensions of the image, p (x, y) corresponds to a pixel in the image, and p' (x, y) represents the output prediction for a particular location (x, y).
Further, the cooperative training and evaluation indexes in step 6 are as follows:
the global branch and the local branch are cooperatively trained by using the feature block x obtained in the step 32And x'2Adding, and performing 1 × 1 convolution to reduce the number of C channels to the number of classes to be segmented;
the average IoU and the average F1 score mean that IoU and F1 scores of different categories of each predicted segmentation map and each real segmentation map are calculated firstly, and then the average values are taken; adding the mean IoU and the mean F1 score of all test sets respectively, and dividing by the number of test images to obtain a mean IoU and a mean F1 score; the two indexes can effectively evaluate the accuracy of model segmentation.
The invention has the following beneficial effects:
the method adopts a down-sampling mode of characteristic rearrangement, can keep more original image information compared with convolution and pooling, adopts an up-sampling mode of reverse characteristic rearrangement, has more flexibility compared with bilinear interpolation, and can obtain corresponding parameters through self learning of a network without being limited to the determination of a formula. And a gating axial attention model of the global branch and the local branch is adopted, so that not only local information interaction and global information interaction are considered, but also the circulation of information in the network is controlled by using a gating mechanism. And a plurality of training skills are properly adopted, and ideal network parameters, an optimization algorithm and the setting of the learning rate are selected, so that the accuracy of the medical image segmentation is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the network framework of the present invention.
Fig. 3 is a graph comparing the segmentation effect of the present invention with other methods.
Detailed Description
The invention is further illustrated by the following figures and examples.
As shown in fig. 1, a method for segmenting a medical image based on feature rearrangement and gated axial attention specifically includes the following steps:
step 1, data set acquisition, wherein 3 data sets are selected from the existing public medical image segmentation data sets. The data sets used in the present invention are the gland segmentation data sets Glas, respectively, containing 85 training images and 80 test images. The nuclear segmentation dataset moninseg, comprised 30 training images and 14 test images. A nuclear segmentation data set TNBC comprising 35 training images and 15 test images.
And 2, processing data, namely unifying the original image and the real segmentation image in the data set into the size of 128 multiplied by 128. And finally, random horizontal/vertical turnover with the probability of 50% is carried out on the training images and the corresponding segmentation images with uniform sizes, so that overfitting can be effectively avoided on one hand, and the model performance can be improved to a certain extent on the other hand.
And 3, as shown in FIG. 2, the medical image segmentation model network framework diagram based on feature rearrangement and gated axial attention is composed of 2 parts, namely a global branch and a local branch. And (4) taking the training image processed in the step (2) and a real segmentation graph of the training image as input.
The global branch and the local branch are specifically realized as follows:
global branch code (encode) part:
firstly, an input original picture uses a 7 × 7 convolution kernel, the step size is 1, Padding is set to be 3, the height and the width of the input are reserved, and the original picture is mapped through a BatchNorm layer and a ReLU activation function.
Next, a feature rearrangement is performed to divide the H and W patches into 2 × 2 patches, i.e., (B, C, H, W) is downsampled to (B,4C, H/2, W/2). Wherein B is the number of pictures input at a time, C is the number of channels of the feature block, and H and W are the height and width of the feature block respectively. This preserves the characteristics of neighboring elements on the C channel, which preserves information better than using pooling layers.
Then, the feature block obtained above is subjected to 2 times of convolution addition without changing the height and width
And mapping the BatchNorm layer and the ReLU activation function to obtain a feature block x, and enhancing the information circulation of the local panel block.
The feature block x is then input into the gated axial attention block. A gated axial attention block is first mapped through a 1 x 1 convolution plus the BatchNorm layer and the ReLU activation function. Second, the gated axis attention applied along the width axis of the tensor is given by:
Figure BDA0003326366280000071
wherein, N represents the width after last down-sampling, qij=Wq′x,kij=Wk′x,vij=Wv′x is query, key, and value, respectively; w is the linear transformation, x is the input, y is the output;
qij=Wq′x, denotes the jth element of the ith row, Wq′Representing parameters of a linear transformation on x
kij=Wk′x, denotes the jth element of the ith row, Wk′Representing parameters of a linear transformation on x
vij=Wv′x, denotes the jth element of the ith row, Wv′Representing parameters of a linear transformation on x
Where qe, ke, e are learnable positional deviation terms, commonly referred to as relative position codes. Where G is a learnable gating parameter used to control the effect of learned relative position coding on coding non-local contexts. Then, the gated axis applied along the height axis of the tensor is focused. The method is consistent with the application of gated axial attention on the width axis. Finally, a 1 × 1 convolution plus the BatchNorm layer and the ReLU activation function map are passed, and the feature blocks establish residual connections before and after passing through each gated axial attention block.
The feature block x passes through a gated axial attention block, then performs feature rearrangement for downsampling, and finally passes through two gated axial attention blocks. Obtaining compact feature value block x with long-range dependency and global information1
Global branch decoding (decoder) section:
firstly, feature block x1The C channel is expanded using a 1 x 1 convolution through two gated axial attention blocks for upsampling using inverse feature rebinning. Inverse eigen-rearrangement is then performed for upsampling. Compared with the method adopting bilinear interpolation up-sampling, the method has strong feature rearrangement flexibility, the parameters can be obtained by network learning, and the bilinear interpolation is obtained by a formula. Secondly, the C channel is expanded by a gated axial attention block and a convolution of 1 multiplied by 1, and then inverse feature rearrangement is carried out to carry out up-sampling to obtain a feature block x which has long-range dependence and global information and has the same size with the input original image2All the blocks corresponding to the decoding part and the coding part adopt jump connection in a way of addition.
Partial branching portion:
firstly, dividing an input original picture into 32 × 32 tile blocks on H and W tiles, and secondly, performing the following operations for each tile block:
using 7 × 7 convolution kernel, step size is 1, Padding is set to 3, the height and width of the input are preserved, and then mapped through BatchNorm layer and ReLU activation function.
Secondly, performing feature rearrangement downsampling on the feature block obtained in the previous step, and performing convolution without changing height and width, a BatchNorm layer and a ReLU activation function mapping for 2 times to obtain a feature block x'.
Then, x' is subjected to 3 gated axial attention blocks plus feature rearrangement downsampling, and the gated axial attention blocks adopted each time are respectively 3, 4 and 1. Obtaining a characteristic block x'1. Then, the product is processedX'1The up-sampling is rearranged by 3 gating axial attention blocks and reverse characteristics, and the gating axial attention blocks adopted each time are 1, 4 and 3 respectively.
Finally, splicing each face patch in the original order to obtain a feature block x 'with local information and the same size as the input original image'2
And 4, defining a loss function, wherein the function of the loss function is used for measuring the error between the predicted value and the real sample mark. This uses the cross entropy loss function commonly used in segmentation tasks as the following equation (2):
Figure BDA0003326366280000081
where w, h is the dimension of the image, p (x, y) corresponds to a pixel in the image, and p' (x, y) represents the output prediction for a particular location (x, y).
And 5, defining an Adam optimizer, setting a reasonable learning rate for the model, setting the initial learning rate to be 0.001, slowing down the learning rate along with the increase of the number of batches in the model training process, and adjusting the learning rate to be 0.8 in each 50 batches, so that the oscillation is effectively inhibited, and a better network parameter is found. Meanwhile, the overfitting is effectively reduced by adopting L2 regularization, and the hyper-parameter of the regularization term is 0.0005. The learning rate attenuation formula is defined as the following formula (3):
lp=l0×0.8p//50 (3)
in the above formula, p is the number of training batches (epoch).
Step 6, network training and testing, training global branch and local branch cooperatively, and processing the characteristic block x obtained in step 32And x'2The addition is performed, followed by a 1 × 1 convolution, so that the C channel is reduced to the number of classes to be segmented. And then, carrying out sigmoid function mapping on the obtained corresponding prediction block in a channel C, calculating the cross entropy loss between the prediction block and a real segmentation graph, and finally carrying out gradient updating by using an Adam optimizer defined in the step 5. A total of 400 training batches were run.
And (4) testing once every 50 batches of training, firstly carrying out sigmoid function mapping on the C channel, and then classifying the corresponding pixels into the classes with the maximum probability. The indices used for evaluation were average IoU and average F1 score. Averaging IoU and F1 score means that IoU and F1 score are calculated for each different class of predicted segmentation map and real segmentation map and then averaged. The mean IoU and the mean F1 score of all test sets were then added separately and divided by the number of test images to yield the mean IoU and the mean F1 score. The two indexes can effectively evaluate the accuracy of model segmentation.
The comparative model used for the experiment was the most recently best MedT model performed on the Glas and moninseg datasets. Experimental index pair ratio table 1 below, the comparative graph of the segmentation effect is shown in fig. 3.
Figure BDA0003326366280000091
Table 1 the invention is compared to the MedT model index.

Claims (5)

1. A medical image segmentation method based on feature rearrangement and gated axial attention is characterized by comprising the following steps:
step 1, acquiring a data set; selecting 3 data sets from existing public medical image segmentation data sets;
step 2, data processing; on the medical image segmentation data set acquired in the step 1, adjusting images in the data set to be the same in size; then, randomly turning the adjusted training sample image horizontally/vertically, thereby increasing the diversity of the training sample;
step 3, defining a medical image segmentation model based on feature rearrangement and gated axial attention, wherein the model comprises a global branch and a local branch; taking the training image processed in the step 2 and a real segmentation graph of the training image as input;
step 4, a loss function; the function of the loss function is to measure the error between the predicted value and the real sample mark; here a cross entropy loss function is employed;
step 5, defining an Adam optimizer, setting a reasonable learning rate for the model, setting the initial learning rate to be 0.001, slowing down the learning rate along with the increase of the number of batches in the model training process, and adjusting the learning rate to be 0.8 in each 50 batches, so that the oscillation is effectively inhibited, and a better network parameter is found; meanwhile, L2 regularization is adopted to effectively reduce overfitting;
and 6, training and testing the network, cooperatively training the global branch and the local branch in the step 3, and evaluating on the test set provided by each data set while training, wherein the evaluation adopts average IoU and average F1 score.
2. The method for segmenting medical images based on feature rearrangement and gated axial attention according to claim 1, wherein the data processing in step 2 is implemented as follows:
firstly, an original image in a data set and a real segmentation map are formed into a size of 128 x 128 through resize;
and finally, randomly turning the training image subjected to resize and the corresponding segmentation image horizontally/vertically with the probability of 50%.
3. The method for segmenting medical images based on feature rearrangement and gated axial attention according to claim 2, wherein the model global branch and the model local branch in step 3 are implemented as follows:
global branch code (encode) part:
3-1, using a 7 × 7 convolution kernel for an input training sample image, setting the step length to be 1 and Padding to be 3, reserving the input height H and width W, and mapping by a BatchNorm layer and a ReLU activation function to obtain a feature block;
3-2, performing characteristic rearrangement on the characteristic blocks, dividing H and W patches into 2 multiplied by 2 patch blocks, and downsampling (B, C, H, W) into (B,4C, H/2, W/2); b is the number of pictures input at one time, C is the number of channels of the feature block, and H and W are the height and width of the feature block respectively; the characteristic rearrangement can keep the characteristics of the adjacent elements on the C channel, and compared with the adoption of a pooling layer, the information can be better kept;
3-3, carrying out convolution with unchanged height and width and a BatchNorm layer and a ReLU activation function mapping on the feature block after feature rearrangement to obtain a feature block x, and enhancing information circulation of a local panel block;
3-4, inputting the characteristic block x into a gated axial attention block; a gated axial attention block is first mapped through a 1 × 1 convolution plus the BatchNorm layer and the ReLU activation function; second, applying a gated axial attention along the width axis of the tensor, as expressed by the following equation (1):
Figure FDA0003326366270000021
wherein, N represents the width after last down-sampling, qij=Wq′x,kij=Wk′x,vij=Wv′x is query, key, and value, respectively; w is the linear transformation, x is the input, y is the output;
qij=Wq′x, denotes the jth element of the ith row, Wq′Representing parameters of a linear transformation on x
kij=Wk′x, denotes the jth element of the ith row, Wk′Representing parameters of a linear transformation on x
vij=Wv′x, denotes the jth element of the ith row, Wv′Representing parameters of a linear transformation on x
Wherein qe, ke and e are learnable position deviation terms, which are generally called relative position codes; wherein G is a learnable gating parameter used to control the effect of learned relative position coding on coding non-local contexts; then, a gated axial attention applied along the height axis of the tensor; the method is consistent with a gated axial attention applied at the width axis; finally, mapping by a convolution of 1 × 1 and a BatchNorm layer and a ReLU activation function, and establishing residual connection before and after the characteristic block passes through each gated axial attention block;
3-5, performing feature rearrangement on the feature blocks obtained after 1 gate-controlled axial attention to sample downFinally, two gated axial attention blocks are passed; obtaining compact feature value block x with long-range dependency and global information1
Global branch decoding (decoder) section:
firstly, feature block x1Expanding the capacity of a C channel by using convolution of 1 multiplied by 1 after passing through two gated axial attention blocks so as to carry out up-sampling by using inverse characteristic rearrangement; then carrying out inverse characteristic rearrangement to carry out up-sampling; compared with the method adopting bilinear interpolation up-sampling, the method has strong feature rearrangement flexibility, the parameters can be obtained by network learning, and the bilinear interpolation is obtained by a formula; secondly, the C channel is expanded by a gated axial attention block and a convolution of 1 multiplied by 1, and then inverse feature rearrangement is carried out to carry out up-sampling to obtain a feature block x which has long-range dependence and global information and has the same size with the input original image2All the blocks corresponding to the decoding part and the coding part adopt jump connection in a way of addition;
partial branching portion:
301. the input training sample image is divided into 32 x 32 patch blocks on H and W patches, and then for each patch block the following operations are performed:
7 multiplied by 7 convolution kernel, the step length is 1, Padding is set to be 3, the input height and width are reserved, and a characteristic block is obtained through BatchNorm layer and ReLU activation function mapping;
302. performing feature rearrangement downsampling on the feature block obtained in the step 301, and performing convolution without changing height and width, a BatchNorm layer and a ReLU activation function mapping for 2 times to obtain a feature block x';
303. 3 times of gated axial attention blocks and characteristic rearrangement downsampling are carried out on x', and the gated axial attention blocks adopted each time are respectively 3, 4 and 1; get the feature block x1'; then x is processed1The up-sampling is rearranged by 3 times of gating axial attention blocks and reverse characteristics, and the gating axial attention blocks adopted each time are respectively 1, 4 and 3;
304. splicing each face patch in the original order to obtain a feature block x 'with local information and the same size as the input original image'2
4. The method for medical image segmentation based on feature rearrangement and gated axial attention according to claim 3, wherein the loss function in step 4 is as follows (2):
Figure FDA0003326366270000031
where H and W are dimensions of the image, p (x, y) corresponds to a pixel in the image, and p' (x, y) represents the output prediction for a particular location (x, y).
5. The method of feature rebinning and gated axial attention based medical image segmentation according to claim 4, wherein the co-training and evaluation index of step 6 is as follows:
the global branch and the local branch are cooperatively trained by using the feature block x obtained in the step 32And x'2Adding, and performing 1 × 1 convolution to reduce the number of C channels to the number of classes to be segmented;
the average IoU and the average F1 score mean that IoU and F1 scores of different categories of each predicted segmentation map and each real segmentation map are calculated firstly, and then the average values are taken; adding the mean IoU and the mean F1 score of all test sets respectively, and dividing by the number of test images to obtain a mean IoU and a mean F1 score; the two indexes can effectively evaluate the accuracy of model segmentation.
CN202111262731.XA 2021-10-28 2021-10-28 Medical image segmentation method based on feature rearrangement and gated axial attention Pending CN114049314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111262731.XA CN114049314A (en) 2021-10-28 2021-10-28 Medical image segmentation method based on feature rearrangement and gated axial attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111262731.XA CN114049314A (en) 2021-10-28 2021-10-28 Medical image segmentation method based on feature rearrangement and gated axial attention

Publications (1)

Publication Number Publication Date
CN114049314A true CN114049314A (en) 2022-02-15

Family

ID=80206285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111262731.XA Pending CN114049314A (en) 2021-10-28 2021-10-28 Medical image segmentation method based on feature rearrangement and gated axial attention

Country Status (1)

Country Link
CN (1) CN114049314A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638842A (en) * 2022-03-15 2022-06-17 桂林电子科技大学 Medical image segmentation method based on MLP
CN114863165A (en) * 2022-04-12 2022-08-05 南通大学 Vertebral body bone density classification method based on fusion of image omics and deep learning features
CN116433660A (en) * 2023-06-12 2023-07-14 吉林禾熙科技开发有限公司 Medical image data processing device, electronic apparatus, and computer-readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638842A (en) * 2022-03-15 2022-06-17 桂林电子科技大学 Medical image segmentation method based on MLP
CN114638842B (en) * 2022-03-15 2024-03-22 桂林电子科技大学 Medical image segmentation method based on MLP
CN114863165A (en) * 2022-04-12 2022-08-05 南通大学 Vertebral body bone density classification method based on fusion of image omics and deep learning features
CN116433660A (en) * 2023-06-12 2023-07-14 吉林禾熙科技开发有限公司 Medical image data processing device, electronic apparatus, and computer-readable storage medium
CN116433660B (en) * 2023-06-12 2023-09-15 吉林禾熙科技开发有限公司 Medical image data processing device, electronic apparatus, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN111681252B (en) Medical image automatic segmentation method based on multipath attention fusion
CN108734659B (en) Sub-pixel convolution image super-resolution reconstruction method based on multi-scale label
CN111627019B (en) Liver tumor segmentation method and system based on convolutional neural network
CN110020989B (en) Depth image super-resolution reconstruction method based on deep learning
CN114049314A (en) Medical image segmentation method based on feature rearrangement and gated axial attention
CN113177882B (en) Single-frame image super-resolution processing method based on diffusion model
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
KR20200144398A (en) Apparatus for performing class incremental learning and operation method thereof
JP7337268B2 (en) Three-dimensional edge detection method, device, computer program and computer equipment
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
KR102543690B1 (en) Image Upscaling Apparatus And Method Based On Learning With Privileged Information
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
CN113436224B (en) Intelligent image clipping method and device based on explicit composition rule modeling
CN115457057A (en) Multi-scale feature fusion gland segmentation method adopting deep supervision strategy
CN109741258B (en) Image super-resolution method based on reconstruction
CN103208109A (en) Local restriction iteration neighborhood embedding-based face hallucination method
CN113962905A (en) Single image rain removing method based on multi-stage feature complementary network
CN117593275A (en) Medical image segmentation system
CN116912268A (en) Skin lesion image segmentation method, device, equipment and storage medium
CN116452619A (en) MRI image segmentation method based on high-resolution network and boundary enhancement
Wang et al. Deep locally linear embedding network
CN113793269B (en) Super-resolution image reconstruction method based on improved neighborhood embedding and priori learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination