CN114519719A - Brain tumor MR image segmentation method - Google Patents

Brain tumor MR image segmentation method Download PDF

Info

Publication number
CN114519719A
CN114519719A CN202210017015.3A CN202210017015A CN114519719A CN 114519719 A CN114519719 A CN 114519719A CN 202210017015 A CN202210017015 A CN 202210017015A CN 114519719 A CN114519719 A CN 114519719A
Authority
CN
China
Prior art keywords
size
image
dimensional
convolution
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210017015.3A
Other languages
Chinese (zh)
Inventor
孙家阔
郭立君
张�荣
高琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN202210017015.3A priority Critical patent/CN114519719A/en
Publication of CN114519719A publication Critical patent/CN114519719A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30016Brain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a brain tumor MR image segmentation method, which constructs a multi-scale feature fusion and additive attention segmentation network by adding a multi-scale feature fusion module, a fusion feature additive attention module and an encoder-decoder additive attention splicing module in a U-Net network, by the multi-scale feature fusion module, various scale feature maps with different receptive fields in the encoder module can be fused to obtain a fusion feature map, an additive attention mechanism is introduced between the fused feature map and the feature maps of all scales of the encoder module through the fused feature additive attention module, by adopting an additive attention mechanism between the scale characteristics of the encoder module and the characteristic diagram of the decoder module guided by the fused characteristic diagram through an additive attention splicing module of the encoder and the decoder, the semantic feature gap generated by the encoder module and the decoder module due to different network depths when jumping connection is reduced; the advantage is that segmentation accuracy is higher.

Description

Brain tumor MR image segmentation method
Technical Field
The invention relates to an MR image segmentation method, in particular to a brain tumor MR image segmentation method.
Background
Gliomas are the most common primary central nervous system brain tumors, accounting for about 27% of all central nervous system tumors and about 80% of malignant central nervous system tumors. Magnetic Resonance Imaging (MRI) has the advantages of being non-invasive, harmless to human body, and clear in obtained images, and is widely applied to clinical brain tumor diagnosis. In order to better assist the doctor in the diagnosis and treatment of the patient, the size, position and shape of the brain tumor need to be known accurately. However, the physician needs to spend a lot of time and effort manually segmenting the brain tumor from the MR image, and is prone to errors due to the high complexity of the appearance of the brain tumor. Therefore, the automatic segmentation of the brain tumor MR image can greatly improve the diagnosis efficiency of doctors. However, accurate automatic segmentation remains a difficult task due to the high variability in the size, shape and location of brain tumors of different patients, and the fuzzy boundaries between normal soft tissue and diseased tissue in the same patient's brain.
In the field of brain tumor MR image segmentation, U-Net is the most widely applied basic segmentation network. In U-Net and its enhancement networks, the encoder progressively achieves extraction of input image features through a series of convolution and downsampling operations. As the convolution and downsampling operations proceed, the size of the feature maps at each level in the encoder gradually decreases, while the receptive field gradually increases. Along with the training of the network, the characteristics of all levels of the encoder learn critical discriminant characteristic information under the current scale. The jump connection between the encoder characteristic and the decoder characteristic in the U-Net introduces shallow layer characteristic information, and further improves the utilization rate of the characteristic. However, observing U-Net and its enhanced networks, it was found that the hopping connection can be spliced by channel using only the same scale of features. However, this ignores the guiding effect of complementary information between multi-scale features with different receptive fields in the encoder on the current scale feature. In addition, the encoder features are in relatively shallow positions in the whole network structure, and the decoder features are in relatively deep positions, so that when the connection is directly jumped, a semantic feature gap exists between the encoder features and the decoder features. Therefore, the accuracy of the existing U-Net and the enhancement network thereof in segmenting the MR image of the brain tumor is still to be improved.
Disclosure of Invention
The invention aims to provide a brain tumor MR image segmentation method with high segmentation accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows: a brain tumor MR image segmentation method, comprising the steps of:
step 1: dividing data in a BraTS2020 training set, wherein the BraTS2020 training set comprises data of 293 High-grade Glioma (HGG) patients and 74 Low-grade Glioma (LGG) patients, and data of 367 patients in all, wherein the data of each patient comprises 5 three-dimensional images, the 5 three-dimensional images are respectively a three-dimensional MR image of a FLAIR modality, a three-dimensional MR image of a T1 modality, a three-dimensional MR image of a T1ce modality, a three-dimensional MR image of a T2 modality and a three-dimensional Group Truth (GT) segmentation mask image, the 5 three-dimensional images are 240pt in length, 240pt in width and 155 in channel number, namely the 5 three-dimensional images are 240pt × 240pt × 155 in size, and the GT segmentation mask image comprises a Whole Tumor region (WT), a Tumor Core region (Tumor, TC) and an Enhanced Tumor region (Enhanced), ET) and all other regions, the tumor core region is labeled with tag 1, the whole tumor region is labeled with tag 2, the enhanced tumor region is labeled with tag 4, and all other regions are labeled with tag 0, the specific division process is as follows: taking data of 234 high-grade glioma patients and 59 low-grade glioma patients in a BraTS2020 training set as training data, and taking data of 59 high-grade glioma patients and 15 low-grade glioma patients in the BraTS2020 training set as verification data;
step 2: carrying out data preprocessing, wherein the specific processing process is as follows:
step 2.1, standardizing the three-dimensional MR images of the four modalities, namely the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data respectively in a Z-Score mode, namely standardizing voxel values of the three-dimensional MR images of the four modalities into a zero mean value and a unit standard deviation, and specifically comprises the following steps: firstly, determining the number and the positions of pixel points with the voxel value larger than zero in the three-dimensional MR image of each modality of each patient, then calculating the mean value and the standard deviation of a region with the voxel value larger than zero in the three-dimensional MR image of each modality, finally, respectively taking the three-dimensional MR image of each modality as an original three-dimensional MR image, respectively carrying out Z-Score standardization operation on the region with the voxel value larger than zero in the three-dimensional MR image of each modality by adopting a formula (1) to obtain a standardized three-dimensional MR image of each modality, wherein the voxel value in the standardized three-dimensional MR image of each modality conforms to standard normal distribution:
Figure BDA0003460066540000031
in the formula (1), v represents a voxel value in the original three-dimensional MR image, μ represents a mean value of a region of which the voxel value is greater than zero in the original three-dimensional MR image, σ represents a standard deviation of the region of which the voxel value is greater than zero in the original three-dimensional MR image, and v' represents a voxel value in a normalized three-dimensional MR image obtained by normalizing the original three-dimensional MR image by Z-Score;
step 2.2, respectively carrying out stitching processing on the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the standardized three-dimensional MR image corresponding to the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data according to the following modes:
sequentially combining the images of the same channel in the FLAIR modality standardized three-dimensional MR image, the T1 modality standardized three-dimensional MR image, the T1ce modality standardized three-dimensional MR image and the T2 modality standardized three-dimensional MR image of the same patient into a four-channel RGBA image
Figure BDA0003460066540000032
The sizes of the three-dimensional group Truth segmentation mask images are 240pt multiplied by 4, and simultaneously, the three-dimensional group Truth segmentation mask images of the patient are divided into the same channelsRGBA image with picture saved as four channels
Figure BDA0003460066540000033
Corresponding segmentation mask image
Figure BDA0003460066540000034
The size of which is 240pt × 240pt × 1, and dividing the mask image
Figure BDA0003460066540000035
RGBA images as four channels
Figure BDA0003460066540000036
At this time, the normalized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, i.e., [ (234+59) × 155) are obtained after the stitching process from the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality, and the three-dimensional MR image of the T2 modality of each patient in the training data]RGBA image sum of four channels [ (234+ 59). times.155-]Verifying the standardized three-dimensional MR images corresponding to the three-dimensional MR images of four modalities, namely the FLAIR modality three-dimensional MR image, the T1 modality three-dimensional MR image, the T1ce modality three-dimensional MR image and the T2 modality three-dimensional MR image of each patient in the data by using the single-channel segmentation mask image, and obtaining [ (59+15) multiplied by 155) after splicing treatment]RGBA image sum of [ (59+15) × 155] of four channels]A single-channel segmentation mask image;
step 2.3, deleting images of an [ (234+59) x 155] four-channel RGBA image and an [ (234+59) x 155] single-channel segmentation mask image which are obtained by splicing standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modes, namely the three-dimensional MR image of the FLAIR mode, the three-dimensional MR image of the T1 mode, the three-dimensional MR image of the T1ce mode and the three-dimensional MR image of the T2 mode of each patient in the training data to obtain 24422 four-channel RGBA images and 24422 single-channel segmentation mask images, and forming a training sample by adopting the obtained 24422 four-channel RGBA images and the obtained 24422 single-channel segmentation mask images;
deleting images of no tumor region in [ (59+15) x 155] four-channel RGBA images and [ (59+15) x 155] single-channel segmentation mask images, which are obtained by splicing standardized three-dimensional MR images corresponding to four-mode three-dimensional MR images, namely a FLAIR mode three-dimensional MR image, a T1 mode three-dimensional MR image, a T1ce mode three-dimensional MR image and a T2 mode three-dimensional MR image of each patient in verification data, obtaining 4794 four-channel RGBA images and 4794 single-channel segmentation mask images, and adopting the obtained 4794 four-channel RGBA images and 4794 single-channel segmentation mask images to form a verification sample;
and step 3: adding a multi-scale feature fusion Module (MSFF), a fusion feature additive attention module (FFAA) and an encoder-decoder additive attention splicing module (E-DAAC) in an existing U-Net network consisting of an encoder module, a decoder module and a bridging module to obtain a multi-scale feature fusion and additive attention splitting network, wherein the multi-scale feature fusion and additive attention splitting network comprises the encoder module, the decoder module, the bridging module, the multi-scale feature fusion Module (MSFF), the fusion feature additive attention module (FFAA) and the encoder-decoder additive attention splicing module (E-DAAC);
the encoder module is used for extracting feature information of a brain tumor in the four-channel RGBA image input into the encoder module, generating a feature map output with the size of 15pt multiplied by 256, and calling the feature map as an encoding feature map; the encoder module comprises a 1 st convolution block, a 1 st sampling layer, a 2 nd convolution block, a 2 nd sampling layer, a 3 rd convolution block, a 3 rd sampling layer, a 4 th convolution block and a 4 th sampling layer which are sequentially arranged, wherein the 1 st convolution block is used for accessing an image with the size of 240pt multiplied by 4, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the image to obtain characteristic diagram output with the size of 240pt multiplied by 32, wherein the first convolution processing and the second convolution processing are respectively realized by 32 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 1 st sampling layer is used for accessing a feature map with the size of 240pt × 240pt × 32 output by the 1 st convolution block, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 120 × 120 × 32; the 2 nd convolution block is used for accessing a feature map with the size of 120pt × 120pt × 32 output by the 1 st sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 120pt × 120pt × 64, wherein the first convolution processing and the second convolution processing are respectively realized by 64 convolution kernels with the step size of 1 and the size of 3 × 3; the 2 nd sampling layer is used for accessing a feature map with the size of 120pt × 120pt × 64 output by the 2 nd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 60pt × 60pt × 64; the 3 rd convolution block is used for accessing a characteristic diagram with the size of 60pt × 60pt × 64 output by the 2 nd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain characteristic diagram output with the size of 60pt × 60pt × 128, wherein the first convolution processing and the second convolution processing are respectively realized by 128 convolution kernels with the step size of 1 and the size of 3 × 3; the 3 rd sampling layer is used for accessing a feature map with the size of 60pt × 60pt × 128 output by the 3 rd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 30pt × 30pt × 128; the 4 th convolution block is used for accessing a feature map with the size of 30pt × 30pt × 128 output by the 3 rd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 30pt × 30pt × 256; the 4 th sampling layer is used for accessing a feature map with the size of 30pt × 30pt × 256 output by the 4 th sampling layer, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 15pt × 15pt × 256, wherein the feature map is a coding feature map;
the bridge module is used for accessing the coding characteristic diagram output by the coder module and processing the coding characteristic diagram to obtain a characteristic diagram output with the size of 15pt multiplied by 512;
the multi-scale feature fusion module comprises a 1 st upper sampling layer, a 2 nd upper sampling layer, a 3 rd upper sampling layer, a 4 th upper sampling layer, a feature splicing layer and a convolution block which are sequentially arranged, wherein the L th upper sampling layer of the multi-scale feature fusion module is used for being connected with the L th convolution block of the encoder module
Figure BDA0003460066540000051
L is 1, 2, 3, 4, the L-th upsampling layer of the multi-scale feature fusion module is used for performing bilinear interpolation upsampling on the accessed feature graph to obtain the feature graph with the size of 240pt multiplied by 2 pt multiplied by (2)L-1X 32), the feature splicing layer of the multi-scale feature fusion module is used for accessing the feature maps output by the 1 st up-sampling layer to the 4 th up-sampling layer of the multi-scale feature fusion module, and the feature maps output by the 1 st up-sampling layer to the 4 th up-sampling layer of the multi-scale feature fusion module are spliced according to channels to obtain the feature map with the size of
Figure BDA0003460066540000052
The convolution block of the multi-scale feature fusion module is used for accessing the feature graph output by the feature splicing layer, and performing convolution operation on the feature graph by adopting 32 convolution kernels with step length of 1 and size of 3 × 3 to obtain feature graph output with size of 240pt × 240pt × 32;
the fusion feature additive attention module comprises a 1 st down-sampling layer, a 1 st additive attention block, a 2 nd down-sampling layer, a 2 nd additive attention block, a 3 rd down-sampling layer, a 3 rd additive attention block, a 4 th down-sampling layer and a 4 th additive attention block which are sequentially arranged in sequence, wherein the 1 st down-sampling layer of the fusion feature additive attention moduleL downsampling layers are used for accessing the feature graph with the size of 240pt multiplied by 32 output by the convolution block of the multi-scale feature fusion module, and the step length of the accessed feature graph is 2LMaximum pooling operation of obtaining a size of
Figure BDA0003460066540000053
The size of the Lth down-sampling layer output of the fused feature additive attention module used for accessing the Lth additive attention block of the fused feature additive attention module is
Figure BDA0003460066540000061
And the size of the Lth convolution block output of said encoder block is
Figure BDA0003460066540000062
The size of the Lth down-sampling layer output of the fused feature additive attention module is
Figure BDA0003460066540000063
Is marked as
Figure BDA0003460066540000064
The size of the L convolution block output of the encoder module is
Figure BDA0003460066540000065
Is marked by
Figure BDA0003460066540000066
The L-th additive attention block of the fused feature additive attention module is used for accessing the feature map
Figure BDA0003460066540000067
And
Figure BDA0003460066540000068
firstly, respectively adopting the quantity of 2L-1X 32 convolution kernel with step size of 1 and size of 1 x 1Performing convolution operation to obtain corresponding sizes of
Figure BDA0003460066540000069
Then adding the two characteristic graphs according to elements to obtain a characteristic graph, and then sequentially activating the obtained characteristic graph by a Mish function, wherein the number of the Mish function is 2L-1Performing convolution operation and Sigmoid function activation on convolution kernels with the length of x 32, the step length of 1 and the size of 1 x 1 to obtain the convolution kernel with the size of 1
Figure BDA00034600665400000610
The size of the feature map obtained at the end is
Figure BDA00034600665400000611
Figure BDA00034600665400000612
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400000613
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400000614
Figure BDA00034600665400000615
Outputting the feature map of (1);
the encoder-decoder additive attention splicing module comprises a 1 st encoder-decoder additive attention splicing block, a 2 nd encoder-decoder additive attention splicing block, a 3 rd encoder-decoder additive attention splicing block and a 4 th encoder-decoder additive attention splicing block which are sequentially arranged in sequence;
recording the feature map output by the 4 th additive attention block of the fused feature additive attention module as the feature map
Figure BDA00034600665400000616
Output of said bridge moduleThe characteristic graph with the size of 15pt multiplied by 512 is recorded as
Figure BDA00034600665400000617
The 1 st encoder-decoder additive attention splicing block is used for accessing the characteristic diagram
Figure BDA00034600665400000618
And characteristic diagrams
Figure BDA00034600665400000619
And compare the characteristic diagrams
Figure BDA00034600665400000620
And
Figure BDA00034600665400000621
the treatment was carried out in the following manner: firstly, respectively adopting the quantity of (2)3X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained
Figure BDA00034600665400000622
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)3X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA00034600665400000623
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA00034600665400000624
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400000625
The feature map of (2), the size obtained at this time is finally
Figure BDA0003460066540000071
Characteristic diagram and characteristic diagram of
Figure BDA0003460066540000072
Splicing according to the channel to obtain the size of
Figure BDA0003460066540000073
Outputting the feature map of (1);
recording the feature map output by the 3 rd additive attention block of the fused feature additive attention module as the feature map
Figure BDA0003460066540000074
The characteristic diagram of the 2 nd up-sampling layer output of the decoder module is marked as
Figure BDA0003460066540000075
The 2 nd encoder-decoder additive attention block is used for accessing the characteristic diagram
Figure BDA0003460066540000076
And characteristic diagrams
Figure BDA0003460066540000077
And compare the characteristic diagrams
Figure BDA0003460066540000078
And characteristic diagrams
Figure BDA0003460066540000079
The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 22Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1
Figure BDA00034600665400000710
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)2X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA00034600665400000711
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA00034600665400000712
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400000713
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400000714
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400000715
Splicing according to the channel to obtain the size of
Figure BDA00034600665400000716
Outputting the feature map of (1);
recording the feature map output by the 2 nd additive attention block of the fused feature additive attention module as the feature map
Figure BDA00034600665400000717
The characteristic diagram of the 3 rd up-sampling layer output of the decoder module is marked as
Figure BDA00034600665400000718
The 3 rd encoder-decoder additive attention block is used for accessing the characteristic diagram
Figure BDA00034600665400000719
And characteristic diagrams
Figure BDA00034600665400000720
And compare the characteristic diagrams
Figure BDA00034600665400000721
And characteristic diagrams
Figure BDA00034600665400000722
The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 21Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1
Figure BDA00034600665400000723
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)1X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA00034600665400000724
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA00034600665400000725
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400000726
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400000727
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400000728
Splicing according to the channel to obtain the size of
Figure BDA00034600665400000729
Outputting the feature map of (1);
recording the feature map output by the 1 st additive attention block of the fused feature additive attention module as the feature map
Figure BDA0003460066540000081
The characteristic diagram of the 4 th up-sampling layer output of the decoder module is marked as
Figure BDA0003460066540000082
The 4 th encoder-decoder additive attention block is used for accessing the characteristic diagram
Figure BDA0003460066540000083
And characteristic diagrams
Figure BDA0003460066540000084
And compare the characteristic diagrams
Figure BDA0003460066540000085
And characteristic diagrams
Figure BDA0003460066540000086
The treatment was carried out as follows: firstly, respectively adopting the quantity of (2)0X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained
Figure BDA0003460066540000087
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)0X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA0003460066540000088
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA0003460066540000089
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400000810
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400000811
Characteristic diagram and characteristic ofSign graph
Figure BDA00034600665400000812
Splicing according to the channel to obtain the size of
Figure BDA00034600665400000813
Outputting the feature map of (1);
the decoder module comprises a 1 st volume block, a 1 st up-sampling layer, a 2 nd volume block, a 2 nd up-sampling layer, a 3 rd volume block, a 3 rd up-sampling layer, a 4 th volume block and an output volume block which are sequentially arranged; the L-th convolution block of the decoder module is configured to access a feature map output by the L-th encoder-decoder additive attention splicing block in the encoder-decoder additive attention splicing module, and process the feature map input thereto in the following manner: sequentially carrying out batch normalization and Mish function activation on the characteristic graphs input into the characteristic graphs and adopting the quantity of 2(5 -L-1)X 32, step size 1, convolution operation with a 3 x 3 convolution kernel, output size
Figure BDA00034600665400000814
A characteristic diagram of (1); the Lth up-sampling layer of the decoder module is used for accessing the characteristic diagram output by the Lth convolution block of the decoder module and carrying out bilinear interpolation operation on the characteristic diagram input into the decoder module to obtain the characteristic diagram with the size of
Figure BDA00034600665400000815
The output convolution block of the decoder module is used for accessing the feature map output by the 4 th convolution block of the decoder module, and convolution operation is carried out by adopting convolution kernels with the number of 4, the step length of 1 and the size of 1 multiplied by 1 to obtain the feature map output with the size of 240pt multiplied by 4;
and 4, step 4: training the multi-scale feature fusion and additive attention segmentation network, wherein the specific process comprises the following steps:
(1) initializing the multi-scale feature fusion and additive attention segmentation network by adopting a he _ normal parameter initialization method;
(2) randomly dividing the training samples obtained in the second step into a plurality of batchs, enabling each batch to contain a plurality of training samples, dividing the training samples into training sample numbers/batchsize batches if the training sample numbers can be evenly divided by the batchsize, and discarding the rest part if the training sample numbers cannot be evenly divided by the batchsize to obtain | training sample numbers/batchsize | batchs, wherein the batchsize is 16 and | is an integer symbol;
(3) taking one batch, and performing image enhancement processing of randomly turning the RGBA images of all four channels in the batch in the left-right direction and the up-down direction, wherein the probability is 50%;
(4) taking the RGBA images of all four channels after image enhancement processing in the selected batch as input, and inputting the input into the multi-scale feature fusion and additive attention segmentation network to obtain the output result of the multi-scale feature fusion and additive attention segmentation network;
(5) according to the multi-scale feature fusion and additive attention segmentation network output result and the single-channel segmentation mask image corresponding to the RGBA images of all four channels in the selected batch, computing the segmentation loss values of the RGBA images of all four channels after image enhancement in the selected batch, and calculating the average value of the segmentation loss values as a final loss value, wherein the segmentation loss value CE of the RGBA image of each four channel is computed as follows:
Figure BDA0003460066540000091
wherein N represents the total number of pixels in the RGBA image of the four channels; c represents the number of categories to be classified of each pixel, wherein the value is 4, namely four categories of the whole tumor, the tumor core, the enhanced tumor and the background are obtained;
Figure BDA0003460066540000092
the real category c to which the nth pixel in the segmentation mask image of the single channel corresponding to the RGBA image of the four channels belongs is shown;
Figure BDA0003460066540000093
representing the probability that the nth pixel in the output result of the input four-channel RGBA image is predicted to be a class c by the multi-scale feature fusion and additive attention segmentation network;
(6) training parameters of the multi-scale feature fusion and additive attention segmentation network by using an ADAM optimizer with a learning rate of 1e-4 according to the segmentation loss value obtained by calculation in the step (5);
(7) repeating the steps (3) to (6) until all the training data of the batch train the multi-scale feature fusion and additive attention segmentation network for one time, then sequentially inputting the verification samples into the multi-scale feature fusion and additive attention segmentation network at the moment, obtaining the segmentation loss value of each four-channel RGBA image in the verification samples by adopting the same method in the step (5), and calculating and obtaining the average segmentation loss value of all the verification samples;
(8) repeating the steps (2) - (7) until the loss of the multi-scale feature fusion and additive attention segmentation network on the verification sample is converged, and finally obtaining the trained multi-scale feature fusion and additive attention segmentation network;
and 5: and processing the brain tumor MR image to be segmented into four-channel RGBA images according to the methods in the step 1 and the step 2, inputting the images into a trained multi-scale feature fusion and additive attention segmentation network, and outputting a segmentation prediction result by the multi-scale feature fusion and additive attention segmentation network.
The bridge module comprises a convolution block, wherein the convolution block is used for accessing a characteristic diagram with the size of 15 multiplied by 256, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain the characteristic diagram with the size of 15 multiplied by 512 and output the characteristic diagram to the multi-scale characteristic fusion module, wherein the first convolution processing and the second convolution processing are respectively realized by 512 convolution kernels with the step length of 1 and the size of 3 multiplied by 3.
Compared with the prior art, the invention has the advantages that the multi-scale feature fusion and additive attention segmentation network is constructed by improving the U-Net network and adding the multi-scale feature fusion module, the fusion feature additive attention module and the encoder-decoder additive attention splicing module in the U-Net network, the fusion feature maps with different receptive fields in the encoder module can be fused to obtain the fusion feature map through the multi-scale feature fusion module, the fusion feature map has detail feature information and high-level semantic feature information, an additive attention mechanism is introduced between the fusion feature map and the scale feature maps of the encoder module through the fusion feature additive attention module, the fusion feature map is adopted to guide the scale feature maps of the encoder module, so that the scale feature maps of the encoder module guided by the fusion feature map are more discriminative, an additive attention mechanism is adopted between the characteristic diagrams of the encoder module and the decoder module through the encoder-decoder additive attention splicing module, wherein the characteristic diagrams of the encoder module and the decoder module are guided by the fused characteristic diagram, so that semantic characteristic differences generated by the encoder module and the decoder module due to different network depths are reduced during jump connection, namely important characteristic information in jump connection characteristics is adaptively learned through the additive attention mechanism, and therefore the semantic characteristic differences existing between the characteristic diagrams of the encoder module and the characteristic diagrams of the decoder module are reduced, and the segmentation accuracy is high.
Detailed Description
The present invention will be described in further detail with reference to examples.
Example (b): a brain tumor MR image segmentation method, comprising the steps of:
step 1: the data in the BraTS2020 training set are divided, the BraTS2020 training set contains data of 293 High-grade Glioma (HGG) patients and 74 Low-grade Glioma (LGG) patients, and the data of 367 patients in total are included in the data of each patient, wherein the data of each patient include 5 three-dimensional images, the 5 three-dimensional images are respectively a three-dimensional MR image of a FLAIR modality, a three-dimensional MR image of a T1 modality, a three-dimensional MR image of a T1ce modality, a three-dimensional MR image of a T2 modality and a three-dimensional Group Trout (GT) segmentation mask image, the 5 three-dimensional images are 240pt long, 240pt wide and 155 channels, namely the 5 three-dimensional images are 240pt × 240pt × 155, and the GT segmentation mask image has a mask image of a Whole Tumor region (tuole, WT), a Tumor Core region (turor Core, TC), an Enhanced Tumor region (Enhanced), ET) and all other regions, the tumor core region is labeled with tag 1, the whole tumor region is labeled with tag 2, the enhanced tumor region is labeled with tag 4, and all other regions are labeled with tag 0, the specific division process is as follows: taking data of 234 high-grade glioma patients and 59 low-grade glioma patients in a BraTS2020 training set as training data, and taking data of 59 high-grade glioma patients and 15 low-grade glioma patients in the BraTS2020 training set as verification data;
step 2: carrying out data preprocessing, wherein the specific processing process is as follows:
step 2.1, standardizing the three-dimensional MR images of the four modalities, namely the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data respectively in a Z-Score mode, namely standardizing voxel values of the three-dimensional MR images of the four modalities into a zero mean value and a unit standard deviation, and specifically comprises the following steps: firstly, determining the number and the positions of pixel points with the voxel value larger than zero in the three-dimensional MR image of each modality of each patient, then calculating the mean value and the standard deviation of a region with the voxel value larger than zero in the three-dimensional MR image of each modality, finally, respectively taking the three-dimensional MR image of each modality as an original three-dimensional MR image, respectively carrying out Z-Score standardization operation on the region with the voxel value larger than zero in the three-dimensional MR image of each modality by adopting a formula (1) to obtain a standardized three-dimensional MR image of each modality, wherein the voxel value in the standardized three-dimensional MR image of each modality conforms to standard normal distribution:
Figure BDA0003460066540000111
in the formula (1), v represents a voxel value in the original three-dimensional MR image, μ represents a mean value of a region of which the voxel value is greater than zero in the original three-dimensional MR image, σ represents a standard deviation of the region of which the voxel value is greater than zero in the original three-dimensional MR image, and v' represents a voxel value in a normalized three-dimensional MR image obtained by normalizing the original three-dimensional MR image by Z-Score;
step 2.2, respectively carrying out stitching processing on the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the Tlce modality and the standardized three-dimensional MR image corresponding to the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data according to the following modes:
sequentially combining the images of the same channel in the FLAIR modality standardized three-dimensional MR image, the T1 modality standardized three-dimensional MR image, the T1ce modality standardized three-dimensional MR image and the T2 modality standardized three-dimensional MR image of the same patient into a four-channel RGBA image
Figure BDA0003460066540000112
The sizes of the three-dimensional group Truth segmentation mask images are 240pt multiplied by 4, and meanwhile, the images of the same channel in the three-dimensional group Truth segmentation mask image of the patient are saved as RGBA images of four channels
Figure BDA0003460066540000113
Corresponding segmentation mask image
Figure BDA0003460066540000114
The size of which is 240pt × 240pt × 1, and dividing the mask image
Figure BDA0003460066540000115
RGBA images as four channels
Figure BDA0003460066540000121
At this time, the normalized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, i.e., [ (234+59) × 155) are obtained after the stitching process from the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality, and the three-dimensional MR image of the T2 modality of each patient in the training data]RGBA image sum of [ (234+ 59). times.155 ] of four channels]Zhang ShanThe standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, namely the FLAIR modality three-dimensional MR image, the T1 modality three-dimensional MR image, the T1ce modality three-dimensional MR image and the T2 modality three-dimensional MR image of each patient in the verification data are spliced to obtain [ (59+15) × 155]RGBA image sum of four channels [ (59+ 15). times.155-]A single-channel segmentation mask image;
step 2.3, deleting images of an [ (234+59) x 155] four-channel RGBA image and an [ (234+59) x 155] single-channel segmentation mask image which are obtained by splicing standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modes, namely the three-dimensional MR image of the FLAIR mode, the three-dimensional MR image of the T1 mode, the three-dimensional MR image of the T1ce mode and the three-dimensional MR image of the T2 mode of each patient in the training data to obtain 24422 four-channel RGBA images and 24422 single-channel segmentation mask images, and forming a training sample by adopting the obtained 24422 four-channel RGBA images and the obtained 24422 single-channel segmentation mask images;
deleting images without tumor regions in 4794 RGBA images of four channels and 4794 segmentation mask images of a single channel, which are obtained by splicing standardized three-dimensional MR images corresponding to the three-dimensional MR images of four modes, namely a three-dimensional MR image of a FLAIR mode, a three-dimensional MR image of a T1 mode, a three-dimensional MR image of a T1ce mode and a three-dimensional MR image of a T2 mode of each patient in verification data, and adopting the obtained 4794 RGBA images of four channels and the obtained 4794 segmentation mask images of the single channel to form a verification sample;
and step 3: adding a multi-scale feature fusion Module (MSFF), a fusion feature additive attention module (FFAA) and an encoder-decoder additive attention splicing module (E-DAAC) in an existing U-Net network consisting of an encoder module, a decoder module and a bridging module to obtain a multi-scale feature fusion and additive attention splitting network, wherein the multi-scale feature fusion and additive attention splitting network comprises the encoder module, the decoder module, the bridging module, the multi-scale feature fusion Module (MSFF), the fusion feature additive attention module (FFAA) and the encoder-decoder additive attention splicing module (E-DAAC);
the encoder module is used for extracting feature information of a brain tumor in the four-channel RGBA image input into the encoder module, generating a feature map output with the size of 15pt multiplied by 256, and calling the feature map as an encoding feature map; the encoder module comprises a 1 st convolution block, a 1 st sampling layer, a 2 nd convolution block, a 2 nd sampling layer, a 3 rd convolution block, a 3 rd sampling layer, a 4 th convolution block and a 4 th sampling layer which are sequentially arranged according to the sequence, wherein the 1 st convolution block is used for accessing an image with the size of 240pt multiplied by 4, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the image to obtain a characteristic diagram output with the size of 240pt multiplied by 32, wherein the first convolution processing and the second convolution processing are respectively realized by 32 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 1 st sampling layer is used for accessing a feature map with the size of 240pt multiplied by 32 output by the 1 st convolution block, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 120 multiplied by 32; the 2 nd convolution block is used for accessing a feature map with the size of 120pt × 120pt × 32 output by the 1 st sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 120pt × 120pt × 64, wherein the first convolution processing and the second convolution processing are respectively realized by 64 convolution kernels with the step length of 1 and the size of 3 × 3; the 2 nd sampling layer is used for accessing a feature map with the size of 120pt multiplied by 64 output by the 2 nd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 60pt multiplied by 64; the 3 rd convolution block is used for accessing a characteristic diagram which is output by the 2 nd sampling layer and has the size of 60pt multiplied by 64, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain characteristic diagram output with the size of 60pt multiplied by 128, wherein the first convolution processing and the second convolution processing are respectively realized by adopting 128 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 3 rd sampling layer is used for accessing a feature map with the size of 60pt multiplied by 128 output by the 3 rd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 30pt multiplied by 128; the 4 th convolution block is used for accessing a feature map with the size of 30pt × 30pt × 128 output by the 3 rd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 30pt × 30pt × 256; the 4 th sampling layer is used for accessing a feature map with the size of 30pt multiplied by 256 output by the 4 th sampling layer, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 15pt multiplied by 256, wherein the feature map is a coding feature map;
the bridging module is used for accessing the coding characteristic graph output by the coder module and processing the coding characteristic graph to obtain characteristic graph output with the size of 15pt multiplied by 512;
the multi-scale feature fusion module comprises a 1 st up-sampling layer, a 2 nd up-sampling layer, a 3 rd up-sampling layer, a 4 th up-sampling layer, a feature splicing layer and a convolution block which are sequentially arranged, wherein the output size of the L th convolution block, used for being accessed into the encoder module, of the L th up-sampling layer of the multi-scale feature fusion module is
Figure BDA0003460066540000131
Figure BDA0003460066540000132
L is 1, 2, 3, 4, the L-th upsampling layer of the multi-scale feature fusion module is used for performing bilinear interpolation upsampling on the accessed feature graph to obtain the feature graph with the size of 240pt × 240pt × (2)L-1X 32), the feature splicing layer of the multi-scale feature fusion module is used for accessing the 1 st upper sampling of the multi-scale feature fusion moduleThe feature maps output from the sample layer to the 4 th up-sampling layer are spliced according to channels to obtain feature maps with the sizes of 1 st up-sampling layer to 4 th up-sampling layer of the multi-scale feature fusion module
Figure BDA0003460066540000141
The convolution block of the multi-scale feature fusion module is used for accessing a feature graph output by the feature splicing layer, and 32 convolution kernels with step length of 1 and size of 3 multiplied by 3 are adopted for the feature graph to carry out convolution operation to obtain feature graph output with size of 240pt multiplied by 32;
the fusion feature additive attention module comprises a 1 st down-sampling layer, a 1 st additive attention block, a 2 nd down-sampling layer, a 2 nd additive attention block, a 3 rd down-sampling layer, a 3 rd additive attention block, a 4 th down-sampling layer and a 4 th additive attention block which are sequentially arranged, wherein the L th down-sampling layer of the fusion feature additive attention module is used for accessing a feature map with the size of 240pt multiplied by 32 output by a convolution block of the multi-scale feature fusion module, and the maximum pooling operation with the step length of 2L is carried out on the accessed feature map to obtain the feature map with the size of 2L
Figure BDA0003460066540000142
The Lth additive attention block of the fused feature additive attention module is used for accessing the Lth downsampling layer output of the fused feature additive attention module
Figure BDA0003460066540000143
And the size of the Lth convolution block output of the encoder block of
Figure BDA0003460066540000144
The size of the Lth down-sampling layer output of the fused feature additive attention module is
Figure BDA0003460066540000145
Is marked as
Figure BDA0003460066540000146
The size of the Lth convolution block output of the encoder block is
Figure BDA0003460066540000147
Is marked as
Figure BDA0003460066540000148
Lth additive attention block of fused feature additive attention module for feature map of access
Figure BDA0003460066540000149
And
Figure BDA00034600665400001410
firstly, respectively adopting the quantity of 2L-1Performing convolution operation on convolution kernels with the size of 1 × 1 and the step size of 1 × 32 to respectively obtain corresponding convolution kernels with the sizes of 1
Figure BDA00034600665400001411
Then adding the two characteristic graphs according to elements to obtain a characteristic graph, and then sequentially activating the obtained characteristic graph by a Mish function, wherein the number of the Mish function is 2L-1Performing convolution operation and Sigmoid function activation on convolution kernels with the length of x 32, the step length of 1 and the size of 1 x 1 to obtain the convolution kernel with the size of 1
Figure BDA00034600665400001412
The size of the feature map obtained at the end is
Figure BDA00034600665400001413
Figure BDA00034600665400001414
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400001415
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400001416
Figure BDA00034600665400001417
Outputting the feature map of (1);
the encoder-decoder additive attention splicing module comprises a 1 st encoder-decoder additive attention splicing block, a 2 nd encoder-decoder additive attention splicing block, a 3 rd encoder-decoder additive attention splicing block and a 4 th encoder-decoder additive attention splicing block which are sequentially arranged in sequence;
the feature map output by the 4 th additive attention block of the fused feature additive attention module is recorded as
Figure BDA0003460066540000151
The feature map output by the bridge module with a size of 15pt × 15pt × 512 is recorded as
Figure BDA0003460066540000152
1 st encoder-decoder additive attention tiles for access profile
Figure BDA0003460066540000153
And characteristic diagrams
Figure BDA0003460066540000154
And compare the characteristic diagrams
Figure BDA0003460066540000155
And
Figure BDA0003460066540000156
the treatment was carried out in the following manner: firstly, respectively adopting the quantity of (2)3X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out the convolution operation, and the sizes are all obtained respectively
Figure BDA0003460066540000157
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function and adopting the quantityIs (2)3X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA0003460066540000158
Figure BDA0003460066540000159
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA00034600665400001510
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400001511
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400001512
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400001513
Splicing according to the channel to obtain the size of
Figure BDA00034600665400001514
Figure BDA00034600665400001515
Outputting the feature map of (1);
the feature map output by the 3 rd additive attention block of the fused feature additive attention module is recorded as
Figure BDA00034600665400001516
The characteristic map of the 2 nd upsampling layer output of the decoder module is noted
Figure BDA00034600665400001517
2 nd encoder-decoder additive attention tiles for access profile
Figure BDA00034600665400001518
And characteristic diagrams
Figure BDA00034600665400001519
And compare the characteristic diagrams
Figure BDA00034600665400001520
And characteristic diagrams
Figure BDA00034600665400001521
The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 22Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1
Figure BDA00034600665400001522
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)2X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA00034600665400001523
Figure BDA00034600665400001524
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA00034600665400001525
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400001526
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400001527
Figure BDA00034600665400001528
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400001529
Splicing according to the channel to obtain the size of
Figure BDA00034600665400001530
Figure BDA00034600665400001531
Outputting the feature map of (1);
the feature map output by the 2 nd additive attention block of the fused feature additive attention module is recorded as
Figure BDA00034600665400001532
The characteristic map of the 3 rd up-sampling layer output of the decoder module is noted
Figure BDA0003460066540000161
3 rd encoder-decoder additive attention tiles for access profile
Figure BDA0003460066540000162
And characteristic diagrams
Figure BDA0003460066540000163
And compare the characteristic diagrams
Figure BDA0003460066540000164
And characteristic diagrams
Figure BDA0003460066540000165
The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 21Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1
Figure BDA0003460066540000166
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)1X 32), a convolution operation with step size of 1 and size of 1 x 1 convolution kernel, and Sigmoid function excitationAlive to obtain the size of
Figure BDA0003460066540000167
Figure BDA0003460066540000168
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA0003460066540000169
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400001610
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400001611
Figure BDA00034600665400001612
Characteristic diagram and characteristic diagram of
Figure BDA00034600665400001613
Splicing according to the channel to obtain the size of
Figure BDA00034600665400001614
Figure BDA00034600665400001615
Outputting the feature map of (1);
denote the feature map of the 1 st additive attention block output of the fused feature additive attention module as
Figure BDA00034600665400001616
The output of the 4 th upsampling layer of the decoder module is characterized by
Figure BDA00034600665400001617
4 th encoder-decoder additive attention tiles for access profile
Figure BDA00034600665400001618
And characteristic diagrams
Figure BDA00034600665400001619
And for characteristic diagram
Figure BDA00034600665400001620
And characteristic diagrams
Figure BDA00034600665400001621
The treatment was carried out as follows: firstly, respectively adopting the quantity of (2)0X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained
Figure BDA00034600665400001622
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)0X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure BDA00034600665400001623
Figure BDA00034600665400001624
A feature map obtained at this time, and a feature map obtained at this time
Figure BDA00034600665400001625
Performing multiplication operation according to elements to obtain the size of
Figure BDA00034600665400001626
The feature map of (2), the size obtained at this time is finally
Figure BDA00034600665400001627
Figure BDA00034600665400001628
Characteristic diagram ofAnd characteristic diagram
Figure BDA00034600665400001629
Splicing according to the channel to obtain the size of
Figure BDA00034600665400001630
Figure BDA00034600665400001631
Outputting the feature map of (1);
the decoder module comprises a 1 st volume block, a 1 st up-sampling layer, a 2 nd volume block, a 2 nd up-sampling layer, a 3 rd volume block, a 3 rd up-sampling layer, a 4 th volume block and an output volume block which are sequentially arranged; the L-th convolution block of the decoder module is used for accessing a characteristic diagram output by the L-th encoder-decoder additive attention splicing block in the encoder-decoder additive attention splicing module, and processing the characteristic diagram input into the L-th encoder-decoder additive attention splicing block according to the following mode: sequentially carrying out batch normalization and Mish function activation on the characteristic graphs input into the characteristic graphs and adopting the quantity of 2(5-L-1)Convolution operation with convolution kernel of 3 × 3 and step size of 1 × 32 and output size of
Figure BDA0003460066540000171
Figure BDA0003460066540000172
A characteristic diagram of (1); the Lth up-sampling layer of the decoder module is used for accessing the characteristic diagram output by the Lth convolution block of the decoder module and carrying out bilinear interpolation operation on the characteristic diagram input into the decoder module to obtain the characteristic diagram with the size of
Figure BDA0003460066540000173
Figure BDA0003460066540000174
The output convolution block of the decoder module is used for accessing the feature map output by the 4 th convolution block of the decoder module, and convolution kernel advancing with the number of 4, the step length of 1 and the size of 1 multiplied by 1 is adoptedPerforming line convolution operation to obtain characteristic diagram output with the size of 240pt multiplied by 4;
and 4, step 4: training a multi-scale feature fusion and additive attention segmentation network, and the specific process comprises the following steps:
(1) initializing a multi-scale feature fusion and additive attention segmentation network by adopting a he _ normal parameter initialization method;
(2) randomly dividing the training samples obtained in the second step into a plurality of batchs, enabling each batch to contain a plurality of training samples, dividing the training samples into training sample numbers/batchsize batches if the training sample numbers can be evenly divided by the batchsize, and discarding the rest part if the training sample numbers cannot be evenly divided by the batchsize to obtain | training sample numbers/batchsize | batchs, wherein the batchsize is 16 and | is an integer symbol;
(3) taking one batch, and performing image enhancement processing of randomly turning the RGBA images of all four channels in the batch in the left-right direction and the up-down direction, wherein the probability is 50%;
(4) taking the RGBA images of all four channels after image enhancement processing in the selected batch as input, and inputting the input into a multi-scale feature fusion and additive attention segmentation network to obtain output results of the multi-scale feature fusion and additive attention segmentation network;
(5) according to the output result of the multi-scale feature fusion and additive attention segmentation network and a single-channel segmentation mask image corresponding to the RGBA images of all four channels in the selected batch, computing the segmentation loss values of the RGBA images of all four channels after image enhancement in the selected batch, and calculating the average value to be used as a final loss value, wherein the segmentation loss value CE of the RGBA image of each four channel is computed as follows:
Figure BDA0003460066540000175
wherein N represents the total number of pixels in the RGBA image of the four channels; c represents the number of classes to be classified for each pixel, and the value is 4, namely four classes of whole tumor, tumor core, enhanced tumor and background;
Figure BDA0003460066540000176
The real category c to which the nth pixel in the single-channel segmentation mask image corresponding to the four-channel RGBA image belongs is shown;
Figure BDA0003460066540000181
representing the probability that the nth pixel in the output result of the input four-channel RGBA image is predicted to be a class c by the multi-scale feature fusion and additive attention segmentation network;
(6) training parameters of the multi-scale feature fusion and additive attention segmentation network by using an ADAM optimizer with a learning rate of 1e-4 according to the segmentation loss value obtained by calculation in the step (5);
(7) repeating the steps (3) to (6) until all the training data of the batch train the multi-scale feature fusion and the additive attention segmentation network for one time, then sequentially inputting the verification samples into the multi-scale feature fusion and the additive attention segmentation network, obtaining the segmentation loss value of each four-channel RGBA image in the verification samples by adopting the same method in the step (5), and calculating and obtaining the average segmentation loss value of all the verification samples;
(8) repeating the steps (2) - (7) until the loss of the multi-scale feature fusion and the additive attention segmentation network on the verification sample is converged, and finally obtaining the trained multi-scale feature fusion and additive attention segmentation network;
and 5: and processing the brain tumor MR image to be segmented into four-channel RGBA images according to the methods in the step 1 and the step 2, inputting the images into a trained multi-scale feature fusion and additive attention segmentation network, and outputting a segmentation prediction result by the multi-scale feature fusion and additive attention segmentation network.
In this embodiment, the bridge module includes a convolution block, where the convolution block is used to access a feature map with a size of 15 × 15 × 256, and sequentially perform first Batch Normalization (BN), first miph function activation, first convolution processing, second Batch Normalization, second miph function activation, and second convolution processing on the feature map, so as to obtain a feature map with a size of 15 × 15 × 512, and output the feature map to the multi-scale feature fusion module, where the first convolution processing and the second convolution processing are implemented by 512 convolution kernels with a step size of 1 and a size of 3 × 3, respectively.
To verify the superiority of the present invention, the performance of the multi-scale feature fusion and additive attention segmentation network of the present invention was evaluated. Based on the BraTS2020 validation dataset, the Dice coefficient and 95% Hausdorff distance (HD95) were used as evaluation indices for the WT, TC and ET segmented regions. And inputting the output result of the multi-scale feature fusion and additive attention segmentation network and the real label into an evaluation code to obtain a quantitative evaluation result. Wherein, the ablation experimental result is shown in table 1:
TABLE 1 ablation test results
Figure BDA0003460066540000182
In Table 1, group a represents the underlying U-Net network, i.e., the existing U-Net network, and group b represents the addition of the proposed multi-scale feature fusion module MSFF to the underlying U-Net network. Comparing the segmentation results of the group a and the group b, the introduction of the multi-scale feature fusion module MSFF improves the segmentation performance of the network. Wherein the average Dice of ET, WT and TC are increased by 2.23%, 2.13% and 0.97%, respectively. And the c group shows that on the basis of the b group experiment, the proposed fusion feature additive attention module FFAA is added, and the segmentation results of the b group and the c group are compared to find that the average Dice of the network on the segmentation results of the three regions is further improved after the additive attention operation is added. This indicates that the feature maps of the respective scales of the encoder module are more discriminative after being guided by the additive attention of the global fusion feature map. Comparing group c and group a, an improvement of 3.77%, 2.71% and 2.42% in the average Dice values for ET, WT and TC, respectively, was observed. This benefits from the result of the multi-scale feature fusion and additive attention co-action. Group d represents the addition of the encoder-decoder additive attention stitching module to the base U-Net network, and comparing the segmentation results of group a and group d, it was found that the average Dice of the network with respect to ET, WT, and TC increased by 2.94%, 2.21%, and 2.25%, respectively, after the encoder-decoder additive attention stitching module E-DAAC was added to the base U-Net network. This shows that the semantic feature gap caused by the difference of the two networks due to the difference of the depths of the networks can be reduced by introducing an additive attention mechanism between the feature maps of the encoder module and the decoder module. And comparing the experimental results of the c group and the d group, the introduction of the multi-scale feature fusion module MSFF and the fusion feature additive attention module FFAA can improve the network segmentation performance better than that of the encoder-decoder additive attention splicing module E-DAAC. Group e shows the addition of the Attention Gate module in the Attention Unet to the network based on the experiment of group c. Comparing the group e and the group c, it is found that after the extension Gate module is added, the parameters of the network are increased by about 2M, but the network segmentation performance is only slightly improved. And the f group represents that three modules proposed by the invention are added into the basic U-Net network at the same time, namely the multi-scale feature fusion and additive attention segmentation network of the invention. Comparing the results of five groups of experiments of a, b, c, d and f, the multi-scale feature fusion and additive attention segmentation network of the invention obtains the best segmentation performance on three tumor regions. Compared with the basic U-Net network, the multi-scale feature fusion and additive attention segmentation network improves the average Dice of ET, WT and TC by 6.23%, 3.53% and 5.93% respectively. Furthermore, comparing group e and group f, it was found that the parameters of the network of group f were about 1.6M less than those of group e, but the segmentation results were much higher than the latter. This shows that the encoder-decoder additive Attention splicing module E-DAAC proposed in the multi-scale feature fusion and additive Attention segmentation network of the present invention has fewer network parameters and better effect on the improvement of the network segmentation result than the Attention Gate module. These results demonstrate the increased effectiveness of the proposed MSFF, FFAA, and E-DAAC for brain tumor MRI image segmentation problems based on the improvement of the underlying U-Net network.
The method of the present invention, which employs multi-scale feature fusion and additive attention segmentation networks, is compared with the existing segmentation methods using several networks, and the results of the comparison experiments are shown in table 2:
table 2 comparative experimental results
Figure BDA0003460066540000201
Note: the bold numbers indicate the optimal values of the column.
Analysis of Table 2 reveals that: the invention achieves remarkable segmentation performance about ET, WT and TC, and the segmentation results about ET, WT and TC in various indexes are superior to those of segmentation methods adopting other existing networks. This demonstrates the superiority of the multi-scale feature fusion and additive attention segmentation network of the present invention and the superiority of the method of the present invention.

Claims (2)

1. A brain tumor MR image segmentation method is characterized by comprising the following steps:
step 1: dividing data in a BraTS2020 training set, wherein the BraTS2020 training set comprises data of 293 High-grade Glioma (HGG) patients and 74 Low-grade Glioma (LGG) patients, and data of 367 patients in all, wherein the data of each patient comprises 5 three-dimensional images, the 5 three-dimensional images are respectively a three-dimensional MR image of a FLAIR modality, a three-dimensional MR image of a T1 modality, a three-dimensional MR image of a T1ce modality, a three-dimensional MR image of a T2 modality and a three-dimensional Group Truth (GT) segmentation mask image, the 5 three-dimensional images are 240pt in length, 240pt in width and 155 in channel number, namely the 5 three-dimensional images are 240pt × 240pt × 155 in size, and the GT segmentation mask image comprises a Whole Tumor region (WT), a Tumor Core region (Tumor, TC) and an Enhanced Tumor region (Enhanced), ET) and all other regions, the tumor core region is labeled with tag 1, the whole tumor region is labeled with tag 2, the enhanced tumor region is labeled with tag 4, and all other regions are labeled with tag 0, the specific division process is as follows: taking data of 234 high-grade glioma patients and 59 low-grade glioma patients in a BraTS2020 training set as training data, and taking data of 59 high-grade glioma patients and 15 low-grade glioma patients in the BraTS2020 training set as verification data;
step 2: carrying out data preprocessing, wherein the specific processing process is as follows:
step 2.1, standardizing the three-dimensional MR images of the four modalities, namely the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data respectively in a Z-Score mode, namely standardizing voxel values of the three-dimensional MR images of the four modalities into a zero mean value and a unit standard deviation, and specifically comprises the following steps: firstly, determining the number and the positions of pixel points with the voxel value larger than zero in the three-dimensional MR image of each modality of each patient, then calculating the mean value and the standard deviation of a region with the voxel value larger than zero in the three-dimensional MR image of each modality, finally, respectively taking the three-dimensional MR image of each modality as an original three-dimensional MR image, respectively carrying out Z-Score standardization operation on the region with the voxel value larger than zero in the three-dimensional MR image of each modality by adopting a formula (1) to obtain a standardized three-dimensional MR image of each modality, wherein the voxel value in the standardized three-dimensional MR image of each modality conforms to standard normal distribution:
Figure FDA0003460066530000011
in the formula (1), v represents a voxel value in the original three-dimensional MR image, mu represents a mean value of a region of which the voxel value is greater than zero in the original three-dimensional MR image, sigma represents a standard deviation of the region of which the voxel value is greater than zero in the original three-dimensional MR image, and v' represents a voxel value in a normalized three-dimensional MR image obtained after the original three-dimensional MR image is normalized by Z-Score;
step 2.2, respectively carrying out stitching processing on the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the standardized three-dimensional MR image corresponding to the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data according to the following modes:
sequentially combining the images of the same channel in the FLAIR modality standardized three-dimensional MR image, the T1 modality standardized three-dimensional MR image, the T1ce modality standardized three-dimensional MR image and the T2 modality standardized three-dimensional MR image of the same patient into a four-channel RGBA image
Figure FDA0003460066530000021
The sizes of the three-dimensional group Truth segmentation mask images are 240pt multiplied by 4, and meanwhile, the images of the same channel in the three-dimensional group Truth segmentation mask image of the patient are saved as RGBA images of four channels
Figure FDA0003460066530000022
Corresponding segmentation mask image
Figure FDA0003460066530000023
The size of the mask image is 240pt multiplied by 1, and the mask image is divided
Figure FDA0003460066530000025
RGBA images as four channels
Figure FDA0003460066530000024
In this case, standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, i.e., the FLAIR modality three-dimensional MR image, the T1 modality three-dimensional MR image, the T1ce modality three-dimensional MR image, and the T2 modality three-dimensional MR image, of each patient in the training data are collectively obtained as [ (234+59) × 155]RGBA image sum of four channels [ (234+ 59). times.155-]A single-channel segmentation mask image is displayed, and standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modes, namely the FLAIR mode three-dimensional MR image, the T1 mode three-dimensional MR image, the T1ce mode three-dimensional MR image and the T2 mode three-dimensional MR image of each patient in the verification data are spliced to obtain [ (59+15) multiplied by 155]RGBA image sum of four channels [ (59+ 15). times.155-]A single-channel segmentation mask image;
step 2.3, deleting images of an [ (234+59) x 155] four-channel RGBA image and an [ (234+59) x 155] single-channel segmentation mask image which are obtained by splicing standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modes, namely the three-dimensional MR image of the FLAIR mode, the three-dimensional MR image of the T1 mode, the three-dimensional MR image of the T1ce mode and the three-dimensional MR image of the T2 mode of each patient in the training data to obtain 24422 four-channel RGBA images and 24422 single-channel segmentation mask images, and forming a training sample by adopting the obtained 24422 four-channel RGBA images and the obtained 24422 single-channel segmentation mask images;
deleting images of no tumor region in [ (59+15) x 155] four-channel RGBA images and [ (59+15) x 155] single-channel segmentation mask images, which are obtained by splicing standardized three-dimensional MR images corresponding to four-mode three-dimensional MR images, namely a FLAIR mode three-dimensional MR image, a T1 mode three-dimensional MR image, a T1ce mode three-dimensional MR image and a T2 mode three-dimensional MR image of each patient in verification data, obtaining 4794 four-channel RGBA images and 4794 single-channel segmentation mask images, and adopting the obtained 4794 four-channel RGBA images and 4794 single-channel segmentation mask images to form a verification sample;
and step 3: adding a multi-scale feature fusion Module (MSFF), a fusion feature additive attention module (FFAA) and an encoder-decoder additive attention splicing module (E-DAAC) in an existing U-Net network consisting of an encoder module, a decoder module and a bridging module to obtain a multi-scale feature fusion and additive attention splitting network, wherein the multi-scale feature fusion and additive attention splitting network comprises the encoder module, the decoder module, the bridging module, the multi-scale feature fusion Module (MSFF), the fusion feature additive attention module (FFAA) and the encoder-decoder additive attention splicing module (E-DAAC);
the encoder module is used for extracting feature information of a brain tumor in the four-channel RGBA image input into the encoder module, generating a feature map output with the size of 15pt multiplied by 256, and calling the feature map as an encoding feature map; the encoder module comprises a 1 st convolution block, a 1 st sampling layer, a 2 nd convolution block, a 2 nd sampling layer, a 3 rd convolution block, a 3 rd sampling layer, a 4 th convolution block and a 4 th sampling layer which are sequentially arranged, wherein the 1 st convolution block is used for accessing an image with the size of 240pt multiplied by 4, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the image to obtain characteristic diagram output with the size of 240pt multiplied by 32, wherein the first convolution processing and the second convolution processing are respectively realized by 32 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 1 st sampling layer is used for accessing a feature map with the size of 240pt × 240pt × 32 output by the 1 st convolution block, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 120 × 120 × 32; the 2 nd convolution block is used for accessing a feature map with the size of 120pt × 120pt × 32 output by the 1 st sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 120pt × 120pt × 64, wherein the first convolution processing and the second convolution processing are respectively realized by 64 convolution kernels with the step size of 1 and the size of 3 × 3; the 2 nd sampling layer is used for accessing a feature map with the size of 120pt × 120pt × 64 output by the 2 nd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 60pt × 60pt × 64; the 3 rd convolution block is used for accessing a characteristic diagram with the size of 60pt × 60pt × 64 output by the 2 nd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain characteristic diagram output with the size of 60pt × 60pt × 128, wherein the first convolution processing and the second convolution processing are respectively realized by 128 convolution kernels with the step size of 1 and the size of 3 × 3; the 3 rd sampling layer is used for accessing a feature map with the size of 60pt × 60pt × 128 output by the 3 rd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 30pt × 30pt × 128; the 4 th convolution block is used for accessing a feature map with the size of 30pt × 30pt × 128 output by the 3 rd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 30pt × 30pt × 256; the 4 th sampling layer is used for accessing a feature map with the size of 30pt × 30pt × 256 output by the 4 th sampling layer, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 15pt × 15pt × 256, wherein the feature map is a coding feature map;
the bridge module is used for accessing the coding characteristic diagram output by the coder module and processing the coding characteristic diagram to obtain a characteristic diagram output with the size of 15pt multiplied by 512;
the multi-scale feature fusion module comprises a 1 st upper sampling layer, a 2 nd upper sampling layer, a 3 rd upper sampling layer, a 4 th upper sampling layer, a feature splicing layer and a convolution block which are sequentially arranged, wherein the L th upper sampling layer of the multi-scale feature fusion module is used for being connected with the L th convolution block of the encoder module
Figure FDA0003460066530000041
L is 1, 2, 3, 4, the L-th upsampling layer of the multi-scale feature fusion module is used for performing bilinear interpolation upsampling on the accessed feature graph to obtain the feature graph with the size of 240pt multiplied by 2 pt multiplied by (2)L-1X 32), the feature splicing layer of the multi-scale feature fusion module is used for accessing the feature maps output by the 1 st up-sampling layer to the 4 th up-sampling layer of the multi-scale feature fusion module, and the feature maps output by the 1 st up-sampling layer to the 4 th up-sampling layer of the multi-scale feature fusion module are spliced according to channels to obtain the feature map with the size of
Figure FDA0003460066530000042
The convolution block of the multi-scale feature fusion module is used for accessing the feature graph output by the feature splicing layer, and performing convolution operation on the feature graph by adopting 32 convolution kernels with step length of 1 and size of 3 × 3 to obtain feature graph output with size of 240pt × 240pt × 32;
the fusion characteristics are addedThe attention module comprises a 1 st down-sampling layer, a 1 st additive attention block, a 2 nd down-sampling layer, a 2 nd additive attention block, a 3 rd down-sampling layer, a 3 rd additive attention block, a 4 th down-sampling layer and a 4 th additive attention block which are sequentially arranged, wherein the L th down-sampling layer of the fusion feature additive attention module is used for accessing a feature map with the size of 240pt multiplied by 32 output by a convolution block of the multi-scale feature fusion module, and the step length of the accessed feature map is 2 pt multiplied by 240pt multiplied by 32LMaximum pooling operation of obtaining a size of
Figure FDA0003460066530000043
The size of the Lth down-sampling layer output of the fused feature additive attention module used for accessing the Lth additive attention block of the fused feature additive attention module is
Figure FDA0003460066530000051
And the size of the Lth convolution block output of said encoder block is
Figure FDA0003460066530000052
The size of the Lth down-sampling layer output of the fused feature additive attention module is
Figure FDA0003460066530000053
Is marked as
Figure FDA0003460066530000054
The size of the L convolution block output of the encoder module is
Figure FDA0003460066530000055
Is marked as
Figure FDA0003460066530000056
The Lth addition of the fusion feature additive attention moduleFeature graph for attention Block for Access
Figure FDA0003460066530000057
And
Figure FDA0003460066530000058
firstly, respectively adopting the quantity of 2L-1Performing convolution operation on convolution kernels with the size of 1 × 1 and the step size of 1 × 32 to respectively obtain corresponding convolution kernels with the sizes of 1
Figure FDA0003460066530000059
Then adding the two characteristic graphs according to elements to obtain a characteristic graph, and then sequentially activating the obtained characteristic graph by a Mish function, wherein the number of the Mish function is 2L-1Performing convolution operation and Sigmoid function activation on convolution kernels with the length of x 32, the step length of 1 and the size of 1 x 1 to obtain the convolution kernel with the size of 1
Figure FDA00034600665300000510
The size of the feature map obtained at the end is
Figure FDA00034600665300000511
Figure FDA00034600665300000512
Characteristic diagram and characteristic diagram of
Figure FDA00034600665300000513
Performing multiplication operation according to elements to obtain the size of
Figure FDA00034600665300000514
Figure FDA00034600665300000515
Outputting the feature map of (1);
the encoder-decoder additive attention splicing module comprises a 1 st encoder-decoder additive attention splicing block, a 2 nd encoder-decoder additive attention splicing block, a 3 rd encoder-decoder additive attention splicing block and a 4 th encoder-decoder additive attention splicing block which are sequentially arranged in sequence;
recording the feature map output by the 4 th additive attention block of the fused feature additive attention module as the feature map
Figure FDA00034600665300000516
The characteristic graph with the output size of 15pt multiplied by 512 of the bridge module is recorded as
Figure FDA00034600665300000517
The 1 st encoder-decoder additive attention splicing block is used for accessing the characteristic diagram
Figure FDA00034600665300000518
And characteristic diagrams
Figure FDA00034600665300000519
And compare the characteristic diagrams
Figure FDA00034600665300000520
And
Figure FDA00034600665300000521
the treatment was carried out in the following manner: firstly, respectively adopting the quantity of (2)3X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained
Figure FDA00034600665300000522
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)3X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure FDA00034600665300000523
A feature map obtained at this time, and a feature map obtained at this time
Figure FDA00034600665300000524
Performing multiplication operation according to elements to obtain the size of
Figure FDA00034600665300000525
The feature map of (2), the size obtained at this time is finally
Figure FDA0003460066530000061
Characteristic diagram and characteristic diagram of
Figure FDA0003460066530000062
Splicing according to the channel to obtain the size of
Figure FDA0003460066530000063
Outputting the feature map of (1);
recording the feature map output by the 3 rd additive attention block of the fused feature additive attention module as the feature map
Figure FDA0003460066530000064
The characteristic diagram of the 2 nd up-sampling layer output of the decoder module is marked as
Figure FDA0003460066530000065
The 2 nd encoder-decoder additive attention block is used for accessing the characteristic diagram
Figure FDA0003460066530000066
And characteristic diagrams
Figure FDA0003460066530000067
And compare the characteristic diagrams
Figure FDA0003460066530000068
And characteristic diagrams
Figure FDA0003460066530000069
The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 22Performing convolution operation on convolution kernels with the length of x 32, the step length of 1 and the size of 1 x 1 to respectively obtain the convolution kernels with the sizes of 1 x 1
Figure FDA00034600665300000610
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)2X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure FDA00034600665300000611
A feature map obtained at this time, and a feature map obtained at this time
Figure FDA00034600665300000629
Performing multiplication operation according to elements to obtain the size of
Figure FDA00034600665300000612
The feature map of (2), the size obtained at this time is finally
Figure FDA00034600665300000613
Characteristic diagram and characteristic diagram of
Figure FDA00034600665300000614
Splicing according to the channel to obtain the size of
Figure FDA00034600665300000615
Outputting the feature map of (1);
recording the feature map output by the 2 nd additive attention block of the fused feature additive attention module as the feature map
Figure FDA00034600665300000616
The characteristic diagram of the output of the 3 rd up-sampling layer of the decoder module is marked as
Figure FDA00034600665300000617
The 3 rd encoder-decoder additive attention block is used for accessing the characteristic diagram
Figure FDA00034600665300000618
And characteristic diagrams
Figure FDA00034600665300000619
And compare the characteristic diagrams
Figure FDA00034600665300000620
And characteristic diagrams
Figure FDA00034600665300000621
The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 21Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1
Figure FDA00034600665300000622
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)1X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure FDA00034600665300000623
A feature map obtained at this time, and a feature map obtained at this time
Figure FDA00034600665300000624
Performing multiplication operation according to elements to obtain the size of
Figure FDA00034600665300000625
The size obtained at this time is finally
Figure FDA00034600665300000626
Characteristic diagram and characteristic diagram of
Figure FDA00034600665300000627
Splicing according to the channel to obtain the size of
Figure FDA00034600665300000628
Outputting the feature map of (2);
recording the feature map output by the 1 st additive attention block of the fused feature additive attention module as the feature map
Figure FDA0003460066530000071
The characteristic diagram of the 4 th up-sampling layer output of the decoder module is marked as
Figure FDA0003460066530000072
The 4 th encoder-decoder additive attention splicing block is used for accessing the characteristic diagram
Figure FDA0003460066530000073
And characteristic diagrams
Figure FDA0003460066530000074
And compare the characteristic diagrams
Figure FDA0003460066530000075
And characteristic diagrams
Figure FDA0003460066530000076
The treatment was carried out as follows: firstly, respectively adopting the quantity of (2)0X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained
Figure FDA0003460066530000077
Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)0X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1
Figure FDA0003460066530000078
A feature map obtained at this time, and a feature map obtained at this time
Figure FDA00034600665300000715
Performing multiplication operation according to elements to obtain the size of
Figure FDA0003460066530000079
The size obtained at this time is finally
Figure FDA00034600665300000710
Characteristic diagram and characteristic diagram of
Figure FDA00034600665300000711
Splicing according to the channel to obtain the size of
Figure FDA00034600665300000712
Outputting the feature map of (1);
the decoder module comprises a 1 st volume block, a 1 st up-sampling layer, a 2 nd volume block, a 2 nd up-sampling layer, a 3 rd volume block, a 3 rd up-sampling layer, a 4 th volume block and an output volume block which are sequentially arranged; the L-th convolution block of the decoder module is used for accessing the characteristic diagram output by the L-th encoder-decoder additive attention splicing block in the encoder-decoder additive attention splicing module and processing the characteristic diagram input into the L-th encoder-decoder additive attention splicing block according to the following mode: sequentially carrying out batch normalization on the characteristic diagrams input into the device,Mish function activation and number of 2(5-L-1)X 32, step size 1, convolution operation with a 3 x 3 convolution kernel, output size
Figure FDA00034600665300000713
A characteristic diagram of (1); the Lth up-sampling layer of the decoder module is used for accessing the characteristic diagram output by the Lth convolution block of the decoder module and carrying out bilinear interpolation operation on the characteristic diagram input into the decoder module to obtain the characteristic diagram with the size of
Figure FDA00034600665300000714
The output convolution block of the decoder module is used for accessing the feature map output by the 4 th convolution block of the decoder module, and convolution operation is carried out by adopting convolution kernels with the number of 4, the step length of 1 and the size of 1 multiplied by 1 to obtain the feature map output with the size of 240pt multiplied by 4;
and 4, step 4: training the multi-scale feature fusion and additive attention segmentation network, wherein the specific process comprises the following steps:
(1) initializing the multi-scale feature fusion and additive attention segmentation network by adopting a he _ normal parameter initialization method;
(2) randomly dividing the training samples obtained in the second step into a plurality of batchs, enabling each batch to contain a plurality of training samples, dividing the training samples into training sample numbers/batchsize batches if the training sample numbers can be evenly divided by the batchsize, and eliminating the rest part if the training sample numbers cannot be evenly divided by the batchsize to obtain | training sample numbers/batchsize | batches, wherein the batchsize is 16 and | is an integer symbol;
(3) taking one batch, and performing image enhancement processing of randomly turning the RGBA images of all four channels in the batch in the left-right direction and the up-down direction, wherein the probability is 50%;
(4) taking the RGBA images of all four channels after image enhancement processing in the selected batch as input, and inputting the input into the multi-scale feature fusion and additive attention segmentation network to obtain the output result of the multi-scale feature fusion and additive attention segmentation network;
(5) according to the multi-scale feature fusion and additive attention segmentation network output result and the single-channel segmentation mask image corresponding to the RGBA images of all four channels in the selected batch, computing the segmentation loss values of the RGBA images of all four channels after image enhancement in the selected batch, and calculating the average value of the segmentation loss values as a final loss value, wherein the segmentation loss value CE of the RGBA image of each four channel is computed as follows:
Figure FDA0003460066530000081
wherein N represents the total number of pixels in the RGBA image of the four channels; c represents the number of categories to be classified of each pixel, wherein the value is 4, namely four categories of the whole tumor, the tumor core, the enhanced tumor and the background are obtained;
Figure FDA0003460066530000082
the real category c to which the nth pixel in the segmentation mask image of the single channel corresponding to the RGBA image of the four channels belongs is shown;
Figure FDA0003460066530000083
representing the probability that the nth pixel in the output result of the input four-channel RGBA image is predicted to be a class c by the multi-scale feature fusion and additive attention segmentation network;
(6) according to the segmentation loss value obtained by calculation in the step (5), using an ADAM optimizer with a learning rate of 1e-4 to train parameters of the multi-scale feature fusion and additive attention segmentation network;
(7) repeating the steps (3) to (6) until all the training data of the batch train the multi-scale feature fusion and additive attention segmentation network for one time, then sequentially inputting the verification samples into the multi-scale feature fusion and additive attention segmentation network at the moment, obtaining the segmentation loss value of each four-channel RGBA image in the verification samples by adopting the same method in the step (5), and calculating and obtaining the average segmentation loss value of all the verification samples;
(8) repeating the steps (2) - (7) until the loss of the multi-scale feature fusion and additive attention segmentation network on the verification sample is converged, and finally obtaining the trained multi-scale feature fusion and additive attention segmentation network;
and 5: and processing the brain tumor MR image to be segmented into four-channel RGBA images according to the methods in the step 1 and the step 2, inputting the images into a trained multi-scale feature fusion and additive attention segmentation network, and outputting a segmentation prediction result by the multi-scale feature fusion and additive attention segmentation network.
2. The method of claim 1, wherein the bridge module comprises a convolution block, and the convolution block is configured to access a feature map with a size of 15 × 15 × 256, and sequentially perform a first Batch Normalization (BN), a first Mish function activation, a first convolution processing, a second Batch Normalization, a second Mish function activation, and a second convolution processing on the feature map to obtain a feature map with a size of 15 × 15 × 512, and output the feature map to the multi-scale feature fusion module, wherein the first convolution processing and the second convolution processing are respectively implemented by 512 convolution kernels with a step size of 1 and a size of 3 × 3.
CN202210017015.3A 2022-01-07 2022-01-07 Brain tumor MR image segmentation method Pending CN114519719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210017015.3A CN114519719A (en) 2022-01-07 2022-01-07 Brain tumor MR image segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210017015.3A CN114519719A (en) 2022-01-07 2022-01-07 Brain tumor MR image segmentation method

Publications (1)

Publication Number Publication Date
CN114519719A true CN114519719A (en) 2022-05-20

Family

ID=81597227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210017015.3A Pending CN114519719A (en) 2022-01-07 2022-01-07 Brain tumor MR image segmentation method

Country Status (1)

Country Link
CN (1) CN114519719A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972760A (en) * 2022-06-17 2022-08-30 湘潭大学 Ionization map automatic tracing method based on multi-scale attention enhancement U-Net
CN116563265A (en) * 2023-05-23 2023-08-08 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN117495882A (en) * 2023-12-28 2024-02-02 无锡学院 Liver tumor CT image segmentation method based on AGCH-Net and multi-scale fusion

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972760A (en) * 2022-06-17 2022-08-30 湘潭大学 Ionization map automatic tracing method based on multi-scale attention enhancement U-Net
CN114972760B (en) * 2022-06-17 2024-04-16 湘潭大学 Ionization diagram automatic tracing method based on multi-scale attention-enhancing U-Net
CN116563265A (en) * 2023-05-23 2023-08-08 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN116563265B (en) * 2023-05-23 2024-03-01 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN117495882A (en) * 2023-12-28 2024-02-02 无锡学院 Liver tumor CT image segmentation method based on AGCH-Net and multi-scale fusion

Similar Documents

Publication Publication Date Title
CN109035263B (en) Automatic brain tumor image segmentation method based on convolutional neural network
CN111784671B (en) Pathological image focus region detection method based on multi-scale deep learning
CN114519719A (en) Brain tumor MR image segmentation method
Aghalari et al. Brain tumor image segmentation via asymmetric/symmetric UNet based on two-pathway-residual blocks
CN107016395B (en) Identification system for sparsely expressed primary brain lymphomas and glioblastomas
CN108109140A (en) Low Grade Gliomas citric dehydrogenase non-destructive prediction method and system based on deep learning
CN111260705B (en) Prostate MR image multi-task registration method based on deep convolutional neural network
CN112365980B (en) Brain tumor multi-target auxiliary diagnosis and prospective treatment evolution visualization method and system
CN112767417B (en) Multi-modal image segmentation method based on cascaded U-Net network
CN113496495B (en) Medical image segmentation model building method capable of realizing missing input and segmentation method
JP7427080B2 (en) Weakly supervised multitask learning for cell detection and segmentation
CN112862805B (en) Automatic auditory neuroma image segmentation method and system
CN114693933A (en) Medical image segmentation device based on generation of confrontation network and multi-scale feature fusion
CN115311193A (en) Abnormal brain image segmentation method and system based on double attention mechanism
CN113487560A (en) Brain tumor segmentation method and device based on spatial feature attention mechanism
Nie et al. Semantic-guided encoder feature learning for blurry boundary delineation
CN116486156A (en) Full-view digital slice image classification method integrating multi-scale feature context
CN116228731A (en) Multi-contrast learning coronary artery high-risk plaque detection method, system and terminal
CN115984296A (en) Medical image segmentation method and system applying multi-attention mechanism
CN113379770B (en) Construction method of nasopharyngeal carcinoma MR image segmentation network, image segmentation method and device
CN112686912B (en) Acute stroke lesion segmentation method based on gradual learning and mixed samples
Zhang et al. Research on brain glioma segmentation algorithm
CN110276414B (en) Image feature extraction method and expression method based on dictionary learning and sparse representation
Peng et al. 2D brain MRI image synthesis based on lightweight denoising diffusion probabilistic model
CN113205454A (en) Segmentation model establishing and segmenting method and device based on multi-scale feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination