CN114519719A

CN114519719A - Brain tumor MR image segmentation method

Info

Publication number: CN114519719A
Application number: CN202210017015.3A
Authority: CN
Inventors: 孙家阔; 郭立君; 张�荣; 高琳琳
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2022-05-20

Abstract

The invention discloses a brain tumor MR image segmentation method, which constructs a multi-scale feature fusion and additive attention segmentation network by adding a multi-scale feature fusion module, a fusion feature additive attention module and an encoder-decoder additive attention splicing module in a U-Net network, by the multi-scale feature fusion module, various scale feature maps with different receptive fields in the encoder module can be fused to obtain a fusion feature map, an additive attention mechanism is introduced between the fused feature map and the feature maps of all scales of the encoder module through the fused feature additive attention module, by adopting an additive attention mechanism between the scale characteristics of the encoder module and the characteristic diagram of the decoder module guided by the fused characteristic diagram through an additive attention splicing module of the encoder and the decoder, the semantic feature gap generated by the encoder module and the decoder module due to different network depths when jumping connection is reduced; the advantage is that segmentation accuracy is higher.

Description

Brain tumor MR image segmentation method

Technical Field

The invention relates to an MR image segmentation method, in particular to a brain tumor MR image segmentation method.

Background

Gliomas are the most common primary central nervous system brain tumors, accounting for about 27% of all central nervous system tumors and about 80% of malignant central nervous system tumors. Magnetic Resonance Imaging (MRI) has the advantages of being non-invasive, harmless to human body, and clear in obtained images, and is widely applied to clinical brain tumor diagnosis. In order to better assist the doctor in the diagnosis and treatment of the patient, the size, position and shape of the brain tumor need to be known accurately. However, the physician needs to spend a lot of time and effort manually segmenting the brain tumor from the MR image, and is prone to errors due to the high complexity of the appearance of the brain tumor. Therefore, the automatic segmentation of the brain tumor MR image can greatly improve the diagnosis efficiency of doctors. However, accurate automatic segmentation remains a difficult task due to the high variability in the size, shape and location of brain tumors of different patients, and the fuzzy boundaries between normal soft tissue and diseased tissue in the same patient's brain.

In the field of brain tumor MR image segmentation, U-Net is the most widely applied basic segmentation network. In U-Net and its enhancement networks, the encoder progressively achieves extraction of input image features through a series of convolution and downsampling operations. As the convolution and downsampling operations proceed, the size of the feature maps at each level in the encoder gradually decreases, while the receptive field gradually increases. Along with the training of the network, the characteristics of all levels of the encoder learn critical discriminant characteristic information under the current scale. The jump connection between the encoder characteristic and the decoder characteristic in the U-Net introduces shallow layer characteristic information, and further improves the utilization rate of the characteristic. However, observing U-Net and its enhanced networks, it was found that the hopping connection can be spliced by channel using only the same scale of features. However, this ignores the guiding effect of complementary information between multi-scale features with different receptive fields in the encoder on the current scale feature. In addition, the encoder features are in relatively shallow positions in the whole network structure, and the decoder features are in relatively deep positions, so that when the connection is directly jumped, a semantic feature gap exists between the encoder features and the decoder features. Therefore, the accuracy of the existing U-Net and the enhancement network thereof in segmenting the MR image of the brain tumor is still to be improved.

Disclosure of Invention

The invention aims to provide a brain tumor MR image segmentation method with high segmentation accuracy.

The technical scheme adopted by the invention for solving the technical problems is as follows: a brain tumor MR image segmentation method, comprising the steps of:

step 1: dividing data in a BraTS2020 training set, wherein the BraTS2020 training set comprises data of 293 High-grade Glioma (HGG) patients and 74 Low-grade Glioma (LGG) patients, and data of 367 patients in all, wherein the data of each patient comprises 5 three-dimensional images, the 5 three-dimensional images are respectively a three-dimensional MR image of a FLAIR modality, a three-dimensional MR image of a T1 modality, a three-dimensional MR image of a T1ce modality, a three-dimensional MR image of a T2 modality and a three-dimensional Group Truth (GT) segmentation mask image, the 5 three-dimensional images are 240pt in length, 240pt in width and 155 in channel number, namely the 5 three-dimensional images are 240pt × 240pt × 155 in size, and the GT segmentation mask image comprises a Whole Tumor region (WT), a Tumor Core region (Tumor, TC) and an Enhanced Tumor region (Enhanced), ET) and all other regions, the tumor core region is labeled with tag 1, the whole tumor region is labeled with tag 2, the enhanced tumor region is labeled with tag 4, and all other regions are labeled with tag 0, the specific division process is as follows: taking data of 234 high-grade glioma patients and 59 low-grade glioma patients in a BraTS2020 training set as training data, and taking data of 59 high-grade glioma patients and 15 low-grade glioma patients in the BraTS2020 training set as verification data;

step 2: carrying out data preprocessing, wherein the specific processing process is as follows:

step 2.1, standardizing the three-dimensional MR images of the four modalities, namely the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data respectively in a Z-Score mode, namely standardizing voxel values of the three-dimensional MR images of the four modalities into a zero mean value and a unit standard deviation, and specifically comprises the following steps: firstly, determining the number and the positions of pixel points with the voxel value larger than zero in the three-dimensional MR image of each modality of each patient, then calculating the mean value and the standard deviation of a region with the voxel value larger than zero in the three-dimensional MR image of each modality, finally, respectively taking the three-dimensional MR image of each modality as an original three-dimensional MR image, respectively carrying out Z-Score standardization operation on the region with the voxel value larger than zero in the three-dimensional MR image of each modality by adopting a formula (1) to obtain a standardized three-dimensional MR image of each modality, wherein the voxel value in the standardized three-dimensional MR image of each modality conforms to standard normal distribution:

in the formula (1), v represents a voxel value in the original three-dimensional MR image, μ represents a mean value of a region of which the voxel value is greater than zero in the original three-dimensional MR image, σ represents a standard deviation of the region of which the voxel value is greater than zero in the original three-dimensional MR image, and v' represents a voxel value in a normalized three-dimensional MR image obtained by normalizing the original three-dimensional MR image by Z-Score;

step 2.2, respectively carrying out stitching processing on the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality and the standardized three-dimensional MR image corresponding to the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data according to the following modes:

sequentially combining the images of the same channel in the FLAIR modality standardized three-dimensional MR image, the T1 modality standardized three-dimensional MR image, the T1ce modality standardized three-dimensional MR image and the T2 modality standardized three-dimensional MR image of the same patient into a four-channel RGBA image

The sizes of the three-dimensional group Truth segmentation mask images are 240pt multiplied by 4, and simultaneously, the three-dimensional group Truth segmentation mask images of the patient are divided into the same channelsRGBA image with picture saved as four channels

Corresponding segmentation mask image

The size of which is 240pt × 240pt × 1, and dividing the mask image

RGBA images as four channels

At this time, the normalized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, i.e., [ (234+59) × 155) are obtained after the stitching process from the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality, and the three-dimensional MR image of the T2 modality of each patient in the training data]RGBA image sum of four channels [ (234+ 59). times.155-]Verifying the standardized three-dimensional MR images corresponding to the three-dimensional MR images of four modalities, namely the FLAIR modality three-dimensional MR image, the T1 modality three-dimensional MR image, the T1ce modality three-dimensional MR image and the T2 modality three-dimensional MR image of each patient in the data by using the single-channel segmentation mask image, and obtaining [ (59+15) multiplied by 155) after splicing treatment]RGBA image sum of [ (59+15) × 155] of four channels]A single-channel segmentation mask image;

step 2.3, deleting images of an [ (234+59) x 155] four-channel RGBA image and an [ (234+59) x 155] single-channel segmentation mask image which are obtained by splicing standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modes, namely the three-dimensional MR image of the FLAIR mode, the three-dimensional MR image of the T1 mode, the three-dimensional MR image of the T1ce mode and the three-dimensional MR image of the T2 mode of each patient in the training data to obtain 24422 four-channel RGBA images and 24422 single-channel segmentation mask images, and forming a training sample by adopting the obtained 24422 four-channel RGBA images and the obtained 24422 single-channel segmentation mask images;

deleting images of no tumor region in [ (59+15) x 155] four-channel RGBA images and [ (59+15) x 155] single-channel segmentation mask images, which are obtained by splicing standardized three-dimensional MR images corresponding to four-mode three-dimensional MR images, namely a FLAIR mode three-dimensional MR image, a T1 mode three-dimensional MR image, a T1ce mode three-dimensional MR image and a T2 mode three-dimensional MR image of each patient in verification data, obtaining 4794 four-channel RGBA images and 4794 single-channel segmentation mask images, and adopting the obtained 4794 four-channel RGBA images and 4794 single-channel segmentation mask images to form a verification sample;

and step 3: adding a multi-scale feature fusion Module (MSFF), a fusion feature additive attention module (FFAA) and an encoder-decoder additive attention splicing module (E-DAAC) in an existing U-Net network consisting of an encoder module, a decoder module and a bridging module to obtain a multi-scale feature fusion and additive attention splitting network, wherein the multi-scale feature fusion and additive attention splitting network comprises the encoder module, the decoder module, the bridging module, the multi-scale feature fusion Module (MSFF), the fusion feature additive attention module (FFAA) and the encoder-decoder additive attention splicing module (E-DAAC);

the encoder module is used for extracting feature information of a brain tumor in the four-channel RGBA image input into the encoder module, generating a feature map output with the size of 15pt multiplied by 256, and calling the feature map as an encoding feature map; the encoder module comprises a 1 st convolution block, a 1 st sampling layer, a 2 nd convolution block, a 2 nd sampling layer, a 3 rd convolution block, a 3 rd sampling layer, a 4 th convolution block and a 4 th sampling layer which are sequentially arranged, wherein the 1 st convolution block is used for accessing an image with the size of 240pt multiplied by 4, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the image to obtain characteristic diagram output with the size of 240pt multiplied by 32, wherein the first convolution processing and the second convolution processing are respectively realized by 32 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 1 st sampling layer is used for accessing a feature map with the size of 240pt × 240pt × 32 output by the 1 st convolution block, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 120 × 120 × 32; the 2 nd convolution block is used for accessing a feature map with the size of 120pt × 120pt × 32 output by the 1 st sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 120pt × 120pt × 64, wherein the first convolution processing and the second convolution processing are respectively realized by 64 convolution kernels with the step size of 1 and the size of 3 × 3; the 2 nd sampling layer is used for accessing a feature map with the size of 120pt × 120pt × 64 output by the 2 nd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 60pt × 60pt × 64; the 3 rd convolution block is used for accessing a characteristic diagram with the size of 60pt × 60pt × 64 output by the 2 nd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain characteristic diagram output with the size of 60pt × 60pt × 128, wherein the first convolution processing and the second convolution processing are respectively realized by 128 convolution kernels with the step size of 1 and the size of 3 × 3; the 3 rd sampling layer is used for accessing a feature map with the size of 60pt × 60pt × 128 output by the 3 rd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 30pt × 30pt × 128; the 4 th convolution block is used for accessing a feature map with the size of 30pt × 30pt × 128 output by the 3 rd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 30pt × 30pt × 256; the 4 th sampling layer is used for accessing a feature map with the size of 30pt × 30pt × 256 output by the 4 th sampling layer, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 15pt × 15pt × 256, wherein the feature map is a coding feature map;

the bridge module is used for accessing the coding characteristic diagram output by the coder module and processing the coding characteristic diagram to obtain a characteristic diagram output with the size of 15pt multiplied by 512;

the multi-scale feature fusion module comprises a 1 st upper sampling layer, a 2 nd upper sampling layer, a 3 rd upper sampling layer, a 4 th upper sampling layer, a feature splicing layer and a convolution block which are sequentially arranged, wherein the L th upper sampling layer of the multi-scale feature fusion module is used for being connected with the L th convolution block of the encoder module

L is 1, 2, 3, 4, the L-th upsampling layer of the multi-scale feature fusion module is used for performing bilinear interpolation upsampling on the accessed feature graph to obtain the feature graph with the size of 240pt multiplied by 2 pt multiplied by (2)^L-1X 32), the feature splicing layer of the multi-scale feature fusion module is used for accessing the feature maps output by the 1 st up-sampling layer to the 4 th up-sampling layer of the multi-scale feature fusion module, and the feature maps output by the 1 st up-sampling layer to the 4 th up-sampling layer of the multi-scale feature fusion module are spliced according to channels to obtain the feature map with the size of

The convolution block of the multi-scale feature fusion module is used for accessing the feature graph output by the feature splicing layer, and performing convolution operation on the feature graph by adopting 32 convolution kernels with step length of 1 and size of 3 × 3 to obtain feature graph output with size of 240pt × 240pt × 32;

the fusion feature additive attention module comprises a 1 st down-sampling layer, a 1 st additive attention block, a 2 nd down-sampling layer, a 2 nd additive attention block, a 3 rd down-sampling layer, a 3 rd additive attention block, a 4 th down-sampling layer and a 4 th additive attention block which are sequentially arranged in sequence, wherein the 1 st down-sampling layer of the fusion feature additive attention moduleL downsampling layers are used for accessing the feature graph with the size of 240pt multiplied by 32 output by the convolution block of the multi-scale feature fusion module, and the step length of the accessed feature graph is 2^LMaximum pooling operation of obtaining a size of

The size of the Lth down-sampling layer output of the fused feature additive attention module used for accessing the Lth additive attention block of the fused feature additive attention module is

And the size of the Lth convolution block output of said encoder block is

The size of the Lth down-sampling layer output of the fused feature additive attention module is

Is marked as

The size of the L convolution block output of the encoder module is

Is marked by

The L-th additive attention block of the fused feature additive attention module is used for accessing the feature map

And

firstly, respectively adopting the quantity of 2^L-1X 32 convolution kernel with step size of 1 and size of 1 x 1Performing convolution operation to obtain corresponding sizes of

Then adding the two characteristic graphs according to elements to obtain a characteristic graph, and then sequentially activating the obtained characteristic graph by a Mish function, wherein the number of the Mish function is 2^L-1Performing convolution operation and Sigmoid function activation on convolution kernels with the length of x 32, the step length of 1 and the size of 1 x 1 to obtain the convolution kernel with the size of 1

The size of the feature map obtained at the end is

Characteristic diagram and characteristic diagram of

Performing multiplication operation according to elements to obtain the size of

Outputting the feature map of (1);

the encoder-decoder additive attention splicing module comprises a 1 st encoder-decoder additive attention splicing block, a 2 nd encoder-decoder additive attention splicing block, a 3 rd encoder-decoder additive attention splicing block and a 4 th encoder-decoder additive attention splicing block which are sequentially arranged in sequence;

recording the feature map output by the 4 th additive attention block of the fused feature additive attention module as the feature map

Output of said bridge moduleThe characteristic graph with the size of 15pt multiplied by 512 is recorded as

The 1 st encoder-decoder additive attention splicing block is used for accessing the characteristic diagram

And characteristic diagrams

And compare the characteristic diagrams

And

the treatment was carried out in the following manner: firstly, respectively adopting the quantity of (2)³X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained

Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)³X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

recording the feature map output by the 3 rd additive attention block of the fused feature additive attention module as the feature map

The characteristic diagram of the 2 nd up-sampling layer output of the decoder module is marked as

The 2 nd encoder-decoder additive attention block is used for accessing the characteristic diagram

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 2²Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1

Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)²X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

recording the feature map output by the 2 nd additive attention block of the fused feature additive attention module as the feature map

The characteristic diagram of the 3 rd up-sampling layer output of the decoder module is marked as

The 3 rd encoder-decoder additive attention block is used for accessing the characteristic diagram

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 2¹Performing convolution operation on convolution kernels with the size of 1 × 1 and the step length of 32 to respectively obtain the convolution kernels with the sizes of 1 × 1

Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)¹X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

recording the feature map output by the 1 st additive attention block of the fused feature additive attention module as the feature map

The characteristic diagram of the 4 th up-sampling layer output of the decoder module is marked as

The 4 th encoder-decoder additive attention block is used for accessing the characteristic diagram

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

The treatment was carried out as follows: firstly, respectively adopting the quantity of (2)⁰X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out convolution operation, and the sizes are all obtained

Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)⁰X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic ofSign graph

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

the decoder module comprises a 1 st volume block, a 1 st up-sampling layer, a 2 nd volume block, a 2 nd up-sampling layer, a 3 rd volume block, a 3 rd up-sampling layer, a 4 th volume block and an output volume block which are sequentially arranged; the L-th convolution block of the decoder module is configured to access a feature map output by the L-th encoder-decoder additive attention splicing block in the encoder-decoder additive attention splicing module, and process the feature map input thereto in the following manner: sequentially carrying out batch normalization and Mish function activation on the characteristic graphs input into the characteristic graphs and adopting the quantity of 2⁽⁵ ^-L-1)X 32, step size 1, convolution operation with a 3 x 3 convolution kernel, output size

A characteristic diagram of (1); the Lth up-sampling layer of the decoder module is used for accessing the characteristic diagram output by the Lth convolution block of the decoder module and carrying out bilinear interpolation operation on the characteristic diagram input into the decoder module to obtain the characteristic diagram with the size of

The output convolution block of the decoder module is used for accessing the feature map output by the 4 th convolution block of the decoder module, and convolution operation is carried out by adopting convolution kernels with the number of 4, the step length of 1 and the size of 1 multiplied by 1 to obtain the feature map output with the size of 240pt multiplied by 4;

and 4, step 4: training the multi-scale feature fusion and additive attention segmentation network, wherein the specific process comprises the following steps:

(1) initializing the multi-scale feature fusion and additive attention segmentation network by adopting a he _ normal parameter initialization method;

(2) randomly dividing the training samples obtained in the second step into a plurality of batchs, enabling each batch to contain a plurality of training samples, dividing the training samples into training sample numbers/batchsize batches if the training sample numbers can be evenly divided by the batchsize, and discarding the rest part if the training sample numbers cannot be evenly divided by the batchsize to obtain | training sample numbers/batchsize | batchs, wherein the batchsize is 16 and | is an integer symbol;

(3) taking one batch, and performing image enhancement processing of randomly turning the RGBA images of all four channels in the batch in the left-right direction and the up-down direction, wherein the probability is 50%;

(4) taking the RGBA images of all four channels after image enhancement processing in the selected batch as input, and inputting the input into the multi-scale feature fusion and additive attention segmentation network to obtain the output result of the multi-scale feature fusion and additive attention segmentation network;

(5) according to the multi-scale feature fusion and additive attention segmentation network output result and the single-channel segmentation mask image corresponding to the RGBA images of all four channels in the selected batch, computing the segmentation loss values of the RGBA images of all four channels after image enhancement in the selected batch, and calculating the average value of the segmentation loss values as a final loss value, wherein the segmentation loss value CE of the RGBA image of each four channel is computed as follows:

wherein N represents the total number of pixels in the RGBA image of the four channels; c represents the number of categories to be classified of each pixel, wherein the value is 4, namely four categories of the whole tumor, the tumor core, the enhanced tumor and the background are obtained;

the real category c to which the nth pixel in the segmentation mask image of the single channel corresponding to the RGBA image of the four channels belongs is shown;

representing the probability that the nth pixel in the output result of the input four-channel RGBA image is predicted to be a class c by the multi-scale feature fusion and additive attention segmentation network;

(6) training parameters of the multi-scale feature fusion and additive attention segmentation network by using an ADAM optimizer with a learning rate of 1e-4 according to the segmentation loss value obtained by calculation in the step (5);

(7) repeating the steps (3) to (6) until all the training data of the batch train the multi-scale feature fusion and additive attention segmentation network for one time, then sequentially inputting the verification samples into the multi-scale feature fusion and additive attention segmentation network at the moment, obtaining the segmentation loss value of each four-channel RGBA image in the verification samples by adopting the same method in the step (5), and calculating and obtaining the average segmentation loss value of all the verification samples;

(8) repeating the steps (2) - (7) until the loss of the multi-scale feature fusion and additive attention segmentation network on the verification sample is converged, and finally obtaining the trained multi-scale feature fusion and additive attention segmentation network;

and 5: and processing the brain tumor MR image to be segmented into four-channel RGBA images according to the methods in the step 1 and the step 2, inputting the images into a trained multi-scale feature fusion and additive attention segmentation network, and outputting a segmentation prediction result by the multi-scale feature fusion and additive attention segmentation network.

The bridge module comprises a convolution block, wherein the convolution block is used for accessing a characteristic diagram with the size of 15 multiplied by 256, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain the characteristic diagram with the size of 15 multiplied by 512 and output the characteristic diagram to the multi-scale characteristic fusion module, wherein the first convolution processing and the second convolution processing are respectively realized by 512 convolution kernels with the step length of 1 and the size of 3 multiplied by 3.

Compared with the prior art, the invention has the advantages that the multi-scale feature fusion and additive attention segmentation network is constructed by improving the U-Net network and adding the multi-scale feature fusion module, the fusion feature additive attention module and the encoder-decoder additive attention splicing module in the U-Net network, the fusion feature maps with different receptive fields in the encoder module can be fused to obtain the fusion feature map through the multi-scale feature fusion module, the fusion feature map has detail feature information and high-level semantic feature information, an additive attention mechanism is introduced between the fusion feature map and the scale feature maps of the encoder module through the fusion feature additive attention module, the fusion feature map is adopted to guide the scale feature maps of the encoder module, so that the scale feature maps of the encoder module guided by the fusion feature map are more discriminative, an additive attention mechanism is adopted between the characteristic diagrams of the encoder module and the decoder module through the encoder-decoder additive attention splicing module, wherein the characteristic diagrams of the encoder module and the decoder module are guided by the fused characteristic diagram, so that semantic characteristic differences generated by the encoder module and the decoder module due to different network depths are reduced during jump connection, namely important characteristic information in jump connection characteristics is adaptively learned through the additive attention mechanism, and therefore the semantic characteristic differences existing between the characteristic diagrams of the encoder module and the characteristic diagrams of the decoder module are reduced, and the segmentation accuracy is high.

Detailed Description

The present invention will be described in further detail with reference to examples.

Example (b): a brain tumor MR image segmentation method, comprising the steps of:

step 1: the data in the BraTS2020 training set are divided, the BraTS2020 training set contains data of 293 High-grade Glioma (HGG) patients and 74 Low-grade Glioma (LGG) patients, and the data of 367 patients in total are included in the data of each patient, wherein the data of each patient include 5 three-dimensional images, the 5 three-dimensional images are respectively a three-dimensional MR image of a FLAIR modality, a three-dimensional MR image of a T1 modality, a three-dimensional MR image of a T1ce modality, a three-dimensional MR image of a T2 modality and a three-dimensional Group Trout (GT) segmentation mask image, the 5 three-dimensional images are 240pt long, 240pt wide and 155 channels, namely the 5 three-dimensional images are 240pt × 240pt × 155, and the GT segmentation mask image has a mask image of a Whole Tumor region (tuole, WT), a Tumor Core region (turor Core, TC), an Enhanced Tumor region (Enhanced), ET) and all other regions, the tumor core region is labeled with tag 1, the whole tumor region is labeled with tag 2, the enhanced tumor region is labeled with tag 4, and all other regions are labeled with tag 0, the specific division process is as follows: taking data of 234 high-grade glioma patients and 59 low-grade glioma patients in a BraTS2020 training set as training data, and taking data of 59 high-grade glioma patients and 15 low-grade glioma patients in the BraTS2020 training set as verification data;

step 2.2, respectively carrying out stitching processing on the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the Tlce modality and the standardized three-dimensional MR image corresponding to the three-dimensional MR image of the T2 modality of each patient in the training data and the verification data according to the following modes:

The sizes of the three-dimensional group Truth segmentation mask images are 240pt multiplied by 4, and meanwhile, the images of the same channel in the three-dimensional group Truth segmentation mask image of the patient are saved as RGBA images of four channels

Corresponding segmentation mask image

The size of which is 240pt × 240pt × 1, and dividing the mask image

RGBA images as four channels

At this time, the normalized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, i.e., [ (234+59) × 155) are obtained after the stitching process from the three-dimensional MR image of the FLAIR modality, the three-dimensional MR image of the T1 modality, the three-dimensional MR image of the T1ce modality, and the three-dimensional MR image of the T2 modality of each patient in the training data]RGBA image sum of [ (234+ 59). times.155 ] of four channels]Zhang ShanThe standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, namely the FLAIR modality three-dimensional MR image, the T1 modality three-dimensional MR image, the T1ce modality three-dimensional MR image and the T2 modality three-dimensional MR image of each patient in the verification data are spliced to obtain [ (59+15) × 155]RGBA image sum of four channels [ (59+ 15). times.155-]A single-channel segmentation mask image;

deleting images without tumor regions in 4794 RGBA images of four channels and 4794 segmentation mask images of a single channel, which are obtained by splicing standardized three-dimensional MR images corresponding to the three-dimensional MR images of four modes, namely a three-dimensional MR image of a FLAIR mode, a three-dimensional MR image of a T1 mode, a three-dimensional MR image of a T1ce mode and a three-dimensional MR image of a T2 mode of each patient in verification data, and adopting the obtained 4794 RGBA images of four channels and the obtained 4794 segmentation mask images of the single channel to form a verification sample;

the encoder module is used for extracting feature information of a brain tumor in the four-channel RGBA image input into the encoder module, generating a feature map output with the size of 15pt multiplied by 256, and calling the feature map as an encoding feature map; the encoder module comprises a 1 st convolution block, a 1 st sampling layer, a 2 nd convolution block, a 2 nd sampling layer, a 3 rd convolution block, a 3 rd sampling layer, a 4 th convolution block and a 4 th sampling layer which are sequentially arranged according to the sequence, wherein the 1 st convolution block is used for accessing an image with the size of 240pt multiplied by 4, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the image to obtain a characteristic diagram output with the size of 240pt multiplied by 32, wherein the first convolution processing and the second convolution processing are respectively realized by 32 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 1 st sampling layer is used for accessing a feature map with the size of 240pt multiplied by 32 output by the 1 st convolution block, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 120 multiplied by 32; the 2 nd convolution block is used for accessing a feature map with the size of 120pt × 120pt × 32 output by the 1 st sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 120pt × 120pt × 64, wherein the first convolution processing and the second convolution processing are respectively realized by 64 convolution kernels with the step length of 1 and the size of 3 × 3; the 2 nd sampling layer is used for accessing a feature map with the size of 120pt multiplied by 64 output by the 2 nd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 60pt multiplied by 64; the 3 rd convolution block is used for accessing a characteristic diagram which is output by the 2 nd sampling layer and has the size of 60pt multiplied by 64, and sequentially carrying out first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the characteristic diagram to obtain characteristic diagram output with the size of 60pt multiplied by 128, wherein the first convolution processing and the second convolution processing are respectively realized by adopting 128 convolution kernels with the step length of 1 and the size of 3 multiplied by 3; the 3 rd sampling layer is used for accessing a feature map with the size of 60pt multiplied by 128 output by the 3 rd convolution block, and performing maximum pooling operation with the step size of 2 on the feature map to obtain feature map output with the size of 30pt multiplied by 128; the 4 th convolution block is used for accessing a feature map with the size of 30pt × 30pt × 128 output by the 3 rd sampling layer, and sequentially performing first Batch Normalization (BN), first Mish function activation, first convolution processing, second Batch Normalization, second Mish function activation and second convolution processing on the feature map to obtain feature map output with the size of 30pt × 30pt × 256; the 4 th sampling layer is used for accessing a feature map with the size of 30pt multiplied by 256 output by the 4 th sampling layer, and performing maximum pooling operation with the step length of 2 on the feature map to obtain feature map output with the size of 15pt multiplied by 256, wherein the feature map is a coding feature map;

the bridging module is used for accessing the coding characteristic graph output by the coder module and processing the coding characteristic graph to obtain characteristic graph output with the size of 15pt multiplied by 512;

the multi-scale feature fusion module comprises a 1 st up-sampling layer, a 2 nd up-sampling layer, a 3 rd up-sampling layer, a 4 th up-sampling layer, a feature splicing layer and a convolution block which are sequentially arranged, wherein the output size of the L th convolution block, used for being accessed into the encoder module, of the L th up-sampling layer of the multi-scale feature fusion module is

L is 1, 2, 3, 4, the L-th upsampling layer of the multi-scale feature fusion module is used for performing bilinear interpolation upsampling on the accessed feature graph to obtain the feature graph with the size of 240pt × 240pt × (2)^L-1X 32), the feature splicing layer of the multi-scale feature fusion module is used for accessing the 1 st upper sampling of the multi-scale feature fusion moduleThe feature maps output from the sample layer to the 4 th up-sampling layer are spliced according to channels to obtain feature maps with the sizes of 1 st up-sampling layer to 4 th up-sampling layer of the multi-scale feature fusion module

The convolution block of the multi-scale feature fusion module is used for accessing a feature graph output by the feature splicing layer, and 32 convolution kernels with step length of 1 and size of 3 multiplied by 3 are adopted for the feature graph to carry out convolution operation to obtain feature graph output with size of 240pt multiplied by 32;

the fusion feature additive attention module comprises a 1 st down-sampling layer, a 1 st additive attention block, a 2 nd down-sampling layer, a 2 nd additive attention block, a 3 rd down-sampling layer, a 3 rd additive attention block, a 4 th down-sampling layer and a 4 th additive attention block which are sequentially arranged, wherein the L th down-sampling layer of the fusion feature additive attention module is used for accessing a feature map with the size of 240pt multiplied by 32 output by a convolution block of the multi-scale feature fusion module, and the maximum pooling operation with the step length of 2L is carried out on the accessed feature map to obtain the feature map with the size of 2L

The Lth additive attention block of the fused feature additive attention module is used for accessing the Lth downsampling layer output of the fused feature additive attention module

And the size of the Lth convolution block output of the encoder block of

Is marked as

The size of the Lth convolution block output of the encoder block is

Is marked as

Lth additive attention block of fused feature additive attention module for feature map of access

And

firstly, respectively adopting the quantity of 2^L-1Performing convolution operation on convolution kernels with the size of 1 × 1 and the step size of 1 × 32 to respectively obtain corresponding convolution kernels with the sizes of 1

The size of the feature map obtained at the end is

Characteristic diagram and characteristic diagram of

Performing multiplication operation according to elements to obtain the size of

Outputting the feature map of (1);

the feature map output by the 4 th additive attention block of the fused feature additive attention module is recorded as

The feature map output by the bridge module with a size of 15pt × 15pt × 512 is recorded as

1 st encoder-decoder additive attention tiles for access profile

And characteristic diagrams

And compare the characteristic diagrams

And

the treatment was carried out in the following manner: firstly, respectively adopting the quantity of (2)³X 32), the step length is 1, the convolution kernel with the size of 1 x 1 is carried out the convolution operation, and the sizes are all obtained respectively

Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function and adopting the quantityIs (2)³X 32), step size of 1, convolution operation of convolution kernel size of 1 x 1 and Sigmoid function activation to obtain size of 1

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

the feature map output by the 3 rd additive attention block of the fused feature additive attention module is recorded as

The characteristic map of the 2 nd upsampling layer output of the decoder module is noted

2 nd encoder-decoder additive attention tiles for access profile

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

the feature map output by the 2 nd additive attention block of the fused feature additive attention module is recorded as

The characteristic map of the 3 rd up-sampling layer output of the decoder module is noted

3 rd encoder-decoder additive attention tiles for access profile

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

Then adding the two characteristic graphs obtained at the moment according to elements, and then sequentially activating the Mish function by adopting the quantity of (2)¹X 32), a convolution operation with step size of 1 and size of 1 x 1 convolution kernel, and Sigmoid function excitationAlive to obtain the size of

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

denote the feature map of the 1 st additive attention block output of the fused feature additive attention module as

The output of the 4 th upsampling layer of the decoder module is characterized by

4 th encoder-decoder additive attention tiles for access profile

And characteristic diagrams

And for characteristic diagram

And characteristic diagrams

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram ofAnd characteristic diagram

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

the decoder module comprises a 1 st volume block, a 1 st up-sampling layer, a 2 nd volume block, a 2 nd up-sampling layer, a 3 rd volume block, a 3 rd up-sampling layer, a 4 th volume block and an output volume block which are sequentially arranged; the L-th convolution block of the decoder module is used for accessing a characteristic diagram output by the L-th encoder-decoder additive attention splicing block in the encoder-decoder additive attention splicing module, and processing the characteristic diagram input into the L-th encoder-decoder additive attention splicing block according to the following mode: sequentially carrying out batch normalization and Mish function activation on the characteristic graphs input into the characteristic graphs and adopting the quantity of 2^(5-L-1)Convolution operation with convolution kernel of 3 × 3 and step size of 1 × 32 and output size of

The output convolution block of the decoder module is used for accessing the feature map output by the 4 th convolution block of the decoder module, and convolution kernel advancing with the number of 4, the step length of 1 and the size of 1 multiplied by 1 is adoptedPerforming line convolution operation to obtain characteristic diagram output with the size of 240pt multiplied by 4;

and 4, step 4: training a multi-scale feature fusion and additive attention segmentation network, and the specific process comprises the following steps:

(1) initializing a multi-scale feature fusion and additive attention segmentation network by adopting a he _ normal parameter initialization method;

(4) taking the RGBA images of all four channels after image enhancement processing in the selected batch as input, and inputting the input into a multi-scale feature fusion and additive attention segmentation network to obtain output results of the multi-scale feature fusion and additive attention segmentation network;

(5) according to the output result of the multi-scale feature fusion and additive attention segmentation network and a single-channel segmentation mask image corresponding to the RGBA images of all four channels in the selected batch, computing the segmentation loss values of the RGBA images of all four channels after image enhancement in the selected batch, and calculating the average value to be used as a final loss value, wherein the segmentation loss value CE of the RGBA image of each four channel is computed as follows:

wherein N represents the total number of pixels in the RGBA image of the four channels; c represents the number of classes to be classified for each pixel, and the value is 4, namely four classes of whole tumor, tumor core, enhanced tumor and background；

The real category c to which the nth pixel in the single-channel segmentation mask image corresponding to the four-channel RGBA image belongs is shown;

(7) repeating the steps (3) to (6) until all the training data of the batch train the multi-scale feature fusion and the additive attention segmentation network for one time, then sequentially inputting the verification samples into the multi-scale feature fusion and the additive attention segmentation network, obtaining the segmentation loss value of each four-channel RGBA image in the verification samples by adopting the same method in the step (5), and calculating and obtaining the average segmentation loss value of all the verification samples;

(8) repeating the steps (2) - (7) until the loss of the multi-scale feature fusion and the additive attention segmentation network on the verification sample is converged, and finally obtaining the trained multi-scale feature fusion and additive attention segmentation network;

In this embodiment, the bridge module includes a convolution block, where the convolution block is used to access a feature map with a size of 15 × 15 × 256, and sequentially perform first Batch Normalization (BN), first miph function activation, first convolution processing, second Batch Normalization, second miph function activation, and second convolution processing on the feature map, so as to obtain a feature map with a size of 15 × 15 × 512, and output the feature map to the multi-scale feature fusion module, where the first convolution processing and the second convolution processing are implemented by 512 convolution kernels with a step size of 1 and a size of 3 × 3, respectively.

To verify the superiority of the present invention, the performance of the multi-scale feature fusion and additive attention segmentation network of the present invention was evaluated. Based on the BraTS2020 validation dataset, the Dice coefficient and 95% Hausdorff distance (HD95) were used as evaluation indices for the WT, TC and ET segmented regions. And inputting the output result of the multi-scale feature fusion and additive attention segmentation network and the real label into an evaluation code to obtain a quantitative evaluation result. Wherein, the ablation experimental result is shown in table 1:

TABLE 1 ablation test results

In Table 1, group a represents the underlying U-Net network, i.e., the existing U-Net network, and group b represents the addition of the proposed multi-scale feature fusion module MSFF to the underlying U-Net network. Comparing the segmentation results of the group a and the group b, the introduction of the multi-scale feature fusion module MSFF improves the segmentation performance of the network. Wherein the average Dice of ET, WT and TC are increased by 2.23%, 2.13% and 0.97%, respectively. And the c group shows that on the basis of the b group experiment, the proposed fusion feature additive attention module FFAA is added, and the segmentation results of the b group and the c group are compared to find that the average Dice of the network on the segmentation results of the three regions is further improved after the additive attention operation is added. This indicates that the feature maps of the respective scales of the encoder module are more discriminative after being guided by the additive attention of the global fusion feature map. Comparing group c and group a, an improvement of 3.77%, 2.71% and 2.42% in the average Dice values for ET, WT and TC, respectively, was observed. This benefits from the result of the multi-scale feature fusion and additive attention co-action. Group d represents the addition of the encoder-decoder additive attention stitching module to the base U-Net network, and comparing the segmentation results of group a and group d, it was found that the average Dice of the network with respect to ET, WT, and TC increased by 2.94%, 2.21%, and 2.25%, respectively, after the encoder-decoder additive attention stitching module E-DAAC was added to the base U-Net network. This shows that the semantic feature gap caused by the difference of the two networks due to the difference of the depths of the networks can be reduced by introducing an additive attention mechanism between the feature maps of the encoder module and the decoder module. And comparing the experimental results of the c group and the d group, the introduction of the multi-scale feature fusion module MSFF and the fusion feature additive attention module FFAA can improve the network segmentation performance better than that of the encoder-decoder additive attention splicing module E-DAAC. Group e shows the addition of the Attention Gate module in the Attention Unet to the network based on the experiment of group c. Comparing the group e and the group c, it is found that after the extension Gate module is added, the parameters of the network are increased by about 2M, but the network segmentation performance is only slightly improved. And the f group represents that three modules proposed by the invention are added into the basic U-Net network at the same time, namely the multi-scale feature fusion and additive attention segmentation network of the invention. Comparing the results of five groups of experiments of a, b, c, d and f, the multi-scale feature fusion and additive attention segmentation network of the invention obtains the best segmentation performance on three tumor regions. Compared with the basic U-Net network, the multi-scale feature fusion and additive attention segmentation network improves the average Dice of ET, WT and TC by 6.23%, 3.53% and 5.93% respectively. Furthermore, comparing group e and group f, it was found that the parameters of the network of group f were about 1.6M less than those of group e, but the segmentation results were much higher than the latter. This shows that the encoder-decoder additive Attention splicing module E-DAAC proposed in the multi-scale feature fusion and additive Attention segmentation network of the present invention has fewer network parameters and better effect on the improvement of the network segmentation result than the Attention Gate module. These results demonstrate the increased effectiveness of the proposed MSFF, FFAA, and E-DAAC for brain tumor MRI image segmentation problems based on the improvement of the underlying U-Net network.

The method of the present invention, which employs multi-scale feature fusion and additive attention segmentation networks, is compared with the existing segmentation methods using several networks, and the results of the comparison experiments are shown in table 2:

table 2 comparative experimental results

Note: the bold numbers indicate the optimal values of the column.

Analysis of Table 2 reveals that: the invention achieves remarkable segmentation performance about ET, WT and TC, and the segmentation results about ET, WT and TC in various indexes are superior to those of segmentation methods adopting other existing networks. This demonstrates the superiority of the multi-scale feature fusion and additive attention segmentation network of the present invention and the superiority of the method of the present invention.

Claims

1. A brain tumor MR image segmentation method is characterized by comprising the following steps:

in the formula (1), v represents a voxel value in the original three-dimensional MR image, mu represents a mean value of a region of which the voxel value is greater than zero in the original three-dimensional MR image, sigma represents a standard deviation of the region of which the voxel value is greater than zero in the original three-dimensional MR image, and v' represents a voxel value in a normalized three-dimensional MR image obtained after the original three-dimensional MR image is normalized by Z-Score;

Corresponding segmentation mask image

The size of the mask image is 240pt multiplied by 1, and the mask image is divided

RGBA images as four channels

In this case, standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modalities, i.e., the FLAIR modality three-dimensional MR image, the T1 modality three-dimensional MR image, the T1ce modality three-dimensional MR image, and the T2 modality three-dimensional MR image, of each patient in the training data are collectively obtained as [ (234+59) × 155]RGBA image sum of four channels [ (234+ 59). times.155-]A single-channel segmentation mask image is displayed, and standardized three-dimensional MR images corresponding to the three-dimensional MR images of the four modes, namely the FLAIR mode three-dimensional MR image, the T1 mode three-dimensional MR image, the T1ce mode three-dimensional MR image and the T2 mode three-dimensional MR image of each patient in the verification data are spliced to obtain [ (59+15) multiplied by 155]RGBA image sum of four channels [ (59+ 15). times.155-]A single-channel segmentation mask image;

the fusion characteristics are addedThe attention module comprises a 1 st down-sampling layer, a 1 st additive attention block, a 2 nd down-sampling layer, a 2 nd additive attention block, a 3 rd down-sampling layer, a 3 rd additive attention block, a 4 th down-sampling layer and a 4 th additive attention block which are sequentially arranged, wherein the L th down-sampling layer of the fusion feature additive attention module is used for accessing a feature map with the size of 240pt multiplied by 32 output by a convolution block of the multi-scale feature fusion module, and the step length of the accessed feature map is 2 pt multiplied by 240pt multiplied by 32^LMaximum pooling operation of obtaining a size of

And the size of the Lth convolution block output of said encoder block is

Is marked as

The size of the L convolution block output of the encoder module is

Is marked as

The Lth addition of the fusion feature additive attention moduleFeature graph for attention Block for Access

And

The size of the feature map obtained at the end is

Characteristic diagram and characteristic diagram of

Performing multiplication operation according to elements to obtain the size of

Outputting the feature map of (1);

The characteristic graph with the output size of 15pt multiplied by 512 of the bridge module is recorded as

And characteristic diagrams

And compare the characteristic diagrams

And

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

The treatment was carried out in the following manner: firstly, respectively adopting the quantity of 2²Performing convolution operation on convolution kernels with the length of x 32, the step length of 1 and the size of 1 x 1 to respectively obtain the convolution kernels with the sizes of 1 x 1

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The feature map of (2), the size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

The characteristic diagram of the output of the 3 rd up-sampling layer of the decoder module is marked as

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (2);

The 4 th encoder-decoder additive attention splicing block is used for accessing the characteristic diagram

And characteristic diagrams

And compare the characteristic diagrams

And characteristic diagrams

A feature map obtained at this time, and a feature map obtained at this time

Performing multiplication operation according to elements to obtain the size of

The size obtained at this time is finally

Characteristic diagram and characteristic diagram of

Splicing according to the channel to obtain the size of

Outputting the feature map of (1);

the decoder module comprises a 1 st volume block, a 1 st up-sampling layer, a 2 nd volume block, a 2 nd up-sampling layer, a 3 rd volume block, a 3 rd up-sampling layer, a 4 th volume block and an output volume block which are sequentially arranged; the L-th convolution block of the decoder module is used for accessing the characteristic diagram output by the L-th encoder-decoder additive attention splicing block in the encoder-decoder additive attention splicing module and processing the characteristic diagram input into the L-th encoder-decoder additive attention splicing block according to the following mode: sequentially carrying out batch normalization on the characteristic diagrams input into the device,Mish function activation and number of 2^(5-L-1)X 32, step size 1, convolution operation with a 3 x 3 convolution kernel, output size

(2) randomly dividing the training samples obtained in the second step into a plurality of batchs, enabling each batch to contain a plurality of training samples, dividing the training samples into training sample numbers/batchsize batches if the training sample numbers can be evenly divided by the batchsize, and eliminating the rest part if the training sample numbers cannot be evenly divided by the batchsize to obtain | training sample numbers/batchsize | batches, wherein the batchsize is 16 and | is an integer symbol;

(6) according to the segmentation loss value obtained by calculation in the step (5), using an ADAM optimizer with a learning rate of 1e-4 to train parameters of the multi-scale feature fusion and additive attention segmentation network;

2. The method of claim 1, wherein the bridge module comprises a convolution block, and the convolution block is configured to access a feature map with a size of 15 × 15 × 256, and sequentially perform a first Batch Normalization (BN), a first Mish function activation, a first convolution processing, a second Batch Normalization, a second Mish function activation, and a second convolution processing on the feature map to obtain a feature map with a size of 15 × 15 × 512, and output the feature map to the multi-scale feature fusion module, wherein the first convolution processing and the second convolution processing are respectively implemented by 512 convolution kernels with a step size of 1 and a size of 3 × 3.