Disclosure of Invention
The invention aims to provide a three-dimensional semantic segmentation network structure with multi-scale feature extraction and multi-path attention mechanism, which extracts features of hippocampus regions with different sizes through a multi-scale residual module, gathers and weights information in different dimensions through the fusion of multi-path attention modules, so that relevant features of a hippocampus are highlighted and irrelevant features are suppressed, and the three-dimensional automatic segmentation method improves the accuracy of network segmentation by combining the output of a classifier of an integrated learning path with the output of a decoder path.
The technical scheme of the invention comprises the following steps:
step 1: acquiring MRI image data in a database and preprocessing the MRI image data;
step 2: designing a three-dimensional hippocampus semantic segmentation network structure based on a multi-scale feature multi-path attention fusion mechanism;
and step 3: training, verifying and testing data set partitioning;
and 4, step 4: training a model and storing a model file with the best performance;
and 5: and testing the model and evaluating the segmentation result.
Further, the step 1 comprises:
A. acquiring image data containing hippocampus annotation in an ADNI (https:// ida.loni.usc.edu /) database, and carrying out left and right hippocampal merging processing;
B. cutting the hippocampal region of the image and the label in the data set, selecting a slice containing a hippocampus from the transverse direction, the sagittal direction and the coronal direction to cut the image and the label data, and removing background information in the data;
C. and carrying out standardization processing on the cut image data.
Further, the three-dimensional hippocampus semantic segmentation network based on the fusion of the multi-scale feature and multi-path attention mechanism in the step 2 takes the functional modules and layers in the network as units, and comprises the following steps in order of realizing different functions:
A. an input layer comprising a data input layer;
B. the network comprises nine multi-scale residual modules, and each multi-scale residual module comprises three-dimensional convolutional layers with the convolutional kernel size of 1, one three-dimensional convolutional layer with the convolutional kernel size of 3, one three-dimensional convolutional layer with the convolutional kernel size of 5, five batch normalization layers, four Relu activation layers and one addition fusion layer;
C. the network comprises four pooling layers, and each pooling layer comprises a three-dimensional maximum pooling layer with a pooling core size of 2;
D. the network comprises four dimension adjusting modules, and each dimension adjusting module comprises a three-dimensional convolution layer with a convolution kernel size of 1, a batch normalization layer and a Relu activation layer;
E. the network comprises four channel attention modules, and each channel attention module comprises a three-dimensional global average pooling layer, a three-dimensional global maximum pooling layer, four batch normalization layers, four full-connection layers, two Relu activation layers, an additive fusion layer, a Sigmoid activation layer, a reconstruction layer and an element multiplication layer;
F. the network comprises four space attention modules, each space attention module comprises a three-dimensional convolution layer with convolution kernel size of 2, three-dimensional convolution layers with convolution kernel size of 1, a three-dimensional deconvolution layer with convolution kernel size of 3, an additive fusion layer, a Relu activation layer, a Sigmoid activation layer, an upper sampling layer, an element multiplication layer and a batch normalization layer;
G. the integrated learning branch structure comprises four layers of three-dimensional convolution layers with convolution kernel size of 1, three layers of upper sampling layers and three layers of element addition fusion layers;
H. the network comprises four jump connection structures, and each jump connection structure comprises an upper sampling layer and a channel splicing and fusing layer;
I. and the output layer comprises a three-dimensional convolution layer with the convolution kernel size of 1.
Further, the step 3 comprises:
A. dividing the acquired ADNI data set into three groups of Alzheimer's disease, mild cognitive impairment and normal people according to disease states;
B. randomly selecting data from the three groups as a training data set on average, setting a random number as a parameter for randomly moving a slice in the data preprocessing of the first step to cut the original data under the condition of ensuring that no hippocampus is omitted, wherein the scale of the training data set can be expanded through multiple times of cutting, and the data enhancement effect is achieved;
C. averagely selecting data from the three groups of residual data as a verification set;
D. the remaining data in the three groups are taken as a test set.
Further, the step 4 comprises:
A. inputting the training set and the verification set into the network for training;
B. if the callback function is used for setting the learning rate, the learning rate can be attenuated according to the reduction condition of the loss value of the verification set in the training process;
C. if the model is set to stop early in the callback function, stopping training of the model according to the loss value reduction condition of the verification set in the training process, and storing the model with the lowest loss value on the verification set;
D、
g represents a tag pixel value and P represents a predicted pixel value;
E. the Dice coefficient on the validation set of the resulting model was trained to be 0.8379.
Further, the step 5 comprises:
A. and (3) carrying out a three-dimensional image hippocampus segmentation test on the model by using data in the test set, wherein the Dice coefficient on the test set is 0.8269.
The invention has the beneficial effects that:
(1) the multi-scale residual error module is used for carrying out multi-scale feature extraction and fusion on the hippocampus region with a complex shape, so that the segmentation precision is improved;
(2) the invention combines a multi-attention mechanism module to carry out attention weighted convergence on multi-dimensional information, screens and weights the characteristics with higher activation values from channel dimension to space dimension, and improves the identification and segmentation capability of a network on a hippocampal target;
(3) the invention constructs the branch path classifier to be combined with the output of the decoder path classifier, and improves the classification and segmentation capability of the whole network in an integrated learning mode.
Detailed Description
The invention can automatically process the brain magnetic resonance image and realize the three-dimensional automatic segmentation of the hippocampus; aiming at the problems of position change, complex shape and the like of the hippocampus, a semantic segmentation network is constructed by utilizing a more effective image feature processing module to improve the segmentation accuracy of a trained model, and more reliable information support is provided for the diagnosis of the Alzheimer's disease.
As shown in fig. 1, a three-dimensional hippocampus semantic network segmentation method based on a multi-scale feature multi-path attention fusion mechanism includes the following 5 steps:
1. acquiring a hippocampus image and label data in an ADNI database and preprocessing the image and the label data;
2. designing a three-dimensional hippocampus semantic segmentation network structure based on multi-scale feature extraction and a multi-path attention fusion mechanism;
3. training and verifying data set division;
4. training a model and storing a model file with the best performance;
5. and testing the model and evaluating the segmentation result.
Further, the step 1 comprises:
1) acquiring a brain magnetic resonance image and label data in the ADNI data set, and merging a left hippocampal label and a right hippocampal label;
2) further carrying out hippocampal region cutting on the original image and the label, and selecting a section containing a hippocampus from three directions of a transverse direction, a sagittal direction and a coronal direction to cut the image and the label data;
3) and further carrying out standardization processing on the cut data set.
Further, the network structure in step 2 includes the following structures in order from input to output, in terms of a layer unit:
1) constructing an Input layer, wherein the Input layer comprises an Input layer, inputting a data set into a network, the data format is a five-dimensional structure, and each dimension is a voxel block, a pixel length, a pixel width, a pixel height and an image channel;
2) constructing an encoder, inputting the output of an input layer into a first layer of the encoder, wherein the encoder structure consists of four multi-scale residual error modules, four layers of pooling layers which are connected end to end, and finally one multi-scale residual error module;
3) constructing multi-scale residual modules, wherein one multi-scale residual module comprises a conv3_3 layer, a conv5_5 layer, an up layer, a shortcut layer and a res _ path layer;
4) the Conv3_3 layer is formed by superposing two Conv3D layers, a BatchNormalization layer and an Activation layer, and three-dimensional convolution, batch normalization and Relu Activation with the convolution kernel size of 3 are carried out on the input of the multi-scale residual error module;
5) the Conv5_5 layer is formed by overlapping two Conv3D layers, a BatchNormalization layer and an Activation layer, and three-dimensional convolution, batch normalization and Relu Activation with the convolution kernel size of 5 are carried out on the input of the multi-scale residual error module;
6) the up layer consists of a concatenate layer, and channel dimension splicing and fusion operations are carried out on the outputs of the conv3_3 layer and the conv5_5 layer;
7) the shortcut layer is formed by overlapping a Conv3D layer and a Batchnormalization layer, and three-dimensional convolution and batch normalization operation with convolution kernel size of 1 are carried out on the input of the multi-scale residual error module;
8) the res _ path layer consists of an Add layer, and the Add addition fusion operation is carried out on the output of the shortcut layer and the up layer;
9) building a pooling layer, wherein the pooling layer comprises a pool _64 layer, a pool _32 layer, a pool _16 layer and a pool _8 layer which are combined, each layer consists of a Max scaling 3D layer, and the three-dimensional maximum pooling operation with the pooling kernel size of 2 is performed on the output of the multi-scale residual error module layer by layer;
10) constructing a decoder, wherein the output of each layer of the encoder corresponds to the corresponding layer of the input decoder, and the decoder structure is formed by combining four dimension adjusting modules, four channel attention modules, four space attention modules, four jump connection structures and four multi-scale residual error modules;
11) constructing dimension adjusting modules, wherein each dimension adjusting module comprises a layer x;
12) the x layer is formed by overlapping a Conv3D layer, a BatchNormalization layer and an Activation layer, and three-dimensional convolution, batch normalization and Relu Activation with the convolution kernel size of 1 are carried out on the input of the dimension adjusting module;
13) and constructing channel attention modules, wherein one channel attention module comprises an x _ s _ avg layer, an x _ e _ avg layer, an x _ s _ max layer, an x _ e layer and a result layer. The structure diagram is shown in FIG. 3, wherein C, L, W, H, r represents the channel, space voxel length, width, height and dimension compression ratio of the characteristic diagram;
14) the x _ s _ avg layer is formed by superposing a GlobavalagePooling 3D layer and a Batchnormalization layer, and the three-dimensional global average pooling and batch normalization operations are carried out on the input of the channel attention module;
15) the x _ e _ avg layer is formed by overlapping a Dense layer, an Activation layer, a Batchnormalization layer and a Dense layer, and the output of the x _ s _ avg layer is subjected to dimensionality scaling;
16) the x _ s _ max layer is formed by overlapping a GlobalMaxPlaooling 3D layer and a Batchnormalization layer, and three-dimensional global maximum pooling and batch normalization operations are carried out on the input of the channel attention module;
17) the x _ e _ max layer is formed by overlapping a Dense layer, an Activation layer, a Batchnormalization layer and a Dense layer, and the dimensionality scaling is carried out on the output of the x _ s _ max layer;
18) the x _ e layer is formed by superposing an add layer, an Activation layer and a restore layer, and the outputs of the x _ e _ avg layer and the x _ e _ max layer are subjected to addition fusion, sigmoid Activation and reconstruction operation;
19) the result layer is composed of a multiplex layer, and element multiplication operation is carried out on the input of the channel attention module and the output of the x _ e layer;
20) constructing a spatial attention module, wherein one spatial attention module comprises a theta _ x layer, a phi _ g layer, an upsample _ g layer, a concat _ xg layer, an act _ xg layer, a psi layer, a sigmoid _ xg layer, an upsample _ psi layer, a y layer, a result layer and a result _ bn layer;
21) the theta _ x layer is composed of a Conv3D layer, and convolution operation with the convolution kernel size of 2 and the step size of 2 is carried out on the input of the spatial attention module;
22) the phi _ g layer is composed of a Conv3D layer, and performs a three-dimensional convolution operation with a convolution kernel size of 1 on the input of the spatial attention module;
23) the upsample _ g layer consists of a Conv3DTranspose layer, and three-dimensional deconvolution operation with the convolution kernel size of 3 is carried out on the output of the phi _ g layer;
24) the concat _ xg layer consists of an add layer and performs addition fusion operation on the outputs of the upsample _ g layer and the theta _ x layer;
25) the act _ xg layer consists of an Activation layer and performs Relu Activation operation on the output of the concat _ xg layer;
26) the psi layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the act _ xg layer;
27) the sigmoid _ xg layer consists of an Activation layer and carries out sigmoid Activation operation on the output of the psi layer;
28) the upsample _ psi layer consists of an UpSampling3D layer and performs UpSampling operation on the output of the sigmoid _ xg layer;
29) the y layer consists of a multiplex layer, and element multiplication operation is carried out on the input of the space attention module and the output of the upsample _ psi layer;
30) the result layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the y layer;
31) the result _ bn layer consists of a Batchnormalization layer, and batch normalization operation is carried out on the output of the result layer;
32) constructing a jump connection structure, wherein the jump connection structure comprises a layer of up _16, a layer of up _32, a layer of up _64 and a layer of up _128, each layer is formed by overlapping a layer of UpSampling3D and a layer of concatanate, and the characteristic diagram in the decoder is subjected to up-sampling, splicing and fusing operations;
33) the decoder multi-scale residual error module has the same structure as the encoder multi-scale residual error module;
34) constructing an integrated learning branch, and correspondingly inputting the output of each layer of the decoder into a corresponding layer of an integrated learning branch structure, wherein the integrated learning branch structure is formed by combining a layer up _ conv _16_11, a layer up _ conv _32_11, a layer up _16_11, a layer add _01, a layer up _ conv _64_11, a layer up _ add _01, a layer add _02, a layer up _ conv _128_11, a layer up _ add _02 and a layer add _ 03;
35) the up _ Conv _16_11 is composed of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;
36) the up _ Conv _32_11 is composed of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;
37) the up _16_11 layer consists of a layer of UpSamplling 3D, and performs up-sampling operation on the output of the up _ conv _16_11 layer;
38) the add _01 layer consists of an add layer and performs addition fusion operation on the outputs of the up _16_11 layer and the up _ conv _32_11 layer;
39) the up _ Conv _64_11 layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;
40) the up _ add _01 layer consists of an UpSamplling 3D layer and performs UpSampling operation on the output of the add _01 layer;
41) the add _02 layer consists of an add layer and performs addition fusion operation on the outputs of the up _ add _01 layer and the up _ conv _64_11 layer;
42) the up _ Conv _128_11 layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;
43) the up _ add _02 layer consists of an UpSamplling 3D layer and performs up-sampling operation on the output of the add _02 layer;
44) the add _03 layer consists of an add layer and performs addition fusion operation on the outputs of the up _ add _02 layer and the up _ conv _128_11 layer;
45) constructing an output layer, inputting the output of the last layer of the integrated learning branch structure into the output layer, wherein the output layer comprises a conv10 layer;
46) the Conv10 layer consists of a Conv3D layer, the three-dimensional convolution with the convolution kernel size of 1 and sigmoid activation operation are carried out on the output of the integrated learning branch structure, and a segmentation result is output;
47) designing an evaluation function using a two-classification Dice coefficient as a model, and calculating the Dice coefficient of an output result and a label;
48) the Dice coefficient is defined as:
g represents a label pixel value, P represents a prediction pixel value, the value range is a closed interval from 0 to 1, 1 is completely overlapped, and 0 is completely not overlapped;
49) designing a loss function using a Dice loss function as a model;
50) the Dice loss function is defined as: loss 1-Dice.
Further, the step 3 comprises:
1) dividing the data in the acquired ADNI data set into three groups of Alzheimer's disease, mild cognitive impairment and normal people according to disease states;
2) randomly selecting data from the three groups as a training data set on average, setting a random number as a parameter for randomly moving a slice to cut the original data under the condition of ensuring that no hippocampus is missed, and expanding the scale of the training data set through multiple cutting;
3) averagely selecting data from the three groups of residual data as a verification set;
4) the remaining data in the three groups served as test sets.
Further, the step 4 comprises:
1) sending the images and the labels of the training set and the verification set into a network for off-line training;
2) if the callback function is used for learning rate attenuation and model early-stop setting, the model can perform learning rate attenuation and training stop according to the reduction condition of the loss value of the verification set in the training process, and the model with the lowest loss value on the verification set is stored.
Further, the step 5 comprises:
1) the obtained Dice coefficient of the trained model on the verification set is 0.8379;
2) carrying out a three-dimensional image hippocampus segmentation test on the model by using the data in the test set, wherein the Dice coefficient obtained by the test set is 0.8269;
3) the segmentation results obtained by the model on the test set are shown in fig. 6.
In summary, the experimental result of the invention shows that the utilization rate of the network to the information of a plurality of dimensional features can be improved by designing the semantic segmentation network structure aiming at the structural characteristics of the hippocampus, so that the pixel dense prediction capability is improved, and the hippocampus segmentation performance is improved.