CN113052856A

CN113052856A - Hippocampus three-dimensional semantic network segmentation method based on multi-scale feature multi-path attention fusion mechanism

Info

Publication number: CN113052856A
Application number: CN202110269960.8A
Authority: CN
Inventors: 林岚; 吴玉超; 吴水才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-29

Abstract

A hippocampal three-dimensional semantic network segmentation method based on multi-scale feature multi-channel attention fusion mechanism belongs to the field of medical image processing. The invention includes the following steps: preprocessing the publicly marked hippocampus image data, cutting the original data into image blocks including the hippocampus; based on multi-scale feature extraction, multi-channel attention fusion mechanism and branch classifier integrated learning strategy, construct a new network structure of hippocampus 3D semantic segmentation; divide the data set; offline model training, obtain model weight parameters for 3D hippocampus structure; use the model file to segment the test set images and evaluate the segmentation results. By designing the semantic segmentation network structure conforming to the characteristics of the three-dimensional hippocampus image, the present invention can improve the utilization rate of the multi-dimensional image information by the network, thereby improving the pixel-intensive prediction ability and improving the hippocampal segmentation performance.

Description

Hippocampus three-dimensional semantic network segmentation method based on multi-scale feature multi-path attention fusion mechanism

Technical Field

The invention relates to the field of medical image processing, in particular to a hippocampus three-dimensional semantic network segmentation method based on a multi-scale feature multi-path attention fusion mechanism.

Background

The hippocampus is an important structure in the brain, is located between the thalamus and the medial temporal lobe of the brain, belongs to a part of the limbic system, and is mainly responsible for functions such as storage, conversion, orientation and the like of long-term memory. Its nerve cells are very fragile and vulnerable, and once the nerve cells of the hippocampus die, memory of a human is lost. The morphological change is an important biological marker for studying long-term memory, and for example, judging whether the hippocampus volume is shrunk or not based on Magnetic Resonance Imaging (MRI) is one of the key technologies for diagnosing alzheimer disease. However, the hippocampus does not have a good distinction degree with the surrounding brain tissue structure in the magnetic resonance image, the size, position and structure details of the hippocampus are different from person to person, and the complexity of the form makes the manual division of the hippocampus very difficult. With the development of deep learning technology, a semantic segmentation method based on a convolutional neural network is utilized to bring new eosin for the automatic segmentation of the hippocampus.

The application number '201810775957.1', entitled brain MRI three-dimensional segmentation method based on deep learning, introduces a typical coding-decoding semantic segmentation network structure comprising 3 one-dimensional convolution layers, 15 three-dimensional convolution layers and 4 maximum pooling layers to segment the hippocampus structure, firstly, original image data is segmented into images with the resolution of 32 x 32, and then the images are sent into a network for training, although certain segmentation precision is obtained, the network structure is simpler, and an improved space is provided on the semantic network structure. With the introduction of attention mechanism, how to more efficiently identify and segment the region of interest by using key features has been noticed by researchers, application No. 201911179566.4, entitled "method for extracting hippocampus based on 3D neural network human brain nuclear magnetic resonance image" introduces a semantic segmentation method using channel attention mechanism in decoding network structure, and performs aggregation and recombination on channel features through attention mechanism, thereby effectively highlighting salient features of the region of interest, suppressing expression of irrelevant features, and improving segmentation precision of the hippocampus by network. However, in the method, attention coefficient weighting is only carried out on the channel dimension characteristics, more characteristic information exists in the space dimension voxels, in the three-dimensional segmentation network, better segmentation performance improvement can be brought by utilizing the characteristics of the voxels, and an improvement space also exists in the segmentation network applying the attention mechanism.

Aiming at the defects of the prior art, the invention provides a brand-new three-dimensional semantic segmentation network structure for segmenting the hippocampus aiming at the brain nuclear magnetic resonance image. The network uses multi-scale feature extraction and fusion structure in the coding path aiming at the characteristics of complicated shape, size and the like of the hippocampus, and then optimizes the feature extractor of the network by carrying out residual connection on the fused output through a direct connection path; the network fuses a multi-channel attention mechanism in a decoding path to carry out weighting processing on channel dimension and space dimension characteristics, and voxel characteristic information under each dimension is fully utilized; the network uses three-dimensional convolution with convolution kernel size of 1 at the output end of each layer of decoder to construct a group of classifiers to realize ensemble learning, and finally combines the output results of the two classifiers at the output end of the network to obtain the optimal segmentation performance in an ensemble learning mode.

Disclosure of Invention

The invention aims to provide a three-dimensional semantic segmentation network structure with multi-scale feature extraction and multi-path attention mechanism, which extracts features of hippocampus regions with different sizes through a multi-scale residual module, gathers and weights information in different dimensions through the fusion of multi-path attention modules, so that relevant features of a hippocampus are highlighted and irrelevant features are suppressed, and the three-dimensional automatic segmentation method improves the accuracy of network segmentation by combining the output of a classifier of an integrated learning path with the output of a decoder path.

The technical scheme of the invention comprises the following steps:

step 1: acquiring MRI image data in a database and preprocessing the MRI image data;

step 2: designing a three-dimensional hippocampus semantic segmentation network structure based on a multi-scale feature multi-path attention fusion mechanism;

and step 3: training, verifying and testing data set partitioning;

and 4, step 4: training a model and storing a model file with the best performance;

and 5: and testing the model and evaluating the segmentation result.

Further, the step 1 comprises:

A. acquiring image data containing hippocampus annotation in an ADNI (https:// ida.loni.usc.edu /) database, and carrying out left and right hippocampal merging processing;

B. cutting the hippocampal region of the image and the label in the data set, selecting a slice containing a hippocampus from the transverse direction, the sagittal direction and the coronal direction to cut the image and the label data, and removing background information in the data;

C. and carrying out standardization processing on the cut image data.

Further, the three-dimensional hippocampus semantic segmentation network based on the fusion of the multi-scale feature and multi-path attention mechanism in the step 2 takes the functional modules and layers in the network as units, and comprises the following steps in order of realizing different functions:

A. an input layer comprising a data input layer;

B. the network comprises nine multi-scale residual modules, and each multi-scale residual module comprises three-dimensional convolutional layers with the convolutional kernel size of 1, one three-dimensional convolutional layer with the convolutional kernel size of 3, one three-dimensional convolutional layer with the convolutional kernel size of 5, five batch normalization layers, four Relu activation layers and one addition fusion layer;

C. the network comprises four pooling layers, and each pooling layer comprises a three-dimensional maximum pooling layer with a pooling core size of 2;

D. the network comprises four dimension adjusting modules, and each dimension adjusting module comprises a three-dimensional convolution layer with a convolution kernel size of 1, a batch normalization layer and a Relu activation layer;

E. the network comprises four channel attention modules, and each channel attention module comprises a three-dimensional global average pooling layer, a three-dimensional global maximum pooling layer, four batch normalization layers, four full-connection layers, two Relu activation layers, an additive fusion layer, a Sigmoid activation layer, a reconstruction layer and an element multiplication layer;

F. the network comprises four space attention modules, each space attention module comprises a three-dimensional convolution layer with convolution kernel size of 2, three-dimensional convolution layers with convolution kernel size of 1, a three-dimensional deconvolution layer with convolution kernel size of 3, an additive fusion layer, a Relu activation layer, a Sigmoid activation layer, an upper sampling layer, an element multiplication layer and a batch normalization layer;

G. the integrated learning branch structure comprises four layers of three-dimensional convolution layers with convolution kernel size of 1, three layers of upper sampling layers and three layers of element addition fusion layers;

H. the network comprises four jump connection structures, and each jump connection structure comprises an upper sampling layer and a channel splicing and fusing layer;

I. and the output layer comprises a three-dimensional convolution layer with the convolution kernel size of 1.

Further, the step 3 comprises:

A. dividing the acquired ADNI data set into three groups of Alzheimer's disease, mild cognitive impairment and normal people according to disease states;

B. randomly selecting data from the three groups as a training data set on average, setting a random number as a parameter for randomly moving a slice in the data preprocessing of the first step to cut the original data under the condition of ensuring that no hippocampus is omitted, wherein the scale of the training data set can be expanded through multiple times of cutting, and the data enhancement effect is achieved;

C. averagely selecting data from the three groups of residual data as a verification set;

D. the remaining data in the three groups are taken as a test set.

Further, the step 4 comprises:

A. inputting the training set and the verification set into the network for training;

B. if the callback function is used for setting the learning rate, the learning rate can be attenuated according to the reduction condition of the loss value of the verification set in the training process;

C. if the model is set to stop early in the callback function, stopping training of the model according to the loss value reduction condition of the verification set in the training process, and storing the model with the lowest loss value on the verification set;

D、

g represents a tag pixel value and P represents a predicted pixel value;

E. the Dice coefficient on the validation set of the resulting model was trained to be 0.8379.

Further, the step 5 comprises:

A. and (3) carrying out a three-dimensional image hippocampus segmentation test on the model by using data in the test set, wherein the Dice coefficient on the test set is 0.8269.

The invention has the beneficial effects that:

(1) the multi-scale residual error module is used for carrying out multi-scale feature extraction and fusion on the hippocampus region with a complex shape, so that the segmentation precision is improved;

(2) the invention combines a multi-attention mechanism module to carry out attention weighted convergence on multi-dimensional information, screens and weights the characteristics with higher activation values from channel dimension to space dimension, and improves the identification and segmentation capability of a network on a hippocampal target;

(3) the invention constructs the branch path classifier to be combined with the output of the decoder path classifier, and improves the classification and segmentation capability of the whole network in an integrated learning mode.

Drawings

FIG. 1 is a flow chart of a method for segmenting a three-dimensional semantic network of a hippocampus based on a multi-scale feature multi-path attention fusion mechanism according to the present invention;

FIG. 2 is a schematic diagram of the multi-scale residual module structure of the present invention;

FIG. 3 is a schematic view of a channel attention module configuration of the present invention;

FIG. 4 is a schematic diagram of a spatial attention module configuration of the present invention;

FIG. 5 is a schematic diagram of the overall network architecture of the present invention;

FIG. 6 is a diagram comparing the segmentation result of the present invention with the segmentation result of a classical three-dimensional U-type network.

Detailed Description

The invention can automatically process the brain magnetic resonance image and realize the three-dimensional automatic segmentation of the hippocampus; aiming at the problems of position change, complex shape and the like of the hippocampus, a semantic segmentation network is constructed by utilizing a more effective image feature processing module to improve the segmentation accuracy of a trained model, and more reliable information support is provided for the diagnosis of the Alzheimer's disease.

As shown in fig. 1, a three-dimensional hippocampus semantic network segmentation method based on a multi-scale feature multi-path attention fusion mechanism includes the following 5 steps:

1. acquiring a hippocampus image and label data in an ADNI database and preprocessing the image and the label data;

2. designing a three-dimensional hippocampus semantic segmentation network structure based on multi-scale feature extraction and a multi-path attention fusion mechanism;

3. training and verifying data set division;

4. training a model and storing a model file with the best performance;

5. and testing the model and evaluating the segmentation result.

Further, the step 1 comprises:

1) acquiring a brain magnetic resonance image and label data in the ADNI data set, and merging a left hippocampal label and a right hippocampal label;

2) further carrying out hippocampal region cutting on the original image and the label, and selecting a section containing a hippocampus from three directions of a transverse direction, a sagittal direction and a coronal direction to cut the image and the label data;

3) and further carrying out standardization processing on the cut data set.

Further, the network structure in step 2 includes the following structures in order from input to output, in terms of a layer unit:

1) constructing an Input layer, wherein the Input layer comprises an Input layer, inputting a data set into a network, the data format is a five-dimensional structure, and each dimension is a voxel block, a pixel length, a pixel width, a pixel height and an image channel;

2) constructing an encoder, inputting the output of an input layer into a first layer of the encoder, wherein the encoder structure consists of four multi-scale residual error modules, four layers of pooling layers which are connected end to end, and finally one multi-scale residual error module;

3) constructing multi-scale residual modules, wherein one multi-scale residual module comprises a conv3_3 layer, a conv5_5 layer, an up layer, a shortcut layer and a res _ path layer;

4) the Conv3_3 layer is formed by superposing two Conv3D layers, a BatchNormalization layer and an Activation layer, and three-dimensional convolution, batch normalization and Relu Activation with the convolution kernel size of 3 are carried out on the input of the multi-scale residual error module;

5) the Conv5_5 layer is formed by overlapping two Conv3D layers, a BatchNormalization layer and an Activation layer, and three-dimensional convolution, batch normalization and Relu Activation with the convolution kernel size of 5 are carried out on the input of the multi-scale residual error module;

6) the up layer consists of a concatenate layer, and channel dimension splicing and fusion operations are carried out on the outputs of the conv3_3 layer and the conv5_5 layer;

7) the shortcut layer is formed by overlapping a Conv3D layer and a Batchnormalization layer, and three-dimensional convolution and batch normalization operation with convolution kernel size of 1 are carried out on the input of the multi-scale residual error module;

8) the res _ path layer consists of an Add layer, and the Add addition fusion operation is carried out on the output of the shortcut layer and the up layer;

9) building a pooling layer, wherein the pooling layer comprises a pool _64 layer, a pool _32 layer, a pool _16 layer and a pool _8 layer which are combined, each layer consists of a Max scaling 3D layer, and the three-dimensional maximum pooling operation with the pooling kernel size of 2 is performed on the output of the multi-scale residual error module layer by layer;

10) constructing a decoder, wherein the output of each layer of the encoder corresponds to the corresponding layer of the input decoder, and the decoder structure is formed by combining four dimension adjusting modules, four channel attention modules, four space attention modules, four jump connection structures and four multi-scale residual error modules;

11) constructing dimension adjusting modules, wherein each dimension adjusting module comprises a layer x;

12) the x layer is formed by overlapping a Conv3D layer, a BatchNormalization layer and an Activation layer, and three-dimensional convolution, batch normalization and Relu Activation with the convolution kernel size of 1 are carried out on the input of the dimension adjusting module;

13) and constructing channel attention modules, wherein one channel attention module comprises an x _ s _ avg layer, an x _ e _ avg layer, an x _ s _ max layer, an x _ e layer and a result layer. The structure diagram is shown in FIG. 3, wherein C, L, W, H, r represents the channel, space voxel length, width, height and dimension compression ratio of the characteristic diagram;

14) the x _ s _ avg layer is formed by superposing a GlobavalagePooling 3D layer and a Batchnormalization layer, and the three-dimensional global average pooling and batch normalization operations are carried out on the input of the channel attention module;

15) the x _ e _ avg layer is formed by overlapping a Dense layer, an Activation layer, a Batchnormalization layer and a Dense layer, and the output of the x _ s _ avg layer is subjected to dimensionality scaling;

16) the x _ s _ max layer is formed by overlapping a GlobalMaxPlaooling 3D layer and a Batchnormalization layer, and three-dimensional global maximum pooling and batch normalization operations are carried out on the input of the channel attention module;

17) the x _ e _ max layer is formed by overlapping a Dense layer, an Activation layer, a Batchnormalization layer and a Dense layer, and the dimensionality scaling is carried out on the output of the x _ s _ max layer;

18) the x _ e layer is formed by superposing an add layer, an Activation layer and a restore layer, and the outputs of the x _ e _ avg layer and the x _ e _ max layer are subjected to addition fusion, sigmoid Activation and reconstruction operation;

19) the result layer is composed of a multiplex layer, and element multiplication operation is carried out on the input of the channel attention module and the output of the x _ e layer;

20) constructing a spatial attention module, wherein one spatial attention module comprises a theta _ x layer, a phi _ g layer, an upsample _ g layer, a concat _ xg layer, an act _ xg layer, a psi layer, a sigmoid _ xg layer, an upsample _ psi layer, a y layer, a result layer and a result _ bn layer;

21) the theta _ x layer is composed of a Conv3D layer, and convolution operation with the convolution kernel size of 2 and the step size of 2 is carried out on the input of the spatial attention module;

22) the phi _ g layer is composed of a Conv3D layer, and performs a three-dimensional convolution operation with a convolution kernel size of 1 on the input of the spatial attention module;

23) the upsample _ g layer consists of a Conv3DTranspose layer, and three-dimensional deconvolution operation with the convolution kernel size of 3 is carried out on the output of the phi _ g layer;

24) the concat _ xg layer consists of an add layer and performs addition fusion operation on the outputs of the upsample _ g layer and the theta _ x layer;

25) the act _ xg layer consists of an Activation layer and performs Relu Activation operation on the output of the concat _ xg layer;

26) the psi layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the act _ xg layer;

27) the sigmoid _ xg layer consists of an Activation layer and carries out sigmoid Activation operation on the output of the psi layer;

28) the upsample _ psi layer consists of an UpSampling3D layer and performs UpSampling operation on the output of the sigmoid _ xg layer;

29) the y layer consists of a multiplex layer, and element multiplication operation is carried out on the input of the space attention module and the output of the upsample _ psi layer;

30) the result layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the y layer;

31) the result _ bn layer consists of a Batchnormalization layer, and batch normalization operation is carried out on the output of the result layer;

32) constructing a jump connection structure, wherein the jump connection structure comprises a layer of up _16, a layer of up _32, a layer of up _64 and a layer of up _128, each layer is formed by overlapping a layer of UpSampling3D and a layer of concatanate, and the characteristic diagram in the decoder is subjected to up-sampling, splicing and fusing operations;

33) the decoder multi-scale residual error module has the same structure as the encoder multi-scale residual error module;

34) constructing an integrated learning branch, and correspondingly inputting the output of each layer of the decoder into a corresponding layer of an integrated learning branch structure, wherein the integrated learning branch structure is formed by combining a layer up _ conv _16_11, a layer up _ conv _32_11, a layer up _16_11, a layer add _01, a layer up _ conv _64_11, a layer up _ add _01, a layer add _02, a layer up _ conv _128_11, a layer up _ add _02 and a layer add _ 03;

35) the up _ Conv _16_11 is composed of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;

36) the up _ Conv _32_11 is composed of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;

37) the up _16_11 layer consists of a layer of UpSamplling 3D, and performs up-sampling operation on the output of the up _ conv _16_11 layer;

38) the add _01 layer consists of an add layer and performs addition fusion operation on the outputs of the up _16_11 layer and the up _ conv _32_11 layer;

39) the up _ Conv _64_11 layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;

40) the up _ add _01 layer consists of an UpSamplling 3D layer and performs UpSampling operation on the output of the add _01 layer;

41) the add _02 layer consists of an add layer and performs addition fusion operation on the outputs of the up _ add _01 layer and the up _ conv _64_11 layer;

42) the up _ Conv _128_11 layer consists of a Conv3D layer, and the three-dimensional convolution operation with the convolution kernel size of 1 is carried out on the output of the multi-scale residual error module in the decoder;

43) the up _ add _02 layer consists of an UpSamplling 3D layer and performs up-sampling operation on the output of the add _02 layer;

44) the add _03 layer consists of an add layer and performs addition fusion operation on the outputs of the up _ add _02 layer and the up _ conv _128_11 layer;

45) constructing an output layer, inputting the output of the last layer of the integrated learning branch structure into the output layer, wherein the output layer comprises a conv10 layer;

46) the Conv10 layer consists of a Conv3D layer, the three-dimensional convolution with the convolution kernel size of 1 and sigmoid activation operation are carried out on the output of the integrated learning branch structure, and a segmentation result is output;

47) designing an evaluation function using a two-classification Dice coefficient as a model, and calculating the Dice coefficient of an output result and a label;

48) the Dice coefficient is defined as:

g represents a label pixel value, P represents a prediction pixel value, the value range is a closed interval from 0 to 1, 1 is completely overlapped, and 0 is completely not overlapped;

49) designing a loss function using a Dice loss function as a model;

50) the Dice loss function is defined as: loss 1-Dice.

Further, the step 3 comprises:

1) dividing the data in the acquired ADNI data set into three groups of Alzheimer's disease, mild cognitive impairment and normal people according to disease states;

2) randomly selecting data from the three groups as a training data set on average, setting a random number as a parameter for randomly moving a slice to cut the original data under the condition of ensuring that no hippocampus is missed, and expanding the scale of the training data set through multiple cutting;

3) averagely selecting data from the three groups of residual data as a verification set;

4) the remaining data in the three groups served as test sets.

Further, the step 4 comprises:

1) sending the images and the labels of the training set and the verification set into a network for off-line training;

2) if the callback function is used for learning rate attenuation and model early-stop setting, the model can perform learning rate attenuation and training stop according to the reduction condition of the loss value of the verification set in the training process, and the model with the lowest loss value on the verification set is stored.

Further, the step 5 comprises:

1) the obtained Dice coefficient of the trained model on the verification set is 0.8379;

2) carrying out a three-dimensional image hippocampus segmentation test on the model by using the data in the test set, wherein the Dice coefficient obtained by the test set is 0.8269;

3) the segmentation results obtained by the model on the test set are shown in fig. 6.

In summary, the experimental result of the invention shows that the utilization rate of the network to the information of a plurality of dimensional features can be improved by designing the semantic segmentation network structure aiming at the structural characteristics of the hippocampus, so that the pixel dense prediction capability is improved, and the hippocampus segmentation performance is improved.

Claims

1. A hippocampus three-dimensional semantic network segmentation method based on a multi-scale feature multi-path attention fusion mechanism comprises the following steps:

step 1: acquiring magnetic resonance image data and preprocessing the magnetic resonance image data;

and step 3: training, verifying and testing data set partitioning;

and 5: testing the model, and evaluating a segmentation result;

in the step 2, a three-dimensional semantic segmentation network structure based on multi-scale feature extraction and a multi-path attention fusion mechanism is constructed, wherein the network comprises an encoder structure for extracting image feature information, a decoder structure for generating dense pixel prediction and a branch structure for forming ensemble learning;

constructing an input layer, wherein the input layer inputs data of the data set; constructing an encoder, wherein the encoder structure performs feature extraction on the output of an input layer by using a multi-scale residual error module and a pooling layer; constructing a decoder, wherein the decoder structure performs weighting processing on the characteristics of the channel and space dimensions respectively by using a channel attention module and a space attention module for the output of the encoder so as to highlight the relevant characteristics of the hippocampus in the characteristic diagram, and the characteristic diagrams in the encoder and the decoder are subjected to characteristic splicing and fusion through a jump connection structure; constructing an ensemble learning branch, wherein the ensemble learning branch structure uses a three-dimensional convolution layer with the convolution kernel size of 1 to perform two-classification output on the output of each layer of decoder and fuse the outputs layer by layer to generate a weak classification result and combine the weak classification result with the classification result of the decoder; constructing an output layer, and outputting the two types of segmentation results through an activation function by the output layer; the design uses the two-class Dice coefficient as the evaluation function of the model, which is defined as follows:

wherein G represents the tag pixel value and P represents the predicted pixel value; the design uses the Dice loss function as a model's loss function, which is defined as follows:

Loss＝1-Dice (2)。

2. the method for segmenting the hippocampus three-dimensional semantic network based on the multi-scale feature multi-path attention fusion mechanism according to claim 1, characterized in that in step 1, image data including hippocampus labels in a database are firstly obtained and merged left and right hippocampus, a hippocampus region is cut for an original image and a label, a slice including the hippocampus is selected from three directions of transverse, sagittal and coronal to cut the image and the label data, background information in the data is removed, and the cut image data is standardized.

3. The method for segmenting the three-dimensional semantic network of the hippocampus based on the multi-scale feature multi-path attention fusion mechanism according to claim 1, wherein in the step 3, the data in the obtained data set are divided into three groups, namely alzheimer disease, mild cognitive impairment and normal people according to disease states; randomly selecting data from the three groups as a training data set on average, setting a random number as a parameter for randomly moving a slice to cut the original data under the condition of ensuring that no hippocampus is missed, and cutting for multiple times can expand the scale of the training data set to play a role in data enhancement; and averaging to select data from three groups of residual data as a verification set, and taking the residual data in the three groups as a test set.

4. The method for segmenting the hippocampus based on the multi-scale feature multi-path attention fusion mechanism according to claim 1, wherein in the step 4, if the hyper-parameters in the training of the "learning rate decay" and "early-stop" callback function control model are used in the training of the model, the model reduces the learning rate and stops the training according to the change situation of the loss value of the verification set during the training, and the model with the lowest loss value on the verification set is saved.

5. The method for segmenting the hippocampus three-dimensional semantic network based on the multi-scale feature multi-path attention fusion mechanism according to claim 1, wherein in the step 5, the obtained model is used for carrying out segmentation test on the test set data, and the segmentation result is compared with the segmentation result of the classical network structure.