CN115147606B

CN115147606B - Medical image segmentation method, medical image segmentation device, computer equipment and storage medium

Info

Publication number: CN115147606B
Application number: CN202210914575.9A
Authority: CN
Inventors: 梅立锋; 吕孟叶
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2024-05-14
Anticipated expiration: 2042-08-01
Also published as: CN115147606A

Abstract

The application relates to a medical image segmentation method, a medical image segmentation device, computer equipment and a storage medium. The method comprises the following steps: inputting the three-dimensional medical image into the segmentation model; the segmentation model comprises an encoder and a decoder; performing feature extraction on the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes; performing up-sampling processing and aggregation processing on the feature images with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature image; classifying each pixel in the first aggregate feature map to obtain a classification result of each pixel; and dividing the three-dimensional medical image according to the pixels of which the classification result is the target class. By adopting the method, the precision and the efficiency of medical image segmentation can be improved.

Description

Medical image segmentation method, medical image segmentation device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to a method, an apparatus, a computer device, and a storage medium for segmenting medical images.

Background

With the development of medicine, there is a higher requirement for segmenting lesions in magnetic resonance images (Magnetic Resonance Imaging, MRI), and in clinical judgment, segmentation of medical images is generally manually performed by doctors according to their own experiences, requiring the doctors to have abundant clinical knowledge, so that the following problems inevitably exist: (1) Because of the large amount of medical image data, manual segmentation by doctors is inefficient. (2) The results of manual segmentation are mostly unrepeatable, and the results of multiple segmentations by the same physician may differ, and the segmentations are inaccurate.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a medical image segmentation method, apparatus, computer device, and computer-readable storage medium that can improve the segmentation accuracy and efficiency of medical images.

In a first aspect, the present application provides a method of segmentation of a medical image. The method comprises the following steps:

Inputting the three-dimensional medical image into the segmentation model; the segmentation model comprises an encoder and a decoder;

Performing feature extraction on the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes;

Performing up-sampling processing and aggregation processing on the feature images with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature image;

classifying each pixel in the first aggregate feature map to obtain a classification result of each pixel;

and dividing the three-dimensional medical image according to the pixels of which the classification result is the target class.

In one embodiment, the feature extraction module in the encoder includes a first feature extraction module, a second feature extraction module, a third feature extraction module, and a fourth feature extraction module; the step of extracting the features of the three-dimensional medical image through each feature extraction module in the encoder to obtain feature graphs with different sizes comprises the following steps:

Extracting a feature map of a first size from the three-dimensional medical image by the first feature extraction module;

After the three-dimensional medical image fusion is carried out on the feature images with the first size based on the first feature extraction module, the feature images with the second size are extracted from the fused feature images through the second feature extraction module;

after the three-dimensional medical image fusion is carried out on the feature images with the second size based on the second feature extraction module, the feature images with the third size are extracted from the fused feature images through the third feature extraction module;

and after the three-dimensional medical image fusion is carried out on the feature map with the third size based on the third feature extraction module, extracting a feature map with a fourth size from the fused feature map through the fourth feature extraction module.

In one embodiment, before the extracting, by the first feature extracting module, the feature map of the first size from the three-dimensional medical image, the method includes:

performing image embedding processing on the three-dimensional medical image through an image block embedding layer of the encoder to obtain an embedding diagram;

The extracting, by the first feature extraction module, a feature map of a first size from the three-dimensional medical image includes:

Performing dynamic position embedding on the embedded graph through a dynamic position embedding layer in the first feature extraction module to obtain a first dynamic feature graph;

carrying out local aggregation treatment on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module to obtain a second aggregation feature map;

And inputting the second aggregate feature map into a feedforward network layer in the first feature extraction module to perform image block representation to obtain a feature map of a first size.

In one embodiment, the feature fusion module in the decoder includes a first feature fusion module, a second feature fusion module, a third feature fusion module, and a fourth feature fusion module; the up-sampling processing and the aggregation processing of the feature graphs with corresponding sizes by the feature fusion module in the decoder comprise the following steps:

after global aggregation processing is carried out on the feature images with the fourth size through the first feature fusion module, up-sampling processing is carried out on the feature images after global aggregation through the second feature fusion module, and a first unfolding feature image is obtained;

After global aggregation processing is carried out on the first unfolding feature map based on the second feature fusion module, up-sampling processing is carried out on the feature map after global aggregation through the third feature fusion module, and a second unfolding feature map is obtained;

after the second unfolding feature map is subjected to local aggregation based on the third feature fusion module, the feature map subjected to local aggregation is subjected to upsampling through the fourth feature fusion module, so that a third unfolding feature map is obtained;

and carrying out local aggregation processing on the third unfolding feature map based on the fourth feature fusion module.

In one embodiment, the second feature fusion module includes a second aggregation module; the global aggregation processing of the first unfolding feature map based on the second feature fusion module comprises the following steps:

Performing position coding and normalization processing on the feature map of the third size through the second aggregation module, and performing position coding and normalization processing on the first unfolding feature map to obtain a preprocessing feature map of the third size and a preprocessed first unfolding feature map;

Inputting the preprocessing feature map with the third size into a first full-connection layer in the second aggregation module to obtain a query feature map;

inputting the preprocessed first unfolding feature map into a second full-connection layer in the second aggregation module to obtain a key feature map and a value feature map;

and carrying out global aggregation processing on the query feature map, the key feature map and the value feature map.

In one embodiment, the third feature fusion module includes a third aggregation module; the fourth feature fusion module comprises a fourth aggregation module; the local aggregation processing of the second unfolding feature map based on the third feature fusion module comprises the following steps:

Carrying out local aggregation processing on the feature map of the second size and the second unfolding feature map through the third aggregation module;

The local aggregation processing of the third unfolding feature map based on the fourth feature fusion module comprises the following steps:

And carrying out local aggregation processing on the feature map of the first size and the third unfolding feature map through the fourth aggregation module.

In a second aspect, the application further provides a medical image segmentation device. The device comprises:

The input module is used for inputting the three-dimensional medical image into the segmentation model; the segmentation model comprises an encoder and a decoder;

the feature extraction module is used for extracting features of the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes;

The up-sampling and aggregation module is used for carrying out up-sampling processing and aggregation processing on the feature images with corresponding sizes through the feature fusion module in the decoder to obtain a first aggregation feature image;

the classification module is used for classifying each pixel in the first aggregate feature map to obtain a classification result of each pixel;

And the segmentation module is used for segmenting the three-dimensional medical image according to the pixels of which the classification result is the target class.

In one embodiment, the feature extraction module in the encoder includes a first feature extraction module, a second feature extraction module, a third feature extraction module, and a fourth feature extraction module; the feature extraction module is further used for extracting a feature map of a first size from the three-dimensional medical image through the first feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the first size based on the first feature extraction module, the feature images with the second size are extracted from the fused feature images through the second feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the second size based on the second feature extraction module, the feature images with the third size are extracted from the fused feature images through the third feature extraction module; and after the three-dimensional medical image fusion is carried out on the feature map with the third size based on the third feature extraction module, extracting a feature map with a fourth size from the fused feature map through the fourth feature extraction module.

In one embodiment, before the feature extraction module extracts the feature map of the first size from the three-dimensional medical image, the feature extraction module is further configured to perform image embedding processing on the three-dimensional medical image through an image block embedding layer of the encoder to obtain an embedding map; the feature extraction module is further used for carrying out dynamic position embedding on the embedded graph through a dynamic position embedding layer in the first feature extraction module to obtain a first dynamic feature graph; carrying out local aggregation treatment on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module to obtain a second aggregation feature map; and inputting the second aggregate feature map into a feedforward network layer in the first feature extraction module to perform image block representation to obtain a feature map of a first size.

In one embodiment, the feature fusion module in the decoder includes a first feature fusion module, a second feature fusion module, a third feature fusion module, and a fourth feature fusion module; the up-sampling and aggregation module is further configured to perform global aggregation processing on the feature map of the fourth size through the first feature fusion module, and perform up-sampling processing on the feature map after global aggregation through the second feature fusion module, so as to obtain a first unfolding feature map; after global aggregation processing is carried out on the first unfolding feature map based on the second feature fusion module, up-sampling processing is carried out on the feature map after global aggregation through the third feature fusion module, and a second unfolding feature map is obtained; after the second unfolding feature map is subjected to local aggregation based on the third feature fusion module, the feature map subjected to local aggregation is subjected to upsampling through the fourth feature fusion module, so that a third unfolding feature map is obtained; and carrying out local aggregation processing on the third unfolding feature map based on the fourth feature fusion module.

In one embodiment, the second feature fusion module includes a second aggregation module; the up-sampling and aggregation module is further configured to perform a position coding process and a normalization process on the feature map of the third size through the second aggregation module, and perform a position coding process and a normalization process on the first expansion feature map, so as to obtain a preprocessed feature map of the third size and a preprocessed first expansion feature map; inputting the preprocessing feature map with the third size into a first full-connection layer in the second aggregation module to obtain a query feature map; inputting the preprocessed first unfolding feature map into a second full-connection layer in the second aggregation module to obtain a key feature map and a value feature map; and carrying out global aggregation processing on the query feature map, the key feature map and the value feature map.

In one embodiment, the third feature fusion module includes a third aggregation module; the fourth feature fusion module comprises a fourth aggregation module; the up-sampling and aggregation module is further configured to locally aggregate, by using the third aggregation module, the feature map of the second size and the second expansion feature map; the up-sampling and aggregation module is further configured to locally aggregate the feature map of the first size and the third expansion feature map through the fourth aggregation module.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the above method.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above method.

The above medical image segmentation method, apparatus, computer device and storage medium by inputting a three-dimensional medical image into a segmentation model; the segmentation model includes an encoder and a decoder; performing feature extraction on the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes; performing up-sampling processing and aggregation processing on the feature images with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature image; classifying each pixel in the first aggregate feature map to obtain a classification result of each pixel; and dividing the three-dimensional medical image according to the pixels of which the classification result is the target class. The medical image is automatically segmented through the designed segmentation model, and the accuracy and efficiency of the segmented image are improved.

Drawings

FIG. 1 is an application environment diagram of a segmentation method of a medical image according to one embodiment;

FIG. 2 is a flow chart of a method of segmentation of a medical image according to one embodiment;

FIG. 3 is a schematic diagram of a segmentation model in one embodiment;

FIG. 4 is a visual result of a predicted tumor segmentation in one embodiment;

FIG. 5 is a flow chart of a feature extraction step in one embodiment;

FIG. 6 is a block diagram of a device for segmenting a medical image according to one embodiment;

Fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The medical image segmentation method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The present application may be executed by the terminal 102 or the server 104, and this embodiment is described by taking the execution of the terminal 102 as an example.

The terminal 102 inputs the three-dimensional medical image to the segmentation model; the segmentation model includes an encoder and a decoder; the terminal 102 performs feature extraction on the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes; the terminal 102 performs up-sampling processing and aggregation processing on the feature graphs with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature graph; the terminal 102 classifies each pixel in the first aggregate feature map to obtain a classification result of each pixel; the terminal 102 segments the three-dimensional medical image according to the pixels of the classification result as the target class.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a method for segmenting a medical image is provided, and the method is applied to the terminal 102 in fig. 1 for illustration, and includes the following steps:

s202, inputting a three-dimensional medical image into a segmentation model; the segmentation model includes an encoder and a decoder.

Wherein the three-dimensional medical image may refer to a multi-modal magnetic resonance image. The segmentation model may refer to an image used for segmenting the medical image. FIG. 3 is a schematic diagram of a segmentation model in one embodiment; as shown, the encoder may be the left half of the segmentation model and the decoder may be the right half of the segmentation model. The encoder may include an image block embedding layer in the segmentation model and a feature extraction module. The feature extraction module comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and a fourth feature extraction module. The first feature extraction module comprises two local correlation aggregation modules and an image block fusion module; the second feature extraction module comprises two local correlation aggregation modules and an image block fusion module; the third feature extraction module comprises two global correlation aggregation modules and an image block fusion module; the fourth feature extraction module includes a global correlation aggregation module. The decoder may include a feature fusion module, an image block unfolding module, and a classification layer in the segmentation model. The feature fusion module comprises a first feature fusion module, a second feature fusion module, a third feature fusion module and a fourth feature fusion module. The first feature fusion module comprises a global correlation aggregation module; the second feature fusion module comprises two global correlation aggregation modules and an image block unfolding module; the third feature fusion module comprises two local correlation aggregation modules and an image block unfolding module; the fourth feature fusion module comprises two local correlation aggregation modules and an image block unfolding module.

Specifically, the terminal may input the three-dimensional medical image uploaded by the user to the segmentation model in response to a segmentation instruction of the medical image; the terminal may also input a three-dimensional medical image stored by itself to the segmentation model.

S204, extracting the characteristics of the three-dimensional medical image through each characteristic extraction module in the encoder to obtain characteristic diagrams with different sizes.

The feature maps of different sizes may include a feature map of a first size, a feature map of a second size, a feature map of a third size, and a feature map of a fourth size.

Specifically, the terminal performs feature extraction on the three-dimensional medical image through a first feature extraction module in the encoder to obtain a feature map of a first size, performs feature extraction on the three-dimensional medical image through the first feature extraction module in the encoder to obtain a feature map of the first size, performs feature extraction on the feature map of the first size through a second feature extraction module in the encoder to obtain a feature map of a second size, and performs feature extraction on the feature map of the second size through a third feature extraction module in the encoder to obtain a feature map of a third size.

S206, carrying out up-sampling processing and aggregation processing on the feature graphs with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature graph.

The first aggregate feature map may refer to a feature map obtained by performing upsampling and aggregation processing on a feature map with a corresponding size by using a feature fusion module.

Specifically, after global aggregation processing is performed on the feature images of the fourth size by the terminal through a global correlation aggregation module of the first feature fusion module, up-sampling processing is performed on the feature images after global aggregation through an image block unfolding module in the second feature fusion module, so as to obtain a first unfolding feature image; after global aggregation processing is carried out on the first unfolding feature map based on the global correlation aggregation module in the second feature fusion module, the image block unfolding module in the third feature fusion module carries out up-sampling processing on the feature map after global aggregation, and a second unfolding feature map is obtained; after the second unfolding feature map is subjected to local aggregation processing based on the local correlation aggregation module in the third feature fusion module, the image block unfolding module in the fourth feature fusion module is used for carrying out upsampling processing on the feature map subjected to local aggregation to obtain a third unfolding feature map; and carrying out local aggregation processing on the third unfolding feature map based on a local correlation aggregation module in the fourth feature fusion module.

The first expansion feature map may refer to a feature map obtained by performing up-sampling processing on a corresponding feature map by an image block expansion module in the second feature fusion module. The second unfolding feature map may refer to a feature map obtained by performing upsampling processing on the corresponding feature map by using an image block unfolding module in the third feature fusion module. The third expansion feature map may refer to a feature map obtained by up-sampling the corresponding feature map by an image block expansion module in the fourth feature fusion module.

In one embodiment, performing upsampling processing on the globally aggregated feature map by using the image block expansion module in the second feature fusion module to obtain the first expanded feature map includes performing upsampling processing on the globally aggregated feature map by using the image block expansion module in the second feature fusion module according to the first convolution to obtain the first expanded feature map.

Wherein the first transposed convolution, the second transposed convolution, and the third transposed convolution are different transposed convolutions.

In one embodiment, the second feature fusion module comprises a second aggregation module; the second aggregation module may be a global correlation aggregation module, and performing global aggregation processing on the first expansion feature map based on the second feature fusion module includes: the terminal performs position coding and normalization processing on the feature map of the third size through the second aggregation module, and performs position coding and normalization processing on the first unfolding feature map to obtain a preprocessing feature map of the third size and a preprocessed first unfolding feature map; inputting the preprocessing feature map with the third size into a first full-connection layer in a second aggregation module to obtain a query feature map; inputting the preprocessed first unfolding feature map into a second full-connection layer in a second aggregation module to obtain a key feature map and a value feature map; and carrying out global aggregation processing on the query feature map, the key feature map and the value feature map.

The preprocessing feature map of the third size may refer to a feature map of the third size after performing position encoding and normalization processing. The preprocessed first expansion feature map may refer to a feature map after the first expansion feature map is subjected to the position coding and normalization processing. The first full-link layer may refer to a full-link layer for processing the pre-processed feature map of the third size. The query feature map may refer to a feature map output after the preprocessing feature map of the third size is processed by the first full-connection layer. The second full connection layer may refer to a full connection layer for processing the preprocessed first unfolded feature map. The key feature map and the value feature map may refer to feature maps output after the preprocessed first expansion feature map is subjected to the second full-connection processing.

For example, as shown in fig. 3, the third size of the feature map (F3) is [320,8,8,8], the first expansion feature map is [320,8,8,8], and the first expansion feature map is input into a special global correlation aggregation module (a second aggregation module), in the module, first, F3 and the first expansion feature map are respectively subjected to dynamic position coding x=dpe (X _in)+X_in), then are respectively flattened into image block sequences and are subjected to matrix transformation to obtain two [512,320] arrays, are respectively subjected to regularization, then are input into a Cross self-Attention mechanism (Cross-Attention), the [512,320] corresponding to F3 is obtained by a query feature map with a size of [512,320] through a full connection layer of 320, the first expansion feature map is obtained by a query feature map with a size of [512,640] through a full connection layer of 640, and are divided into two arrays with a size of [512,320,320] as (y) K, namely, a key feature map, (value) kev, and then are subjected to regularization, the corresponding [512,320] of F3 is obtained by a query feature map with a full connection layer of 320, and the global correlation of the first and second expansion feature map is obtained by a key feature map, and the first expansion feature map is obtained by a global correlation scheme.

In one embodiment, performing upsampling processing on the globally aggregated feature map by using the image block expansion module in the third feature fusion module to obtain the second expanded feature map includes performing upsampling processing on the globally aggregated feature map by using the image block expansion module in the third feature fusion module according to the second transpose convolution to obtain the second expanded feature map.

In one embodiment, performing upsampling processing on the partially aggregated feature map by using the image block expansion module in the fourth feature fusion module to obtain a third expanded feature map includes performing upsampling processing on the partially aggregated feature map by using the image block expansion module in the fourth feature fusion module according to third transpose convolution to obtain the third expanded feature map.

In one embodiment, the third feature fusion module comprises a third aggregation module; the third aggregation module may be a local correlation aggregation module; the fourth feature fusion module comprises a fourth aggregation module; the fourth aggregation module may be a local correlation aggregation module; the local aggregation processing of the second unfolding feature map based on the third feature fusion module comprises the following steps: the terminal carries out local aggregation processing on the feature map of the second size and the second unfolding feature map through a third aggregation module; the local aggregation processing of the third unfolding feature map based on the fourth feature fusion module comprises the following steps: and the terminal carries out local aggregation processing on the feature map of the first size and the third unfolding feature map through a fourth aggregation module.

In one embodiment, the local aggregation processing of the second-size feature map and the second expansion feature map by the third aggregation module includes normalizing the second-size feature map by the third aggregation module to obtain a normalized second-size feature map, connecting the normalized second-size feature map and the second expansion feature map according to a first preset channel to obtain a first connection feature map, and performing local aggregation processing on the first connection feature map.

The preset channel may refer to a preset channel, and the first preset channel and the second preset channel are different preset channels. The first connection feature map may refer to a normalized second-size feature map and a second-expansion feature map connected feature map.

In one embodiment, the locally aggregating the feature map of the first size and the third unfolded feature map by the fourth aggregation module includes normalizing the feature map of the first size by the fourth aggregation module to obtain a normalized feature map of the first size, connecting the normalized feature map of the first size and the third unfolded feature map according to a second preset channel to obtain a second connection feature map, and locally aggregating the second connection feature map.

In one embodiment, after S206, the terminal performs an upsampling process on the first aggregate feature map by using the image block unfolding module to obtain a fourth unfolding feature map for classifying pixels.

The fourth expansion feature map may refer to a feature map output after the last image block expansion module in the decoder of the segmentation model performs processing.

S208, classifying each pixel in the first aggregation feature map to obtain a classification result of each pixel.

The classification result may refer to whether a pixel in the three-dimensional medical image is a result of a target class, among other things.

Specifically, the terminal may determine a classification category first, and classify each pixel in the first aggregate feature map according to the classification category, so as to obtain a classification result of each pixel.

The classification categories include non-target categories and target categories, and the classification categories may refer to preset pixel categories. For example, where the non-target class is a non-enhanced tumor, the target tumor is an enhanced tumor; when the non-target class is a non-tumor core, the target tumor is a tumor core; when the non-target class is a non-intact tumor, the target tumor is an intact tumor.

In one embodiment, the terminal classifies each pixel in the fourth expansion feature map to obtain a classification result of each pixel.

And S210, dividing the three-dimensional medical image according to the pixels of which the classification result is the target class.

Specifically, the terminal judges whether the classification result of the pixels corresponding to the three-dimensional medical image is a target class or not in sequence, and after all the pixels are judged, the three-dimensional medical image is segmented according to the range formed by the pixels with the classification result being the target class, so that the segmented three-dimensional medical image is obtained.

In the above-described medical image segmentation method, a three-dimensional medical image is input into a segmentation model; the segmentation model includes an encoder and a decoder; performing feature extraction on the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes; performing up-sampling processing and aggregation processing on the feature images with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature image; classifying each pixel in the first aggregate feature map to obtain a classification result of each pixel; and dividing the three-dimensional medical image according to the pixels of which the classification result is the target class. The medical image is automatically segmented through the designed segmentation model, and the accuracy and efficiency of the segmented image are improved.

In one embodiment, as shown in fig. 5, the feature extraction step includes:

s502, extracting a feature map of a first size from the three-dimensional medical image through a first feature extraction module.

In one embodiment, before S502, the terminal may perform image embedding processing on the three-dimensional medical image through an image block embedding layer of the encoder to obtain an embedding map.

Wherein, the image block embedding layer can be used for carrying out image embedding processing on the medical image. The embedding map may refer to an image after the image embedding process is performed.

Specifically, the terminal performs dynamic position embedding on the embedded graph through a dynamic position embedding layer in the first feature extraction module to obtain a first dynamic feature graph; local aggregation treatment is carried out on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module, and a second aggregation feature map is obtained; and inputting the second polymeric feature map into a feedforward network layer in the first feature extraction module to perform image block characterization to obtain a feature map of a first size.

The dynamic position embedding layer can be used for carrying out dynamic position embedding on the image, and the dynamic position embedding can be realized through zero-filled depth convolution position coding and is applicable to any resolution. The first dynamic profile may refer to a profile that is output after the dynamic location is embedded. The first dynamic feature map, the second dynamic feature map, and the third dynamic feature map are different dynamic feature maps. The multi-head correlation aggregation layer can realize the context coding and the correlation learning of the image, and the multi-head refers to dividing the image block into a plurality of groups, and each head respectively processes the information of one group of channels. The second aggregation feature map may refer to a feature map output after the first dynamic feature map is subjected to local aggregation processing. The feed-forward network layer may be used to further characterize the image.

S504, after the three-dimensional medical image fusion is carried out on the feature images with the first size based on the first feature extraction module, the feature images with the second size are extracted from the fused feature images through the second feature extraction module.

In one embodiment, performing three-dimensional medical image fusion on the feature map of the first size based on the first feature extraction module includes performing downsampling on the feature map of the first size based on an image block fusion module in the first feature extraction module to obtain a first fused feature map.

The first fused feature map may refer to a feature map after the feature map of the first size is subjected to downsampling.

In one embodiment, extracting, by the second feature extraction module, the feature map of the second size from the fused feature map includes the terminal performing dynamic position embedding on the embedded map by a dynamic position embedding layer in the second feature extraction module to obtain a second dynamic feature map; carrying out local polymerization treatment on the second dynamic feature map through a multi-head correlation polymerization layer in the second feature extraction module to obtain a third polymerization feature map; and inputting the third polymerization feature map into a feedforward network layer in a second feature extraction module to perform image block characterization to obtain a feature map of a second size.

The third feature map may refer to a feature map output after the second dynamic feature map is locally aggregated.

S506, after the feature images of the second size are fused based on the second feature extraction module, the feature images of the third size are extracted from the fused feature images through the third feature extraction module.

In one embodiment, performing three-dimensional medical image fusion on the feature map of the second size based on the second feature extraction module includes performing downsampling on the feature map of the second size based on the image block fusion module in the second feature extraction module to obtain a second fused feature map.

The first fused feature map may refer to a feature map after the feature map of the second size is subjected to the downsampling process.

In one embodiment, extracting, by the third feature extraction module, the feature map of the third size from the fused feature map includes the terminal performing dynamic position embedding on the embedded map by a dynamic position embedding layer in the third feature extraction module to obtain a third dynamic feature map; global aggregation processing is carried out on the third dynamic feature map through a multi-head correlation aggregation layer in the third feature extraction module, and a fourth aggregation feature map is obtained; and inputting the fourth aggregate feature map into a feedforward network layer in a third feature extraction module to perform image block characterization to obtain a feature map of a third size.

The fourth aggregate feature map may refer to a feature map output after the third dynamic feature map is locally aggregated.

S508, after the three-dimensional medical image fusion is carried out on the feature images with the third size based on the third feature extraction module, the feature images with the fourth size are extracted from the fused feature images through the fourth feature extraction module.

In one embodiment, performing three-dimensional medical image fusion on the feature map of the third size based on the third feature extraction module includes performing downsampling on the feature map of the third size based on the image block fusion module in the third feature extraction module to obtain a third fused feature map.

The first fused feature map may refer to a feature map after the feature map of the third size is subjected to the downsampling process.

In one embodiment, extracting, by the fourth feature extraction module, the feature map of the fourth size from the fused feature map includes the terminal performing dynamic position embedding on the embedded map by a dynamic position embedding layer in the fourth feature extraction module to obtain a fourth dynamic feature map; performing global aggregation treatment on the fourth dynamic feature map through a multi-head correlation aggregation layer in the fourth feature extraction module to obtain a fifth aggregation feature map; and inputting the fifth aggregate feature map into a feedforward network layer in a fourth feature extraction module to perform image block characterization to obtain a feature map of a fourth size.

The fifth aggregation feature map may refer to a feature map output after the fourth dynamic feature map is locally aggregated.

In this embodiment, the first feature extraction module is used to extract the feature map of the first size from the three-dimensional medical image, after the three-dimensional medical image fusion is performed on the feature map of the first size based on the first feature extraction module, the second feature extraction module is used to extract the feature map of the second size from the fused feature map, the third feature extraction module is used to extract the feature map of the third size from the fused feature map after the three-dimensional medical image fusion is performed on the feature map of the third size based on the third feature extraction module, and the fourth feature extraction module is used to extract the feature map of the fourth size from the fused feature map, so that the feature maps of different sizes can be accurately extracted, and further the following decoding steps are well paved.

As an example, the present embodiment is as follows

In some embodiments, the implementation method of the 3D image block embedding layer is: in three-dimensional imagesAs input, a learning image block embedding function f (·) is designed for the model, so that x is taken as input to obtain image characteristicsF (·) is the convolution kernel size k x k, step s, padding is a three-dimensional convolution operation of p. The number of channels of the image feature f (x) is the embedding dimension Dim, and the depth, height and width are respectively: : F (x) is then flattened into a dhw xdim image block sequence and normalized by a layer normalization method (Layer Normalization, LN) for input into a subsequent module.

The relevance aggregation module consists of three key components: dynamic location embedding (Dynamic Position Embedding, DPE), multi-headed correlation aggregation (MHRA), feed-Forward Network (FFN). For input X _in∈R^C×D×H×W, firstly, dynamic position embedding is introduced to dynamically integrate position information into the image block sequence obtained above, the dynamic position embedding generally uses zero-filled depth convolution position coding, the method is suitable for arbitrary input resolution, and can make full use of feature sequences to perform better visual recognition, and the expression is as follows: x=dpe (X _in)+X_in; context coding and dependency learning of image blocks is then achieved using multi-headed dependency aggregation, which refers to dividing the image blocks into groups, each head processing information of a set of channels separately, local dependency aggregation is usually achieved using a large convolution kernel, global dependency aggregation is usually achieved using a self-attention mechanism expressed as y=mhra (Norm (X)) +x, and finally, similar to a Transformer structure, we add a feed-forward network to further characterize the image blocks expressed as z=ffn (Norm (Y)) +y.

The image block fusion and expansion module can be realized by using a convolution layer or a full connection layer, and the function of the image block fusion and expansion module is dimension increase or dimension reduction of image characteristics.

Because of the memory limitations, the images are typically trimmed to size, such as 128 x 128, during model training. In the model reasoning stage, a sliding window strategy with 50% overlapping degree is adopted, the image area is selected by sequentially sliding the windows according to the size of the training input as the window size, and the prediction of the overlapping area is weighted and averaged by Gaussian importance, so that the weight of the central voxel has higher importance.

A multi-modality MRI image of size 128 x 4 is input into the segmentation model, the 3D image block embedding module uses kernel size 4 x 4, step size 4 x 4, padding to 0, the depth convolution with a channel number of 64 embeds the input image in dynamic position and the normalization of the data is performed on the data by a regularized LN layer immediately following the depth convolution, which is converted into data with a mean of 0 and a variance of 1 to speed up the training speed of the network.

The 3D image block embedding module outputs a 32 x 64 feature map, which is then input to two local correlation aggregation modules, the local correlation aggregation is carried out on the features, and the output feature graph is marked as F1. F1 is firstly input into a 3D image block fusion module for characteristic downsampling yields a feature map of 16 x 128, each 3D image block fusion module uses convolutions of kernel size and step size of 2 x 2. In addition, the downsampling convolution is followed by a regularized LN layer. And then, the step is similar to the previous step, the feature map obtained by downsampling is input into two local correlation aggregation modules to obtain a feature map which is marked as F2, and the feature downsampling is carried out on the F2 in a 3D image block fusion module to obtain the feature map of 8 multiplied by 320. And then inputting the feature map into two global correlation aggregation modules to perform global feature modeling, wherein the output feature map is marked as F3. F3 is subjected to feature downsampling to obtain a 4 x 512 feature map, is input into two global correlation aggregation modules, the output result is input into a 3D image block unfolding module for feature up-sampling, the module uses a transpose convolution of kernel size 2 x 2, step size 2 x 2, padding 0, channel number 320 to achieve upsampling, and a regularized LN layer immediately following the transpose convolution normalizes the data,

The feature map of 8×8×320 is output and is input into two special global correlation aggregation modules together with the LN normalized F3, and the modules do not use a self-attention mechanism to do global correlation aggregation, but use the particularity of a transducer structure to aggregate the features of the layer jump connection in a cross-attention mode. Then the output result is subjected to feature up-sampling in a 3D image block expansion module, the resolution of the feature map is changed into 16 multiplied by 128 through transposition convolution operation, the normalized F2 and the output of the 3D image block expansion module are connected (Concatenate) according to channels to obtain the feature map with the size of 16 multiplied by 256, then input to the kernel with the size of 3 x 3, the step size of 1 x 1, the filling of 1 x 1, in a convolution with a channel number of 128, feature fusion is achieved while reducing the channel number dimension, to make the convolutionally extracted features more efficient, a GeLU nonlinear activation function is added to the LN layer to improve network performance.

The latter step is similar to the previous, the last 3D image block expansion module uses a transpose convolution of kernel size 4 x 4, step size 4 x 4, padding 0, channel number 64 to effect upsampling, restore the image to 128 x 64, after LN normalization, the kernel size is 1 multiplied by 1, the step size is 1 multiplied by 1, the convolution with a channel number of 3 (partition category 3) generates a partition mask, the output image size is 128×128×128×3.

Of the 128 x 3 images output, 3 is the number of categories of the tumor, that is, each type of tumor has a corresponding array of [128,128,128] size (image matrix), the internal value is only 0/1, each value in this array corresponds to a pixel value of the image, and if one of the pixels is located in a region belonging to this category of tumor, the value is 1, and if not, 0.

The segmentation model may be trained using an adaptive gradient method (AdamW) to optimize model parameters until convergence based on cross entropy loss and a collective similarity measure (Dice) loss function.

FIG. 4 is a visual result of a predicted tumor segmentation in one embodiment; as shown, T2 Image in the first left column refers to an undivided medical Image, ground Truth in the second left column refers to an artificially segmented medical Image, proposed in the third left column refers to a medical Image segmented by the segmentation model of the present application, dice is 90.45, UNet in the fourth left column refers to a medical Image segmented by the UNet-based segmentation model, and Dice is 84.72. From the figure, the segmentation accuracy of the segmentation model of the application is higher than that of the segmentation model based on UNet.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a medical image segmentation device for realizing the above related medical image segmentation method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for segmenting one or more medical images provided below may be referred to the limitation of the method for segmenting medical images above, which is not repeated here.

In one embodiment, as shown in fig. 6, there is provided a medical image segmentation apparatus including: an input module 602, a feature extraction module 604, an up-sampling and aggregation module 606, a classification module 608, and a segmentation module 610, wherein:

an input module 602 for inputting a three-dimensional medical image to the segmentation model; the segmentation model includes an encoder and a decoder;

The feature extraction module 604 is configured to perform feature extraction on the three-dimensional medical image through each feature extraction module in the encoder, so as to obtain feature graphs with different sizes;

The up-sampling and aggregation module 606 is configured to perform up-sampling processing and aggregation processing on feature graphs with corresponding sizes through a feature fusion module in the decoder, so as to obtain a first aggregated feature graph;

the classification module 608 is configured to classify each pixel in the first aggregate feature map to obtain a classification result of each pixel;

The segmentation module 610 is configured to segment the three-dimensional medical image according to the classification result as the pixel of the target class.

In one embodiment, the feature extraction module in the encoder includes a first feature extraction module, a second feature extraction module, a third feature extraction module, and a fourth feature extraction module; the feature extraction module 604 is further configured to extract, by the first feature extraction module, a feature map of a first size from the three-dimensional medical image; after the three-dimensional medical image fusion is carried out on the feature images with the first size based on the first feature extraction module, the feature images with the second size are extracted from the fused feature images through the second feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the second size based on the second feature extraction module, the feature images with the third size are extracted from the fused feature images through the third feature extraction module; and after the three-dimensional medical image fusion is carried out on the feature images with the third size based on the third feature extraction module, extracting the feature images with the fourth size from the fused feature images through the fourth feature extraction module.

In one embodiment, before extracting the feature map of the first size from the three-dimensional medical image by the first feature extraction module, the feature extraction module 604 is further configured to perform image embedding processing on the three-dimensional medical image by an image block embedding layer of the encoder to obtain an embedded map; the feature extraction module 604 is further configured to perform dynamic position embedding on the embedded graph through a dynamic position embedding layer in the first feature extraction module, so as to obtain a first dynamic feature graph; local aggregation treatment is carried out on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module, and a second aggregation feature map is obtained; and inputting the second polymeric feature map into a feedforward network layer in the first feature extraction module to perform image block characterization to obtain a feature map of a first size.

In one embodiment, the feature fusion module in the decoder includes a first feature fusion module, a second feature fusion module, a third feature fusion module, and a fourth feature fusion module; the upsampling and aggregating module 606 is further configured to perform global aggregation processing on the feature map of the fourth size through the first feature fusion module, and perform upsampling processing on the feature map after global aggregation through the second feature fusion module, so as to obtain a first unfolded feature map; after global aggregation processing is carried out on the first unfolding feature map based on the second feature fusion module, up-sampling processing is carried out on the feature map after global aggregation through the third feature fusion module, and a second unfolding feature map is obtained; after the second unfolding feature map is subjected to local aggregation processing based on the third feature fusion module, the feature map subjected to local aggregation is subjected to upsampling processing through the fourth feature fusion module, so that the third unfolding feature map is obtained; and carrying out local aggregation processing on the third unfolding feature map based on the fourth feature fusion module.

In one embodiment, the second feature fusion module comprises a second aggregation module; the upsampling and aggregating module 606 is further configured to perform a position encoding process and a normalization process on the feature map of the third size through the second aggregation module, and perform a position encoding process and a normalization process on the first expansion feature map, so as to obtain a preprocessed feature map of the third size and a preprocessed first expansion feature map; inputting the preprocessing feature map with the third size into a first full-connection layer in a second aggregation module to obtain a query feature map; inputting the preprocessed first unfolding feature map into a second full-connection layer in a second aggregation module to obtain a key feature map and a value feature map; and carrying out global aggregation processing on the query feature map, the key feature map and the value feature map.

In one embodiment, the third feature fusion module includes a third aggregation module; the fourth feature fusion module comprises a fourth aggregation module; the upsampling and aggregating module 606 is further configured to locally aggregate the second size feature map and the second unfolding feature map through a third aggregation module; the up-sampling and aggregation module is further configured to locally aggregate the feature map of the first size and the third expansion feature map through the fourth aggregation module.

By inputting a three-dimensional medical image into the segmentation model; the segmentation model includes an encoder and a decoder; performing feature extraction on the three-dimensional medical image through each feature extraction module in the encoder to obtain feature images with different sizes; performing up-sampling processing and aggregation processing on the feature images with corresponding sizes through a feature fusion module in the decoder to obtain a first aggregation feature image; classifying each pixel in the first aggregate feature map to obtain a classification result of each pixel; and dividing the three-dimensional medical image according to the pixels of which the classification result is the target class. The medical image is automatically segmented through the designed segmentation model, and the accuracy and efficiency of the segmented image are improved.

The respective modules in the above-described medical image segmentation apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal or a server, and the embodiment uses the computer device as a terminal, and the internal structure diagram may be shown in fig. 7. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of segmentation of medical images. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided that includes a memory having a computer program stored therein and a processor that implements the above embodiments when the processor executes the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the above embodiments.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of segmenting a medical image, the method comprising:

Inputting the three-dimensional medical image into the segmentation model; the segmentation model comprises an encoder and a decoder; the feature extraction module in the encoder comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and a fourth feature extraction module, wherein the first feature extraction module comprises a local correlation aggregation module and an image block fusion module, the second feature extraction module comprises a local correlation aggregation module and an image block fusion module, the third feature extraction module comprises a global correlation aggregation module and an image block fusion module, the fourth feature extraction module comprises a global correlation aggregation module, and the correlation aggregation module comprises a dynamic position embedding layer, a multi-head correlation aggregation layer and a feedforward network layer; the feature fusion module in the decoder comprises a first feature fusion module, a second feature fusion module, a third feature fusion module and a fourth feature fusion module, wherein the second feature fusion module comprises a second aggregation module; the multi-head correlation aggregation layer is realized by using a large convolution kernel when local correlation aggregation is carried out, and is realized by using a self-attention mechanism when global correlation aggregation is carried out;

Extracting a feature map of a first size from the three-dimensional medical image by the first feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the first size based on the first feature extraction module, the feature images with the second size are extracted from the fused feature images through the second feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the second size based on the second feature extraction module, the feature images with the third size are extracted from the fused feature images through the third feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the third size based on the third feature extraction module, a feature image with a fourth size is extracted from the fused feature images through the fourth feature extraction module;

After global aggregation processing is carried out on the feature images with the fourth size through the first feature fusion module, up-sampling processing is carried out on the feature images after global aggregation through the second feature fusion module, and a first unfolding feature image is obtained; after global aggregation processing is carried out on the first unfolding feature map based on the second feature fusion module, up-sampling processing is carried out on the feature map after global aggregation through the third feature fusion module, and a second unfolding feature map is obtained; after the second unfolding feature map is subjected to local aggregation based on the third feature fusion module, the feature map subjected to local aggregation is subjected to upsampling through the fourth feature fusion module, so that a third unfolding feature map is obtained; carrying out local aggregation treatment on the third unfolding feature map based on the fourth feature fusion module to obtain a first aggregation feature map;

Dividing the three-dimensional medical image according to the pixels of which the classification result is the target class;

the global aggregation processing of the first unfolding feature map based on the second feature fusion module comprises the following steps: performing position coding and normalization processing on the feature map of the third size through the second aggregation module, and performing position coding and normalization processing on the first unfolding feature map to obtain a preprocessing feature map of the third size and a preprocessed first unfolding feature map; inputting the preprocessing feature map with the third size into a first full-connection layer in the second aggregation module to obtain a query feature map; inputting the preprocessed first unfolding feature map into a second full-connection layer in the second aggregation module to obtain a key feature map and a value feature map; and carrying out global aggregation processing on the query feature map, the key feature map and the value feature map.

2. The method of claim 1, wherein the performing, by the second feature fusion module, the upsampling of the globally aggregated feature map to obtain a first unfolded feature map comprises:

And performing up-sampling processing on the globally aggregated feature map according to a first transfer convolution by an image block unfolding module in the second feature fusion module to obtain a first unfolding feature map.

3. The method of claim 1, wherein prior to extracting a feature map of a first size from the three-dimensional medical image by the first feature extraction module, the method comprises:

4. The method of claim 1, wherein the performing, by the third feature fusion module, upsampling the global aggregated feature map to obtain a second unfolded feature map includes:

And performing up-sampling processing on the globally aggregated feature map according to the second transpose convolution by an image block expansion module in the third feature fusion module to obtain a second expansion feature map.

5. The method of claim 1, wherein the performing, by the fourth feature fusion module, upsampling the locally aggregated feature map to obtain a third unfolded feature map includes:

and performing up-sampling processing on the partially aggregated feature map according to third transpose convolution by an image block expansion module in the fourth feature fusion module to obtain a third expansion feature map.

6. The method of claim 1, wherein the third feature fusion module comprises a third aggregation module; the fourth feature fusion module comprises a fourth aggregation module; the local aggregation processing of the second unfolding feature map based on the third feature fusion module comprises the following steps:

7. A medical image segmentation apparatus, the apparatus comprising:

The input module is used for inputting the three-dimensional medical image into the segmentation model; the segmentation model comprises an encoder and a decoder; the feature extraction module in the encoder comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and a fourth feature extraction module, wherein the first feature extraction module comprises a local correlation aggregation module and an image block fusion module, the second feature extraction module comprises a local correlation aggregation module and an image block fusion module, the third feature extraction module comprises a global correlation aggregation module and an image block fusion module, the fourth feature extraction module comprises a global correlation aggregation module, and the correlation aggregation module comprises a dynamic position embedding layer, a multi-head correlation aggregation layer and a feedforward network layer; the feature fusion module in the decoder comprises a first feature fusion module, a second feature fusion module, a third feature fusion module and a fourth feature fusion module, wherein the second feature fusion module comprises a second aggregation module; the multi-head correlation aggregation layer is realized by using a large convolution kernel when local correlation aggregation is carried out, and is realized by using a self-attention mechanism when global correlation aggregation is carried out;

The feature extraction module is used for extracting a feature map with a first size from the three-dimensional medical image through the first feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the first size based on the first feature extraction module, the feature images with the second size are extracted from the fused feature images through the second feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the second size based on the second feature extraction module, the feature images with the third size are extracted from the fused feature images through the third feature extraction module; after the three-dimensional medical image fusion is carried out on the feature images with the third size based on the third feature extraction module, a feature image with a fourth size is extracted from the fused feature images through the fourth feature extraction module;

The up-sampling and aggregation module is used for performing global aggregation processing on the feature images with the fourth size through the first feature fusion module, and performing up-sampling processing on the feature images after global aggregation through the second feature fusion module to obtain a first unfolding feature image; after global aggregation processing is carried out on the first unfolding feature map based on the second feature fusion module, up-sampling processing is carried out on the feature map after global aggregation through the third feature fusion module, and a second unfolding feature map is obtained; after the second unfolding feature map is subjected to local aggregation based on the third feature fusion module, the feature map subjected to local aggregation is subjected to upsampling through the fourth feature fusion module, so that a third unfolding feature map is obtained; carrying out local aggregation treatment on the third unfolding feature map based on the fourth feature fusion module to obtain a first aggregation feature map;

the segmentation module is used for segmenting the three-dimensional medical image according to the pixels of which the classification result is the target class;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.