CN117058160A

CN117058160A - Three-dimensional medical image segmentation method and system based on self-adaptive feature fusion network

Info

Publication number: CN117058160A
Application number: CN202311313587.7A
Authority: CN
Inventors: 刘敏; 陈坤隆; 刘庆浩; 张哲�; 王耀南
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2023-10-11
Filing date: 2023-10-11
Publication date: 2023-11-14
Anticipated expiration: 2043-10-11
Also published as: CN117058160B

Abstract

The invention discloses a three-dimensional medical image segmentation method and a system based on an adaptive feature fusion network, wherein the adaptive feature fusion network is firstly constructed and comprises an encoder, a decoder and a long jump link, a DAM module is arranged on the long jump link, a three-dimensional medical image is input into a plurality of encoding blocks in the encoder for step-by-step downsampling, and a plurality of downsampling feature images with different sizes are correspondingly output; inputting the downsampled feature map to the corresponding DAM module for processing, and outputting corresponding fused long jump link information; and a plurality of decoding blocks in the decoder respectively receive and aggregate the aggregated feature images output by the next layer of decoding blocks adjacent to each other and the corresponding fused long jump link information, and the aggregated feature images output by the uppermost layer of decoding blocks are used as target images which are segmented from the three-dimensional medical images. The method can improve the fusion degree of different layer information collected by the encoder and is suitable for a plurality of segmentation targets with different scales.

Description

Three-dimensional medical image segmentation method and system based on self-adaptive feature fusion network

Technical Field

The invention relates to the technical field of computer vision, in particular to a three-dimensional medical image segmentation method and system based on a self-adaptive feature fusion network.

Background

Currently, medical image segmentation algorithms based on deep learning typically employ a U-shaped network structure, including encoders, decoders, and long jump links. Adaptive feature fusion is a key feature of a U-shaped network. At each stage of the decoder, the feature maps between the encoder and decoder are fused together by a jump connection. This fusion is adaptive in that it allows the network to dynamically select which features should be fused to produce the best segmentation result. This adaptation enables the network to flexibly adapt between different images and segmentation tasks, so the U-shaped network architecture can also be interpreted as an adaptive feature fusion network. The network structure composition of the codec can be classified into a hybrid structure of a convolutional-based deep learning image segmentation algorithm, a Transform-based deep learning image segmentation algorithm, a convolutional-based deep learning image segmentation algorithm, and a Transform-based deep learning image segmentation algorithm.

As FCN applies convolution to the image segmentation field, a convolution-based deep learning segmentation method is widely applied to the fields of natural image segmentation and medical image segmentation. In the field of medical image segmentation, a U-shaped network based on encoder-decoder architecture is proposed for cell segmentation. The codec structure is used to obtain context information and partition target locations. The deep learning network with the structure achieves good medical image segmentation effect. Unet++ then uses dense links as skip links to both encoder and decoder to aggregate different layer features. An nnUnet network framework is proposed, and competitive results are obtained on multiple public data sets by designing a fully automatic segmentation framework for preprocessing, training, reasoning and post-processing of the multiple public data sets. UneXt is a convolution and MLP based network that proposes a tokenized MLP block to reduce network parameters to enable the network to run on a mobile device. However, due to the limitations of convolution itself, a larger receptive field cannot be obtained, which makes it difficult for the network to obtain global information, thereby limiting the segmentation accuracy of the medical image.

Transformer-based deep learning networks have been applied in the field of medical image segmentation, visual Transformer to divide images into position-embedded blocks to construct a series of characterizations, combining self-attention and multi-layer perceptrons to reflect distant features, thereby obtaining a global feature representation. Swin-UNET is a trans-former based codec network structure, and uses layered Swin trans-formers as encoders to extract context features, and uses symmetric Swin trans-former based decoders to achieve good results in multi-organ and heart segmentation tasks. In order to combine the advantages of convolution and transform, the nnFormer adopts a structure combining convolution and transform block interleaving, and utilizes local and global information to construct a feature pyramid so as to obtain a larger acceptance field of view, so that multi-organ and heart segmentation is realized. Although the above algorithm solves the problem of too small a convolution receptive field to some extent, there is still the disadvantage of too large parameters.

In addition to making a series of improvements to the codec of the adaptive feature fusion network, researchers have also performed a series of efforts to combine attention mechanisms with the adaptive feature fusion network: the CE-NET provides a context extractor based on a codec structure to obtain a higher-level semantic feature map and obtain a better effect in different medical image segmentation tasks; CPFNet adds two pyramid modules, SAPF and GPG, fuses multi-scale context information on UNET basis and produces convincing results on four datasets. Although the above network achieves a better effect, the skip link between the codecs cannot fully compensate for the loss of fine information caused by downsampling, thereby reducing the segmentation accuracy of the target boundary. Meanwhile, the network pair cannot completely adapt to different-size segmentation targets due to the limitation of the size and the shape of the receptive field.

Disclosure of Invention

Aiming at the problems that the existing medical image segmentation network causes fine information loss in the downsampling operation and the network cannot completely adapt to segmentation targets with different sizes due to the limitation of the size and the shape of a receptive field, the invention provides a three-dimensional medical image segmentation method and a three-dimensional medical image segmentation system based on a self-adaptive feature fusion network.

The invention provides a three-dimensional medical image segmentation method based on a self-adaptive feature fusion network, which comprises the following steps:

s1, constructing a self-adaptive feature fusion network, wherein the self-adaptive feature fusion network comprises an encoder, a decoder and a long jump link, a DAM module is arranged on the long jump link, the encoder comprises a plurality of layers of coding blocks which are sequentially connected, the decoder comprises decoding blocks which are the same in number as the plurality of layers of coding blocks and are sequentially connected, the bottommost layer of the plurality of layers of coding blocks is directly connected with the bottommost layer of the plurality of layers of decoding blocks, and the plurality of layers of coding blocks are also respectively connected with other layers of decoding blocks in the plurality of layers of decoding blocks through the long jump link provided with the DAM module;

s2, acquiring a three-dimensional medical image, inputting the three-dimensional medical image into an encoder, encoding the three-dimensional medical image step by a plurality of encoding blocks, and correspondingly outputting a plurality of downsampling feature images with different sizes and the thinned feature images;

s3, directly inputting the bottommost refined feature images and the bottommost downsampling feature images output by the bottommost coding block into the bottommost decoding block for processing to obtain bottommost aggregated feature images, inputting a plurality of downsampling feature images with different sizes into corresponding DAM modules for processing through long jump links, and outputting corresponding long jump link information;

s4, other layer decoding blocks in the plurality of layer decoding blocks respectively receive the aggregated feature images output by the next layer decoding blocks adjacent to each other, and receive and process the long jump link information output by the DAM module corresponding to each layer, so as to obtain the aggregated feature images of the corresponding layers;

s5, taking the top layer aggregated feature map output by the top layer decoding block in the plurality of layers of decoding blocks as a target image segmented from the three-dimensional medical image.

Preferably, each layer of coding blocks in the plurality of layers of coding blocks in the S1 comprises a downsampling module and a downsampling FAM module which are sequentially connected, and each layer of decoding blocks in the plurality of layers of decoding blocks comprises an upsampling module and an upsampling FAM module which are sequentially connected.

Preferably, the downsampling module and the upsampling module each comprise a 3D convolution layer, a batch normalization layer and a pralu activation function layer which are sequentially connected.

Preferably, the downsampling FAM module comprises a downsampling channel attention module and a downsampling space attention module which are sequentially connected, the downsampling channel attention module is used for acquiring and processing downsampling feature images output by the corresponding downsampling module to obtain downsampling channel attention feature images, and the downsampling space attention module is used for receiving and processing the downsampling channel attention feature images to acquire downsampling space attention feature images.

Preferably, the downsampling channel attention module and the downsampling spatial attention module each comprise an average pooling layer, a normal convolution layer and a fourier convolution layer connected in sequence.

Preferably, the upsampling FAM module includes an upsampling channel attention module and an upsampling space attention module connected in sequence, and the upsampling channel attention module and the upsampling space attention module have the same structures as the downsampling channel attention module and the downsampling space attention module, respectively.

Preferably, the DAM module includes a deformable convolutional network and a gating mechanism which are connected in sequence, and in S3, a plurality of downsampling feature maps with different sizes are further input to respective corresponding DAM modules through long jump links for processing, and corresponding long jump link information is output, which specifically includes:

s31, inputting a plurality of downsampling feature maps with different sizes to corresponding DAM modules through long jump links;

s32, the DAM module receives downsampled feature images with different sizes, and performs interpolation processing and splicing by adopting a tri-linear difference method to obtain a spliced feature image;

s33, the deformable convolution network receives and processes the spliced characteristic graphs to obtain deformable convolution processed characteristic graphs adapting to various target shapes and positions;

and S34, selectively activating or suppressing the information in the feature map after the deformable convolution processing through a gating mechanism to obtain key information of the feature map after the deformable convolution processing, and taking the key information as long jump link information.

Preferably, S4 specifically includes:

s41, other layer decoding blocks in the plurality of layer decoding blocks respectively receive the aggregated feature map output by the next layer decoding block adjacent to each layer decoding block and the long jump link information output by the DAM module corresponding to each layer decoding block;

s42, an up-sampling module in the decoding block of other layers receives and processes the corresponding aggregated feature images to obtain up-sampling feature images with corresponding sizes;

s43, an up-sampling FAM module in the decoding block of other layers processes the received long-jump link information through an up-sampling channel attention module and an up-sampling space attention module which are connected in sequence, and an up-sampling space attention feature map is obtained;

s44, adding the up-sampling feature map output by the up-sampling module and elements at corresponding positions in the up-sampling spatial attention feature map output by the up-sampling FAM module, thereby obtaining a feature map after corresponding layer aggregation.

The invention further provides a three-dimensional medical image segmentation system based on the self-adaptive feature fusion network, which adopts a three-dimensional medical image segmentation method based on the self-adaptive feature fusion network to segment a target image from a three-dimensional medical image to be detected, wherein the segmentation system comprises an image acquisition module, a computer system and the self-adaptive feature fusion network, the image acquisition module is connected with the computer system, and the self-adaptive feature fusion network is arranged in the computer system, wherein:

the image acquisition module is used for acquiring three-dimensional medical images in a target scene in real time and sending the three-dimensional medical images to the computer system;

the self-adaptive feature fusion network in the computer system processes the three-dimensional medical image by adopting the three-dimensional medical image segmentation method based on the self-adaptive feature fusion network, and the target image is segmented from the three-dimensional medical image.

The three-dimensional medical image segmentation method and the system based on the self-adaptive feature fusion network comprise the steps of firstly constructing the self-adaptive feature fusion network, comprising an encoder, a decoder and long jump connection, then acquiring a three-dimensional medical image, inputting the three-dimensional medical image into the encoder, gradually downsampling the three-dimensional medical image by a plurality of layers of encoding blocks in the encoder, and correspondingly outputting a plurality of downsampling feature images with different scales and refined feature images; then, directly inputting the downsampled feature map and the refined feature map output by the bottommost layer coding block in the encoder into the bottommost layer decoding block in the decoder for upsampling to obtain the bottommost layer upsampled feature map and the refined feature map, inputting the downsampled feature maps output by other layer coding blocks in the encoder into corresponding DAM modules respectively, fusing the downsampled feature maps through a gating mechanism, and outputting corresponding fused long jump link information; and then, the other layer decoding blocks in the decoding blocks respectively receive the aggregated feature images output by the next layer decoding blocks adjacent to each other, aggregate the aggregated feature images output by the next layer decoding blocks and the corresponding fused long jump link information to obtain the aggregated feature images of the corresponding layers, wherein the aggregated feature images output by the uppermost layer decoding block in the decoding blocks are the target images segmented from the three-dimensional medical images. According to the method, a DAM module and a FAM module are introduced into a U-shaped network formed by residual modules, so that a self-adaptive feature fusion network is formed, the DAM module adopts a deformable sensing field and gating fusion method, the fusion degree of different layers of information collected by an encoder can be improved, the DAM module comprises a deformable convolution layer and a gating mechanism, the deformable convolution layer enables the sensing field to adapt to a plurality of segmented objects with different sizes and positions, the multi-layer information is fused through the gating mechanism, the most critical information is reserved, the FAM module adopts a channel attention module and a space attention module which are sequentially connected, different channels and space positions can be adaptively and selectively emphasized, the performance of the network is improved, the Fourier convolution layer is fast Fourier transform convolution, low-frequency information can be extracted from a frequency domain and is complemented with time domain information, and algorithm complexity is lower than that of common convolution.

Drawings

FIG. 1 is a flow chart of a three-dimensional medical image segmentation method based on an adaptive feature fusion network in an embodiment of the invention;

FIG. 2 is a schematic diagram of a network architecture of an adaptive feature fusion network in accordance with an embodiment of the present invention;

fig. 3 is a schematic diagram of a network structure of a FAM module in an embodiment of the invention;

FIG. 4 is a schematic diagram of a network architecture of a DAM module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a system architecture of a three-dimensional medical image segmentation system based on an adaptive feature fusion network according to an embodiment of the present invention;

FIG. 6 is a graph showing the contrast of the effect of segmenting a target image from a three-dimensional medical image using different image processing methods according to an embodiment of the present invention.

Description of the reference numerals

1. An image acquisition module; 2. a computer system; 3. an adaptive feature fusion network.

Detailed Description

In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings.

A three-dimensional medical image segmentation method based on a self-adaptive feature fusion network specifically comprises the following steps:

Specifically, referring to fig. 1 and fig. 2, fig. 1 is a flowchart of a three-dimensional medical image segmentation method based on an adaptive feature fusion network according to an embodiment of the present invention, and fig. 2 is a network structure schematic diagram of the adaptive feature fusion network according to an embodiment of the present invention.

Constructing a self-adaptive feature fusion network, wherein the self-adaptive feature fusion network comprises an encoder and a decoder, the encoder and the decoder are connected through a long jump link, a DAM module (Deformable Aggregation Module) and a deformable aggregation module) are arranged in the long jump link, in the embodiment, the encoder comprises four layers of encoding blocks which are sequentially connected, the decoder comprises four layers of decoding blocks which are sequentially connected, wherein the bottommost encoding block is directly connected with the bottommost decoding block, and each layer of encoding block in the four layers of encoding blocks is also connected with other non-bottommost decoding blocks in the four layers of decoding blocks through the long jump connection provided with the DAM module; acquiring a three-dimensional medical image, and sequentially encoding the input three-dimensional medical image layer by utilizing four layers of encoding blocks of an encoder to correspondingly obtain four downsampled feature images with different sizes and four feature images with different sizes after refinement; the method comprises the steps of directly inputting a bottommost downsampling feature map and a bottommost refined feature map output by a bottommost coding block into a bottommost decoding block for processing to obtain a bottommost aggregated feature map, introducing DAM modules into long jump links, inputting downsampling feature maps with different sizes output by each layer of coding blocks into corresponding DAM modules for processing, fusing information corresponding to the input downsampling feature maps through a gating mechanism to reduce the introduction of redundant information into a decoder by long jump connection, and reserving the most critical information, so that long jump link information is obtained, and is favorable for the decoder to recover finer information of a segmentation target; and inputting the long-jump link information into a corresponding layer decoding block, receiving and processing the long-jump link information and the aggregated feature map output by a next layer decoding block adjacent to the corresponding layer decoding block by the corresponding layer decoding block to obtain a corresponding layer aggregated feature map, inputting the corresponding layer aggregated feature map into a previous layer decoding block, and finally outputting a target image segmented from the three-dimensional medical image by the uppermost layer decoding block through the same processing method.

In one embodiment, each of the plurality of encoding blocks in S1 includes a downsampling module and a downsampling FAM module that are sequentially connected, and each of the plurality of decoding blocks includes an upsampling module and an upsampling FAM module that are sequentially connected.

Specifically, each layer of coding block comprises a downsampling module and a downsampling FAM module (Fourier Attentin Module, fourier attention module) connected with the downsampling module, and the output of the downsampling module and the output of the downsampling FAM module are added to form a residual structure. The first layer (i.e., top-most) coding block in fig. 2 includes a first downsampling module and a first downsampling FAM module, and the fourth layer (i.e., bottom-most) coding block includes a fourth downsampling module and a fourth downsampling FAM module; each layer of decoding block comprises an up-sampling module and an up-sampling FAM module connected with the up-sampling module, and the output of the up-sampling module is added with the output of the up-sampling FAM module to form a residual structure. The first layer (i.e., topmost) decoding block in fig. 2 includes a first upsampling module and a first upsampling FAM module, and the fourth layer (i.e., bottommost) decoding block includes a fourth upsampling module and a fourth upsampling FAM module. Each layer of coding block and each layer of decoding block adopt residual error structures, so that the gradient vanishing problem can be further relieved, and the convergence of the self-adaptive feature fusion network can be accelerated.

Taking a second layer decoding block in the decoding blocks as an example (the second layer decoding block includes a second upsampling module and a second upsampling FAM module), the processing procedure is described as follows: the second up-sampling FAM module in the second decoding block receives the corresponding long jump link information, the second up-sampling module in the second decoding block receives the next layer decoding block adjacent to the second up-sampling module, namely the third layer aggregated feature map output by the third layer decoding block and carries out up-sampling to obtain an up-sampling feature map, the second up-sampling FAM module processes the received long jump link information and the up-sampling feature map, and adds the processing result and the corresponding up-sampling feature map to connect the shallow layer feature at the corresponding position of the encoder with the deep layer feature of the decoder to obtain the second layer aggregated feature map output by the third layer decoding block. In the coding blocks, a downsampling module in each layer of coding blocks carries out convolution operation on the input multi-layer two-dimensional image to obtain downsampling feature images, a downsampling FAM module in each layer of coding blocks processes the received downsampling feature images, and the processing result and the corresponding downsampling feature images are added to obtain corresponding layer refined feature images output by each layer of coding blocks.

In one embodiment, the downsampling module and the upsampling module each include a 3D convolution layer, a batch normalization layer, and a pralu activation function layer, which are connected in sequence.

Specifically, each downsampling module or upsampling module comprises a 3D convolution layer, a batch normalization layer and a PReLU activation function layer which are sequentially connected, the 3D convolution layer is adopted to carry out convolution operation on three-dimensional data, namely, a filter is moved in all three directions (height, width and channel), characteristics are extracted through multiplication and addition of elements, the batch normalization layer carries out normalization processing on data of each batch, the problems of gradient disappearance and gradient explosion can be relieved, the convergence speed is accelerated, the generalization capability of a network is improved, and the PReLU activation function layer PReLU provides more flexible nonlinearity than the traditional ReLU by introducing a linear relation with a learnable parameter, and can effectively solve the problem of gradient disappearance.

In one embodiment, the downsampling FAM module includes a downsampling channel attention module and a downsampling space attention module which are sequentially connected, the downsampling channel attention module is used for acquiring and processing a downsampling feature map output by the corresponding downsampling module to obtain a downsampling channel attention feature map, and the downsampling space attention module is used for receiving and processing the downsampling channel attention feature map to obtain a downsampling space attention feature map.

In one embodiment, the downsampling channel attention module and the downsampling spatial attention module each include an average pooling layer, a normal convolution layer, and a fourier convolution layer connected in sequence.

In one embodiment, the upsampling FAM module includes an upsampling channel attention module and an upsampling space attention module connected in sequence, the upsampling channel attention module and the upsampling space attention module having the same structures as the downsampling channel attention module and the downsampling space attention module, respectively.

Specifically, referring to fig. 3, fig. 3 is a schematic diagram of a network structure of a FAM module in an embodiment of the invention.

The FAM module in fig. 3 includes a channel attention module and a spatial attention module connected in sequence, and the channel attention module and the spatial attention module each include an average pooling layer, a normal convolution layer, and a fourier convolution layer connected in sequence.

When the FAM module is embodied as a downsampled FAM module in the encoding block, the channel attention module and the spatial attention module in fig. 3 are embodied as a downsampled channel attention module and a downsampled spatial attention module. At this time, the information input by the downsampling FAM modules in each coding block is specifically a downsampling feature map of a corresponding size, the downsampling channel attention module in each downsampling FAM module carries out averaging pooling and convolution processing on the downsampling feature map of the corresponding size, outputs downsampling channel attention weights, carries out channel weighting operation of element-by-element multiplication on the downsampling channel attention weights and corresponding position elements in matrixes corresponding to the downsampling feature maps of the corresponding size, obtains the downsampling channel attention feature map, receives the downsampling channel attention feature map by the downsampling space attention module in each downsampling FAM module, carries out averaging pooling and convolution processing on the downsampling channel attention feature map, outputs downsampling space attention weights, and carries out space weighting operation of element-by-element multiplication on the downsampling space attention weights and corresponding position elements in the matrixes corresponding to the downsampling channel attention feature map, thereby obtaining the downsampling space attention feature map. And adding the downsampled spatial attention feature map and the corresponding downsampled feature map to obtain a refined feature map of the corresponding layer output.

When the FAM module is embodied as an upsampled FAM module in a decoding block, the channel attention module and the spatial attention module in fig. 3 are embodied as an upsampled channel attention module and an upsampled spatial attention module. At this time, the information input by the upsampling FAM module of each decoding block in the other decoding blocks except the lowest decoding block is specifically long jump connection information and an upsampling feature map with a corresponding size. The upsampling FAM module processes the long jump connection information and the upsampling feature map with the corresponding size by the method as described above, and outputs the corresponding upsampling spatial attention feature map, which is not described herein again. And adding the upsampled spatial attention feature map and the corresponding upsampled feature map to obtain an aggregated feature map output by the corresponding layer.

The channel attention mechanism and the space attention mechanism are adopted to select the important channel attention feature diagram and the space attention feature diagram, which is more beneficial to fusing the long jump link information and the downsampled direct input information, so that the decoder outputs finer segmentation results.

The channel attention module dynamically weights the feature responses of each channel, and the model can automatically decide which channels are more important for the current task, thereby improving the expressive power of the features. The spatial attention module dynamically weights the features of the different locations, which helps the model focus on different areas of the input data, thereby improving the network's ability to locate and identify boundaries of objects. This adaptive attention mechanism helps the network better handle features of different scales, different levels of abstraction and different locations, making the model more generalizing.

In one embodiment, the DAM module includes a deformable convolution network and a gating mechanism which are connected in sequence, and in S3, a plurality of downsampling feature maps with different sizes are further input to the DAM module corresponding to each of the downsampling feature maps through long jump links for processing, and corresponding long jump link information is output, which specifically includes:

Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a network structure of a DAM module according to an embodiment of the present invention.

Each DAM module comprises a deformable convolution network and a gating mechanism, the deformable convolution network can enable the receptive field to adapt to a plurality of segmented objects with different sizes and positions, and in order to obtain a larger receptive field, the deformable convolution network adopts a convolution kernel with the size of 13x13, so that more complete characteristics of an object of interest can be better extracted, and redundant background information is prevented from being introduced in the process of recovering finer information by a decoder. Taking the second DAM module of fig. 2 as an example, the second DAM module receives 3 downsampling modules connected thereto (corresponding to the second downsampling of fig. 2Sample module, third downsampling module, and fourth downsampling module) respectively outputting downsampled feature maps of different sizes、/>、/>Performing tri-linear interpolation on the 3 feature images with different sizes to the same size, then splicing to obtain a spliced feature image, receiving and processing the spliced feature image by a deformable convolution network, so that a receptive field is adapted to a plurality of spliced objects with different sizes and positions in the spliced feature image, and further capturing the details and local structures of the targets better to obtain a deformable convolution processed feature image adapting to various target shapes and positions; the data of the feature map after the deformable convolution processing is selectively activated or suppressed by a gating mechanism (Relu+Sigmod) to retain key information as corresponding long-jump link information.

In one embodiment, S4 specifically includes:

Specifically, referring to fig. 2, each encoding block in fig. 2 includes a downsampling module and a downsampling FAM module connected to the downsampling module, and outputs of the downsampling module and outputs of the downsampling FAM module are added to form a residual structure, each decoding block in fig. 2 includes an upsampling module and an upsampling FAM module connected to the upsampling module, and outputs of the upsampling module and outputs of the upsampling FAM module are added to form a residual structure, wherein a lowermost encoding block is directly connected to a lowermost decoding block, and encoding blocks and decoding blocks of other layers except the lowermost layer are connected through long jump links provided with the DAM module. Taking the decoder in fig. 2 as an example, the processing procedure of the input information by the multi-layer decoding block in the decoder is described as follows:

the method comprises the steps that a bottommost decoding block in a decoder directly receives a downsampling feature map and a refined feature map (namely a bottommost downsampling feature map and a lowest refined feature map) output by a bottommost encoding block in the encoder, an upsampling module (corresponding to a fourth upsampling module in fig. 2) in the bottommost decoding block carries out upsampling processing on the lowest refined feature map to obtain a bottommost upsampling feature map, a FAM module (corresponding to a fourth upsampling FAM module in fig. 2) in the bottommost decoding block carries out processing on the received bottommost downsampling feature map and the bottommost upsampling feature map to obtain a bottommost aggregated feature map, other decoding blocks in the decoding block respectively receive the aggregated feature map output by the next adjacent decoding blocks, the upsampling feature map output by the upsampling module in each decoding block is processed by the upsampling module in each layer to obtain an upsampling feature map output by the upsampling module in each decoding block, and the FAM module in each layer receives corresponding fused long-jump link information and respectively acquires the upsampling channel attention map and the upsampling channel attention feature map through an upsampling channel attention module and an upsampling attention module which are sequentially connected; and adding the up-sampling feature map output by the up-sampling module in each layer of decoding block with elements at corresponding positions in the up-sampling spatial attention feature map obtained after the up-sampling FAM module is processed through a residual structure, thereby obtaining a feature map after aggregation of the corresponding layers. And inputting the aggregated feature map obtained after processing of each layer of decoding blocks into the adjacent previous layer of decoding blocks, processing the aggregated feature map in the same manner as the previous layer of decoding blocks, and taking the aggregated feature map output after processing of the uppermost layer of decoding blocks as a target image which is segmented from the three-dimensional medical image.

In one embodiment, a three-dimensional medical image segmentation system based on an adaptive feature fusion network, the segmentation system comprising an image acquisition module, a computer system, and the adaptive feature fusion network, the image acquisition module being connected to the computer system, the adaptive feature fusion network being disposed in the computer system, wherein:

the self-adaptive feature fusion network in the computer system adopts a three-dimensional medical image segmentation method based on the self-adaptive feature fusion network to segment a target image from the three-dimensional medical image to be detected.

Specifically, referring to fig. 5, fig. 5 is a schematic system structure diagram of a three-dimensional medical image segmentation system based on an adaptive feature fusion network according to an embodiment of the present invention.

The three-dimensional medical image segmentation system based on the adaptive feature fusion network shown in fig. 5 comprises an image acquisition module 1, a computer system 2 and an adaptive feature fusion network 3, wherein the image acquisition module 1 is connected with the computer system 2 and is used for inputting acquired three-dimensional medical images to be detected into the computer system 2, and the adaptive feature fusion network 3 is arranged in the computer system 2 and is used for receiving and processing the three-dimensional medical images to be detected and outputting target images segmented from the three-dimensional medical images to be detected.

For specific limitations of the three-dimensional medical image segmentation system based on the adaptive feature fusion network, reference may be made to the above limitation of the three-dimensional medical image segmentation method based on the adaptive feature fusion network, which is not described herein.

Further, in order to verify the effectiveness of the three-dimensional medical image segmentation method and system based on the adaptive feature fusion network provided by the invention, fig. 6 shows an effect comparison graph of a target image segmented from a three-dimensional medical image by adopting other methods and the three-dimensional medical image segmentation method based on the adaptive feature fusion network provided by the invention.

The "original picture" in fig. 6 is listed as a three-dimensional medical image to be detected, the medical image is suspected to contain a target image, the "label" in fig. 6 is listed as a true value label of a target object, the "our method" in fig. 6 is listed as an image of the target object segmented from the three-dimensional medical image to be detected by adopting the three-dimensional medical image segmentation method based on the adaptive feature fusion network proposed by the present invention, and the "nnFormer", "UNETR", "CPFNet", "transutet", "attrenet", "une++" in fig. 6 respectively represent the columns of the image of the target object segmented from the three-dimensional medical image to be detected by adopting the corresponding method, for example, "nnFormer" represents the column of the image of the target object segmented from the three-dimensional medical image to be detected by adopting the nnFormer method.

Further, the images of the target object segmented from the three-dimensional medical image by adopting other methods and the three-dimensional medical image segmentation method based on the self-adaptive feature fusion network in the invention are subjected to the Gaussian similarity coefficient comparison, and the comparison data are specifically shown in the table 1.

TABLE 1 comparison of dess similarity coefficients

By comparing images of the target image segmented from the three-dimensional medical image to be detected by adopting different methods, it can be seen that the three-dimensional medical image segmentation method (corresponding to the method of table 1 in the column of the method) based on the adaptive feature fusion network provided by the invention can segment the target image from the three-dimensional medical image more effectively.

The three-dimensional medical image segmentation method and the system based on the self-adaptive feature fusion network comprise the steps of firstly constructing the self-adaptive feature fusion network, comprising an encoder, a decoder and long jump connection, then acquiring a three-dimensional medical image, inputting the three-dimensional medical image into the encoder, gradually downsampling the three-dimensional medical image by a plurality of layers of encoding blocks in the encoder, and correspondingly outputting a plurality of downsampling feature images with different scales and refined feature images; then, directly inputting the downsampled feature map and the refined feature map output by the bottommost layer coding block in the encoder into the bottommost layer decoding block in the decoder for upsampling to obtain the bottommost layer upsampled feature map and the aggregated feature map, inputting the downsampled feature maps output by the coding blocks of each layer in the encoder into corresponding DAM modules, fusing the downsampled feature maps through a gating mechanism, and outputting corresponding fused long jump link information; and then, the other layer decoding blocks in the decoding blocks respectively receive the aggregated feature images output by the next layer decoding blocks adjacent to each other, aggregate the aggregated feature images output by the next layer decoding blocks and the corresponding fused long jump link information to obtain the aggregated feature images of the corresponding layers, wherein the aggregated feature images output by the uppermost layer decoding block in the decoding blocks are the target images segmented from the three-dimensional medical images. According to the method, DAM and FAM modules are introduced into a U-shaped network formed by residual modules, so that a self-adaptive feature fusion network is formed, the DAM module comprises a deformable convolution layer and a gating mechanism, the deformable convolution layer enables a receptive field to adapt to a plurality of segmented objects with different sizes and positions, the gating mechanism fuses multi-level information to retain the most critical information, the FAM module adopts a channel attention module and a space attention module which are sequentially connected, different channels and space positions can be adaptively and selectively emphasized, the performance of the network is improved, the Fourier convolution layer is fast Fourier transform convolution, low-frequency information can be extracted from a frequency domain and is complemented with time domain information, and algorithm complexity is lower than that of common convolution.

The three-dimensional medical image segmentation method and the system based on the self-adaptive feature fusion network provided by the invention are described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. The three-dimensional medical image segmentation method based on the self-adaptive feature fusion network is characterized by comprising the following steps of:

s1, constructing a self-adaptive feature fusion network, wherein the self-adaptive feature fusion network comprises an encoder, a decoder and a long jump link, a DAM module is arranged on the long jump link, the encoder comprises a plurality of layers of coding blocks which are sequentially connected, the decoder comprises decoding blocks which are the same in number as the plurality of layers of coding blocks and are sequentially connected, the bottommost coding block in the plurality of layers of coding blocks is directly connected with the bottommost decoding block in the plurality of layers of decoding blocks, and the plurality of layers of coding blocks are also respectively connected with other layers of decoding blocks in the plurality of layers of decoding blocks through the long jump link provided with the DAM module;

s2, acquiring a three-dimensional medical image, inputting the three-dimensional medical image to the encoder, encoding the three-dimensional medical image step by a plurality of layers of encoding blocks, and correspondingly outputting a plurality of downsampling feature maps with different sizes and a thinned feature map;

s3, directly inputting the bottommost refined feature images and the bottommost downsampling feature images output by the bottommost coding blocks into the bottommost decoding blocks for processing to obtain bottommost aggregated feature images, inputting a plurality of downsampling feature images with different sizes into corresponding DAM modules for processing through long jump links, and outputting corresponding long jump link information;

s4, other layer decoding blocks in the plurality of layers of decoding blocks respectively receive the aggregated feature images output by the next layer decoding blocks adjacent to each layer of decoding blocks, and receive and process the long jump link information output by the DAM module corresponding to each layer of decoding blocks to obtain the aggregated feature images of the corresponding layers;

and S5, taking the top-layer aggregated feature map output by the top-layer decoding block in the decoding blocks of the layers as a target image segmented from the three-dimensional medical image.

2. The three-dimensional medical image segmentation method based on the adaptive feature fusion network according to claim 1, wherein each of the several layers of the coding blocks in S1 includes a downsampling module and a downsampling FAM module that are sequentially connected, and each of the several layers of the decoding blocks includes an upsampling module and an upsampling FAM module that are sequentially connected.

3. The three-dimensional medical image segmentation method based on the adaptive feature fusion network according to claim 2, wherein the downsampling module and the upsampling module each comprise a 3D convolution layer, a batch normalization layer and a pralu activation function layer, which are sequentially connected.

4. A three-dimensional medical image segmentation method based on an adaptive feature fusion network as defined in claim 3, wherein the downsampling FAM module comprises a downsampling channel attention module and a downsampling space attention module which are sequentially connected, the downsampling channel attention module is used for acquiring and processing a downsampling feature map output by the corresponding downsampling module to obtain a downsampling channel attention feature map, and the downsampling space attention module is used for receiving and processing the downsampling channel attention feature map to acquire the downsampling space attention feature map.

5. The method for three-dimensional medical image segmentation based on an adaptive feature fusion network according to claim 4, wherein the downsampling channel attention module and the downsampling spatial attention module each comprise an average pooling layer, a normal convolution layer and a fourier convolution layer, which are sequentially connected.

6. The method for three-dimensional medical image segmentation based on adaptive feature fusion network according to claim 5, wherein the upsampling FAM module comprises an upsampling channel attention module and an upsampling space attention module which are sequentially connected, and the upsampling channel attention module and the upsampling space attention module have the same structures as the downsampling channel attention module and the downsampling space attention module, respectively.

7. The three-dimensional medical image segmentation method based on the adaptive feature fusion network according to claim 6, wherein the DAM module comprises a deformable convolution network and a gating mechanism which are sequentially connected, and in S3, a plurality of downsampled feature graphs with different sizes are further input to respective corresponding DAM modules through long jump links to be processed, and corresponding long jump link information is output, which specifically comprises:

s33, the deformable convolution network receives and processes the spliced characteristic images to obtain deformable convolution processed characteristic images adapting to various target shapes and positions;

and S34, selectively activating or inhibiting the information in the feature map after the deformable convolution processing through the gating mechanism to obtain key information of the feature map after the deformable convolution processing, and taking the key information as long jump link information.

8. The three-dimensional medical image segmentation method based on the adaptive feature fusion network according to claim 7, wherein S4 specifically comprises:

s41, other layer decoding blocks in the plurality of layers of decoding blocks respectively receive the aggregated feature images output by the next layer decoding blocks adjacent to each other and the long jump link information output by the DAM modules corresponding to each other;

9. A three-dimensional medical image segmentation system based on an adaptive feature fusion network, which segments a target image from a three-dimensional medical image to be detected by adopting the three-dimensional medical image segmentation method based on the adaptive feature fusion network according to any one of claims 1 to 8, wherein the segmentation system comprises an image acquisition module, a computer system and an adaptive feature fusion network, the image acquisition module is connected with the computer system, and the adaptive feature fusion network is arranged in the computer system, wherein:

an adaptive feature fusion network in the computer system processes the three-dimensional medical image using the method of any one of claims 1 to 8, and segments a target image from the three-dimensional medical image.