CN114529562A

CN114529562A - Medical image segmentation method based on auxiliary learning task and re-segmentation constraint

Info

Publication number: CN114529562A
Application number: CN202210162154.5A
Authority: CN
Inventors: 屈磊; 周文琼; 吴军; 欧阳磊; 陶在洋; 尚宏伟; 赵婧雨; 洪思成
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2022-05-24

Abstract

The invention relates to a medical image segmentation method based on an auxiliary learning task and re-segmentation constraint, which comprises the following steps in sequence: (1) preprocessing three-dimensional human brain nuclear magnetic resonance data to obtain a training set and a test set; (2) constructing a segmentation network based on an auxiliary learning task and a re-segmentation constraint; (3) inputting the training set into a segmentation network for training to obtain a trained segmentation network; (4) inputting the test set into the trained segmentation network, and outputting the segmentation network to obtain a segmentation result. The method is beneficial to segmentation network learning to complementary medical image characteristics by introducing an additional image reconstruction task branch, so that the model is helped to better understand the internal structure of the medical image; and inputting the reconstruction result into the segmentation network again, comparing the obtained re-segmentation result with the real segmentation graph, and providing an additional supervision signal for training the segmentation network from a semantic level so as to further improve the accuracy of the image segmentation result.

Description

Medical image segmentation method based on auxiliary learning task and re-segmentation constraint

Technical Field

The invention relates to the technical field of medical image segmentation, in particular to a medical image segmentation method based on an auxiliary learning task and re-segmentation constraint.

Background

In recent years, with the rapid development of artificial intelligence technology, computer vision has shown a very high recognition effect in the field of natural image application, and has also gained wide attention in the field of medical image segmentation. Generally, the purpose of segmenting a medical image is to make a human tissue structure or a pathological structure more clear and intuitive, or relevant tissues can be modeled through a segmentation result so as to perform a subsequent auxiliary diagnosis operation. However, the data in medical images is slightly different from the native image format, and in addition to data of two-dimensional structures, image data based on MRI or CT is generally a three-dimensional structure that includes the results of scanning the entire organ tissue. In terms of image content, the boundaries of various objects in natural images are relatively obvious. However, the medical image shows the human tissue structure and is acquired by a professional imaging instrument, and there may be characteristics that the boundary of the tissue edge contour is not clear enough, and the image brightness change is complicated.

Currently, with the rapid iterative update of the deep learning algorithm, researchers make a series of improvements on natural image segmentation models, and apply the models to the field of medical image segmentation, and compared with the traditional medical image segmentation method, the segmentation accuracy is obviously improved. Thus, the conventional medical image segmentation approach is gradually replaced by a deep learning approach. The deep learning mode does not need to acquire features manually like the traditional mode, and the difference caused by the prior knowledge can not be generated, so that the excellent performance effect is shown in the field of medical image segmentation. Under the background of increasing requirements of intelligent medical tasks, the existing medical image segmentation mode adopting a deep learning method usually requires the training of modeling on large-scale labeled data, and the medical image data is smaller than the general data in scale, so that the existing medical image segmentation model is usually difficult to fully extract related characteristics with discriminative power for representing segmentation. Therefore, the limitations described above cause the existing medical image segmentation work based on deep learning to still be further improved in terms of the accuracy of the segmentation performance.

Disclosure of Invention

The invention aims to provide a medical image segmentation method based on an auxiliary learning task and re-segmentation constraint, which improves the image segmentation precision of a segmentation main task by constructing an auxiliary task of image reconstruction, and simultaneously further constrains a network by reconstructing image secondary segmentation in a model training stage so as to improve the accuracy of a segmentation result again.

In order to achieve the purpose, the invention adopts the following technical scheme: a medical image segmentation method based on auxiliary learning task and re-segmentation constraint comprises the following steps:

(1) preprocessing three-dimensional human brain nuclear magnetic resonance data to obtain a training set and a test set;

(2) constructing a segmentation network based on an auxiliary learning task and re-segmentation constraints;

(3) inputting the training set into a segmentation network for training to obtain a trained segmentation network;

(4) and inputting the test set into the trained segmentation network, and outputting the segmentation network to obtain a segmentation result.

The step (1) specifically comprises the following steps:

(2a) the three-dimensional human brain nuclear magnetic resonance data comprises four modalities: t1, T1c, T2 and FLAIR, combining four-modality three-dimensional human brain nmr data, wherein the original size of the four-modality data is 240 × 155, generating four-channel three-dimensional data with the size of 4 × 240 × 155, wherein 4 represents the number of modalities, 155 represents the number of two-dimensional slices contained in each three-dimensional human brain nmr data, and 240 × 240 represents the height and width of the image, respectively;

(2b) converting the merged three-dimensional human brain image data from nii format to numpy format;

(2c) carrying out normalization processing on the converted data by adopting a zero-mean normalization method;

(2d) normalizing the processed image according to random division by 7:3, dividing the ratio into a training set and a test set;

(2e) the training set was randomly clipped, resulting in training set data of size 4 x 128.

In step (2), the partition network comprises a first encoding module, a second encoding module, a first decoding module, a second decoding module and a third decoding module;

the first coding module and the second coding module are composed of four convolution blocks and three maximum pooling downsampling layers, each of the four convolution blocks comprises a first convolution block, a second convolution block, a third convolution block and a fourth convolution block, and the first convolution block comprises a first convolution layer, a first batch normalization layer, a first modified linear unit activation layer, a second convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the second convolution block comprises a third convolution layer, a first batch normalization layer, a first modified linear unit activation layer, a fourth convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the third convolution block comprises a fifth convolution layer, a first batch normalization layer, a first modified linear unit activation layer, a sixth convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the fourth convolution block comprises a seventh convolution layer, a first batch normalization layer, a first modified linear unit activation layer, an eighth convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the three maximum pooled downsampled layers include a first maximum pooled downsampled layer, a second maximum pooled downsampled layer, and a third maximum pooled downsampled layer;

the first decoding module, the second decoding module and the third decoding module are all composed of three deconvolution blocks and three upper sampling layers, the three deconvolution blocks comprise a first deconvolution block, a second deconvolution block and a third deconvolution block, and the first deconvolution block comprises a ninth convolution layer, a third batch normalization layer, a third modified linear unit activation layer, a tenth convolution layer, a fourth batch normalization layer and a fourth modified linear unit activation layer; the second deconvolution block comprises an eleventh convolution layer, a third batch normalization layer, a third modified linear unit activation layer, a twelfth convolution layer, a fourth batch normalization layer and a fourth modified linear unit activation layer; the third anti-convolution block comprises a thirteenth convolution layer, a third batch normalization layer, a third modified linear unit activation layer, a fourteenth convolution layer, a fourth batch normalization layer, a fourth modified linear unit activation layer and a fifteenth convolution layer; the three upsampling layers include a first upsampling layer, a second upsampling layer, and a third upsampling layer.

The step (3) specifically comprises the following steps:

(3a) the training set is sequentially input into a first coding module in batches, and the first coding module codes input data to obtain a first characteristic diagram;

(3b) inputting the first characteristic diagram into a first decoding module and a second decoding module in parallel to realize forward propagation of the segmentation network, wherein the first decoding module outputs a reconstruction result, and the second decoding module outputs a segmentation result;

(3c) inputting the reconstruction result into a second coding module to obtain a second characteristic diagram;

(3d) inputting the second characteristic diagram into a third decoding module to realize forward propagation of the network and obtain a re-segmentation result;

(3e) comparing the segmentation result with the corresponding real segmentation graph, and calculating the segmentation loss through a dice loss function; comparing the re-segmentation result with the corresponding real segmentation graph, and calculating the re-segmentation loss through a dice loss function, wherein the calculation formula of the dice loss function is as follows:

wherein X is a real segmentation graph; when calculating the segmentation loss, Y is a segmentation result, and when calculating the re-segmentation loss, Y is a re-segmentation result; comparing the reconstruction result obtained in the step (3b) with training set data to be segmented input into the segmentation network, and calculating reconstruction loss through a cross entropy loss function;

(3f) weighting and summing the segmentation loss, the re-segmentation loss and the reconstruction loss obtained in the step (3e) to obtain a total loss result, and performing back propagation to train the segmentation network by using a gradient descent algorithm;

(3g) and obtaining the trained segmentation network after the training times of the segmentation network reach the set training times.

The convolution kernel size of the first convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 32; the convolution kernel size of the second convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 64; the convolution kernel size of the third convolution layer is 3 × 3 × 3, and the number of convolution kernels is 64: the convolution kernel size of the fourth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 128; the convolution kernel size of the fifth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 128; the convolution kernel size of the sixth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 256; the convolution kernel size of the seventh convolution layer is 3 multiplied by 3, and the number of convolution kernels is 256; the convolution kernel size of the eighth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 512;

the first maximum pooled downsampled layer, the second maximum pooled downsampled layer and the third maximum pooled downsampled layer are all 2 × 2 × 2 in size;

the size of convolution kernels of the ninth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 256; the size of convolution kernels of the tenth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 256; the size of convolution kernels of the eleventh convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 128; the size of convolution kernels of the twelfth convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 128; the size of convolution kernels of the thirteenth convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 64; the size of convolution kernels of the fourteenth convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 64; the fifteenth convolutional layer has a size of 3 × 3 × 3 and the number of convolutional cores is 4.

The first convolution block of the first coding module is used as an input port of a partition network, the output result of the first convolution block of the first coding module is input to a first maximum pooling down-sampling layer of the first coding module, the output result of the first maximum pooling down-sampling layer of the first coding module is input to a second convolution block of the first coding module, the output result of the second convolution block of the first coding module is input to a second maximum pooling down-sampling layer of the first coding module, the output result of the second maximum pooling down-sampling layer of the first coding module is input to a third convolution block of the first coding module, the output result of the third convolution block of the first coding module is input to a third maximum pooling down-sampling layer of the first coding module, the output result of the third maximum pooling down-sampling layer of the first coding module is input to a fourth convolution block of the first coding module, the output result of the fourth convolution block of the first coding module is input to a first upper sampling layer of the first decoding module in parallel, A first up-sampling layer of a second decoding module splices an output result of the first up-sampling layer of the first decoding module and an output result of a third rolling block of the first encoding module to obtain a first splicing result, splices the output result of the first up-sampling layer of the second decoding module and the output result of the third rolling block of the first encoding module to obtain a second splicing result, inputs the first splicing result to a first reverse rolling block of the first decoding module, inputs the second splicing result to a first reverse rolling block of the second decoding module, inputs the output result of the first reverse rolling block of the first decoding module to a second up-sampling layer of the first decoding module, inputs the output result of the first reverse rolling block of the second decoding module to a second up-sampling layer of the second decoding module, splices the output result of the second up-sampling layer of the first decoding module and the output result of the second rolling block of the first encoding module, obtaining a third splicing result, splicing the output result of the second up-sampling layer of the second decoding module with the output result of the second rolling block of the first encoding module to obtain a fourth splicing result, inputting the third splicing result into the second anti-rolling block of the first decoding module, inputting the fourth splicing result into the second anti-rolling block of the second decoding module, inputting the output result of the second anti-rolling block of the first decoding module into the third up-sampling layer of the first decoding module, inputting the output result of the second anti-rolling block of the second decoding module into the third up-sampling layer of the second decoding module, splicing the output result of the third up-sampling layer of the first decoding module with the output result of the first rolling block of the first encoding module to obtain a fifth splicing result, splicing the output result of the third up-sampling layer of the second decoding module with the output result of the first rolling block of the first encoding module, obtaining a sixth splicing result, inputting the fifth splicing result into a third deconvolution block of the first decoding module, inputting the sixth splicing result into a third deconvolution block of the second decoding module, outputting a reconstruction result by the first decoding module, outputting a segmentation result in the second decoding, inputting the reconstruction result into a first convolution block of the second encoder, inputting an output result of the first convolution block of the second encoder into a first maximum pooled downsampling layer of the second encoder, inputting an output result of the first maximum pooled downsampling layer into a second convolution block of the second encoder, inputting an output result of the second convolution block into a second maximum pooled downsampling layer of the second encoder, inputting an output result of the second maximum pooled downsampling layer of the second encoder into a third convolution block of the second encoder, inputting an output result of the third convolution block of the second encoder into a third maximum pooled downsampling layer of the second encoder, the output result of the third maximal pooled downsampled layer of the second encoder is input into a fourth convolution block of the second encoder, the output result of the fourth convolution block of the second encoder is input into a first upsampled layer of a third decoder, the output of the first upsampled layer of the third decoder and the output of the third convolution block of the second encoder are concatenated, the concatenated result is input into a first inverse convolution block of the third decoder, the output result of the first inverse convolution block of the third decoder is input into a second upsampled layer of the third decoder, the output of the second upsampled layer of the third decoder and the output of the second convolution block of the second encoder are concatenated, the concatenated result is input into a second inverse convolution block of the third decoder, the output result of the second inverse convolution block of the third decoder is input into a third upsampled layer of the third decoder, the output of the third upsampled layer of the third decoder, And the output of the first convolution block of the second encoder is spliced, and the splicing result is input into a third anti-convolution block of a third decoder to obtain a re-segmentation result.

According to the technical scheme, the beneficial effects of the invention are as follows: firstly, the method is helpful for a segmentation network to learn complementary medical image characteristics through the introduction of an additional image reconstruction task branch, so as to help a model to better understand the internal structure of the medical image; secondly, the reconstruction result is input into the segmentation network again, the obtained re-segmentation result is compared with the real segmentation graph, and an additional supervision signal is provided for training of the segmentation network from the semantic level, so that the accuracy of the further image segmentation result is improved.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

fig. 2 is a schematic structural diagram of a segmentation network according to the present invention.

Detailed Description

As shown in fig. 1, a medical image segmentation method based on an assisted learning task and re-segmentation constraint includes the following steps:

The step (1) specifically comprises the following steps:

In step (2), as shown in fig. 2, the partition network includes a first encoding module, a second encoding module, a first decoding module, a second decoding module, and a third decoding module;

the first coding module and the second coding module are composed of four convolution blocks and three maximum pooling downsampling layers, each convolution block comprises a first convolution block, a second convolution block, a third convolution block and a fourth convolution block, and each convolution block comprises a first convolution layer, a first batch normalization layer, a first correction linear unit activation layer, a second convolution layer, a second batch normalization layer and a second correction linear unit activation layer; the second convolution block comprises a third convolution layer, a first batch normalization layer, a first modified linear unit activation layer, a fourth convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the third convolution block comprises a fifth convolution layer, a first batch normalization layer, a first modified linear unit activation layer, a sixth convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the fourth convolution block comprises a seventh convolution layer, a first batch normalization layer, a first modified linear unit activation layer, an eighth convolution layer, a second batch normalization layer and a second modified linear unit activation layer; the three maximum pooled downsampled layers include a first maximum pooled downsampled layer, a second maximum pooled downsampled layer, and a third maximum pooled downsampled layer;

the first decoding module, the second decoding module and the third decoding module are all composed of three deconvolution blocks and three upsampling layers, the three deconvolution blocks comprise a first deconvolution block, a second deconvolution block and a third deconvolution block, and the first deconvolution block comprises a ninth convolution layer, a third batch normalization layer, a third modified linear unit activation layer, a tenth convolution layer, a fourth batch normalization layer and a fourth modified linear unit activation layer; the second deconvolution block comprises an eleventh convolution layer, a third batch normalization layer, a third modified linear unit activation layer, a twelfth convolution layer, a fourth batch normalization layer and a fourth modified linear unit activation layer; the third anti-convolution block comprises a thirteenth convolution layer, a third batch normalization layer, a third modified linear unit activation layer, a fourteenth convolution layer, a fourth batch normalization layer, a fourth modified linear unit activation layer and a fifteenth convolution layer; the three upsampling layers include a first upsampling layer, a second upsampling layer, and a third upsampling layer.

The step (3) specifically comprises the following steps:

The first convolution block of the first coding module is used as an input port of the partition network, the output result of the first convolution block of the first coding module is input to the first maximum pooling downsampling layer of the first coding module, the output result of the first maximum pooling downsampling layer of the first coding module is input to the second convolution block of the first coding module, the output result of the second convolution block of the first coding module is input to the second maximum pooling downsampling layer of the first coding module, the output result of the second maximum pooling downsampling layer of the first coding module is input to the third convolution block of the first coding module, the output result of the third convolution block of the first coding module is input to the third maximum pooling downsampling layer of the first coding module, the output result of the third maximum pooling downsampling layer of the first coding module is input to the fourth convolution block of the first coding module, the output result of the fourth convolution block of the first coding module is input to the first upsampling layer of the first decoding module in parallel, A first up-sampling layer of a second decoding module splices an output result of the first up-sampling layer of the first decoding module and an output result of a third rolling block of the first encoding module to obtain a first splicing result, splices the output result of the first up-sampling layer of the second decoding module and the output result of the third rolling block of the first encoding module to obtain a second splicing result, inputs the first splicing result to a first reverse rolling block of the first decoding module, inputs the second splicing result to a first reverse rolling block of the second decoding module, inputs the output result of the first reverse rolling block of the first decoding module to a second up-sampling layer of the first decoding module, inputs the output result of the first reverse rolling block of the second decoding module to a second up-sampling layer of the second decoding module, splices the output result of the second up-sampling layer of the first decoding module and the output result of the second rolling block of the first encoding module, obtaining a third splicing result, splicing the output result of the second up-sampling layer of the second decoding module with the output result of the second rolling block of the first encoding module to obtain a fourth splicing result, inputting the third splicing result into the second anti-rolling block of the first decoding module, inputting the fourth splicing result into the second anti-rolling block of the second decoding module, inputting the output result of the second anti-rolling block of the first decoding module into the third up-sampling layer of the first decoding module, inputting the output result of the second anti-rolling block of the second decoding module into the third up-sampling layer of the second decoding module, splicing the output result of the third up-sampling layer of the first decoding module with the output result of the first rolling block of the first encoding module to obtain a fifth splicing result, splicing the output result of the third up-sampling layer of the second decoding module with the output result of the first rolling block of the first encoding module, obtaining a sixth splicing result, inputting the fifth splicing result into a third deconvolution block of the first decoding module, inputting the sixth splicing result into a third deconvolution block of the second decoding module, outputting a reconstruction result by the first decoding module, outputting a segmentation result in the second decoding, inputting the reconstruction result into a first convolution block of the second encoder, inputting an output result of the first convolution block of the second encoder into a first maximum pooled downsampling layer of the second encoder, inputting an output result of the first maximum pooled downsampling layer into a second convolution block of the second encoder, inputting an output result of the second convolution block into a second maximum pooled downsampling layer of the second encoder, inputting an output result of the second maximum pooled downsampling layer of the second encoder into a third convolution block of the second encoder, inputting an output result of the third convolution block of the second encoder into a third maximum pooled downsampling layer of the second encoder, the output result of the third maximal pooled downsampled layer of the second encoder is input into a fourth convolution block of the second encoder, the output result of the fourth convolution block of the second encoder is input into a first upsampled layer of a third decoder, the output of the first upsampled layer of the third decoder and the output of the third convolution block of the second encoder are concatenated, the concatenated result is input into a first inverse convolution block of the third decoder, the output result of the first inverse convolution block of the third decoder is input into a second upsampled layer of the third decoder, the output of the second upsampled layer of the third decoder and the output of the second convolution block of the second encoder are concatenated, the concatenated result is input into a second inverse convolution block of the third decoder, the output result of the second inverse convolution block of the third decoder is input into a third upsampled layer of the third decoder, the output of the third upsampled layer of the third decoder, And the output of the first convolution block of the second encoder is spliced, and the splicing result is input into a third anti-convolution block of a third decoder to obtain a re-segmentation result.

Example one

The present invention uses a total of 285 cases of 3D MRI data provided by brain tumor segmentation (BraTS)2018 challenge suite for studies of medical image segmentation. The data set consisted of four MR sequences, each patient having a 3D image of a brain tumor of 240x 240x155 voxel size. The labels for tumor segmentation included background (label 0), necrotic and non-enhanced tumors (label 1), peritumoral edema (label 2) and GD-enhanced tumors (label 4). The method adopts a random division mode, divides a data set into a training set and a test set according to the proportion of 7:3, and evaluates the effectiveness of a segmentation algorithm by calculating the segmentation precision of the test set. Segmentation accuracy is measured by the Dice score index, where ET, WT and TC refer to enhanced tumor region (tag 1), whole tumor (tags 1, 2 and 4) and tumor core (tags 1 and 4), respectively. After image reconstruction is added, the multi-task learning model promotes the sharing of features among different tasks to improve the overall learning performance of the network, so that the segmentation performance of the WT, ET and TC areas is improved by 1.06%, 0.11% and 0.17% respectively. The reconstruction result is input into the model branch again, and the overall brain tumor segmentation result of the model is improved by 1.44%, 0.58% and 1.89% respectively. This shows that, by encouraging that the two segmentation results are constrained to be sufficiently similar in the training process, an additional supervision signal can be generated on a semantic level to guide the training of the model so as to learn more feature information related to the segmentation target, thereby further optimizing the segmentation performance of the network.

Table 1 shows the effect of the image reconstruction task branch on brain tumor segmentation performance:

TABLE 1

Table 2 shows the effect of the present invention on brain tumor segmentation performance:

TABLE 2

The high classification precision of the invention can be further seen through the comparative analysis of experiments.

In conclusion, the method is beneficial to segmentation network learning to complementary medical image features through introduction of additional image reconstruction task branches, so that the model is helped to better understand the internal structure of the medical image; and inputting the reconstruction result into the segmentation network again, comparing the obtained re-segmentation result with the real segmentation graph, and providing an additional supervision signal for training the segmentation network from a semantic level so as to further improve the accuracy of the image segmentation result.

Claims

1. A medical image segmentation method based on auxiliary learning task and re-segmentation constraint is characterized in that: the method comprises the following steps in sequence:

2. The medical image segmentation method based on the assistant learning task and the re-segmentation constraint of claim 1, wherein: the step (1) specifically comprises the following steps:

3. The medical image segmentation method based on the assistant learning task and the re-segmentation constraint of claim 1, wherein: in step (2), the partition network comprises a first encoding module, a second encoding module, a first decoding module, a second decoding module and a third decoding module;

4. The medical image segmentation method based on the assistant learning task and the re-segmentation constraint of claim 1, wherein: the step (3) specifically comprises the following steps:

5. The medical image segmentation method based on the assistant learning task and the re-segmentation constraint of claim 3, wherein: the convolution kernel size of the first convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 32; the convolution kernel size of the second convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 64; the convolution kernel size of the third convolution layer is 3 × 3 × 3, and the number of convolution kernels is 64: the convolution kernel size of the fourth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 128; the convolution kernel size of the fifth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 128; the convolution kernel size of the sixth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 256; the convolution kernel size of the seventh convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 256; the convolution kernel size of the eighth convolution layer is 3 multiplied by 3, and the number of convolution kernels is 512;

6. The medical image segmentation method based on the assistant learning task and the re-segmentation constraint of claim 3, wherein: the first convolution block of the first coding module is used as an input port of the partition network, the output result of the first convolution block of the first coding module is input to the first maximum pooling downsampling layer of the first coding module, the output result of the first maximum pooling downsampling layer of the first coding module is input to the second convolution block of the first coding module, the output result of the second convolution block of the first coding module is input to the second maximum pooling downsampling layer of the first coding module, the output result of the second maximum pooling downsampling layer of the first coding module is input to the third convolution block of the first coding module, the output result of the third convolution block of the first coding module is input to the third maximum pooling downsampling layer of the first coding module, the output result of the third maximum pooling downsampling layer of the first coding module is input to the fourth convolution block of the first coding module, the output result of the fourth convolution block of the first coding module is input to the first upsampling layer of the first decoding module in parallel, A first up-sampling layer of a second decoding module splices an output result of the first up-sampling layer of the first decoding module and an output result of a third rolling block of the first encoding module to obtain a first splicing result, splices the output result of the first up-sampling layer of the second decoding module and the output result of the third rolling block of the first encoding module to obtain a second splicing result, inputs the first splicing result to a first reverse rolling block of the first decoding module, inputs the second splicing result to a first reverse rolling block of the second decoding module, inputs the output result of the first reverse rolling block of the first decoding module to a second up-sampling layer of the first decoding module, inputs the output result of the first reverse rolling block of the second decoding module to a second up-sampling layer of the second decoding module, splices the output result of the second up-sampling layer of the first decoding module and the output result of the second rolling block of the first encoding module, obtaining a third splicing result, splicing the output result of the second up-sampling layer of the second decoding module with the output result of the second rolling block of the first encoding module to obtain a fourth splicing result, inputting the third splicing result into the second anti-rolling block of the first decoding module, inputting the fourth splicing result into the second anti-rolling block of the second decoding module, inputting the output result of the second anti-rolling block of the first decoding module into the third up-sampling layer of the first decoding module, inputting the output result of the second anti-rolling block of the second decoding module into the third up-sampling layer of the second decoding module, splicing the output result of the third up-sampling layer of the first decoding module with the output result of the first rolling block of the first encoding module to obtain a fifth splicing result, splicing the output result of the third up-sampling layer of the second decoding module with the output result of the first rolling block of the first encoding module, obtaining a sixth splicing result, inputting the fifth splicing result into a third deconvolution block of the first decoding module, inputting the sixth splicing result into a third deconvolution block of the second decoding module, outputting a reconstruction result by the first decoding module, outputting a segmentation result in the second decoding, inputting the reconstruction result into a first convolution block of the second encoder, inputting an output result of the first convolution block of the second encoder into a first maximum pooled downsampling layer of the second encoder, inputting an output result of the first maximum pooled downsampling layer into a second convolution block of the second encoder, inputting an output result of the second convolution block into a second maximum pooled downsampling layer of the second encoder, inputting an output result of the second maximum pooled downsampling layer of the second encoder into a third convolution block of the second encoder, inputting an output result of the third convolution block of the second encoder into a third maximum pooled downsampling layer of the second encoder, the output result of the third maximal pooled downsampled layer of the second encoder is input into a fourth convolution block of the second encoder, the output result of the fourth convolution block of the second encoder is input into a first upsampled layer of a third decoder, the output of the first upsampled layer of the third decoder and the output of the third convolution block of the second encoder are concatenated, the concatenated result is input into a first inverse convolution block of the third decoder, the output result of the first inverse convolution block of the third decoder is input into a second upsampled layer of the third decoder, the output of the second upsampled layer of the third decoder and the output of the second convolution block of the second encoder are concatenated, the concatenated result is input into a second inverse convolution block of the third decoder, the output result of the second inverse convolution block of the third decoder is input into a third upsampled layer of the third decoder, the output of the third upsampled layer of the third decoder, And the output of the first convolution block of the second encoder is spliced, and the splicing result is input into a third anti-convolution block of a third decoder to obtain a re-segmentation result.