CN114998307A

CN114998307A - Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network

Info

Publication number: CN114998307A
Application number: CN202210796459.1A
Authority: CN
Inventors: 文静; 尹浩; 王翊; 张毅; 杨维斌
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2022-09-02

Abstract

The invention belongs to the technical field of medical image segmentation, and particularly discloses a two-stage full-3D abdominal organ segmentation method and a two-stage full-3D abdominal organ segmentation system based on a dual-resolution network. By adopting the technical scheme, the segmentation of the full 3D abdominal organ is realized by using a two-stage method.

Description

Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network

Technical Field

The invention belongs to the technical field of medical image segmentation, and relates to a two-stage full-3D abdominal organ segmentation method and system based on a dual-resolution network.

Background

With the rapid development of modern society, the pace of life of people is accelerated, and health problems become more and more important. Meanwhile, the development of modern medical technology is changing day by day, and the auxiliary diagnosis using medical image processing technology has become a very important treatment method in medical units.

In the traditional method, the standard of medical image segmentation is manually completed by a doctor with medical experience for many years, which is time-consuming and labor-consuming, and needs the doctor to check a large number of slice images one by one and to outline a volume image. In addition, the manual marking results have the defects of low repeatability and strong subjectivity, and the results of respective manual segmentation are different due to the difference of clinical treatment of different doctors.

Since the 21 st century, the image segmentation technology based on deep learning is widely applied in the medical field, but since the abdominal medical image has the problems of serious inter-class imbalance and unobvious difference among organs, some organs which are not easy to segment have the characteristics of small shape and large deformation, and the deformation among different cases is particularly large, and the boundary is not clear, so that the image model segmentation effect is usually poor. For example: the precision of the pancreas is always below 90%, and the liver, spleen, left kidney and right kidney near the pancreas can reach 33% to 95% or so. Meanwhile, the single set of data of the 3D abdomen medical image has large information amount and large byte number, so that the training process of the image segmentation model is very long, and the high efficiency and timeliness of the abdomen medical intelligent diagnosis are greatly reduced. Therefore, a high-efficiency and high-precision automatic segmentation processing method is urgently needed.

In recent years, a deep learning theoretical method is introduced into intelligent automatic diagnosis of various medical images, and the deep learning-based method becomes a mainstream research method by virtue of strong feature expression capability. Many existing medical image detection methods are based on the Unet (semantic segmentation) model and improved on the basis of the Unet model, for example: the method comprises the steps of detecting the features of the image, analyzing the features of the image, and analyzing the images to obtain a structural structure of Dense and Res, and analyzing the structural structure of the image to obtain a characteristic pyramid and a characteristic pyramid.

Although these methods and strategies effectively improve the segmentation effect, the following problems still exist: 1) the improvement effect is not obvious, and the performance is hardly improved obviously in practical application; 2) the quantity of parameters of the network is large, a large number of data sets are needed for training, but the quantity of available medical data is small, and meanwhile, the problem of overfitting also exists; 3) after the model is downsampled for many times, the detail information of the image is seriously lost, and the influence on the precision is large; 4) the use of too many activation and normalization layers by the model may have adverse effects, such as loss of feature details.

Furthermore, we found by further analysis of the 3D abdominal organ segmentation study that the following problems also exist: 1) the difficulty of sample collection is high, the quality levels of samples are not uniform, some samples with poor quality exist, and the requirement on the generalization of the network is high; 2) there are certain difficulties with the detection of small organs; 3) the prediction precision requirement is high. 4) The single-stage reasoning with sliding windows is costly and loses three-dimensional background.

Disclosure of Invention

The invention aims to provide a two-stage full-3D abdominal organ segmentation method and a two-stage full-3D abdominal organ segmentation system based on a double-resolution network, which are used for realizing the segmentation of a full-3D abdominal organ and shortening the segmentation task period.

In order to achieve the purpose, the basic scheme of the invention is as follows: a two-stage full-3D abdominal organ segmentation method based on a depth dual-resolution network comprises the following steps:

acquiring a data set and an original image, and preprocessing the data set and the original image;

carrying out random data enhancement on the preprocessed data set and the original image;

training a rough segmentation network and a fine segmentation network according to the data set;

zooming the original image after data enhancement to a preset size, inputting the image into a rough segmentation network, and performing abdominal organ segmentation;

and obtaining an ROI image according to the ROI obtained by rough segmentation, zooming the ROI image to a preset size, and inputting the ROI image into a fine segmentation network to obtain a segmentation result.

The working principle and the beneficial effects of the basic scheme are as follows: corresponding data are obtained and preprocessed and random data enhancement is carried out, so that the generalization capability of the model can be improved, and the subsequent use of the data is facilitated. By using a two-stage method from coarse to fine, the problems that the full-3D abdomen segmentation task period is long and the training is time-consuming are solved, and the abdominal organs can be segmented quickly and accurately.

Further, the method for preprocessing the data set and the original image is as follows:

removing data with the number of spacing layers in all z-axis directions being at least 5% of all the number of layers, and removing data with the z-direction image layer spacing being higher than 3 to obtain an experimental data set;

dividing an experimental data set into a training set and a testing set according to the proportion of 8:2, adjusting an image to a preset size before training a rough segmentation network and a fine segmentation network, and carrying out standardization and normalization treatment:

wherein, mu and sigma are respectively mean value and variance of original image data, x and x ^* Respectively, the pixel value of the original image and the preprocessed pixel value.

The method is simple to operate and beneficial to use, screens data and ensures that the data obtained by the network model is displayed smoothly in three dimensions.

Further, the method for enhancing random data of the data set and the original image is as follows:

carrying out space geometric transformation and pixel transformation on the data set and the original image, and presetting a probability value corresponding to the transformation;

the number of training samples is increased through space geometric transformation, pixel masking points with different sizes filled with different values are generated through pixel transformation, and then the pixel masking points are mixed with original images to disturb partial characteristics of the original images.

The image is enhanced by using a random enhancement mode, so that the regularization effect can be brought, the robustness of the model is enhanced, and the sensitivity of decision-making on model parameters is reduced.

Further, the method for training the rough segmentation network is as follows:

reducing the image of the data set to a preset size, and inputting the image into a rough segmentation network, wherein the rough segmentation network is a 3DUnet network;

and cutting and adjusting according to the ROI area of the rough segmentation network to obtain an ROI image with a required size and inputting the ROI image into the fine segmentation network.

The image is reduced to a preset size, the training speed can be accelerated, the obtained ROI image is input into the fine segmentation network through the coarse segmentation network, and the fine segmentation network can be conveniently and accurately segmented.

Further, the method for training the fine segmentation network is as follows:

establishing a deep dual-resolution branch network: the encoder performs down-sampling and feature extraction on the image through an improved 3D residual convolution module and a down-sampling module, namely an ith high-resolution feature map X _Hi And low resolution featuresSign diagram X _Li Comprises the following steps:

wherein, F _H And F _L Corresponding high-resolution and low-resolution residual basic block sequences, T _L-H And T _H-L Represents the low-to-high and high-to-low conversion functions, respectively, and R represents the Relu activation function;

dual resolution branch feature fusion: performing double-branch feature extraction at the third stage of the encoder part, and continuously performing down-sampling on the low-resolution branch to acquire more deep features and semantic information; the high-resolution branches are used for extracting features and keeping the size and the channel number of the feature map unchanged, and two branches are used for carrying out multi-time bilateral feature fusion at different stages to fully fuse spatial information and semantic information;

capturing anisotropy and context information present in the abdominal scene using an anisotropic pyramid pooling module: before point-by-point summation of the double resolution branches, inputting the low resolution branches into an anisotropic pyramid pooling module, wherein the anisotropic pyramid pooling module comprises anisotropic strip pooling and standard space pooling, and the anisotropic strip pooling captures anisotropy and context information existing in an abdominal scene so as to capture the spatial relationship among multiple organs, and the standard space pooling fuses the multi-scale features;

combining the output of the anisotropic strip pooling and the standard space pooling, restoring the semantic feature information extracted by the encoder to the original image size through continuous upsampling, and completing the classification task of the corresponding pixel points to obtain the final output.

In a fine-segmented network, loss of feature details is avoided with fewer activated and normalized residual modules. By using the dual-resolution branch and the cross fusion method, more detail information is prevented from being lost in the down-sampling process, the low-resolution branch supplements detail information, and the high-resolution branch supplements semantic information. And an anisotropic pyramid pooling module is used in a low-resolution branch, so that more spatial information can be captured and multi-scale feature fusion can be realized.

Further, the two branches are subjected to bilateral feature fusion for multiple times at different stages, space information and semantic information are fused, and for the fusion from high resolution to low resolution, high resolution feature mapping is subjected to down-sampling through a 3 multiplied by 3 convolution sequence with the step length of 2 before point-by-point summation;

low resolution to high resolution fusion, low resolution feature mapping is first compressed by a 1 × 1 × 1 convolution, and then upsampled by trilinear interpolation.

The method is simple to operate and beneficial to use, and loss of image detail information caused by continuous downsampling is avoided.

Further, an anisotropic pyramid pooling module captures anisotropy and context information existing in an abdominal scene, and an input feature map is respectively sent to anisotropic strip pooling and standard space pooling after passing through two 1 × 1 × 1 convolution modules;

1 XNx N, N x1 xN and NxNx1 are subjected to inter-slice convolution and upsampling by 3 X1 x 1, 1 X3 x 1 and 1 X1 x 3 slices, and finally added together and sent into a convolution module, wherein the convolution module can capture the anisotropy and context information existing in an abdominal scene, so that the spatial relationship among multiple organs is captured;

the standard space pooling adopts two average pooling, the step length is respectively 2 multiplied by 2 and 4 multiplied by 4, and the fusion of multi-scale characteristics is realized through inter-slice convolution and up-sampling which are the same as strip pooling and finally fusion with residual error branches;

combining the outputs of the anisotropic strip pooling and the standard space pooling, and obtaining the final output after the outputs are passed through a convolution module with the size of 1 multiplied by 1, the input characteristics and a Relu activation function.

Simple structure and convenient operation.

Further, the method also comprises an injury function and a depth supervision strategy:

the Dice Loss is used as a Loss function of the network, and the computation of the Dice Loss is as follows:

the mixing loss function is then:

Loss(y,p)＝DiceLoss(y,p)

wherein y represents a label of an original image, and p represents a prediction result of the network model;

and (3) adding auxiliary loss on the third stage of the high-resolution branch by adopting a deep supervision strategy, wherein the overall loss function of the network is as follows:

Loss _total ＝Loss _main (y,p)+λ ₁ Loss _aux1 (y,p)

wherein Loss _main 、Loss _aux1 Respectively representing main loss and auxiliary loss of a third stage; lambda [ alpha ] ₁ Is losing weight.

And a deep supervision strategy is used, so that gradient explosion and gradient disappearance are avoided, and the problem of slow convergence is solved.

Further, the method for evaluating the segmentation precision of the fine segmentation network comprises the following steps:

optimizing the fine segmentation network by adopting an Adam optimizer, setting Dropout rate to be 0.2, setting learning rate to be 0.01, setting batch to be 1 and setting the maximum iteration number to be 200 epochs;

saving a weight file generated in the round of the result with the highest average value of DSC and NSC on the verification set, and terminating the training of the network when the maximum iteration number is reached;

wherein the content of the first and second substances,

g is a label of an original image, and S is a prediction result of the network model;

and

representing the number of voxels of the segmentation;

and

respectively representing the boundary regions of the label and the distance of the segmentation surface under the tolerance tau;

the overlap of the boundary region representing the distance between the voxel of the label and the predictor surface,

an overlap of boundary regions representing voxel and label surface distances of the prediction;

the accuracy of the network segmentation was assessed on the test set using DSC and NSC indices.

The evaluation of the segmentation precision of the network is beneficial to analyzing whether the segmentation result output by the network is accurate or not and simultaneously is convenient for subsequently optimizing the network,

the invention also provides a two-stage full-3D abdominal organ segmentation system based on the depth dual-resolution network, which comprises a data acquisition unit and a processing unit, wherein the data acquisition unit is used for acquiring data sets and original images, the output end of the data acquisition unit is connected with the input end of the processing unit, and the processing unit executes the method of the invention to complete full-3D abdominal organ segmentation.

The two-stage full-3D abdominal organ segmentation method based on the depth dual-resolution network is utilized to realize the segmentation of full-3D abdominal organs, and the method is simple to operate and beneficial to use.

Drawings

FIG. 1 is a schematic flow chart of a two-stage full 3D abdominal organ segmentation method based on a deep dual resolution network according to the present invention;

FIG. 2 is a schematic diagram of a deep dual resolution branch network structure of the two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network according to the present invention;

FIG. 3 is a schematic structural diagram of a 3D residual convolution module of the two-stage full 3D abdominal organ segmentation method based on a depth dual resolution network according to the present invention;

FIG. 4 is a schematic diagram of a dual-branch feature fusion structure of the two-stage full 3D abdominal organ segmentation method based on a deep dual resolution network according to the present invention;

FIG. 5 is a schematic structural diagram of an anisotropic pyramid pooling module of the two-stage full 3D abdominal organ segmentation method based on a deep dual resolution network according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

The invention discloses a two-stage full-3D abdominal organ segmentation method based on a depth dual-resolution network, which realizes the rapid and accurate segmentation of full-3D abdominal organs. As shown in fig. 1, the method comprises the steps of:

and obtaining an ROI (region of interest) image according to the ROI area obtained by the rough segmentation, zooming the ROI image into a preset size, and inputting the preset size into a fine segmentation network to obtain a segmentation result.

In a preferred embodiment of the present invention, the method for preprocessing the data set and the original image is as follows:

the total number of the self-built data sets is more than 1000, and in order to ensure that data obtained by the network model is displayed smoothly in three dimensions, part of bad data needs to be screened. Removing data with the spacing (spacing) layer number of all z-axis directions being at least 5% of all layer numbers, and removing data with the z-direction image layer distance being higher than 3 to obtain an experimental data set;

dividing an experimental data set into a training set and a testing set according to the proportion of 8:2, adjusting (resize) an image to a preset size before training a rough segmentation network and a fine segmentation network, and carrying out standardization and normalization treatment:

where μ and σ are the mean and variance of the original image data, x ^* And x. As before the coarse segmentation network model training, resize the image to (160 × 160 × 160) size; scaling the cropped ROI area to a large size (192 × 192) before the fine segmentation model trainingX 192) as input data for the fine division network.

In a preferred embodiment of the present invention, the method for enhancing random data of a data set and an original image is as follows:

carrying out space geometric transformation and pixel transformation on the data set and the original image, presetting a probability value corresponding to the transformation, namely, carrying out random data enhancement on the preprocessed data, and setting a probability value for each enhancement method; for example: the turnover change probability is 0.5, so that one or more data are randomly adopted for enhancement;

image rotation, deformation, mirror image and the like are used in the geometric transformation, the number of training samples is increased by carrying out spatial geometric transformation on the data set, the generalization capability of the model is improved, meanwhile, the model can learn the characteristics of different samples more easily, the translation invariance of the model is enhanced, the target position of the enhanced image can be captured, and overfitting is avoided; methods such as contrast enhancement, Gaussian blur and brightness enhancement are used in the pixel transformation, and pixel masking points of different sizes filled with different values are generated through the transformation and then mixed with the original image to disturb some characteristics of the original image. These transformations may bring about a regularization effect, enhancing the robustness of the model, and reducing the sensitivity of the decision to the model parameters.

In a preferred embodiment of the present invention, the method for training the coarse segmentation network comprises the following steps:

reducing the image of the data set to a preset size (such as [128 × 128 × 128] or [160 × 160 × 160]), so as to accelerate the training speed, and inputting a rough segmentation network, wherein the rough segmentation network is a 3DUnet network, and the model is simple, is used for learning the global information of the image, and is beneficial to the reasoning of fine segmentation to find a proper ROI (region of interest);

and (4) performing cropping (crop) and adjusting (resize) according to the ROI region of the rough segmentation network to obtain an ROI image with a required size (such as [192 × 192 × 192], wherein the actual size can be determined according to a specific task of a data set), and inputting the ROI image into the fine segmentation network.

In a preferred embodiment of the present invention, the method for training the fine segmentation network comprises the following steps:

the precise segmentation network needs to divide the image more accurately, a method based on a depth dual-resolution branch network (DRUnnet) is used in the stage, a classic Encoder-Decoder framework is adopted in the whole framework, during training, Ground truth label data is used for cutting an ROI (region of interest) region of an original image, and then the ROI region is zoomed to [192,192,192] to be sent into a model, so that the precise segmentation model can learn local context information more easily and precisely.

As shown in fig. 2, a deep dual resolution finger network is built: the Encoder (Encoder) realizes the feature extraction by stacking and independent downsampling of a feature extraction module, the convolution block adopts a modified 3D residual convolution module (as shown in FIG. 3), the convolution block is composed of two identical modified residual module stacks, the 3D residual convolution module only uses normalization (Instance Norm) and Relu activation functions once in the whole residual structure, meanwhile, the size of each convolution kernel is 3 x 3, the step size is 1, and the size and the channel number of the feature map are not changed. The down-sampling module is composed of a convolution block with normalization and step length of 2, and the feature map is reduced to 1/2, and the number of channels is 2 times of the input number. The feature map is firstly processed by a down-sampling module and then processed by a feature extraction module, the input feature map is processed by a convolution block consisting of convolution with the step length of 1 and Instance Norm, and then processed by a convolution block consisting of convolution with the step length of 1 and Relu, and finally added with the initial input to be used as the input of the next residual module. Ith high resolution feature map X _Hi And low resolution feature map X _Li Comprises the following steps:

wherein, F _H And F _L Corresponding high-resolution and low-resolution residual basic block sequences, T _L-H And T _H-L Represents the low-to-high and high-to-low conversion functions, respectively, and R represents the Relu function;

dual resolution branch feature fusion: as shown in fig. 4, in order to avoid the loss of image detail information caused by continuous downsampling, the third stage of the encoder part performs dual-branch feature extraction, the low resolution branch continues downsampling through an improved 3D residual convolution module and an independent downsampling module, the number of channels is respectively set to 16, 32, 64, 128 and 256, and more deep features and semantic information are acquired; the high-resolution branch carries out down-sampling through a convolution module with the convolution kernel size of 3 multiplied by 3 and the step length of 1, extracts features, keeps the size and the channel number of a feature map unchanged, carries out multi-time bilateral feature fusion between the two branches at different stages, and fully fuses spatial information and semantic information; for a high resolution to low resolution fusion, the high resolution feature map is downsampled by a 3 × 3 × 3 convolution sequence of step size 2 before summing point-by-point; the low resolution is fused with the high resolution, the low resolution characteristic mapping is firstly compressed by convolution of 1 multiplied by 1, and then is up-sampled by trilinear interpolation;

capturing anisotropy and context information present in the abdominal scene using an anisotropic pyramid pooling module: as shown in FIG. 5, ConINReLU represents that the module is composed of convolution, normalization and activation functions; avg-pool represents average pooling; ConINUpsample represents that the module consists of convolution, normalization and upsampling; wherein 1 x 1, 1 x 12, etc. represent the size of the convolution or pooling kernel; relu stands for activation function; convin represents the module consisting of convolution, normalization. Before point-by-point summation of the double resolution branches, inputting the low resolution branches into an anisotropic pyramid pooling module, wherein the anisotropic pyramid pooling module comprises anisotropic strip pooling and standard space pooling, and the anisotropic strip pooling captures anisotropy and context information existing in an abdominal scene so as to capture the spatial relationship among multiple organs, and the standard space pooling fuses the multi-scale features;

finally, the low-resolution branch passes through an anisotropic pyramid pool and then is summed point by point with the high-resolution branch and sent to a Decoder (Decoder) module. The output of anisotropic stripe pooling and standard space pooling is combined, a decoder module separates a standard 3D convolution with a kernel size of 3 x 3 into a convolution with a kernel size of 3 x 1 and a convolution between 1 x 3 slices, the up-sampling of a feature map is realized by using trilinear interpolation, the semantic feature information extracted by an encoder is restored to the original image size through continuous up-sampling, and the classification task of corresponding pixel points is completed to obtain the final output.

More preferably, the anisotropic pyramid pooling module captures anisotropy and context information existing in the abdominal scene, and the input feature map is respectively sent to anisotropic strip pooling and standard space pooling after passing through two 1 × 1 × 1 convolution modules;

In a preferred embodiment of the present invention, the two-stage full 3D abdominal organ segmentation method further includes an injury function and a depth supervision strategy:

the mixing loss function is then:

Loss(y,p)＝DiceLoss(y,p)

in order to accelerate the convergence of the network, a deep supervision strategy is adopted, and auxiliary loss is added to the third stage of the high-resolution branch, so that the overall loss function of the network is as follows:

Loss _total ＝Loss _main (y,p)+λ ₁ Loss _aux1 (y,p)

wherein Loss _main 、Loss _aux1 Respectively representing main loss and auxiliary loss of a third stage; lambda [ alpha ] ₁ Is a loss weight, and can be set to 0.5.

In a preferred embodiment of the present invention, the two-stage full 3D abdominal organ segmentation method further includes a method for evaluating the precision of the fine segmentation network segmentation:

if set x ∈ R ^d×w×h And y ∈ R ^d×w×h Respectively representing an original image and a corresponding manual labeling result thereof, and d, w and h respectively represent the height, length and width of the image; p ═ f (θ, x) denotes the segmentation model, where θ denotes the network parameters, p denotes the probability map of the prediction results, and M is an abbreviation for the model.

Represents the predicted result:

randomly selecting 200 sets as training sets and 40 sets as testing sets;

optimizing the fine segmentation network by adopting an Adam optimizer (Adam is a first-order optimization algorithm capable of replacing the traditional stochastic gradient descent process and can iteratively update the weight of the neural network based on training data), wherein the Dropout rate is set to be 0.2, the learning rate is set to be 0.01, the batch is set to be 1, and the maximum iteration number is 200 epochs (1 epoch refers to one time of training with all samples in the training set);

saving a weight file generated in the round of the result with the highest average value of DSC (pulse similarity Coefficient, which is a set similarity measure index) and NSC (Normalized Surface Dice) on the verification set, and terminating the training of the network when the maximum iteration number is reached;

wherein the content of the first and second substances,

and

representing the number of voxels of the segmentation;

and

an overlap of boundary regions representing voxel and label surface distances of the prediction; the parameters are obtained through data statistics, the parameters with the band G are obtained through label statistics, and the parameters with the band S are obtained through prediction result data statistics.

The invention also provides a two-stage full-3D abdominal organ segmentation system based on the depth dual-resolution network, which comprises a data acquisition unit and a processing unit, wherein the data acquisition unit is used for acquiring data sets and original images, the output end of the data acquisition unit is electrically connected with the input end of the processing unit, and the processing unit executes the method of the invention to complete full-3D abdominal organ segmentation.

The scheme uses a two-stage method from coarse to fine, and solves the problems that the full 3D abdomen segmentation task is long in period and time-consuming in training. Aiming at the characteristics of data, a random data enhancement method is used, and the generalization capability of the model is improved. In a fine-segmented network, the convolution module realizes less activated and normalized residual modules, and avoids loss of feature details. Meanwhile, in order to avoid losing more detail information in the down-sampling process, a double-resolution branch and cross fusion method is used, the low-resolution branch supplements detail information, and the high-resolution branch supplements semantic information. And an anisotropic pyramid pooling module is used in a low-resolution branch, so that more spatial information can be captured and multi-scale feature fusion can be realized. Meanwhile, as the depth of the network increases, in order to solve the problems of gradient disappearance and gradient explosion and too slow convergence, deep supervision is applied to a decoder part.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A two-stage full-3D abdominal organ segmentation method based on a depth dual-resolution network is characterized by comprising the following steps:

and obtaining an ROI area image according to the ROI area obtained by rough segmentation, zooming the ROI area image into a preset size, and inputting the ROI area image into a fine segmentation network to obtain a segmentation result.

2. The two-stage full-3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for preprocessing the data set and the original image is as follows:

removing data with the number of spacing layers in all z-axis directions being at least 5% of all layers, and removing data with the z-direction image interlayer spacing being higher than 3 to obtain an experimental data set;

wherein, mu and sigma are the mean and variance of the original image data, x and x ^* Respectively, the pixel values of the original image and the preprocessed pixel values.

3. The two-stage full-3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for performing random data enhancement on the data set and the original image is as follows:

the number of training samples is increased through space geometric transformation, pixel masking points with different sizes filled with different numerical values are generated through pixel transformation and then are mixed with the original image, and therefore partial characteristics of the original image are disturbed.

4. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for training the coarse segmentation network is as follows:

5. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for training the fine segmentation network is as follows:

establishing a deep dual-resolution branch network: the encoder performs down-sampling and feature extraction on the image through an improved 3D residual convolution module and a down-sampling module, namely an ith high-resolution feature map X _Hi And low resolution feature map X _Li Comprises the following steps:

dual resolution branch feature fusion: performing double-branch feature extraction at the third stage of the encoder part, and continuously performing down-sampling on a low-resolution branch to acquire more deep features and semantic information; the high-resolution branches are used for extracting features and keeping the size and the channel number of the feature graph unchanged, and two branches are used for carrying out multi-time bilateral feature fusion at different stages to fully fuse spatial information and semantic information;

6. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 5, wherein the two branches are subjected to multiple bilateral feature fusions at different stages, fusing spatial information and semantic information, and for the fusion from high resolution to low resolution, the high resolution feature mapping is downsampled by a convolution sequence of 3 x 3 with step size of 2 before point-by-point summation;

7. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 5, wherein the anisotropic pyramid pooling module captures the anisotropy and context information existing in the abdominal scene, and the input feature map is fed into the anisotropic stripe pooling and the standard space pooling respectively after passing through two 1 x 1 convolution modules;

combining outputs of anisotropic strip pooling and standard space pooling, passing through a convolution module with the size of 1 × 1 × 1 together, adding input features, and passing through a Relu function to obtain final output.

8. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network of claim 5, further comprising an impairment function and a deep surveillance strategy:

the mixing loss function is then:

Loss(y,p)＝DiceLoss(y,p)

Loss _total ＝Loss _main (y,p)+λ ₁ Loss _aux1 (y,p)

wherein Loss _main 、Loss _aux1 Respectively representing the main loss and the auxiliary loss of a third stage; lambda [ alpha ] ₁ Is the loss of weight.

9. The two-stage full-3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, further comprising an evaluation method for the segmentation accuracy of the fine segmentation network:

saving a weight file generated in the turn of the result with the highest average value of the DSC and the NSC on the verification set, and terminating the training of the network when the maximum iteration times is reached;

wherein the content of the first and second substances,

and

representing the number of voxels of the segmentation;

and

the overlap of the boundary region representing the voxel of the label and the predictor surface distance,

voxel and label surface distance representing predictionThe overlapping portion of the boundary region of (a);

10. A two-stage full 3D abdominal organ segmentation system based on a depth dual resolution network, comprising a data acquisition unit for acquiring data sets and raw images, and a processing unit connected to an output of the data acquisition unit and to an input of the processing unit, wherein the processing unit performs the method according to any one of claims 1 to 9 to complete full 3D abdominal organ segmentation.