CN114998307A - Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network - Google Patents

Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network Download PDF

Info

Publication number
CN114998307A
CN114998307A CN202210796459.1A CN202210796459A CN114998307A CN 114998307 A CN114998307 A CN 114998307A CN 202210796459 A CN202210796459 A CN 202210796459A CN 114998307 A CN114998307 A CN 114998307A
Authority
CN
China
Prior art keywords
network
resolution
segmentation
pooling
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210796459.1A
Other languages
Chinese (zh)
Inventor
文静
尹浩
王翊
张毅
杨维斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210796459.1A priority Critical patent/CN114998307A/en
Publication of CN114998307A publication Critical patent/CN114998307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Molecular Biology (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of medical image segmentation, and particularly discloses a two-stage full-3D abdominal organ segmentation method and a two-stage full-3D abdominal organ segmentation system based on a dual-resolution network. By adopting the technical scheme, the segmentation of the full 3D abdominal organ is realized by using a two-stage method.

Description

Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network
Technical Field
The invention belongs to the technical field of medical image segmentation, and relates to a two-stage full-3D abdominal organ segmentation method and system based on a dual-resolution network.
Background
With the rapid development of modern society, the pace of life of people is accelerated, and health problems become more and more important. Meanwhile, the development of modern medical technology is changing day by day, and the auxiliary diagnosis using medical image processing technology has become a very important treatment method in medical units.
In the traditional method, the standard of medical image segmentation is manually completed by a doctor with medical experience for many years, which is time-consuming and labor-consuming, and needs the doctor to check a large number of slice images one by one and to outline a volume image. In addition, the manual marking results have the defects of low repeatability and strong subjectivity, and the results of respective manual segmentation are different due to the difference of clinical treatment of different doctors.
Since the 21 st century, the image segmentation technology based on deep learning is widely applied in the medical field, but since the abdominal medical image has the problems of serious inter-class imbalance and unobvious difference among organs, some organs which are not easy to segment have the characteristics of small shape and large deformation, and the deformation among different cases is particularly large, and the boundary is not clear, so that the image model segmentation effect is usually poor. For example: the precision of the pancreas is always below 90%, and the liver, spleen, left kidney and right kidney near the pancreas can reach 33% to 95% or so. Meanwhile, the single set of data of the 3D abdomen medical image has large information amount and large byte number, so that the training process of the image segmentation model is very long, and the high efficiency and timeliness of the abdomen medical intelligent diagnosis are greatly reduced. Therefore, a high-efficiency and high-precision automatic segmentation processing method is urgently needed.
In recent years, a deep learning theoretical method is introduced into intelligent automatic diagnosis of various medical images, and the deep learning-based method becomes a mainstream research method by virtue of strong feature expression capability. Many existing medical image detection methods are based on the Unet (semantic segmentation) model and improved on the basis of the Unet model, for example: the method comprises the steps of detecting the features of the image, analyzing the features of the image, and analyzing the images to obtain a structural structure of Dense and Res, and analyzing the structural structure of the image to obtain a characteristic pyramid and a characteristic pyramid.
Although these methods and strategies effectively improve the segmentation effect, the following problems still exist: 1) the improvement effect is not obvious, and the performance is hardly improved obviously in practical application; 2) the quantity of parameters of the network is large, a large number of data sets are needed for training, but the quantity of available medical data is small, and meanwhile, the problem of overfitting also exists; 3) after the model is downsampled for many times, the detail information of the image is seriously lost, and the influence on the precision is large; 4) the use of too many activation and normalization layers by the model may have adverse effects, such as loss of feature details.
Furthermore, we found by further analysis of the 3D abdominal organ segmentation study that the following problems also exist: 1) the difficulty of sample collection is high, the quality levels of samples are not uniform, some samples with poor quality exist, and the requirement on the generalization of the network is high; 2) there are certain difficulties with the detection of small organs; 3) the prediction precision requirement is high. 4) The single-stage reasoning with sliding windows is costly and loses three-dimensional background.
Disclosure of Invention
The invention aims to provide a two-stage full-3D abdominal organ segmentation method and a two-stage full-3D abdominal organ segmentation system based on a double-resolution network, which are used for realizing the segmentation of a full-3D abdominal organ and shortening the segmentation task period.
In order to achieve the purpose, the basic scheme of the invention is as follows: a two-stage full-3D abdominal organ segmentation method based on a depth dual-resolution network comprises the following steps:
acquiring a data set and an original image, and preprocessing the data set and the original image;
carrying out random data enhancement on the preprocessed data set and the original image;
training a rough segmentation network and a fine segmentation network according to the data set;
zooming the original image after data enhancement to a preset size, inputting the image into a rough segmentation network, and performing abdominal organ segmentation;
and obtaining an ROI image according to the ROI obtained by rough segmentation, zooming the ROI image to a preset size, and inputting the ROI image into a fine segmentation network to obtain a segmentation result.
The working principle and the beneficial effects of the basic scheme are as follows: corresponding data are obtained and preprocessed and random data enhancement is carried out, so that the generalization capability of the model can be improved, and the subsequent use of the data is facilitated. By using a two-stage method from coarse to fine, the problems that the full-3D abdomen segmentation task period is long and the training is time-consuming are solved, and the abdominal organs can be segmented quickly and accurately.
Further, the method for preprocessing the data set and the original image is as follows:
removing data with the number of spacing layers in all z-axis directions being at least 5% of all the number of layers, and removing data with the z-direction image layer spacing being higher than 3 to obtain an experimental data set;
dividing an experimental data set into a training set and a testing set according to the proportion of 8:2, adjusting an image to a preset size before training a rough segmentation network and a fine segmentation network, and carrying out standardization and normalization treatment:
Figure BDA0003732271760000031
wherein, mu and sigma are respectively mean value and variance of original image data, x and x * Respectively, the pixel value of the original image and the preprocessed pixel value.
The method is simple to operate and beneficial to use, screens data and ensures that the data obtained by the network model is displayed smoothly in three dimensions.
Further, the method for enhancing random data of the data set and the original image is as follows:
carrying out space geometric transformation and pixel transformation on the data set and the original image, and presetting a probability value corresponding to the transformation;
the number of training samples is increased through space geometric transformation, pixel masking points with different sizes filled with different values are generated through pixel transformation, and then the pixel masking points are mixed with original images to disturb partial characteristics of the original images.
The image is enhanced by using a random enhancement mode, so that the regularization effect can be brought, the robustness of the model is enhanced, and the sensitivity of decision-making on model parameters is reduced.
Further, the method for training the rough segmentation network is as follows:
reducing the image of the data set to a preset size, and inputting the image into a rough segmentation network, wherein the rough segmentation network is a 3DUnet network;
and cutting and adjusting according to the ROI area of the rough segmentation network to obtain an ROI image with a required size and inputting the ROI image into the fine segmentation network.
The image is reduced to a preset size, the training speed can be accelerated, the obtained ROI image is input into the fine segmentation network through the coarse segmentation network, and the fine segmentation network can be conveniently and accurately segmented.
Further, the method for training the fine segmentation network is as follows:
establishing a deep dual-resolution branch network: the encoder performs down-sampling and feature extraction on the image through an improved 3D residual convolution module and a down-sampling module, namely an ith high-resolution feature map X Hi And low resolution featuresSign diagram X Li Comprises the following steps:
Figure BDA0003732271760000041
wherein, F H And F L Corresponding high-resolution and low-resolution residual basic block sequences, T L-H And T H-L Represents the low-to-high and high-to-low conversion functions, respectively, and R represents the Relu activation function;
dual resolution branch feature fusion: performing double-branch feature extraction at the third stage of the encoder part, and continuously performing down-sampling on the low-resolution branch to acquire more deep features and semantic information; the high-resolution branches are used for extracting features and keeping the size and the channel number of the feature map unchanged, and two branches are used for carrying out multi-time bilateral feature fusion at different stages to fully fuse spatial information and semantic information;
capturing anisotropy and context information present in the abdominal scene using an anisotropic pyramid pooling module: before point-by-point summation of the double resolution branches, inputting the low resolution branches into an anisotropic pyramid pooling module, wherein the anisotropic pyramid pooling module comprises anisotropic strip pooling and standard space pooling, and the anisotropic strip pooling captures anisotropy and context information existing in an abdominal scene so as to capture the spatial relationship among multiple organs, and the standard space pooling fuses the multi-scale features;
combining the output of the anisotropic strip pooling and the standard space pooling, restoring the semantic feature information extracted by the encoder to the original image size through continuous upsampling, and completing the classification task of the corresponding pixel points to obtain the final output.
In a fine-segmented network, loss of feature details is avoided with fewer activated and normalized residual modules. By using the dual-resolution branch and the cross fusion method, more detail information is prevented from being lost in the down-sampling process, the low-resolution branch supplements detail information, and the high-resolution branch supplements semantic information. And an anisotropic pyramid pooling module is used in a low-resolution branch, so that more spatial information can be captured and multi-scale feature fusion can be realized.
Further, the two branches are subjected to bilateral feature fusion for multiple times at different stages, space information and semantic information are fused, and for the fusion from high resolution to low resolution, high resolution feature mapping is subjected to down-sampling through a 3 multiplied by 3 convolution sequence with the step length of 2 before point-by-point summation;
low resolution to high resolution fusion, low resolution feature mapping is first compressed by a 1 × 1 × 1 convolution, and then upsampled by trilinear interpolation.
The method is simple to operate and beneficial to use, and loss of image detail information caused by continuous downsampling is avoided.
Further, an anisotropic pyramid pooling module captures anisotropy and context information existing in an abdominal scene, and an input feature map is respectively sent to anisotropic strip pooling and standard space pooling after passing through two 1 × 1 × 1 convolution modules;
1 XNx N, N x1 xN and NxNx1 are subjected to inter-slice convolution and upsampling by 3 X1 x 1, 1 X3 x 1 and 1 X1 x 3 slices, and finally added together and sent into a convolution module, wherein the convolution module can capture the anisotropy and context information existing in an abdominal scene, so that the spatial relationship among multiple organs is captured;
the standard space pooling adopts two average pooling, the step length is respectively 2 multiplied by 2 and 4 multiplied by 4, and the fusion of multi-scale characteristics is realized through inter-slice convolution and up-sampling which are the same as strip pooling and finally fusion with residual error branches;
combining the outputs of the anisotropic strip pooling and the standard space pooling, and obtaining the final output after the outputs are passed through a convolution module with the size of 1 multiplied by 1, the input characteristics and a Relu activation function.
Simple structure and convenient operation.
Further, the method also comprises an injury function and a depth supervision strategy:
the Dice Loss is used as a Loss function of the network, and the computation of the Dice Loss is as follows:
Figure BDA0003732271760000061
the mixing loss function is then:
Loss(y,p)=DiceLoss(y,p)
wherein y represents a label of an original image, and p represents a prediction result of the network model;
and (3) adding auxiliary loss on the third stage of the high-resolution branch by adopting a deep supervision strategy, wherein the overall loss function of the network is as follows:
Loss total =Loss main (y,p)+λ 1 Loss aux1 (y,p)
wherein Loss main 、Loss aux1 Respectively representing main loss and auxiliary loss of a third stage; lambda [ alpha ] 1 Is losing weight.
And a deep supervision strategy is used, so that gradient explosion and gradient disappearance are avoided, and the problem of slow convergence is solved.
Further, the method for evaluating the segmentation precision of the fine segmentation network comprises the following steps:
optimizing the fine segmentation network by adopting an Adam optimizer, setting Dropout rate to be 0.2, setting learning rate to be 0.01, setting batch to be 1 and setting the maximum iteration number to be 200 epochs;
saving a weight file generated in the round of the result with the highest average value of DSC and NSC on the verification set, and terminating the training of the network when the maximum iteration number is reached;
wherein the content of the first and second substances,
Figure BDA0003732271760000071
g is a label of an original image, and S is a prediction result of the network model;
Figure BDA0003732271760000072
g is a label of an original image, and S is a prediction result of the network model;
Figure BDA0003732271760000073
and
Figure BDA0003732271760000074
representing the number of voxels of the segmentation;
Figure BDA0003732271760000075
and
Figure BDA0003732271760000076
respectively representing the boundary regions of the label and the distance of the segmentation surface under the tolerance tau;
Figure BDA0003732271760000077
the overlap of the boundary region representing the distance between the voxel of the label and the predictor surface,
Figure BDA0003732271760000078
an overlap of boundary regions representing voxel and label surface distances of the prediction;
the accuracy of the network segmentation was assessed on the test set using DSC and NSC indices.
The evaluation of the segmentation precision of the network is beneficial to analyzing whether the segmentation result output by the network is accurate or not and simultaneously is convenient for subsequently optimizing the network,
the invention also provides a two-stage full-3D abdominal organ segmentation system based on the depth dual-resolution network, which comprises a data acquisition unit and a processing unit, wherein the data acquisition unit is used for acquiring data sets and original images, the output end of the data acquisition unit is connected with the input end of the processing unit, and the processing unit executes the method of the invention to complete full-3D abdominal organ segmentation.
The two-stage full-3D abdominal organ segmentation method based on the depth dual-resolution network is utilized to realize the segmentation of full-3D abdominal organs, and the method is simple to operate and beneficial to use.
Drawings
FIG. 1 is a schematic flow chart of a two-stage full 3D abdominal organ segmentation method based on a deep dual resolution network according to the present invention;
FIG. 2 is a schematic diagram of a deep dual resolution branch network structure of the two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network according to the present invention;
FIG. 3 is a schematic structural diagram of a 3D residual convolution module of the two-stage full 3D abdominal organ segmentation method based on a depth dual resolution network according to the present invention;
FIG. 4 is a schematic diagram of a dual-branch feature fusion structure of the two-stage full 3D abdominal organ segmentation method based on a deep dual resolution network according to the present invention;
FIG. 5 is a schematic structural diagram of an anisotropic pyramid pooling module of the two-stage full 3D abdominal organ segmentation method based on a deep dual resolution network according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.
The invention discloses a two-stage full-3D abdominal organ segmentation method based on a depth dual-resolution network, which realizes the rapid and accurate segmentation of full-3D abdominal organs. As shown in fig. 1, the method comprises the steps of:
acquiring a data set and an original image, and preprocessing the data set and the original image;
carrying out random data enhancement on the preprocessed data set and the original image;
training a rough segmentation network and a fine segmentation network according to the data set;
zooming the original image after data enhancement to a preset size, inputting the image into a rough segmentation network, and performing abdominal organ segmentation;
and obtaining an ROI (region of interest) image according to the ROI area obtained by the rough segmentation, zooming the ROI image into a preset size, and inputting the preset size into a fine segmentation network to obtain a segmentation result.
In a preferred embodiment of the present invention, the method for preprocessing the data set and the original image is as follows:
the total number of the self-built data sets is more than 1000, and in order to ensure that data obtained by the network model is displayed smoothly in three dimensions, part of bad data needs to be screened. Removing data with the spacing (spacing) layer number of all z-axis directions being at least 5% of all layer numbers, and removing data with the z-direction image layer distance being higher than 3 to obtain an experimental data set;
dividing an experimental data set into a training set and a testing set according to the proportion of 8:2, adjusting (resize) an image to a preset size before training a rough segmentation network and a fine segmentation network, and carrying out standardization and normalization treatment:
Figure BDA0003732271760000101
where μ and σ are the mean and variance of the original image data, x * And x. As before the coarse segmentation network model training, resize the image to (160 × 160 × 160) size; scaling the cropped ROI area to a large size (192 × 192) before the fine segmentation model trainingX 192) as input data for the fine division network.
In a preferred embodiment of the present invention, the method for enhancing random data of a data set and an original image is as follows:
carrying out space geometric transformation and pixel transformation on the data set and the original image, presetting a probability value corresponding to the transformation, namely, carrying out random data enhancement on the preprocessed data, and setting a probability value for each enhancement method; for example: the turnover change probability is 0.5, so that one or more data are randomly adopted for enhancement;
image rotation, deformation, mirror image and the like are used in the geometric transformation, the number of training samples is increased by carrying out spatial geometric transformation on the data set, the generalization capability of the model is improved, meanwhile, the model can learn the characteristics of different samples more easily, the translation invariance of the model is enhanced, the target position of the enhanced image can be captured, and overfitting is avoided; methods such as contrast enhancement, Gaussian blur and brightness enhancement are used in the pixel transformation, and pixel masking points of different sizes filled with different values are generated through the transformation and then mixed with the original image to disturb some characteristics of the original image. These transformations may bring about a regularization effect, enhancing the robustness of the model, and reducing the sensitivity of the decision to the model parameters.
In a preferred embodiment of the present invention, the method for training the coarse segmentation network comprises the following steps:
reducing the image of the data set to a preset size (such as [128 × 128 × 128] or [160 × 160 × 160]), so as to accelerate the training speed, and inputting a rough segmentation network, wherein the rough segmentation network is a 3DUnet network, and the model is simple, is used for learning the global information of the image, and is beneficial to the reasoning of fine segmentation to find a proper ROI (region of interest);
and (4) performing cropping (crop) and adjusting (resize) according to the ROI region of the rough segmentation network to obtain an ROI image with a required size (such as [192 × 192 × 192], wherein the actual size can be determined according to a specific task of a data set), and inputting the ROI image into the fine segmentation network.
In a preferred embodiment of the present invention, the method for training the fine segmentation network comprises the following steps:
the precise segmentation network needs to divide the image more accurately, a method based on a depth dual-resolution branch network (DRUnnet) is used in the stage, a classic Encoder-Decoder framework is adopted in the whole framework, during training, Ground truth label data is used for cutting an ROI (region of interest) region of an original image, and then the ROI region is zoomed to [192,192,192] to be sent into a model, so that the precise segmentation model can learn local context information more easily and precisely.
As shown in fig. 2, a deep dual resolution finger network is built: the Encoder (Encoder) realizes the feature extraction by stacking and independent downsampling of a feature extraction module, the convolution block adopts a modified 3D residual convolution module (as shown in FIG. 3), the convolution block is composed of two identical modified residual module stacks, the 3D residual convolution module only uses normalization (Instance Norm) and Relu activation functions once in the whole residual structure, meanwhile, the size of each convolution kernel is 3 x 3, the step size is 1, and the size and the channel number of the feature map are not changed. The down-sampling module is composed of a convolution block with normalization and step length of 2, and the feature map is reduced to 1/2, and the number of channels is 2 times of the input number. The feature map is firstly processed by a down-sampling module and then processed by a feature extraction module, the input feature map is processed by a convolution block consisting of convolution with the step length of 1 and Instance Norm, and then processed by a convolution block consisting of convolution with the step length of 1 and Relu, and finally added with the initial input to be used as the input of the next residual module. Ith high resolution feature map X Hi And low resolution feature map X Li Comprises the following steps:
Figure BDA0003732271760000121
wherein, F H And F L Corresponding high-resolution and low-resolution residual basic block sequences, T L-H And T H-L Represents the low-to-high and high-to-low conversion functions, respectively, and R represents the Relu function;
dual resolution branch feature fusion: as shown in fig. 4, in order to avoid the loss of image detail information caused by continuous downsampling, the third stage of the encoder part performs dual-branch feature extraction, the low resolution branch continues downsampling through an improved 3D residual convolution module and an independent downsampling module, the number of channels is respectively set to 16, 32, 64, 128 and 256, and more deep features and semantic information are acquired; the high-resolution branch carries out down-sampling through a convolution module with the convolution kernel size of 3 multiplied by 3 and the step length of 1, extracts features, keeps the size and the channel number of a feature map unchanged, carries out multi-time bilateral feature fusion between the two branches at different stages, and fully fuses spatial information and semantic information; for a high resolution to low resolution fusion, the high resolution feature map is downsampled by a 3 × 3 × 3 convolution sequence of step size 2 before summing point-by-point; the low resolution is fused with the high resolution, the low resolution characteristic mapping is firstly compressed by convolution of 1 multiplied by 1, and then is up-sampled by trilinear interpolation;
capturing anisotropy and context information present in the abdominal scene using an anisotropic pyramid pooling module: as shown in FIG. 5, ConINReLU represents that the module is composed of convolution, normalization and activation functions; avg-pool represents average pooling; ConINUpsample represents that the module consists of convolution, normalization and upsampling; wherein 1 x 1, 1 x 12, etc. represent the size of the convolution or pooling kernel; relu stands for activation function; convin represents the module consisting of convolution, normalization. Before point-by-point summation of the double resolution branches, inputting the low resolution branches into an anisotropic pyramid pooling module, wherein the anisotropic pyramid pooling module comprises anisotropic strip pooling and standard space pooling, and the anisotropic strip pooling captures anisotropy and context information existing in an abdominal scene so as to capture the spatial relationship among multiple organs, and the standard space pooling fuses the multi-scale features;
finally, the low-resolution branch passes through an anisotropic pyramid pool and then is summed point by point with the high-resolution branch and sent to a Decoder (Decoder) module. The output of anisotropic stripe pooling and standard space pooling is combined, a decoder module separates a standard 3D convolution with a kernel size of 3 x 3 into a convolution with a kernel size of 3 x 1 and a convolution between 1 x 3 slices, the up-sampling of a feature map is realized by using trilinear interpolation, the semantic feature information extracted by an encoder is restored to the original image size through continuous up-sampling, and the classification task of corresponding pixel points is completed to obtain the final output.
More preferably, the anisotropic pyramid pooling module captures anisotropy and context information existing in the abdominal scene, and the input feature map is respectively sent to anisotropic strip pooling and standard space pooling after passing through two 1 × 1 × 1 convolution modules;
1 XNx N, N x1 xN and NxNx1 are subjected to inter-slice convolution and upsampling by 3 X1 x 1, 1 X3 x 1 and 1 X1 x 3 slices, and finally added together and sent into a convolution module, wherein the convolution module can capture the anisotropy and context information existing in an abdominal scene, so that the spatial relationship among multiple organs is captured;
the standard space pooling adopts two average pooling, the step length is respectively 2 multiplied by 2 and 4 multiplied by 4, and the fusion of multi-scale characteristics is realized through inter-slice convolution and up-sampling which are the same as strip pooling and finally fusion with residual error branches;
combining the outputs of the anisotropic strip pooling and the standard space pooling, and obtaining the final output after the outputs are passed through a convolution module with the size of 1 multiplied by 1, the input characteristics and a Relu activation function.
In a preferred embodiment of the present invention, the two-stage full 3D abdominal organ segmentation method further includes an injury function and a depth supervision strategy:
the Dice Loss is used as a Loss function of the network, and the computation of the Dice Loss is as follows:
Figure BDA0003732271760000141
the mixing loss function is then:
Loss(y,p)=DiceLoss(y,p)
wherein y represents a label of an original image, and p represents a prediction result of the network model;
in order to accelerate the convergence of the network, a deep supervision strategy is adopted, and auxiliary loss is added to the third stage of the high-resolution branch, so that the overall loss function of the network is as follows:
Loss total =Loss main (y,p)+λ 1 Loss aux1 (y,p)
wherein Loss main 、Loss aux1 Respectively representing main loss and auxiliary loss of a third stage; lambda [ alpha ] 1 Is a loss weight, and can be set to 0.5.
In a preferred embodiment of the present invention, the two-stage full 3D abdominal organ segmentation method further includes a method for evaluating the precision of the fine segmentation network segmentation:
if set x ∈ R d×w×h And y ∈ R d×w×h Respectively representing an original image and a corresponding manual labeling result thereof, and d, w and h respectively represent the height, length and width of the image; p ═ f (θ, x) denotes the segmentation model, where θ denotes the network parameters, p denotes the probability map of the prediction results, and M is an abbreviation for the model.
Figure BDA0003732271760000143
Represents the predicted result:
Figure BDA0003732271760000142
randomly selecting 200 sets as training sets and 40 sets as testing sets;
optimizing the fine segmentation network by adopting an Adam optimizer (Adam is a first-order optimization algorithm capable of replacing the traditional stochastic gradient descent process and can iteratively update the weight of the neural network based on training data), wherein the Dropout rate is set to be 0.2, the learning rate is set to be 0.01, the batch is set to be 1, and the maximum iteration number is 200 epochs (1 epoch refers to one time of training with all samples in the training set);
saving a weight file generated in the round of the result with the highest average value of DSC (pulse similarity Coefficient, which is a set similarity measure index) and NSC (Normalized Surface Dice) on the verification set, and terminating the training of the network when the maximum iteration number is reached;
wherein the content of the first and second substances,
Figure BDA0003732271760000151
g is a label of an original image, and S is a prediction result of the network model;
Figure BDA0003732271760000152
g is a label of an original image, and S is a prediction result of the network model;
Figure BDA0003732271760000153
and
Figure BDA0003732271760000154
representing the number of voxels of the segmentation;
Figure BDA0003732271760000155
and
Figure BDA0003732271760000156
respectively representing the boundary regions of the label and the distance of the segmentation surface under the tolerance tau;
Figure BDA0003732271760000157
the overlap of the boundary region representing the distance between the voxel of the label and the predictor surface,
Figure BDA0003732271760000158
an overlap of boundary regions representing voxel and label surface distances of the prediction; the parameters are obtained through data statistics, the parameters with the band G are obtained through label statistics, and the parameters with the band S are obtained through prediction result data statistics.
The accuracy of the network segmentation was assessed on the test set using DSC and NSC indices.
The invention also provides a two-stage full-3D abdominal organ segmentation system based on the depth dual-resolution network, which comprises a data acquisition unit and a processing unit, wherein the data acquisition unit is used for acquiring data sets and original images, the output end of the data acquisition unit is electrically connected with the input end of the processing unit, and the processing unit executes the method of the invention to complete full-3D abdominal organ segmentation.
The scheme uses a two-stage method from coarse to fine, and solves the problems that the full 3D abdomen segmentation task is long in period and time-consuming in training. Aiming at the characteristics of data, a random data enhancement method is used, and the generalization capability of the model is improved. In a fine-segmented network, the convolution module realizes less activated and normalized residual modules, and avoids loss of feature details. Meanwhile, in order to avoid losing more detail information in the down-sampling process, a double-resolution branch and cross fusion method is used, the low-resolution branch supplements detail information, and the high-resolution branch supplements semantic information. And an anisotropic pyramid pooling module is used in a low-resolution branch, so that more spatial information can be captured and multi-scale feature fusion can be realized. Meanwhile, as the depth of the network increases, in order to solve the problems of gradient disappearance and gradient explosion and too slow convergence, deep supervision is applied to a decoder part.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A two-stage full-3D abdominal organ segmentation method based on a depth dual-resolution network is characterized by comprising the following steps:
acquiring a data set and an original image, and preprocessing the data set and the original image;
carrying out random data enhancement on the preprocessed data set and the original image;
training a rough segmentation network and a fine segmentation network according to the data set;
zooming the original image after data enhancement to a preset size, inputting the image into a rough segmentation network, and performing abdominal organ segmentation;
and obtaining an ROI area image according to the ROI area obtained by rough segmentation, zooming the ROI area image into a preset size, and inputting the ROI area image into a fine segmentation network to obtain a segmentation result.
2. The two-stage full-3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for preprocessing the data set and the original image is as follows:
removing data with the number of spacing layers in all z-axis directions being at least 5% of all layers, and removing data with the z-direction image interlayer spacing being higher than 3 to obtain an experimental data set;
dividing an experimental data set into a training set and a testing set according to the proportion of 8:2, adjusting an image to a preset size before training a rough segmentation network and a fine segmentation network, and carrying out standardization and normalization treatment:
Figure FDA0003732271750000011
wherein, mu and sigma are the mean and variance of the original image data, x and x * Respectively, the pixel values of the original image and the preprocessed pixel values.
3. The two-stage full-3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for performing random data enhancement on the data set and the original image is as follows:
carrying out space geometric transformation and pixel transformation on the data set and the original image, and presetting a probability value corresponding to the transformation;
the number of training samples is increased through space geometric transformation, pixel masking points with different sizes filled with different numerical values are generated through pixel transformation and then are mixed with the original image, and therefore partial characteristics of the original image are disturbed.
4. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for training the coarse segmentation network is as follows:
reducing the image of the data set to a preset size, and inputting the image into a rough segmentation network, wherein the rough segmentation network is a 3DUnet network;
and cutting and adjusting according to the ROI area of the rough segmentation network to obtain an ROI image with a required size and inputting the ROI image into the fine segmentation network.
5. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, wherein the method for training the fine segmentation network is as follows:
establishing a deep dual-resolution branch network: the encoder performs down-sampling and feature extraction on the image through an improved 3D residual convolution module and a down-sampling module, namely an ith high-resolution feature map X Hi And low resolution feature map X Li Comprises the following steps:
Figure FDA0003732271750000021
wherein, F H And F L Corresponding high-resolution and low-resolution residual basic block sequences, T L-H And T H-L Represents the low-to-high and high-to-low conversion functions, respectively, and R represents the Relu activation function;
dual resolution branch feature fusion: performing double-branch feature extraction at the third stage of the encoder part, and continuously performing down-sampling on a low-resolution branch to acquire more deep features and semantic information; the high-resolution branches are used for extracting features and keeping the size and the channel number of the feature graph unchanged, and two branches are used for carrying out multi-time bilateral feature fusion at different stages to fully fuse spatial information and semantic information;
capturing anisotropy and context information present in the abdominal scene using an anisotropic pyramid pooling module: before point-by-point summation of the double resolution branches, inputting the low resolution branches into an anisotropic pyramid pooling module, wherein the anisotropic pyramid pooling module comprises anisotropic strip pooling and standard space pooling, and the anisotropic strip pooling captures anisotropy and context information existing in an abdominal scene so as to capture the spatial relationship among multiple organs, and the standard space pooling fuses the multi-scale features;
combining the output of the anisotropic strip pooling and the standard space pooling, restoring the semantic feature information extracted by the encoder to the original image size through continuous upsampling, and completing the classification task of the corresponding pixel points to obtain the final output.
6. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 5, wherein the two branches are subjected to multiple bilateral feature fusions at different stages, fusing spatial information and semantic information, and for the fusion from high resolution to low resolution, the high resolution feature mapping is downsampled by a convolution sequence of 3 x 3 with step size of 2 before point-by-point summation;
low resolution to high resolution fusion, low resolution feature mapping is first compressed by a 1 × 1 × 1 convolution, and then upsampled by trilinear interpolation.
7. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 5, wherein the anisotropic pyramid pooling module captures the anisotropy and context information existing in the abdominal scene, and the input feature map is fed into the anisotropic stripe pooling and the standard space pooling respectively after passing through two 1 x 1 convolution modules;
1 XNx N, N x1 xN and NxNx1 are subjected to inter-slice convolution and upsampling by 3 X1 x 1, 1 X3 x 1 and 1 X1 x 3 slices, and finally added together and sent into a convolution module, wherein the convolution module can capture the anisotropy and context information existing in an abdominal scene, so that the spatial relationship among multiple organs is captured;
the standard space pooling adopts two average pooling, the step length is respectively 2 multiplied by 2 and 4 multiplied by 4, and the fusion of multi-scale characteristics is realized through inter-slice convolution and up-sampling which are the same as strip pooling and finally fusion with residual error branches;
combining outputs of anisotropic strip pooling and standard space pooling, passing through a convolution module with the size of 1 × 1 × 1 together, adding input features, and passing through a Relu function to obtain final output.
8. The two-stage full 3D abdominal organ segmentation method based on the deep dual resolution network of claim 5, further comprising an impairment function and a deep surveillance strategy:
the Dice Loss is used as a Loss function of the network, and the computation of the Dice Loss is as follows:
Figure FDA0003732271750000041
the mixing loss function is then:
Loss(y,p)=DiceLoss(y,p)
wherein y represents a label of an original image, and p represents a prediction result of the network model;
and (3) adding auxiliary loss on the third stage of the high-resolution branch by adopting a deep supervision strategy, wherein the overall loss function of the network is as follows:
Loss total =Loss main (y,p)+λ 1 Loss aux1 (y,p)
wherein Loss main 、Loss aux1 Respectively representing the main loss and the auxiliary loss of a third stage; lambda [ alpha ] 1 Is the loss of weight.
9. The two-stage full-3D abdominal organ segmentation method based on the deep dual resolution network as claimed in claim 1, further comprising an evaluation method for the segmentation accuracy of the fine segmentation network:
optimizing the fine segmentation network by adopting an Adam optimizer, setting Dropout rate to be 0.2, setting learning rate to be 0.01, setting batch to be 1 and setting the maximum iteration number to be 200 epochs;
saving a weight file generated in the turn of the result with the highest average value of the DSC and the NSC on the verification set, and terminating the training of the network when the maximum iteration times is reached;
wherein the content of the first and second substances,
Figure FDA0003732271750000051
g is a label of an original image, and S is a prediction result of the network model;
Figure FDA0003732271750000052
g is a label of an original image, and S is a prediction result of the network model;
Figure FDA0003732271750000053
and
Figure FDA0003732271750000054
representing the number of voxels of the segmentation;
Figure FDA0003732271750000055
and
Figure FDA0003732271750000056
respectively representing the boundary regions of the label and the distance of the segmentation surface under the tolerance tau;
Figure FDA0003732271750000057
the overlap of the boundary region representing the voxel of the label and the predictor surface distance,
Figure FDA0003732271750000058
voxel and label surface distance representing predictionThe overlapping portion of the boundary region of (a);
the accuracy of the network segmentation was assessed on the test set using DSC and NSC indices.
10. A two-stage full 3D abdominal organ segmentation system based on a depth dual resolution network, comprising a data acquisition unit for acquiring data sets and raw images, and a processing unit connected to an output of the data acquisition unit and to an input of the processing unit, wherein the processing unit performs the method according to any one of claims 1 to 9 to complete full 3D abdominal organ segmentation.
CN202210796459.1A 2022-07-06 2022-07-06 Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network Pending CN114998307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210796459.1A CN114998307A (en) 2022-07-06 2022-07-06 Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210796459.1A CN114998307A (en) 2022-07-06 2022-07-06 Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network

Publications (1)

Publication Number Publication Date
CN114998307A true CN114998307A (en) 2022-09-02

Family

ID=83019033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210796459.1A Pending CN114998307A (en) 2022-07-06 2022-07-06 Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network

Country Status (1)

Country Link
CN (1) CN114998307A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914948A (en) * 2020-08-20 2020-11-10 上海海事大学 Ocean current machine blade attachment self-adaptive identification method based on rough and fine semantic segmentation network
CN115409990A (en) * 2022-09-28 2022-11-29 北京医准智能科技有限公司 Medical image segmentation method, device, equipment and storage medium
CN116934738A (en) * 2023-08-14 2023-10-24 威朋(苏州)医疗器械有限公司 Organ and nodule joint segmentation method and system based on ultrasonic image
CN117058727A (en) * 2023-07-18 2023-11-14 广州脉泽科技有限公司 Image enhancement-based hand vein image recognition method and device
CN117576076A (en) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 Bare soil detection method and device and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914948A (en) * 2020-08-20 2020-11-10 上海海事大学 Ocean current machine blade attachment self-adaptive identification method based on rough and fine semantic segmentation network
CN115409990A (en) * 2022-09-28 2022-11-29 北京医准智能科技有限公司 Medical image segmentation method, device, equipment and storage medium
CN117058727A (en) * 2023-07-18 2023-11-14 广州脉泽科技有限公司 Image enhancement-based hand vein image recognition method and device
CN117058727B (en) * 2023-07-18 2024-04-02 广州脉泽科技有限公司 Image enhancement-based hand vein image recognition method and device
CN116934738A (en) * 2023-08-14 2023-10-24 威朋(苏州)医疗器械有限公司 Organ and nodule joint segmentation method and system based on ultrasonic image
CN116934738B (en) * 2023-08-14 2024-03-22 威朋(苏州)医疗器械有限公司 Organ and nodule joint segmentation method and system based on ultrasonic image
CN117576076A (en) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 Bare soil detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN114998307A (en) Two-stage full-3D abdominal organ segmentation method and system based on dual-resolution network
CN111145170B (en) Medical image segmentation method based on deep learning
CN107220980B (en) A kind of MRI image brain tumor automatic division method based on full convolutional network
CN109685768B (en) Pulmonary nodule automatic detection method and system based on pulmonary CT sequence
CN107492071A (en) Medical image processing method and equipment
CN111354002A (en) Kidney and kidney tumor segmentation method based on deep neural network
CN113223072B (en) Spine Cobb angle measurement method and system
CN111951288B (en) Skin cancer lesion segmentation method based on deep learning
CN110232653A (en) The quick light-duty intensive residual error network of super-resolution rebuilding
CN111754520B (en) Deep learning-based cerebral hematoma segmentation method and system
CN113052210A (en) Fast low-illumination target detection method based on convolutional neural network
CN111402254B (en) CT image lung nodule high-performance automatic detection method and device
CN110930378B (en) Emphysema image processing method and system based on low data demand
CN112446892A (en) Cell nucleus segmentation method based on attention learning
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
CN115909006B (en) Mammary tissue image classification method and system based on convolution transducer
CN112348839A (en) Image segmentation method and system based on deep learning
CN115546466A (en) Weak supervision image target positioning method based on multi-scale significant feature fusion
CN114565601A (en) Improved liver CT image segmentation algorithm based on DeepLabV3+
CN111027440A (en) Crowd abnormal behavior detection device and method based on neural network
CN114612481A (en) Cell nucleus segmentation method based on region enhancement
CN117392153A (en) Pancreas segmentation method based on local compensation and multi-scale adaptive deformation
CN111223113A (en) Nuclear magnetic resonance hippocampus segmentation algorithm based on dual dense context-aware network
CN116542988A (en) Nodule segmentation method, nodule segmentation device, electronic equipment and storage medium
CN115294151A (en) Lung CT interested region automatic detection method based on multitask convolution model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination