WO2020078269A1 - 三维图像的语义分割方法、装置、终端及存储介质 - Google Patents
三维图像的语义分割方法、装置、终端及存储介质 Download PDFInfo
- Publication number
- WO2020078269A1 WO2020078269A1 PCT/CN2019/110562 CN2019110562W WO2020078269A1 WO 2020078269 A1 WO2020078269 A1 WO 2020078269A1 CN 2019110562 W CN2019110562 W CN 2019110562W WO 2020078269 A1 WO2020078269 A1 WO 2020078269A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dimensional
- image
- map
- distribution
- terminal
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10088—Magnetic resonance imaging [MRI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10116—X-ray image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30048—Heart; Cardiac
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30056—Liver; Hepatic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30061—Lung
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30092—Stomach; Gastric
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Definitions
- the present application relates to the field of deep learning, in particular to a method, device, terminal and storage medium for semantic segmentation of three-dimensional images.
- the deep learning model is used to semantically segment the medical image, and the image region where the human organs / tissues in the medical image are located can be obtained.
- a scene parsing network (Pyramid Scene Parsing Network, Pspnet) is used to perform semantic segmentation on two-dimensional medical images.
- Pspnet is a semantic segmentation technology based on deep learning.
- Pspnet uses a variety of different sizes of convolution kernels, multi-scale collection of feature maps (feature maps), and finally the output feature maps are interpolated and amplified to obtain semantic segmentation results.
- Pspnet is a semantic segmentation technology for two-dimensional natural images, and does not support semantic segmentation for three-dimensional medical images.
- a method for semantic segmentation of a three-dimensional image includes:
- the terminal obtains the three-dimensional image
- the terminal slices the three-dimensional image according to the three-directional plane where the three-dimensional coordinate axis is located to obtain a two-dimensional slice image on the x-axis, a two-dimensional slice image on the y-axis, and a two-dimensional slice image on the z-axis;
- the terminal invokes the first segmentation model to semantically segment the x-axis two-dimensional slice image to obtain a distribution probability map of the target object in the x-axis direction;
- the terminal invokes a second segmentation model to semantically segment the y-axis two-dimensional slice image to obtain a distribution probability map of the target object in the y-axis direction;
- the terminal invokes a third segmentation model to semantically segment the z-axis two-dimensional slice image to obtain a distribution probability map of the target object in the z-axis direction;
- the terminal invokes an adaptive fusion model to perform three-dimensional fusion on three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface, to obtain a three-dimensional distribution of the target object. Value graph.
- At least one of the first segmentation model, the second segmentation model, and the third segmentation model includes: a deep network encoding unit and a skip pass decoding unit, the deep network encoding unit includes n layers of convolution layers, the skip transfer decoding unit includes m layers of deconvolution layers, and n and m are both positive integers;
- the deep network coding unit is used for the terminal to perform down-sampling feature extraction on the two-dimensional slice image through the n-layer convolutional layer to obtain a down-sampled first intermediate feature map;
- the skip transfer decoding unit is used by the terminal to upsample the first intermediate feature map and the second intermediate feature map through the m-layer deconvolution layer to obtain the upsampled distribution probability map ;
- the second intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the terminal invokes the adaptive fusion model to combine three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional Distribution characteristic map;
- the terminal performs three-dimensional fusion convolution on the three-dimensional distribution feature map to obtain a three-dimensional segmentation probability map
- the terminal calculates a three-dimensional distribution binary map of the target object according to the maximum probability category of each pixel in the three-dimensional segmentation probability map.
- the three-dimensional image is a three-dimensional medical image.
- the terminal performs filtering processing on noise pixels in the three-dimensional distribution binary map based on clinical prior knowledge
- the clinical prior knowledge is the knowledge obtained by statistically calculating the distribution position of the target object in the three-dimensional medical image.
- the terminal filters out the first noise pixel that exceeds the target value range in the three-dimensional distribution binary map
- the target value range is a coordinate value range that the target object may appear based on the first clinical prior knowledge.
- the terminal filters out second noise pixels beyond the three-dimensional ellipsoid model in the three-dimensional distribution binary map
- the three-dimensional ellipsoid model is an ellipsoid model corresponding to the target object obtained according to the second clinical prior knowledge.
- the terminal slices the two-dimensional slice according to the square frame formed by the short side length of the two-dimensional slice image The image is divided into scan frames to obtain several two-dimensional slice images to be processed.
- a method for semantic segmentation of a two-dimensional image includes:
- the terminal obtains the two-dimensional image
- the terminal invokes a segmentation model to semantically segment the two-dimensional image to obtain a distribution probability map of the target object;
- the terminal calculates a two-dimensional distribution binary map of the target object according to the maximum probability category of each pixel in the distribution probability map;
- the segmentation model includes: a deep network coding unit and a skip transfer decoding unit, the deep network coding unit includes an n-layer convolution layer, and the skip transfer decoding unit includes an m-layer deconvolution layer, both n and m Positive integer
- the deep network coding unit is used for the terminal to perform down-sampling feature extraction on the two-dimensional image through the n-layer convolutional layer to obtain a down-sampled third intermediate feature map;
- the skip transfer decoding unit is used by the terminal to up-sample the third intermediate feature map and the fourth intermediate feature map through the m-layer deconvolution layer to obtain the up-sampled distribution probability map ;
- the fourth intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the two-dimensional image is a two-dimensional medical image.
- the terminal performs filtering processing on noise pixels in the two-dimensional distribution binary map based on clinical prior knowledge
- the clinical prior knowledge is the knowledge obtained by statistically calculating the distribution position of the target object in the two-dimensional medical image.
- the terminal filters out the third noise pixel point that exceeds the target value range in the two-dimensional distribution binary map
- the target value range is a coordinate value range that the target object may appear based on the third clinical prior knowledge.
- the terminal scans the two-dimensional image according to the square frame formed by the short sides of the two-dimensional image Frame segmentation to obtain several two-dimensional images to be processed.
- a semantic segmentation device for three-dimensional images including:
- a first acquisition module configured to acquire the three-dimensional image
- a slicing module configured to slice the three-dimensional image according to the three directions of the three-dimensional coordinate axis to obtain a two-dimensional slice image on the x axis, a two-dimensional slice image on the y axis, and a two-dimensional slice image on the z axis;
- the first segmentation module is used to call the first segmentation model to semantically segment the x-axis two-dimensional slice image to obtain the distribution probability map of the target object in the x-axis direction;
- the first segmentation module is used to invoke a second segmentation model to semantically segment the y-axis two-dimensional slice image to obtain a distribution probability map of the target object in the y-axis direction;
- the first segmentation module is used to invoke a third segmentation model to semantically segment the z-axis two-dimensional slice image to obtain a distribution probability map of the target object in the z-axis direction;
- the fusion module is used to call an adaptive fusion model to perform three-dimensional fusion on the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain the three-dimensional of the target object Distribution binary map.
- At least one of the first segmentation model, the second segmentation model, and the third segmentation model includes: a deep network encoding unit and a skip pass decoding unit, the deep network encoding unit includes n layers of convolution layers, the skip transfer decoding unit includes m layers of deconvolution layers, and n and m are both positive integers;
- the deep network coding unit is configured to perform down-sampling feature extraction on the two-dimensional slice image through the n-layer convolutional layer to obtain a down-sampled first intermediate feature map;
- the skip transfer decoding unit is configured to perform upsampling processing on the first intermediate feature map and the second intermediate feature map through the m-layer deconvolution layer to obtain the up-sampled distribution probability map;
- the second intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the fusion module includes:
- a combination unit configured to call the adaptive fusion model to combine the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution feature map;
- a fusion unit configured to perform three-dimensional fusion convolution on the three-dimensional distribution feature map to obtain a three-dimensional segmentation probability map
- the calculation unit is configured to calculate a three-dimensional distribution binary map of the target object according to the maximum probability category of each pixel in the three-dimensional segmentation probability map.
- the three-dimensional image is a three-dimensional medical image; the device further includes:
- a first filtering module configured to filter out noise pixels in the three-dimensional distribution binary map based on clinical prior knowledge
- the clinical prior knowledge is the knowledge obtained by statistically calculating the distribution position of the target object in the three-dimensional medical image.
- the first filtering module is configured to filter out the first noise pixel points that exceed the target value range in the three-dimensional distribution binary map;
- the target value range is a coordinate value range that the target object may appear based on the first clinical prior knowledge.
- the first filtering module is configured to filter out second noise pixels beyond the three-dimensional ellipsoid model in the three-dimensional distribution binary map;
- the three-dimensional ellipsoid model is an ellipsoid model corresponding to the target object obtained according to the second clinical prior knowledge.
- the device further includes:
- the first scanning module is used to perform a two-dimensional slice image according to the square frame formed by the short side length of the two-dimensional slice image when the aspect ratio of the two-dimensional slice image exceeds a preset ratio range Scan frame segmentation to obtain several two-dimensional slice images to be processed.
- a two-dimensional image semantic segmentation device including:
- a second acquisition module configured to acquire the two-dimensional image
- the second segmentation module is used to call a segmentation model to semantically segment the two-dimensional image to obtain a distribution probability map of the target object;
- a calculation module configured to calculate a two-dimensional distribution binary map of the target object according to the maximum probability category of each pixel in the distribution probability map;
- the segmentation model includes: a deep network coding unit and a skip transfer decoding unit, the deep network coding unit includes an n-layer convolution layer, and the skip transfer decoding unit includes an m-layer deconvolution layer, both n and m Positive integer
- the deep network coding unit is configured to perform down-sampling feature extraction on the two-dimensional image through the n-layer convolutional layer to obtain a down-sampled third intermediate feature map;
- the skip transfer decoding unit is configured to perform up-sampling processing on the third intermediate feature map and the fourth intermediate feature map through the m-layer deconvolution layer to obtain the up-sampled distribution probability map;
- the fourth intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the two-dimensional image is a two-dimensional medical image; the device further includes:
- a second filtering module configured to filter out noise pixels in the two-dimensional distribution binary map based on clinical prior knowledge
- the clinical prior knowledge is the knowledge obtained by statistically calculating the distribution position of the target object in the two-dimensional medical image.
- the second filtering module is configured to filter out the third noise pixel points that exceed the target value range in the two-dimensional distribution binary map
- the target value range is a coordinate value range that the target object may appear based on the third clinical prior knowledge.
- the device further includes:
- a second scanning module configured to divide the scan frame of the two-dimensional image according to the square frame formed by the short side length of the two-dimensional image when the ratio of length to width of the two-dimensional image exceeds a preset ratio range To obtain several two-dimensional images to be processed.
- a terminal includes a processor and a memory, and the memory stores computer-readable instructions.
- the processor is caused to execute the method described in the above embodiment.
- one or more non-volatile storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, one or more Each processor executes the method described in the above embodiment.
- FIG. 1 is a schematic structural diagram of a Pspnet network model provided in the related art
- FIG. 2 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application.
- FIG. 3 is a flowchart of a method for semantic segmentation of a three-dimensional image provided by an exemplary embodiment of the present application
- FIG. 4 is a schematic structural diagram of semantic segmentation of a three-dimensional medical image provided by an exemplary embodiment of the present application.
- FIG. 5 is a flowchart of a method for semantic segmentation of a three-dimensional image provided by another exemplary embodiment of the present application.
- FIG. 6 is a schematic diagram of the shape change of the target object without changing the size of the two-dimensional slice image provided by another exemplary embodiment of the present application;
- FIG. 7 is a schematic diagram showing that the shape of the target object does not change when the size of the two-dimensional slice image is changed according to another exemplary embodiment of the present application;
- FIG. 8 is a schematic structural diagram of a first segmentation model provided by another exemplary embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a first module in a ResNet101 network model provided by another exemplary embodiment of the present application.
- FIG. 10 is a schematic structural diagram of an adaptive fusion model provided by another exemplary embodiment of the present application.
- FIG. 11 is a flowchart of a convolutional network model training method provided by an exemplary embodiment of the present application.
- FIG. 12 is a flowchart of a method for semantic segmentation of a two-dimensional image provided by another exemplary embodiment of the present application.
- FIG. 13 is a schematic diagram of an apparatus for semantic segmentation of a three-dimensional image provided by an exemplary embodiment of the present application.
- FIG. 14 is a schematic diagram of an apparatus for semantic segmentation of a three-dimensional image provided by another exemplary embodiment of the present application.
- FIG. 15 is a schematic diagram of an apparatus for a fusion module provided by another exemplary embodiment of the present application.
- 16 is a schematic diagram of an apparatus for semantic segmentation of a two-dimensional image provided by an exemplary embodiment of the present application.
- 17 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- FIG. 18 is an internal structure diagram of a terminal provided by an exemplary embodiment of the present application.
- Semantic segmentation refers to dividing the image into several non-overlapping areas based on the characteristics of the image such as grayscale, color, texture, and shape, and making these features appear similar in the same area, but obvious in different areas Difference.
- a spatial dimension such as depth dimension
- a time dimension image is added.
- a three-dimensional medical image can be considered as a three-dimensional image with a depth dimension added
- a video can be considered as a time dimension Three-dimensional image.
- Target objects Objects that belong to the foreground area in semantic segmentation.
- the target object may be a target organ.
- the target organ refers to the internal organs or tissues of the human body, and / or the internal organs or tissues of animals, such as heart, lung, liver, spleen, stomach, etc .; for two-dimensional In medical images, the target object may be a target organ.
- the embodiments of the present application mainly illustrate that the target object is a human organ in a three-dimensional medical image.
- the related art uses a convolutional network model to semantically segment the medical image, that is, the medical image is input into the convolutional network model, and the constructed convolutional network model
- the features of the human organs / tissues are extracted, and the features of the human organs / tissues are classified to obtain specific regions of the human organs / tissues in the medical image.
- the medical image after semantic segmentation can distinguish the human organ / tissue area from the background area, and then the doctor will make a clinical diagnosis.
- the "medical image” here may include an X-ray image obtained by irradiating the human body with X-rays, a CT image obtained by computerized tomography (CT), and a magnetic resonance imaging (MagneticResonanceImaging, MRI) MRI images obtained, etc.
- the medical image collected by the medical image collection device may be a 2D medical image or a 3D medical image.
- Pspnet is used to semantically segment 2D medical images.
- Pspnet uses a variety of convolution kernels of different sizes to convolve the input medical image, extract the features of the medical image, and form a variety of different size feature maps (feature maps). Finally, the output feature maps are interpolated and amplified to obtain semantic segmentation. result.
- the medical image 101 is input into the Pspnet network model, and the features of the medical image 101 are extracted to obtain a first feature map 102 of the same size as the medical image 101.
- the Pspnet network model uses four convolution kernels of different scales to perform convolution calculation on the simplified first feature map 102 to obtain four sub-feature maps corresponding to the size of the convolution kernel. The sizes of the four sub-feature maps are Not the same.
- upsampling or upsampling
- the size interpolation of the four different-size sub-feature maps is enlarged to the size of the medical image 101, and the enlarged four sub-feature maps are connected to the first feature map 102, thereby
- the second feature map 103 is obtained.
- a final probability map 104 is obtained after the second feature map 103 performs semantic segmentation through convolution.
- Pspnet only supports semantic segmentation of 2D medical images, not 3D medical images.
- the medical image is a 3D medical image with high definition and detection accuracy, such as a CT image and an MRI image
- Pspnet forcibly using Pspnet to perform semantic segmentation on the 3D medical image, "fracture phenomenon" may occur easily, and the edge after image segmentation The fitting does not meet the requirements.
- Pspnet does not support the processing of 3D medical images.
- Embodiments of the present application provide a method, device, terminal, and storage medium for semantic segmentation of a three-dimensional image, which can be used to solve the problems in the related technologies described above.
- This method can achieve semantic segmentation of three-dimensional images.
- the three-dimensional image is a three-dimensional medical image or video.
- the three-dimensional image is a three-dimensional medical image.
- FIG. 2 shows a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application.
- FIG. 2 includes: a medical image acquisition device 100 and a computer device 200.
- the medical image acquisition device 100 is used to acquire medical images of human organs / tissues, and the medical images include two-dimensional medical images and three-dimensional medical images.
- the medical image collection device 100 is also used to send the collected medical images to the computer device 200.
- the computer device 200 is used to receive medical images and perform semantic segmentation on the medical images.
- the medical image acquisition device 100 may be a device independent of the computer device 200, or may be combined into the computer device 200 as a whole device.
- the computer device 200 includes a central processing unit (CPU) 210 and a memory 220.
- CPU central processing unit
- the CPU 210 is used to call a neural network model that implements semantic segmentation.
- the memory 220 is used to store a neural network model that implements semantic segmentation.
- the neural network model includes a first cutting model 221, a second cutting model 222, a third cutting model 223, and an adaptive fusion model 224.
- the first cutting model 221, the second cutting model 222, and the third cutting model 223 are two-dimensional models for semantic segmentation based on convolutional neural networks.
- the adaptive fusion model 224 is a three-dimensional model used to obtain the three-dimensional semantic segmentation results by adaptively fusing the semantic segmentation results of the three two-dimensional semantic segmentation models.
- the first segmentation model 221 is used to perform two-dimensional semantic segmentation on the x-axis two-dimensional slice image to obtain a distribution probability map of the target organ in the x-axis direction.
- the second segmentation model 222 is used to perform two-dimensional semantic segmentation on the y-axis two-dimensional slice image to obtain a distribution probability map of the target organ in the y-axis direction.
- the third segmentation model 223 is used to perform two-dimensional semantic segmentation on the z-axis two-dimensional slice image to obtain a distribution probability map of the target organ in the z-axis direction.
- the adaptive fusion model 224 is used to perform three-dimensional fusion on the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution binary map of the target object.
- the three-dimensional image is first sliced according to the three-directional plane where the three-dimensional coordinate axis is located, and then the two-dimensional slice image of the three-directional plane is semantically segmented through three segmentation models.
- the distribution probability maps in three directions are obtained, and then the three distribution probability maps are three-dimensionally fused through an adaptive fusion model to obtain a three-dimensional distribution binary map corresponding to the final target object.
- FIG. 3 shows a flowchart of a method for semantic segmentation of a three-dimensional image provided by an exemplary embodiment of the present application.
- the method can be applied to the implementation environment shown in FIG. 2.
- the method includes:
- Step 301 the terminal obtains a three-dimensional image.
- the terminal collects the three-dimensional image through the image acquisition device.
- Step 302 The terminal slices the three-dimensional image according to the three-directional plane where the three-dimensional coordinate axis is located, to obtain a two-dimensional slice image on the x axis, a two-dimensional slice image on the y axis, and a two-dimensional slice image on the z axis.
- the terminal slices the three-dimensional image according to the three-directional plane where the three-dimensional coordinate axis is located, thereby obtaining a two-dimensional slice image on the x-axis, a two-dimensional slice image on the y-axis, and a two-dimensional slice image on the z-axis.
- the x-axis direction plane refers to the plane where the x-axis and z-axis are located
- the y-axis direction plane refers to the plane where the y-axis and z-axis are located
- the z-axis direction plane refers to the plane where the x-axis and y-axis are located.
- Step 303 The terminal invokes the first segmentation model to semantically segment the x-axis two-dimensional slice image to obtain a distribution probability map of the target object in the x-axis direction.
- the CPU calls the first segmentation model stored in the memory to semantically segment the x-axis two-dimensional slice image.
- the first segmentation model completes the semantic segmentation process of the target object in the x-axis two-dimensional slice image based on the grayscale, color, texture and shape of the target object in the x-axis two-dimensional slice image, thus outputting the target object in x The probability map of the distribution in the axial direction.
- Step 304 The terminal invokes the second segmentation model to semantically segment the y-axis two-dimensional slice image to obtain a distribution probability map of the target object in the y-axis direction plane.
- the CPU calls the second segmentation model stored in the memory to semantically segment the y-axis two-dimensional slice image.
- the second segmentation model completes the semantic segmentation process of the target object on the y-axis two-dimensional slice image according to the grayscale, color, texture and shape of the target object on the y-axis two-dimensional slice image, thus outputting the target object on the y The probability map of the distribution in the axial direction.
- Step 305 The terminal invokes the third segmentation model to perform semantic segmentation on the z-axis two-dimensional slice image to obtain a distribution probability map of the target object in the z-axis direction.
- the CPU calls the third segmentation model stored in the memory to perform semantic segmentation on the z-axis two-dimensional slice image.
- the third segmentation model completes the semantic segmentation process of the target object in the z-axis two-dimensional slice image based on the grayscale, color, texture and shape of the target object in the z-axis two-dimensional slice image, thus outputting the target object in z The probability map of the distribution in the axial direction.
- Step 306 The terminal invokes an adaptive fusion model to perform three-dimensional fusion on three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface, to obtain a three-dimensional distribution binary map of the target object.
- the CPU calls the adaptive fusion model stored in the memory to adaptively fuse the three distribution probability maps corresponding to the obtained x-axis, y-axis, and z-axis. Since the adaptive fusion model fuses three two-dimensional distribution probability maps of different dimensions, it can suppress many background noises and segment the target object's edge smoothly and accurately, and finally obtain the three-dimensional distribution of the target object. Value graph.
- the computing device divides the input three-dimensional medical image 401 in the x-axis direction plane to obtain an x-axis two-dimensional slice image 402, and divides the y-axis direction plane to obtain a y-axis
- the two-dimensional slice image 403 of the two-dimensional slice image is divided in the z-axis direction to obtain the two-dimensional slice image 404 of the z-axis, and then the two two-dimensional slice images are two-dimensional semantically segmented to obtain the two-dimensional of the target object in three directions
- a two-dimensional slice image corresponding to the three-directional plane is obtained, and then through three
- the three segmentation models corresponding to the direction planes obtain the two-dimensional distribution probability maps corresponding to the three direction planes, and realize the two-dimensional semantic segmentation of the three-dimensional medical images by the terminal.
- the three distribution probability maps are fused three-dimensionally through an adaptive fusion model to obtain a three-dimensional distribution binary map of the target object, thereby solving the related art Pspnet network model is only suitable for semantic segmentation of 2D natural images, and does not support 3D.
- the problem of semantic segmentation of medical images has achieved the effect of using three 2D segmentation models and an adaptive fusion model to achieve semantic segmentation of 3D medical images, and because the adaptive fusion model is a two-dimensional three-dimensional
- the distribution probability map is fused, which effectively suppresses background noise during 3D fusion, and achieves smooth and accurate segmentation of the target object's edge segmentation.
- FIG. 5 shows a flowchart of a method for semantic segmentation of a three-dimensional image provided by another exemplary embodiment of the present application. This method can be applied to the implementation environment shown in FIG. 2.
- the three-dimensional image is a three-dimensional medical image.
- the target object is the target organ to illustrate.
- the method includes:
- Step 501 The terminal obtains a three-dimensional medical image.
- the computer device collects a three-dimensional medical image through a medical image acquisition device, and the three-dimensional medical image includes a three-dimensional target organ and a background area other than the target organ.
- Step 502 The terminal slices the three-dimensional medical image according to the three-directional plane where the three-dimensional coordinate axis is located, to obtain a two-dimensional slice image on the x axis, a two-dimensional slice image on the y axis, and a two-dimensional slice image on the z axis.
- the computer device slices the three-dimensional medical image according to the three directions of the three-dimensional coordinate axis to obtain the two-dimensional slice image of the x-axis, the two-dimensional slice image of the y-axis, and the two Dimensional slice image.
- the computer device since the distribution position of each target organ in the three-dimensional medical image is relatively fixed, the computer device also reads the pre-stored first clinical prior knowledge, which is used to indicate the target
- the target value range of the candidate appearance position of the organ in each two-dimensional slice image For example, the abscissa range of candidate organs appearing in the x-axis two-dimensional slice image is [a1, a2], and the ordinate range of candidates appearing in the y-axis two-dimensional slice image is [b1, b2].
- the target value range is used for performing first noise filtering in the post-processing process.
- the computer device since the external shape of each target organ is an ellipsoid, the computer device also reads the pre-stored second clinical prior knowledge, which is used to indicate the 3D ellipsoid model of the target organ .
- the computer equipment uses the second clinical prior knowledge to calculate the longest and shortest possible axes of the target organ in the three directions of x, y, and z, thereby pre-establishing a three-dimensional ellipsoid model of the target organ.
- the three-dimensional ellipsoid model indicates the candidate appearance position of the target organ in the three-dimensional medical image, and the three-dimensional ellipsoid model is used to perform second noise filtering in the post-processing process.
- Step 503 when the length-width ratio of the two-dimensional slice image exceeds the preset ratio range, the terminal divides the two-dimensional slice image into scan frames according to the square frame formed by the short side length of the two-dimensional slice image, and obtains several pending 2D slice image.
- the computer device can also use the following image preprocessing method to process the two-dimensional slice image.
- the computer device when the aspect ratio of the obtained two-dimensional slice image is within a preset ratio range, the computer device converts the size of the two-dimensional slice image to an input size that conforms to the segmentation model.
- the preset ratio range can be [1/3, 3].
- the aspect ratio of the obtained two-dimensional slice image exceeds the preset ratio range, that is, when the aspect ratio of the two-dimensional slice image exceeds [1/3, 3], it is considered The two-dimensional slice image is too narrow and long. If the computer device directly converts the two-dimensional slice image 601 to the input size 602 according to the original size, the input size is the pixel size that matches the segmentation model, the target organ in the two-dimensional slice image 601 will be Extruded into strips, resulting in inaccurate final prediction results.
- the computer device when training the segmentation model, cuts the two-dimensional slice image 701 obtained from the sample image according to the square frame formed by the short side length to obtain the middle two to be processed Dimensional slice image 702.
- the computer device converts the size of the two-dimensional slice image 702 to be processed into the input size 703 of the segmentation model for training.
- the computer device divides the two-dimensional slice image 704 obtained from the three-dimensional medical image according to the square frame formed by the short side length to scan the two-dimensional slice image to obtain several two-dimensional to be processed
- the slice image 705 (for example, 3 sheets in FIG. 7).
- the computer device converts the size of several two-dimensional slice images 705 to be processed into the input size 703 of the segmentation model, and then inputs them to the segmentation model for prediction, respectively.
- Step 504 The terminal invokes the first segmentation model to semantically segment the x-axis two-dimensional slice image to obtain a distribution probability map of the target organ in the x-axis direction.
- the computer device calls the first segmentation model stored in the memory to semantically segment the x-axis two-dimensional slice image.
- the first segmentation model completes the semantic segmentation process of the two-dimensional slice image of the target organ in the x-axis according to the distribution position, size and shape of the target organ in the three-dimensional medical image, thereby outputting the distribution probability of the target organ in the x-axis direction Figure.
- the first segmentation model includes a deep network decoding unit and a skip transfer decoding unit
- the deep network decoding unit includes an n-layer convolutional layer
- the skip transfer decoding unit includes an m-layer deconvolution layer, both n and m Positive integer.
- the deep network coding unit is used to perform down-sampling feature extraction on a two-dimensional slice image through n convolutional layers to obtain a down-sampled first intermediate feature map.
- the skip transfer decoding unit is used for upsampling the first intermediate feature map and the second intermediate feature map through the m-layer deconvolution layer to obtain an upsampled distribution probability map.
- the second intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the deep network coding unit is a neural network model constructed based on the residual network model, or the deep network coding unit is a neural network model constructed based on other classification models, which is not limited in this embodiment.
- the computer device inputs the obtained x-axis two-dimensional slice image 801 into the deep network coding unit 802 constructed based on the ResNet101 model.
- the deep network coding unit 802 includes 5 convolutional layers.
- the five convolutional layers are Conv1, Conv2_x, Conv3_x, Conv4_x and Conv5_x, the size and number of convolution kernels corresponding to each convolution layer, and the stride of the convolution kernel for each convolution kernel (stride)
- the information is shown in Table 1.
- the x in the table represents the sub-convolutional layer number belonging to the convolutional layer of this layer.
- the Conv1 layer of the deep network coding unit 802 includes 64 7x7 convolution kernels, and the convolution step size is 2 each time.
- Conv2_x includes 1 sub-convolutional layer and 3 first modules in cascade.
- the first sub-convolutional layer includes a 3x3 convolutional kernel with a convolutional step size of 2 and the first subconvolutional layer After convolution, a maximum pooling is performed.
- the three first blocks located after the first sub-convolutional layer are the same.
- the first module includes three sub-convolution layers.
- the first sub-convolution layer 901 includes 64 1 ⁇ 1 convolution kernels
- the second sub-convolution layer 902 includes 64 3 ⁇ 3 convolution kernels.
- the three-sub-convolution layer 903 includes 256 1x1 convolution kernels, and each sub-convolution layer is connected to an activation (relu) layer and a batch normalization (BN) layer (not shown in the figure) .
- the first module is also used to map the pixels corresponding to the feature map output by the first sub-convolutional layer of the previous layer to the feature map output by the third sub-convolutional layer 903 through skip connections.
- the relu layer is used to convert the linear data obtained after convolution into non-linear data, thereby enhancing the expressive ability of the ResNet101 model.
- the BN layer is used to speed up the convergence speed of the ResNet101 model and alleviate the gradient dispersion problem of the ResNet101 model in the deep layer, thereby making the ResNet101 model more stable and easy to train.
- Conv3_x includes 4 second modules in cascade, the 4 second modules are the same. And the structure of the second module is the same as that of the first module, and the second module can be understood by referring to the structure of the first module.
- the second module includes three sub-convolution layers, the fourth sub-convolution layer includes 128 1x1 convolution kernels, and each convolution step is 2, and the fifth sub-convolution layer includes 128 3x3 convolution kernels.
- the sixth sub-convolution layer includes 512 1x1 convolution kernels, and each sub-convolution layer is connected to a relu layer and a BN layer.
- the second module is also used to map the pixels corresponding to the feature map output by the previous module to the feature map output by the sixth sub-convolution layer through the jump connection, and activate through the relu layer to obtain the next module The input feature map.
- Conv4_x includes 23 third modules cascaded, and the 23 third modules are the same.
- the structure of the third module is the same as that of the first module, and the third module can be understood by referring to the structure of the first module.
- the third module includes three sub-convolution layers, and the seventh sub-convolution layer includes 256 1x1 convolution kernels, and each convolution step is 1, in order to ensure the features output after each seventh sub-convolution layer
- the area of the graph also known as the receptive field
- the hole is set to 2
- the eighth sub-convolution layer includes 256 3x3 convolution kernels
- the ninth sub-convolution layer includes 1024 1x1 convolution kernels, and each layer After the sub-convolutional layer, a relu layer and a BN layer are connected.
- the third module is also used to map the pixels corresponding to the feature map output by the previous module to the feature map output by the ninth sub-convolution layer through the jump connection, and activate through the relu layer to obtain the next module
- hollow convolution is also called expansion convolution, which is a kind of convolution method in which holes are injected between convolution kernels.
- expansion convolution introduces a hyper-parameter called "dilation rate", which defines the interval between values when the convolution kernel processes data.
- the receptive field can be expanded to achieve more accurate target detection .
- the receptive field is the size of the area where the pixels on the feature map output from the hidden layer in the neural network are mapped on the original image. The larger the receptive field of the pixel on the original image, the larger the range of the mapped original image. It means that it may contain more global and higher-level features.
- Conv5_x includes three fourth modules in cascade, and the three fourth modules are the same. And the structure of the fourth module is the same as that of the first module, and the fourth module can be understood by referring to the structure of the first module.
- the fourth module includes three sub-convolution layers, the tenth sub-convolution layer includes 512 1x1 convolution kernels, the eleventh sub-convolution layer includes 512 3x3 convolution kernels, and the twelfth sub-convolution layer includes There are 2048 1x1 convolution kernels, and each sub-convolution layer is connected with a relu layer and a BN layer.
- the fourth module is also used to map the pixels corresponding to the feature map output by the previous module to the feature map output by the twelfth sub-convolution layer through the jump connection, and activate through the relu layer to obtain the next module The input feature map.
- the x-axis two-dimensional slice image 801 extracts features through the five-layer convolutional layer of the deep network encoding unit 802 to obtain a first intermediate feature map (1), which corresponds to the x-axis direction plane .
- the first intermediate feature map (1) is a feature map after 8 times downsampling.
- pooling is used for downsampling after Conv5_3.
- the downsampling kernel size is set to 1/9/19/37 / 74 Five kinds.
- the computer device then inputs the first intermediate feature map (1) to the skip pass decoding unit 803, which includes a 2-layer deconvolution layer.
- the computer device decodes the first intermediate feature map (1) step by step through the deconvolution layer, the number of decoding times is two, and the decoding multiple is two each time.
- Decoding the first intermediate feature map (1) refers to performing a jump connection and upsampling process on the first intermediate feature map (1) and a feature map output from a predetermined layer in the deep network editing unit 802.
- the first deconvolution layer connects the first intermediate feature map (1) with the second intermediate feature map (2) output by the Conv3_x convolutional layer of the deep network decoding unit 802, and performs 2x upsampling to obtain a 2x
- the first intermediate feature map (1 ') after upsampling, the first intermediate feature map (1') after upsampling and the second intermediate feature map (2 ') output by the Conv1 convolution layer of the deep network decoding unit 802 Perform the skip connection and 2 times upsampling process to obtain the second intermediate feature map (2 ') after 4 times upsampling to obtain the final distribution probability map.
- the size of the first intermediate feature map and the second intermediate feature map of the jump connection are the same.
- the computer device obtains the distribution probability map 804 of the target organ in the x-axis direction plane through the first segmentation model.
- the distribution probability map 804 indicates the probability that each pixel on the two-dimensional slice image belongs to the foreground area and / or belongs to the background area.
- the foreground area refers to the area where the target organ is located
- the background area refers to the area where the non-target organ is located.
- Step 505 The terminal invokes the second segmentation model to semantically segment the y-axis two-dimensional slice image to obtain a distribution probability map of the target organ in the y-axis direction.
- the second segmentation model and the first segmentation model have the same structure, but there are differences only in the sample images used in the training process. Therefore, the process of performing semantic segmentation on the y-axis two-dimensional slice image by using the second segmentation model can refer to the description of step 504, and will not be described again.
- Step 506 The terminal invokes the third segmentation model to semantically segment the z-axis two-dimensional slice image to obtain a distribution probability map of the target organ in the z-axis direction.
- the third segmentation model and the first segmentation model have the same structure, but there are differences only in the sample images used in the training process. Therefore, the process of performing semantic segmentation on the y-axis two-dimensional slice image by using the second segmentation model can refer to the description of step 504, and will not be described again.
- Step 507 The terminal invokes the adaptive fusion model to combine the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution feature map.
- the computer device calls the adaptive fusion model stored in the memory to combine the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution feature map.
- the computer device carries out the distribution probability map 1001 of the target organ in the x-axis direction plane 1001, the distribution probability map 1002 of the target organ in the y-axis direction plane, and the distribution probability map 1003 of the target organ in the z-axis direction plane.
- the distribution probability maps 1001-1003 of the three directional planes have the same size as the three-dimensional medical image, and have probabilities corresponding to the respective directional planes.
- the three-dimensional distribution feature map 1004 includes the probabilities corresponding to the target organs corresponding to the three directional surfaces, and the size of the three-dimensional distribution feature map 1004 is the same as the size of the three-dimensional medical image.
- Step 508 The terminal performs three-dimensional fusion and convolution on the three-dimensional distribution feature map to obtain a three-dimensional segmentation probability map.
- the computer device invokes the adaptive fusion model stored in the memory to perform a three-layer convolution layer on the obtained three-dimensional distribution feature map 1004 for three-dimensional fusion convolution, thereby obtaining a three-dimensional segmentation probability map 1005, which is used to indicate three-dimensional
- the probability that each pixel in the medical image belongs to the foreground area and / or belongs to the background area refers to the area where the target organ is located, and the background area refers to the area where the non-target organ is located.
- H * W * D * C represents the size of the image and the corresponding probability.
- the adaptive fusion model includes three shallow 3D convolutional layers, the first 3D convolutional layer includes 64 3 * 3 * 3 3D convolution kernels and the convolution step size is 1,
- the second 3D convolutional layer includes 64 3 * 3 * 3 3D convolution kernels with a convolution step size of 1
- the third 3D convolutional layer includes 1 3 * 3 * 3 3D convolution kernel with a convolution step size. Is 1.
- the size of the three-dimensional segmentation probability map 1005 is the same as the size of the three-dimensional medical image.
- Step 509 The terminal calculates a three-dimensional distribution binary map of the target organ according to the maximum probability category of each pixel in the three-dimensional segmentation probability map.
- the adaptive fusion model determines the category to which each pixel in the image belongs according to the maximum probability category of each pixel in the three-dimensional segmentation probability map.
- the category includes the foreground pixels that belong to the target organ and those that do not belong to the target Background pixel of the organ.
- the three-dimensional segmentation probability map 1005 includes a first probability that each pixel belongs to the foreground pixel and a second probability that belongs to the background pixel.
- the maximum probability category is based on the larger of the first probability and the second probability.
- the category corresponding to a probability For example, if the probability of a pixel belonging to a foreground pixel is 80%, and the probability of belonging to a background pixel is 20%, then the maximum probability category of the pixel is the foreground pixel.
- the three-dimensional distribution binary map uses 1 to represent foreground pixels and 0 to represent background pixels.
- Step 510 The terminal performs filtering processing on the noise pixel points in the three-dimensional distribution binary map based on clinical prior knowledge.
- the computer equipment can also use clinical a priori knowledge to filter out the noise pixels in the three-dimensional distribution binary map.
- the computer device filters out the first noise pixel points that exceed the target value range in the three-dimensional distribution binary map.
- the target value range is the coordinate value range that the target organ may appear based on the first clinical prior knowledge.
- the target value range is a three-dimensional cube frame area.
- the first clinical prior knowledge can be constructed based on multiple sample images.
- the computer device filters out the second noise pixel points beyond the three-dimensional ellipsoid model in the three-dimensional distribution binary map.
- the three-dimensional ellipsoid model is the ellipsoid model corresponding to the target organ obtained from the second clinical prior knowledge.
- the second clinical prior knowledge can be constructed based on multiple sample images. Since the shape of most organs is biased to an ellipsoid, the terminal can pre-count the longest axis and shortest axis of the two-dimensional slice image of the target organ in the x-axis, y-axis, and z-axis directions, thereby constructing the target organ Three-dimensional ellipsoid model. According to the constructed three-dimensional ellipsoid model, the noise pixels beyond the three-dimensional ellipsoid model are filtered out of the candidate pixels.
- the method for filtering noise pixels by the computer device may use at least one of the above two filtering methods.
- a two-dimensional slice image corresponding to the three-directional plane is obtained, and then through three
- the three segmentation models corresponding to the direction planes obtain the two-dimensional distribution probability maps corresponding to the three direction planes, and realize the two-dimensional semantic segmentation of the three-dimensional medical images by the terminal.
- the three distribution probability maps are fused three-dimensionally through an adaptive fusion model to obtain a three-dimensional distribution binary map of the target object, thereby solving the related art Pspnet network model is only suitable for semantic segmentation of 2D natural images, and does not support 3D.
- the problem of semantic segmentation of medical images has achieved the effect of using three 2D segmentation models and an adaptive fusion model to achieve semantic segmentation of 3D medical images, and because the adaptive fusion model is a two-dimensional three-dimensional
- the distribution probability map is fused, which effectively suppresses background noise during 3D fusion, and achieves smooth and accurate segmentation of the target object's edge segmentation.
- the terminal through the filtering of noise pixels by clinical prior knowledge, acquires pixels belonging to the target organ, which has strong noise reduction capability and good edge segmentation effect.
- the size of the two-dimensional slice image is changed from the original size to the input size, to avoid the error problem that may occur when the original size of the two-dimensional slice image is used, so that when the three-dimensional medical image is semantically segmented, the target can be Accurate segmentation of organs.
- the first segmentation model, the second segmentation model, the third segmentation model, and the adaptive fusion model all belong to the convolutional network model.
- the convolution needs to be performed by a computer device.
- the network model is trained. As shown in Figure 11, the training methods of the three two-dimensional segmentation models include but are not limited to the following steps:
- Step 1101 the terminal acquires at least one set of sample images.
- the computer device acquires at least one group of sample images through the medical image collection device.
- the number of sample images in each group is not limited, and can be set according to the needs of the trainers.
- the sample image may include an image with a sample organ and an image without a sample organ. For a sample image in which a sample organ exists, the pixel point to which the sample organ belongs is marked in the sample image.
- the sample image may be a two-dimensional slice image on the x-axis direction, and the pixel points to which the sample organ belongs are marked on the two-dimensional slice image on the x-axis direction.
- the sample image may be a two-dimensional slice image on the y-axis direction, and the pixel points to which the sample organ belongs are marked on the two-dimensional slice image on the y-axis direction.
- the sample image may be a two-dimensional slice image on the z-axis direction, and the pixel points to which the sample organ belongs are marked on the two-dimensional slice image on the z-axis direction.
- Step 1102 The terminal obtains the calibration result of the sample organ in the sample image to obtain a sample image data set composed of the sample image and the sample organ corresponding to the sample image.
- the calibration result includes the distribution position of the sample organ in the sample image.
- a training person or computer device sets a calibration result in the sample image, and the calibration result includes the pixel points to which the sample organ belongs.
- the calibration result is used to indicate at least one of the distribution position of the sample organ in the sample image, the size of the sample organ, and the shape of the ellipsoid corresponding to the sample organ.
- the area where the sample organ is located and the background area other than the sample organ are marked in the image with the sample organ; the area where the sample organ does not exist is marked in the image without the sample organ.
- the sample image data set is used for comparison with the training result corresponding to the sample image.
- step 1103 the terminal inputs the sample image into the original segmentation model to obtain the training result.
- the computer device inputs the calibrated sample images into the original segmentation model, identifies the sample image and the sample organs in the sample image through the original segmentation model, and outputs the recognition results as training results.
- the original segmentation model is a model constructed based on the ResNet model, as shown in FIG. 8.
- the initial weight of the segmentation model can be set by the trainer according to the experience value, or randomly set by the computer equipment.
- the weights of the deep network coding part in the segmentation model are initialized using ResNet parameters trained on the ImageNet dataset, while the weights of the skip pass decoding part use the mean of 0 and the variance of 2 divided by the input The Gaussian distribution of numbers is initialized.
- step 1104 the terminal compares the training result with the calibration result of the sample organ according to each sample image data set to obtain a calculated loss, and the calculated loss is used to indicate an error between the training result and the calibration result of the sample organ.
- the computer device compares the obtained training result with the sample image data set corresponding to the same set of sample images, and calculates the error between the training result and the calibration result.
- the error is a weighted loss function.
- the calculated loss is used to indicate the error between the training result and the calibration result of the sample organ.
- the weighted loss function uses a cross entropy loss function, and the weighted loss formula of the cross entropy loss function is
- p represents the probability that the pixel belongs to the target pixel corresponding to the target organ
- y represents the category, that is, y is 0 or 1
- w fg represents the weight of the foreground category
- w wg represents the weight of the background category
- t i represents the i-th sample
- n i represents the number of pixels of the full image in the i-th sample image
- N is the number of sample images of a batch input size (Batchsize), and the weighted value comes from the foreground of the sample image
- background ratio statistics is the number of sample images of a batch input size (Batchsize), and the weighted value comes from the foreground of the sample image And background ratio statistics.
- step 1105 the terminal uses error back propagation algorithm to train and obtain a segmentation model according to the corresponding calculated loss of at least one sample image data set.
- the terminal uses the error back propagation algorithm to reset the weights according to the respective calculated losses of at least one set of sample image data sets until the weighted loss obtained by the terminal according to the reset weights meets the preset threshold, or the number of times the terminal performs training The preset number of times. For example, if the number of trainings is required to reach 20,000, the terminal can stop training, and the training of the segmentation model for two-dimensional semantic segmentation is completed.
- the error back-propagation algorithm may use SGD (Stochastic Gradient Descent) gradient descent method to solve the convolution template parameter w and offset parameter b of the segmentation model according to SGD, and the training iteration parameters may be selected according to cross-validation.
- a two-dimensional distribution probability map is obtained in the trained segmentation model according to the two-dimensional slice images of each three-dimensional sample image, and the two-dimensional distribution probability map and the calibration
- a good three-dimensional binary map is used as another sample image group.
- the sample image group is used to train an adaptive fusion model, and the training process of the adaptive fusion model is the same as or similar to the above method, which will not be repeated in this application.
- the weighted loss is calculated by calculating the probability that each pixel in the feature map belongs to the target pixel, which is the pixel corresponding to each feature in the target organ.
- the training process of the adaptive fusion model is the same as the training process of the three segmentation models.
- the training process of the adaptive fusion model can be implemented with reference to the steps shown in FIG. 11. After the adaptive fusion model obtains the training result, the Dice loss function is used as the loss function. The Dice loss function is used to calculate the error between the training result of the adaptive fusion model and the calibration result of the adaptive fusion model. .
- the semantic segmentation method of the three-dimensional image provided in this application can also be applied to the semantic segmentation method of the two-dimensional image.
- FIG. 12 shows a flowchart of a method for semantic segmentation of a two-dimensional image provided by another exemplary embodiment of the present application. This method can be applied to the implementation environment shown in FIG. 2.
- the two-dimensional image is two Dimensional medical images
- the target object is the target organ to illustrate.
- the method includes:
- Step 1201 the terminal obtains a two-dimensional medical image.
- the computer device collects a two-dimensional medical image through the medical image acquisition device.
- the two-dimensional medical image includes two target organs and a background area other than the target organ.
- the computer device After acquiring the two-dimensional medical image, the computer device analyzes it. In some embodiments, since the distribution position of each target organ in the two-dimensional medical image is relatively fixed, the computer device also reads the pre-stored third clinical prior knowledge, which is used to indicate The target value range of the candidate appearance position of the target organ in each two-dimensional medical image. For example, the range of the abscissa of the target organ A in the x-axis two-dimensional medical image is [a1, a2], and the range of the ordinate in the y-axis two-dimensional medical image is [b1, b2]. The target value range is used to perform third noise filtering in the post-processing process.
- Step 1202 when the length-width ratio of the 2D medical image exceeds the preset ratio range, the terminal divides the scan frame of the 2D medical image according to the square frame formed by the short side length of the 2D medical image to obtain several pending Two-dimensional medical image.
- the computer device can also use the following image preprocessing method to process the two-dimensional medical image.
- the computer device when the aspect ratio of the obtained two-dimensional medical image is within a preset ratio range, the computer device converts the size of the two-dimensional medical image into an input size that conforms to the segmentation model.
- the preset ratio range can be [1/3, 3].
- the aspect ratio of the obtained two-dimensional medical image exceeds the preset ratio range, that is, the aspect ratio of the two-dimensional medical image exceeds [1/3, 3], it is considered The two-dimensional medical image is too narrow and long. If the computer device directly converts the two-dimensional medical image to the input size according to the original size, the input size is the pixel size that meets the segmentation model, the target organ in the two-dimensional medical image will be squeezed into Bars, resulting in inaccurate final prediction results.
- the computer device divides the two-dimensional medical image into scan frames according to the square frame formed by the short sides of the two-dimensional medical image to obtain several two-dimensional medical images to be processed . After that, the computer device converts the size of several two-dimensional medical images to be processed into the input size of the segmentation model, and then inputs them to the segmentation model for prediction, respectively.
- Step 1203 The terminal invokes the segmentation model to semantically segment the two-dimensional medical image to obtain a distribution probability map of the target organ.
- the structure of the split model is the same as the structure of the first split model, so the structure of the split model can refer to the model structure shown in FIG. 8.
- the segmentation model includes: a deep network coding unit and a skip transfer decoding unit, the deep network coding unit includes an n-layer convolutional layer, the skip transfer decoding unit includes an m-layer deconvolution layer, and n and m are both positive integers.
- the deep network coding unit is used for the terminal to perform down-sampling feature extraction on the two-dimensional image through n convolutional layers to obtain the down-sampled third intermediate feature map.
- the skip transfer decoding unit is used by the terminal to perform upsampling processing on the third intermediate feature map and the fourth intermediate feature map through the m-layer deconvolution layer to obtain a distribution probability map after upsampling.
- the fourth intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the segmentation model and the first segmentation model have the same structure, and only the sample images used in the training process differ. Therefore, the process of semantic segmentation of the two-dimensional medical image by using the segmentation model can refer to the description of step 504, and will not be repeated here.
- Step 1204 the terminal calculates a two-dimensional distribution binary map of the target organ according to the maximum probability category of each pixel in the distribution probability map.
- the segmentation model determines the category to which each pixel in the image belongs based on the maximum probability category of each pixel in the distribution probability map, the category includes the foreground pixels belonging to the target organ and the background not belonging to the target organ pixel.
- the distribution probability map includes a third probability that each pixel belongs to the foreground pixel and a fourth probability that belongs to the background pixel.
- the maximum probability category is based on the larger of the third probability and the fourth probability The corresponding category. For example, if the probability of a pixel belonging to a foreground pixel is 80%, and the probability of belonging to a background pixel is 20%, then the maximum probability category of the pixel is the foreground pixel.
- the two-dimensional distribution binary map uses 1 to represent foreground pixels and 0 to represent background pixels.
- Step 1205 The terminal performs filtering processing on the noise pixel points in the two-dimensional distribution binary map based on clinical prior knowledge.
- the computer device can also use clinical a priori instructions to filter out noise pixels in the two-dimensional distribution binary map.
- the computer device filters out the third noise pixel points beyond the target value range in the two-dimensional distribution binary map.
- the target value range is the coordinate value range that the target organ may appear based on the third clinical prior knowledge.
- the target value range is a two-dimensional plane frame area.
- the third clinical prior knowledge can be constructed based on multiple sample images.
- the obtained two-dimensional image is semantically segmented by the segmentation model to obtain the distribution probability map of the target organ, and by determining the maximum probability category of each pixel in the distribution probability map, Obtain the two-dimensional distribution binary map of the target organ, and filter the noise pixel points of the obtained two-dimensional distribution binary map according to the third clinical prior knowledge to achieve the purpose of semantic segmentation of the two-dimensional image.
- the filtering of noise pixels makes the segmentation boundary of the semantic segmentation image clear, and the edge processing is friendly.
- the semantic segmentation method of the three-dimensional image provided by the present application is not only applicable to the semantic segmentation of the three-dimensional image, but also to the semantic segmentation of the two-dimensional image, and the segmentation effect is relatively good.
- steps in the embodiments of the present application are not necessarily executed in the order indicated by the step numbers. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The order is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
- a terminal is further provided.
- the terminal includes a semantic segmentation device for three-dimensional images and a semantic segmentation device for two-dimensional graphics.
- the semantic segmentation device for three-dimensional images and the semantic segmentation device for two-dimensional graphics include various modules. Each module can be implemented in whole or in part by software, hardware, or a combination thereof.
- FIG. 13 shows a schematic diagram of an apparatus for semantic segmentation of a three-dimensional image provided by an exemplary embodiment of the present application.
- the apparatus includes:
- the first acquisition module 1310 is used to acquire a three-dimensional image.
- the slicing module 1320 is configured to slice the three-dimensional image according to the three directional planes where the three-dimensional coordinate axis is located to obtain a two-dimensional slice image on the x axis, a two-dimensional slice image on the y axis, and a two-dimensional slice image on the z axis.
- the first segmentation module 1330 is used to call the first segmentation model to semantically segment the x-axis two-dimensional slice image to obtain the distribution probability map of the target object in the x-axis direction; call the second segmentation model to segment the y-axis two-dimensional slice The image is semantically segmented to obtain the distribution probability map of the target object in the y-axis direction; the third segmentation model is called to semantically segment the z-axis two-dimensional slice image to obtain the distribution probability map of the target object in the z-axis direction.
- the fusion module 1340 is used to call the adaptive fusion model to perform three-dimensional fusion on the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution binary map of the target object.
- the apparatus includes:
- the first acquisition module 1410 is used to acquire a three-dimensional image.
- the slicing module 1420 is configured to slice the three-dimensional image according to the three directions of the three-dimensional coordinate axis to obtain a two-dimensional slice image on the x-axis, a two-dimensional slice image on the y-axis, and a two-dimensional slice image on the z-axis.
- the first scanning module 1430 is used to divide the scan frame of the two-dimensional slice image according to the square frame formed by the short side length of the two-dimensional slice image when the aspect ratio of the two-dimensional slice image exceeds the preset ratio range, to obtain Several two-dimensional slice images to be processed.
- the first segmentation module 1440 is used to call the first segmentation model to semantically segment the x-axis two-dimensional slice image to obtain the distribution probability map of the target object in the x-axis direction; call the second segmentation model to slice the two-dimensional y-axis The image is semantically segmented to obtain the distribution probability map of the target object in the y-axis direction; the third segmentation model is called to semantically segment the z-axis two-dimensional slice image to obtain the distribution probability map of the target object in the z-axis direction.
- At least one of the first segmentation model, the second segmentation model, and the third segmentation model includes: a deep network coding unit and a skip transfer decoding unit, the deep network coding unit includes an n-layer convolutional layer, skip transfer The decoding unit includes m layers of deconvolution layers, and n and m are both positive integers.
- the deep network coding unit is configured to perform down-sampling feature extraction on the two-dimensional slice image through n convolutional layers to obtain the down-sampled first intermediate feature map.
- the skip transfer decoding unit is used for upsampling the first intermediate feature map and the second intermediate feature map through the m-layer deconvolution layer to obtain an upsampled distribution probability map.
- the second intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the fusion module 1450 is used to call the adaptive fusion model to perform three-dimensional fusion on the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution binary map of the target object.
- the fusion module 1450 includes:
- the combining unit 1451 is configured to call the adaptive fusion model to combine the three distribution probability maps corresponding to the x-axis direction surface, the y-axis direction surface, and the z-axis direction surface to obtain a three-dimensional distribution feature map.
- the fusion unit 1452 is used to perform three-dimensional fusion and convolution on the three-dimensional distribution feature map to obtain a three-dimensional segmentation probability map.
- the calculation unit 1453 is configured to calculate a three-dimensional distribution binary map of the target object according to the maximum probability category of each pixel in the three-dimensional segmentation probability map.
- the three-dimensional image is a three-dimensional medical image
- the first filtering module 1460 is used to filter the noise pixels in the three-dimensional distribution binary map based on clinical prior knowledge
- the clinical prior knowledge is the knowledge obtained by statistically calculating the distribution position of the target object in the three-dimensional medical image.
- the first filtering module 1460 is configured to filter the first noise pixel points that exceed the target value range in the three-dimensional distribution binary map.
- the target value range is the coordinate value range that the target object may appear based on the first clinical prior knowledge.
- the first filtering module 1460 is configured to filter out the second noise pixels beyond the three-dimensional ellipsoid model in the three-dimensional distribution binary map.
- the three-dimensional ellipsoid model is an ellipsoid model corresponding to the target object obtained according to the second clinical prior knowledge.
- the first acquisition module 1410 is also used to implement any other implicit or disclosed acquisition step related functions in the above method embodiment; the slicing module 1420 is also used to implement any other implicit or disclosed slicing step in the above method embodiment Related functions; the first scanning module 1430 is also used to implement any other implicit or disclosed scanning step related functions in the above method embodiment; the first segmentation module 1440 is also used to implement any other implicit or The functions related to the disclosed segmentation step; the fusion module 1450 is also used to implement any other implicit or disclosed functions related to the fusion step in the above method embodiment; the first filtering module 1460 is also used to implement any other function in the above method embodiment Implicit or public functions related to filtering steps.
- the semantic segmentation device for three-dimensional images provided in the above embodiments is only exemplified by the division of the above functional modules.
- the above functions can be allocated by different functional modules according to needs, that is, the equipment
- the internal structure of is divided into different functional modules to complete all or part of the functions described above.
- the semantic segmentation device of the three-dimensional image provided in the above embodiment and the method embodiment of the semantic segmentation method of the three-dimensional image belong to the same concept. For the specific implementation process, refer to the method embodiment, and details are not described here.
- the apparatus includes:
- the second acquisition module 1610 is used to acquire a two-dimensional image.
- the second scanning module 1620 is used to divide the scan frame of the two-dimensional image according to the square frame formed by the short side length of the two-dimensional image when the aspect ratio of the two-dimensional image exceeds the preset ratio range, and obtain several Processed two-dimensional image.
- the second segmentation module 1630 is used to call the segmentation model to semantically segment the two-dimensional image to obtain the distribution probability map of the target object;
- the segmentation model includes: a deep network coding unit and a skip transfer decoding unit, the deep network coding unit includes an n-layer convolutional layer, the skip transfer decoding unit includes an m-layer deconvolution layer, and n and m are both positive integers.
- the deep network coding unit is configured to perform down-sampling feature extraction on the two-dimensional image through n convolutional layers to obtain a down-sampled third intermediate feature map.
- the skip transfer decoding unit is used for up-sampling the third intermediate feature map and the fourth intermediate feature map through the m-layer deconvolution layer to obtain the up-sampled distribution probability map.
- the fourth intermediate feature map includes a feature map output by the i-th convolution layer in the n-layer convolution layer, where i is an integer less than or equal to n.
- the calculation module 1640 is used to calculate a two-dimensional distribution binary map of the target object according to the maximum probability category of each pixel in the distribution probability map.
- the two-dimensional image is a two-dimensional medical image
- the second filtering module 1650 is used to filter out the noise pixels in the two-dimensional distribution binary map based on clinical prior knowledge
- the clinical prior knowledge is the knowledge obtained by statistically calculating the distribution position of the target object in the two-dimensional medical image.
- the second filtering module 1650 is configured to filter out the third noise pixel points that exceed the target value range in the two-dimensional distribution binary map.
- the target value range is the coordinate value range that the target object may appear based on the third clinical prior knowledge.
- the second acquisition module 1610 is also used to implement any other implicit or disclosed acquisition step related functions in the above method embodiments;
- the second scanning module 1620 is also used to implement any other implicit or disclosed other implicit or disclosed functions in the above method embodiments Functions related to the scanning step;
- the second segmentation module 1630 is also used to implement any other implicit or disclosed functions related to the segmentation step in the above method embodiment;
- the calculation module 1640 is also used to implement any other implicit or Functions related to the disclosed calculation step;
- the second filtering module 1650 is also used to implement any other functions related to the filtering step that are implicit or disclosed in the above method embodiments.
- the semantic segmentation device for two-dimensional images provided in the above embodiments is only exemplified by the division of the above functional modules.
- the above functions can be allocated by different functional modules according to needs, that is, The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the semantic segmentation device for two-dimensional graphics provided by the above embodiments and the method embodiment of the semantic segmentation method for two-dimensional images belong to the same concept. For the specific implementation process, refer to the method embodiments, and details are not described here.
- FIG. 17 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
- the computer device is used to implement the semantic segmentation method of the three-dimensional image and the semantic segmentation method of the two-dimensional image provided in the above embodiments. Specifically:
- the computer device 1700 includes a central processing unit (CPU) 1701, a system memory 1704 including a random access memory (RAM) 1702 and a read-only memory (ROM) 1703, and a system bus 1705 connecting the system memory 1704 and the central processing unit 1701.
- the computer device 1700 also includes a basic input / output system (I / O system) 1706 that helps transfer information between various devices in the computer, and a large-capacity storage for storing the operating system 1713, application programs 1714, and other program modules 1715 Device 1707.
- I / O system basic input / output system
- the basic input / output system 1706 includes a display 1708 for displaying information and an input device 1709 for a user to input information, such as a mouse and a keyboard.
- the display 1708 and the input device 1709 are both connected to the central processing unit 1701 through an input and output controller 1710 connected to the system bus 1705.
- the basic input / output system 1706 may also include an input output controller 1710 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus.
- the input output controller 1710 also provides output to a display screen, printer, or other type of output device.
- the mass storage device 1707 is connected to the central processing unit 1701 through a mass storage controller (not shown) connected to the system bus 1705.
- the mass storage device 1707 and its associated computer-readable medium provide non-volatile storage for the computer device 1700. That is, the mass storage device 1707 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.
- Computer-readable media may include computer storage media and communication media.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory, or other solid-state storage technologies, CD-ROM, DVD, or other optical storage, tape cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices.
- RAM random access memory
- ROM read-only memory
- EPROM Erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other solid-state storage technologies
- CD-ROM, DVD or other optical storage
- tape cassettes magnetic tape
- magnetic disk storage or other magnetic storage devices.
- computer storage medium is not limited to the above types.
- the above-mentioned system memory 1704 and mass storage device 1707 may be collectively referred to as a memory.
- the computer device 1700 may also be operated by a remote computer connected to the network through a network such as the Internet. That is, the computer device 1700 can be connected to the network 1712 through the network interface unit 1711 connected to the system bus 1705, or the network interface unit 1711 can also be used to connect to other types of networks or remote computer systems (not shown) .
- the memory also includes one or more programs that are stored in the memory and configured to be executed by one or more processors.
- One or more of the above programs contains instructions for performing the following operations:
- the memory of the server further includes instructions for performing the following operations :
- FIG. 18 shows an internal structure diagram of the terminal in one embodiment.
- the terminal includes a processor, memory, network interface, display screen, and input device connected through a system bus.
- the processor of the computer device is used to provide computing and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system and computer programs.
- the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
- the memory includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium of the terminal stores an operating system and computer readable instructions.
- the processor can realize the semantic segmentation method of the three-dimensional image and the two-dimensional image Semantic segmentation method.
- the internal memory may also store computer readable instructions.
- the processor may cause the processor to execute the semantic segmentation method of the three-dimensional image and the semantic segmentation method of the two-dimensional image.
- the network interface of the computer device is used to communicate with external terminals through a network connection.
- the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen
- the input device of the computer device may be a touch screen covered on the display screen, or may be a button, a trackball, or a touchpad provided on the computer device housing It can also be an external keyboard, touchpad or mouse.
- FIG. 18 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
- the specific terminal may include More or fewer components are shown in, or some components are combined, or have different component arrangements.
- the semantic segmentation device for three-dimensional images and the semantic segmentation device for two-dimensional images provided in this application may be implemented in the form of computer-readable instructions, which may be on a terminal as shown in FIG. 18 run.
- the memory of the terminal may store various program modules constituting the semantic segmentation device of the three-dimensional image and the semantic segmentation device of the two-dimensional image, such as the first acquisition module 1410, the slicing module 1420, the first scanning module 1430, the first segmentation module 1440 and Fusion module 1450.
- the computer-readable instructions formed by the various program modules cause the processor to execute the steps in the three-dimensional image semantic segmentation method and the two-dimensional image semantic segmentation method described in the present specification.
- FIG. 18 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
- the specific terminal may include More or fewer components are shown in, or some components are combined, or have different component arrangements.
- Embodiments of the present application provide a computer-readable storage medium that stores computer-readable instructions that are loaded by a processor and have a semantic segmentation method and a three-dimensional image segmentation method to implement the above embodiments The operations in the semantic segmentation method of two-dimensional images.
- Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory can include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM double data rate SDRAM
- ESDRAM enhanced SDRAM
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Apparatus For Radiation Diagnosis (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
- Image Processing (AREA)
Abstract
本申请公开了一种三维图像的语义分割方法、装置、终端及存储介质,属于深度学习领域。所述方法包括:获取三维图像;将三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像;调用第一分割模型、第二分割模型、第三分割模型对x轴、y轴、z轴的二维切片图像进行语义分割,得到目标对象在三个方向面上的分布概率图;调用自适应融合模型对x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行三维融合,得到目标对象的三维分布二值图。
Description
本申请要求于2018年10月16日提交中国专利局,申请号为201811204375.4,申请名称为“三维图像的语义分割方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及深度学习领域,特别涉及一种三维图像的语义分割方法、装置、终端及存储介质。
医学图像中,人体器官/组织的外形或体积变化对临床诊断有着重要的启示作用。采用深度学习模型对医学图像进行语义分割,能够得到医学图像中的人体器官/组织所在的图像区域。
相关技术中采用场景解析网络(Pyramid Scene Parsing Network,Pspnet)对二维医学图像进行语义分割。Pspnet是一种基于深度学习的语义分割技术。Pspnet采用多种不同尺寸的卷积核,多尺度采集特征图(feature map),最后将输出的特征图插值放大,得到语义分割结果。
但Pspnet是二维自然图像的语义分割技术,并不支持对三维医学图像进行语义分割。
发明内容
本申请提供的各种实施例,提供了一种三维图像的语义分割方法、装置、终端及存储介质。所述技术方案如下:
根据本公开实施例的一方面,提供了一种三维图像的语义分割方法,所述方法包括:
终端获取所述三维图像;
所述终端将所述三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像;
所述终端调用第一分割模型对所述x轴的二维切片图像进行语义分割,得 到目标对象在x轴方向面的分布概率图;
所述终端调用第二分割模型将所述y轴的二维切片图像进行语义分割,得到所述目标对象在y轴方向面的分布概率图;
所述终端调用第三分割模型将所述z轴的二维切片图像进行语义分割,得到所述目标对象在z轴方向面的分布概率图;
所述终端调用自适应融合模型对所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行三维融合,得到所述目标对象的三维分布二值图。
在一些实施例中,所述第一分割模型、所述第二分割模型和所述第三分割模型中的至少一个模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;
所述深度网络编码部,用于所述终端通过所述n层卷积层对所述二维切片图像进行降采样特征提取,得到降采样后的第一中间特征图;
所述跳跃传递解码部,用于所述终端通过所述m层反卷积层对所述第一中间特征图和第二中间特征图进行升采样处理,得到升采样后的所述分布概率图;
其中,所述第二中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
在一些实施例中,所述终端调用所述自适应融合模型将所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图;
所述终端对所述三维分布特征图进行三维融合卷积,得到三维分割概率图;
所述终端根据所述三维分割概率图中每个像素点的最大概率类别,计算得到所述目标对象的三维分布二值图。
在一些实施例中,所述三维图像是三维医学图像。
所述终端基于临床先验知识对所述三维分布二值图中的噪声像素点进行滤除处理;
所述临床先验知识是对所述三维医学图像中的所述目标对象的分布位置进行统计得到的知识。
在一些实施例中,所述终端在所述三维分布二值图中滤除超出目标取值范围的第一噪声像素点;
其中,所述目标取值范围是根据第一临床先验知识得到的所述目标对象可能出现的坐标取值范围。
在一些实施例中,所述终端在所述三维分布二值图中滤除超出三维椭球模型的第二噪声像素点;
其中,所述三维椭球模型是根据第二临床先验知识得到的所述目标对象对应的椭球模型。
在一些实施例中,当所述二维切片图像的长宽比例超过预设比例范围时,所述终端按照所述二维切片图像的短边边长所形成的正方形边框对所述二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像。
根据本公开实施例的另一方面,提供了一种二维图像的语义分割方法,所述方法包括:
终端获取所述二维图像;
所述终端调用分割模型对所述二维图像进行语义分割,得到目标对象的分布概率图;
所述终端根据所述分布概率图中每个像素点的最大概率类别,计算得到所述目标对象的二维分布二值图;
其中,所述分割模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;
所述深度网络编码部,用于所述终端通过所述n层卷积层对所述二维图像进行降采样特征提取,得到降采样后的第三中间特征图;
所述跳跃传递解码部,用于所述终端通过所述m层反卷积层对所述第三中间特征图和第四中间特征图进行升采样处理,得到升采样后的所述分布概率图;
其中,所述第四中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
在一些实施例中,所述二维图像是二维医学图像。
所述终端基于临床先验知识对所述二维分布二值图中的噪声像素点进行滤除处理;
所述临床先验知识是对所述二维医学图像中的所述目标对象的分布位置进行统计得到的知识。
在一些实施例中,所述终端在所述二维分布二值图中滤除超出目标取值范 围的第三噪声像素点;
其中,所述目标取值范围是根据第三临床先验知识得到的所述目标对象可能出现的坐标取值范围。
在一些实施例中,当所述二维图像的长宽比例超过预设比例范围时,所述终端按照所述二维图像的短边边长所形成的正方形边框对所述二维图像进行扫描框分割,得到若干个待处理的二维图像。
根据本公开实施例的另一方面,提供了一种三维图像的语义分割装置,所述装置包括:
第一获取模块,用于获取所述三维图像;
切片模块,用于将所述三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像;
第一分割模块,用于调用第一分割模型对所述x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图;
所述第一分割模块,用于调用第二分割模型将所述y轴的二维切片图像进行语义分割,得到所述目标对象在y轴方向面的分布概率图;
所述第一分割模块,用于调用第三分割模型将所述z轴的二维切片图像进行语义分割,得到所述目标对象在z轴方向面的分布概率图;
融合模块,用于调用自适应融合模型对所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行三维融合,得到所述目标对象的三维分布二值图。
在一些实施例中,所述第一分割模型、所述第二分割模型和所述第三分割模型中的至少一个模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;
所述深度网络编码部,用于通过所述n层卷积层对所述二维切片图像进行降采样特征提取,得到降采样后的第一中间特征图;
所述跳跃传递解码部,用于通过所述m层反卷积层对所述第一中间特征图和第二中间特征图进行升采样处理,得到升采样后的所述分布概率图;
其中,所述第二中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
在一些实施例中,所述融合模块,包括:
组合单元,用于调用所述自适应融合模型将所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图;
融合单元,用于对所述三维分布特征图进行三维融合卷积,得到三维分割概率图;
计算单元,用于根据所述三维分割概率图中每个像素点的最大概率类别,计算得到所述目标对象的三维分布二值图。
在一些实施例中,所述三维图像是三维医学图像;所述装置还包括:
第一滤除模块,用于基于临床先验知识对所述三维分布二值图中的噪声像素点进行滤除处理;
所述临床先验知识是对所述三维医学图像中的所述目标对象的分布位置进行统计得到的知识。
在一些实施例中,所述第一过滤模块,用于在所述三维分布二值图中滤除超出目标取值范围的第一噪声像素点;
其中,所述目标取值范围是根据第一临床先验知识得到的所述目标对象可能出现的坐标取值范围。
在一些实施例中,所述第一过滤模块,用于在所述三维分布二值图中滤除超出三维椭球模型的第二噪声像素点;
其中,所述三维椭球模型是根据第二临床先验知识得到的所述目标对象对应的椭球模型。
在一些实施例中,所述装置还包括:
第一扫描模块,用于当所述二维切片图像的长宽比例超过预设比例范围时,按照所述二维切片图像的短边边长所形成的正方形边框对所述二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像。
根据本公开实施例的另一方面,提供了一种二维图像的语义分割装置,所述装置包括:
第二获取模块,用于获取所述二维图像;
第二分割模块,用于调用分割模型对所述二维图像进行语义分割,得到目标对象的分布概率图;
计算模块,用于根据所述分布概率图中每个像素点的最大概率类别,计算得到所述目标对象的二维分布二值图;
其中,所述分割模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;
所述深度网络编码部,用于通过所述n层卷积层对所述二维图像进行降采样特征提取,得到降采样后的第三中间特征图;
所述跳跃传递解码部,用于通过所述m层反卷积层对所述第三中间特征图和第四中间特征图进行升采样处理,得到升采样后的所述分布概率图;
其中,所述第四中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
在一些实施例中,所述二维图像是二维医学图像;所述装置还包括:
第二滤除模块,用于基于临床先验知识对所述二维分布二值图中的噪声像素点进行滤除处理;
所述临床先验知识是对所述二维医学图像中的所述目标对象的分布位置进行统计得到的知识。
在一些实施例中,所述第二滤除模块,用于在所述二维分布二值图中滤除超出目标取值范围的第三噪声像素点;
其中,所述目标取值范围是根据第三临床先验知识得到的所述目标对象可能出现的坐标取值范围。
在一些实施例中,所述装置还包括:
第二扫描模块,用于当所述二维图像的长宽比例超过预设比例范围时,按照所述二维图像的短边边长所形成的正方形边框对所述二维图像进行扫描框分割,得到若干个待处理的二维图像。
根据本公开实施例的另一方面,提供了一种终端,所述终端包括处理器和存储器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述实施例所述的方法。
根据本公开实施例的另一方面,提供了一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述实施例所述的方法。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其他特征、目的和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是相关技术中提供的Pspnet网络模型的结构示意图;
图2是本申请一个示例性实施例提供的实施环境的示意图;
图3是本申请一个示例性实施例提供的三维图像的语义分割的方法流程图;
图4是本申请一个示例性实施例提供的三维医学图像的语义分割的结构示意图;
图5是本申请另一个示例性实施例提供的三维图像的语义分割的方法流程图;
图6是本申请另一个示例性实施例提供的不改变二维切片图像尺寸时目标对象形状变化的示意图;
图7是本申请另一个示例性实施例提供的改变二维切片图像尺寸时目标对象形状不变化的示意图;
图8是本申请另一个示例性实施例提供的第一分割模型的结构示意图;
图9是本申请另一个示例性实施例提供的ResNet101网络模型中第一模块的结构示意图;
图10是本申请另一个示例性实施例提供的自适应融合模型的结构示意图;
图11是本申请一个示例性实施例提供的卷积网络模型训练方法的流程图;
图12是本申请另一个示例性实施例提供的二维图像的语义分割的方法流程图;
图13是本申请一个示例性实施例提供的三维图像的语义分割的装置示意图;
图14是本申请另一个示例性实施例提供的三维图像的语义分割的装置示意图;
图15是本申请另一个示例性实施例提供的融合模块的装置示意图;
图16是本申请一个示例性实施例提供的二维图像的语义分割的装置示意图;
图17是本申请一个实施例提供的计算机设备的结构示意图;
图18是本申请一个示例性实施例提供的终端的内部结构图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
首先,对本申请实施例涉及的若干个名词进行介绍:
语义分割:是指根据图像的灰度、颜色、纹理和形状等特征把图像划分成若干互不交叠的区域,并使这些特征在同一区域内呈现出相似性,而在不同区域呈现出明显的差异性。
三维图像:相比于二维图像增加一个空间维度(比如深度维)或时间维的图像,比如三维医学图像可认为是增加了一个深度维的三维图像,视频可认为是增加了一个时间维的三维图像。
目标对象:语义分割中属于前景区域的对象。针对三维医学图像,目标对象可以是目标器官,目标器官是指人体的内脏器官或组织,和/或,动物的内脏器官或组织,比如,心脏、肺、肝、脾、胃等;针对二维医学图像,目标对象可以是目标器官。本申请实施例主要以目标对象是三维医学图像中的人体器官来举例说明。
医学图像中,人体器官/组织的外形或体积变化对临床诊断有着重要的启示作用。为了避免人工分析时可能会产生的误判,相关技术中使用卷积网络模型对医学图像进行语义分割,即将医学图像输入卷积网络模型中,由构建的卷积网络模型将医学图像中对应的人体器官/组织的特征提取出来,并对人体器官/组织的特征进行分类,从而得到人体器官/组织在医学图像中的具体区域。语义分割后的医学图像能够将人体器官/组织区域与背景区域区分开来,之后再由医生进行临床诊断。这里的“医学图像”可以包括利用X射线照射人体得到的X射线(X-Ray)图像、通过电子计算机扫描(Computerized Tomography,CT)得到的CT图像、通过磁共振成像(Magnetic Resonance Imaging,MRI)得到的MRI图像等。通过医学影像采集设备采集得到的医学图像,可以是2D医学图像,也可以是3D医学图像。
在一种示意性的相关技术中,使用Pspnet对2D医学图像进行语义分割。Pspnet采用多种不同尺寸的卷积核对输入的医学图像进行卷积,提取出医学图像的特征,形成多种不同尺寸的特征图(feature map),最后将输出的特征图插值放大,得到语义分割结果。
示意性的如图1所示,将医学图像101输入Pspnet网络模型中,提取医学图像101的特征,得到与医学图像101相同尺寸的第一特征图102。接着,Pspnet网络模型分别采用四种不同尺度的卷积核对简化后的第一特征图102进行卷积计算,获得4个与卷积核尺寸对应的子特征图,该4个子特征图的尺寸各不相同。之后,通过升采样(或叫上采样)将4个不同尺寸的子特征图的尺寸插值放大为医学图像101的尺寸,并将放大后的4个子特征图与第一特征图102进行连接,从而得到第二特征图103。最后,第二特征图103通过卷积进行语义分割后,得到最终的概率图104。
然而,Pspnet只支持对2D医学图像进行语义分割,不支持3D医学图像进行语义分割。当医学图像是CT图像和MRI图像等具备较高清晰度和检测精度的3D医学图像时,强行使用Pspnet对该3D医学图像进行语义分割,会容易出现“断层现象”,而且图像分割后的边缘拟合达不到要求。同时,Pspnet也不支持对3D医学图像的处理。
本申请实施例提供了一种三维图像的语义分割方法、装置、终端及存储介质,可以用于解决上述相关技术中存在的问题。该方法能够实现对三维图像进行语义分割。典型的,三维图像是三维医学图像或者视频。本申请实施例以三维图像是三维医学图像来举例说明。
图2示出了本申请一个示例性实施例提供的实施环境的示意图,图2中包括:医学影像采集设备100和计算机设备200。
医学影像采集设备100用于采集人体器官/组织的医学图像,该医学图像包括二维医学图像和三维医学图像。医学影像采集设备100还用于将采集到的医学图像发送至计算机设备200。计算机设备200用于接收医学图像,并对医学图像进行语义分割。
在一些实施例中,该医学影像采集设备100可以是独立于计算机设备200之外的设备,也可以组合至计算机设备200中作为一个整体的设备。
该计算机设备200包括:中央处理器(Central Processing Unit,CPU)210 和存储器220。
CPU210用于调用实现语义分割的神经网络模型。存储器220用于存储实现语义分割的神经网络模型,该神经网络模型包括第一切割模型221、第二切割模型222、第三切割模型223和自适应融合模型224。在一些实施例中,第一切割模型221、第二切割模型222和第三切割模型223是基于卷积神经网络进行语义分割的二维模型。自适应融合模型224是用于将三个二维语义分割模型的语义分割结果,进行自适应融合来得到三维语义分割结果的三维模型。
第一分割模型221,用于对该x轴的二维切片图像进行二维语义分割,得到目标器官在x轴方向面的分布概率图。
第二分割模型222,用于对该y轴的二维切片图像进行二维语义分割,得到目标器官在y轴方向面的分布概率图。
第三分割模型223,用于对该z轴的二维切片图像进行二维语义分割,得到目标器官在z轴方向面的分布概率图。
自适应融合模型224,用于对x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行三维融合,得到目标对象的三维分布二值图。
在本申请的一些实施例中,先对三维图像按照三维坐标轴所在的三个方向面进行切片进行二维切片,再通过三个分割模型对三个方向面的二维切片图像进行语义分割,得到三个方向面的分布概率图,之后通过自适应融合模型将三个分布概率图进行三维融合,得到最终的目标对象对应的三维分布二值图。
图3示出了本申请一个示例性实施例提供的三维图像的语义分割的方法流程图,该方法可以应用于图2所示的实施环境,该方法包括:
步骤301,终端获取三维图像。
在一些实施例中,终端通过影像采集设备采集三维图像。
步骤302,终端将三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像。
终端在获取到三维图像后,将三维图像按照三维坐标轴所在的三个方向面进行切片,从而得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像。
其中,x轴的方向面是指x轴和z轴所在的平面,y轴的方向面是指y轴和z轴所在的平面,z轴的方向面是指x轴和y轴所在的平面。
步骤303,终端调用第一分割模型对x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图。
CPU调用存储于存储器内的第一分割模型对x轴的二维切片图像进行语义分割。第一分割模型根据目标对象在x轴的二维切片图像中的灰度、颜色、纹理和形状等特征,完成目标对象在x轴的二维切片图像的语义分割过程,从而输出目标对象在x轴方向面的分布概率图。
步骤304,终端调用第二分割模型对y轴的二维切片图像进行语义分割,得到目标对象在y轴方向面的分布概率图。
CPU调用存储于存储器内的第二分割模型对y轴的二维切片图像进行语义分割。第二分割模型根据目标对象在y轴的二维切片图像中的灰度、颜色、纹理和形状等特征,完成目标对象在y轴的二维切片图像的语义分割过程,从而输出目标对象在y轴方向面的分布概率图。
步骤305,终端调用第三分割模型对z轴的二维切片图像进行语义分割,得到目标对象在z轴方向面的分布概率图。
CPU调用存储于存储器内的第三分割模型对z轴的二维切片图像进行语义分割。第三分割模型根据目标对象在z轴的二维切片图像中的灰度、颜色、纹理和形状等特征,完成目标对象在z轴的二维切片图像的语义分割过程,从而输出目标对象在z轴方向面的分布概率图。
步骤306,终端调用自适应融合模型对x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行三维融合,得到目标对象的三维分布二值图。
CPU调用存储于存储器中的自适应融合模型对得到的x轴,y轴,z轴对应的三个分布概率图进行自适应融合。由于自适应融合模型是将三个不同维度的二维分布概率图进行融合,因此可以抑制掉许多背景噪点,并对目标对象的边缘分割进行光滑和准确地分割,最终得到目标对象的三维分布二值图。
以三维图像是三维医学图像为例,参考图4,计算设备将输入的三维医学图像401在x轴方向面进行分割得到x轴的二维切片图像402、在y轴方向面进行分割得到y轴的二维切片图像403、在z轴方向面进行分割得到z轴的二维切片图像404,然后对三个二维切片图像进行二维语义分割,得到目标对象在三个方向面上的二维分布概率图405-407,然后由自适应融合模型将三个二维分布概率图405-407进行三维融合,得到目标对象的三维分布二值图408(3D Mask)。
综上所述,在本实施例提供的方法中,通过将获得的三维图像按照三维坐 标轴所在的三个方向面进行切片,得到三个方向面对应的二维切片图像,再通过三个方向面对应的三个分割模型,得到三个方向面对应的二维分布概率图,实现了终端对三维医学图像的二维语义分割。再通过自适应融合模型将三个分布概率图进行三维融合,得到目标对象的三维分布二值图,从而解决了相关技术中的Pspnet网络模型仅适用于2D自然图像的语义分割,不支持对3D医学图像进行语义分割的问题,达到了能够利用三个2D分割模型和一个自适应融合模型,实现对3D医学图像进行语义分割的效果,且由于自适应融合模型是将三个不同维度的二维分布概率图进行融合,在三维融合时有效抑制了背景噪点,实现对目标对象的边缘分割进行光滑和准确地分割。
图5示出了本申请另一个示例性实施例提供的三维图像的语义分割的方法流程图,该方法可以应用于图2所示的实施环境中,本实施例中以三维图像是三维医学图像,目标对象是目标器官来举例说明。该方法包括:
步骤501,终端获取三维医学图像。
计算机设备通过医学影像采集设备采集三维医学图像,该三维医学图像包括三维目标器官,以及除目标器官以外的背景区域。
步骤502,终端将三维医学图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像。
故计算机设备在获取到三维医学图像后,将三维医学图像按照三维坐标轴所在的三个方向面进行切片,从而得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像。
在一些实施例中,由于每种目标器官在三维医学图像中的分布位置是相对固定的,计算机设备还读取预先存储的第一临床先验知识,该第一临床先验知识用于指示目标器官在每个二维切片图像中的候选出现位置的目标取值范围。比如,目标器官A在x轴的二维切片图像中候选出现的横坐标范围为[a1,a2],在y轴的二维切片图像中候选出现的纵坐标范围为[b1,b2]。该目标取值范围用于在后处理过程中进行第一噪声滤除。
在一些实施例中,由于每种目标器官的外部形状呈椭球形,计算机设备还读取预先存储的第二临床先验知识,该第二临床先验知识用于指示目标器官的3D椭球模型。示意性的,计算机设备利用第二临床先验知识统计出目标器官在x,y,z三个方向面上可能的最长轴、最短轴,从而预先建立目标器官的三维椭 球模型。该三维椭球模型指示了目标器官在三维医学图像中的候选出现位置,该三维椭球模型用于在后处理过程中进行第二噪声滤除。
步骤503,当二维切片图像的长宽比例超过预设比例范围时,终端按照二维切片图像的短边边长所形成的正方形边框对二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像。
由于三个坐标轴所对应的分割模型的输入图尺寸通常为正方形尺寸,而某些实施方式中的二维切片图像太过狭长,直接将狭长的二维切片图像转换为正方形尺寸后,会导致目标器官发生严重变形而导致语义分割失败,因此计算机设备还可以采用如下图像预处理方式,对二维切片图像进行处理。
在一些实施例中,当获得的二维切片图像的长宽比例在预设比例范围内时,计算机设备将二维切片图像的尺寸转换为符合分割模型的输入尺寸。预设比例范围可以是[1/3,3]。
在一些实施例中,如图6所示,当获得的二维切片图像的长宽比例超过预设比例范围时,即二维切片图像的长宽比例超出[1/3,3]时,认为二维切片图像太过狭长,若计算机设备直接将二维切片图像601按照原始尺寸转换为输入尺寸602,该输入尺寸是符合分割模型的像素点尺寸,则二维切片图像601中的目标器官会挤压成条形,从而造成最终的预测结果不准确。
在这种情况下,如图7所示,计算机设备在训练分割模型时,将根据样本图像得到的二维切片图像701按照短边边长所形成的正方形边框进行切割,得到中间待处理的二维切片图像702。计算机设备将中间待处理的二维切片图像702的尺寸转换为分割模型的输入尺寸703进行训练。
计算机设备在测试过程或预测过程中,将根据三维医学图像获得的二维切片图像704按照短边边长所形成的正方形边框对二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像705(比如图7中为3张)。之后,计算机设备再将若干个待处理的二维切片图像705的尺寸转换为分割模型的输入尺寸703,然后分别输入至分割模型进行预测。
步骤504,终端调用第一分割模型对x轴的二维切片图像进行语义分割,得到目标器官在x轴方向面的分布概率图。
计算机设备调用存储于存储器内的第一分割模型对x轴的二维切片图像进行语义分割。第一分割模型根据目标器官在三维医学图像中的分布位置、大小和形状等特征,完成目标器官在x轴的二维切片图像的语义分割过程,从而输 出目标器官在x轴方向面的分布概率图。
在一些实施例中,该第一分割模型包括深度网络解码部和跳跃传递解码部,深度网络解码部包括n层卷积层,跳跃传递解码部包括m层反卷积层,n和m均为正整数。
该深度网络编码部,用于通过n层卷积层对二维切片图像进行降采样特征提取,得到降采样后的第一中间特征图。该跳跃传递解码部,用于通过m层反卷积层对第一中间特征图和第二中间特征图进行升采样处理,得到升采样后的分布概率图。其中,第二中间特征图包括n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
在一些实施例中,该深度网络编码部是基于残差网络模型构建的神经网络模型,或,该深度网络编码部是基于其它分类模型构建的神经网络模型,本实施例不做限定。
示意性的,如图8所示,计算机设备将获得的x轴的二维切片图像801输入至基于ResNet101模型构建的深度网络编码部802中,该深度网络编码部802包括5层卷积层,该5层卷积层分别为Conv1、Conv2_x、Conv3_x、Conv4_x和Conv5_x,每层卷积层对应的卷积核尺寸和卷积核数量,以及每次卷积核进行卷积的步长(stride)的信息如表一所示。表中的x表示属于该层卷积层的子卷积层编号。
表一
如表一所示,深度网络编码部802的Conv1层包括64个7x7的卷积核,每次卷积步长为2。Conv2_x包括级联的1个子卷积层和3个第一模块,第一个子卷积层包括3x3的卷积核,每次卷积步长为2,且在该第一个子卷积层卷积后进行一次最大池化。位于第一个子卷积层后的三个第一模块(block)是相同的。如图9所示,该第一模块包括三个子卷积层,第一子卷积层901包括64个1x1的卷积核,第二子卷积层902包括64个3x3的卷积核,第三子卷积层903包括256个1x1的卷积核,且每层子卷积层之后都连接有激活(relu)层和批归一化(Batch Normalization,BN)层(图中未示出)。此外,该第一模块还用于将上一层第一个子卷积层输出的特征图对应的像素通过跳跃连接(skip connection)映射至第三子卷积层903输出的特征图中,并通过relu层进行激活,从而得到下一个模块的输入的特征图。该relu层用于将卷积后得到的线性数据转化为非线性数据,从而提升ResNet101模型的表达能力。该BN层用于加快ResNet101模型的收敛速度,缓解了ResNet101模型深层时的梯度弥散问题,从而使得该ResNet101模型更加稳定和易于训练。
Conv3_x包括级联的4个第二模块,4个第二模块是相同的。且第二模块与第一模块的结构相同,可以参考第一模块的结构理解第二模块。该第二模块包括三个子卷积层,第四子卷积层包括128个1x1的卷积核,且每次卷积步长为2,第五子卷积层包括128个3x3的卷积核,第六子卷积层包括512个1x1的卷积核,且每层子卷积层之后都连接有relu层和BN层。此外,该第二模块还用于将上一个模块的输出的特征图对应的像素通过跳跃连接映射至第六子卷积层输出的特征图中,并通过relu层进行激活,从而得到下一个模块的输入的特征图。
Conv4_x包括级联的23个第三模块,23个第三模块是相同的。且第三模块与第一模块的结构相同,可以参考第一模块的结构理解第三模块。该第三模块包括三个子卷积层,第七子卷积层包括256个1x1的卷积核,且每次卷积步长为1,为了保证每层第七子卷积层之后输出的特征图的面积(又称感受野)不缩小,空洞设置为2,第八子卷积层包括256个3x3的卷积核,第九子卷积层包括1024个1x1的卷积核,且每层子卷积层之后都连接有relu层和BN层。此外,该第三模块还用于将上一个模块输出的特征图对应的像素通过跳跃连接映射至 第九子卷积层输出的特征图中,并通过relu层进行激活,从而得到下一个模块的输入的特征图。
其中,空洞卷积也称为扩张卷积,是一种在卷积核之间注入空洞的一种卷积方式。相较于普通卷积,空洞卷积引入了一个称为“扩张率(dilation rate)”的超参数,该参数定义了卷积核处理数据时各值的间距。通过空洞卷积处理,一方面能够保持图像特征的空间尺度不变,从而避免因减少了图像特征的像素的信息而导致的信息损失,另一方面能够扩大感受野,从而实现更加精准的目标检测。其中,感受野是神经网络中的隐藏层输出的特征图上的像素点在原始图像上映射的区域大小,像素在原始图像上的感受野越大,表示其映射的原始图像范围越大,也意味着其可能蕴含更为全局、语义层次更高的特征。
Conv5_x包括级联的3个第四模块,3个第四模块是相同的。且第四模块与第一模块的结构相同,可以参考第一模块的结构理解第四模块。该第四模块包括三个子卷积层,第十子卷积层包括512个1x1的卷积核,第十一子卷积层包括512个3x3的卷积核,第十二子卷积层包括2048个1x1的卷积核,且每层子卷积层之后都连接有relu层和BN层。此外,该第四模块还用于将上一个模块输出的特征图对应的像素通过跳跃连接映射至第十二子卷积层输出的特征图中,并通过relu层进行激活,从而得到下一个模块的输入的特征图。
x轴的二维切片图像801经由深度网络编码部802的5层卷积层提取特征后,得到第一中间特征图(1),该第一中间特征图(1)对应于x轴的方向面。示意性的,该第一中间特征图(1)是8倍降采样后的特征图。在一些实施例中,在Conv5_3之后使用pooling进行降采样,鉴于3D切割为切片时容易出现尺度范围分布差异巨大的情况,需要加入多尺度信息,降采样核尺寸设置为1/9/19/37/74五种。
示意性的,计算机设备接着将第一中间特征图(1)输入至跳跃传递解码部803,该跳跃传递解码部803包括2层反卷积层。计算机设备通过反卷积层进行逐步式解码第一中间特征图(1),解码次数为两次,每次解码倍数为2。解码第一中间特征图(1)是指将第一中间特征图(1)与深度网络编辑部802中预定层输出的特征图进行跳跃连接以及升采样处理。第一层反卷积层将第一中间特征图(1)与深度网络解码部802的Conv3_x卷积层输出的第二中间特征图(2)进行跳跃连接以及2倍升采样处理,得到2倍升采样后的第一中间特征图(1’),将升采样后的第一中间特征图(1’)与深度网络解码部802的Conv1卷积层输 出的第二中间特征图(2’)进行跳跃连接以及2倍升采样处理,得到4倍升采样后的第二中间特征图(2’),得到最终的分布概率图。在一些实施例中,跳跃连接的第一中间特征图和第二中间特征图的尺寸是相同的。
计算机设备通过第一分割模型得到目标器官在x轴方向面的分布概率图804。该分布概率图804指示了二维切片图像上的每个像素点属于前景区域的概率和/或属于背景区域的概率。前景区域是指目标器官所在的区域,背景区域是指非目标器官所在的区域。
步骤505,终端调用第二分割模型对y轴的二维切片图像进行语义分割,得到目标器官在y轴方向面的分布概率图。
在一些实施例中,第二分割模型和第一分割模型具有相同的结构,仅为训练过程所使用的样本图像存在区别。因此,采用第二分割模型对y轴的二维切片图像进行语义分割的过程可以参考步骤504的描述,不再赘述。
步骤506,终端调用第三分割模型对z轴的二维切片图像进行语义分割,得到目标器官在z轴方向面的分布概率图。
在一些实施例中,第三分割模型和第一分割模型具有相同的结构,仅为训练过程所使用的样本图像存在区别。因此,采用第二分割模型对y轴的二维切片图像进行语义分割的过程可以参考步骤504的描述,不再赘述。
步骤507,终端调用自适应融合模型将x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图。
计算机设备调用存储于存储器中的自适应融合模型将x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图。
如图10所示,计算机设备将得到的目标器官在x轴方向面的分布概率图1001、目标器官在y轴方向面的分布概率图1002和目标器官在z轴方向面的分布概率图1003进行三维融合,得到三维分布特征图1004。该三个方向面的分布概率图1001-1003的尺寸与三维医学图像的尺寸相同,并具有各自方向面对应的概率。该三维分布特征图1004包括三个方向面各自对应的目标器官对应的概率,且该三维分布特征图1004的尺寸与三维医学图像的尺寸相同。
步骤508,终端对三维分布特征图进行三维融合卷积,得到三维分割概率图。
计算机设备调用存储于存储器中的自适应融合模型对得到的三维分布特征图1004进行三层卷积层进行三维融合卷积,从而得到三维分割概率图1005,该三维分割概率图1005用于指示三维医学图像中每个像素点属于前景区域的概率 和/或属于背景区域的概率。前景区域是指目标器官所在的区域,背景区域是指非目标器官所在的区域。图10中,H*W*D*C表示图像的尺寸和对应的概率。
在一些实施例中,该自适应融合模型包括三个浅层的3D卷积层,第一3D卷积层包括64个3*3*3的3D卷积核且卷积步长为1、第二3D卷积层包括64个3*3*3的3D卷积核且卷积步长为1、第三3D卷积层包括1个3*3*3的3D卷积核且卷积步长为1。
在一些实施例中,该三维分割概率图1005的尺寸与三维医学图像的尺寸相同。
步骤509,终端根据三维分割概率图中每个像素点的最大概率类别,计算得到目标器官的三维分布二值图。
在一些实施例中,自适应融合模型根据三维分割概率图中每个像素点的最大概率类别,确定图像中每个像素点所属的类别,该类别包括属于目标器官的前景像素点和不属于目标器官的背景像素点。
在一些实施例中,三维分割概率图1005中包括每个像素点属于前景像素点的第一概率以及属于背景像素点的第二概率,最大概率类别是根据第一概率和第二概率中较大一个概率所对应的类别。比如,某个像素点属于前景像素点的概率是80%,属于背景像素点的概率是20%,则该像素点的最大概率类别是前景像素点。在一些实施例中,三维分布二值图中采用1来代表前景像素点,0来代表背景像素点。
步骤510,终端基于临床先验知识对三维分布二值图中的噪声像素点进行滤除处理。
由于每种目标器官在三维医学图像中的分布位置是相对固定的,计算机设备还可采用临床先验知识对三维分布二值图中的噪声像素点进行滤除。
第一,计算机设备在三维分布二值图中滤除超出目标取值范围的第一噪声像素点。
其中,目标取值范围是根据第一临床先验知识得到的目标器官可能出现的坐标取值范围。在一些实施例中,目标取值范围是一个三维的立方体框区域。第一临床先验知识可以是基于多个样本图像构建的。
第二,计算机设备在三维分布二值图中滤除超出三维椭球模型的第二噪声像素点。
其中,三维椭球模型是根据第二临床先验知识得到的目标器官对应的椭球 模型。第二临床先验知识可以是基于多个样本图像构建的。由于大部分器官的形状偏向椭球体,故终端可以预先统计目标器官在x轴、y轴、z轴各自的方向面上的二维切片图像上的最长轴和最短轴,从而构建目标器官的三维椭球模型。根据构建的三维椭球模型,在候选像素点中滤除超出三维椭球模型的噪声像素点。
在一些实施例中,计算机设备滤除噪声像素点的方法可以采用上述两种滤除方式中的至少一种。
综上所述,在本实施例提供的方法中,通过将获得的三维图像按照三维坐标轴所在的三个方向面进行切片,得到三个方向面对应的二维切片图像,再通过三个方向面对应的三个分割模型,得到三个方向面对应的二维分布概率图,实现了终端对三维医学图像的二维语义分割。再通过自适应融合模型将三个分布概率图进行三维融合,得到目标对象的三维分布二值图,从而解决了相关技术中的Pspnet网络模型仅适用于2D自然图像的语义分割,不支持对3D医学图像进行语义分割的问题,达到了能够利用三个2D分割模型和一个自适应融合模型,实现对3D医学图像进行语义分割的效果,且由于自适应融合模型是将三个不同维度的二维分布概率图进行融合,在三维融合时有效抑制了背景噪点,实现对目标对象的边缘分割进行光滑和准确地分割。
在本实施例提供的方法中,通过临床先验知识对噪声像素点的过滤处理,终端获取属于目标器官的像素点,具备较强的降噪能力和良好的边缘分割效果。
在本实施例提供的方法中,将二维切片图像的尺寸从原始尺寸改为输入尺寸,避免使用二维切片图像原始尺寸可能产生的误差问题,使得三维医学图像进行语义分割时,能够将目标器官准确的分割。在实际应用中,能够实现多种器官/组织与外形相关的自动化病变的判断,从而达到辅助诊断的目的。
在一些实施例中,上述第一分割模型、第二分割模型、第三分割模型和自适应融合模型都属于卷积网络模型,在调用该卷积网络模型之前,还需要通过计算机设备对卷积网络模型进行训练,如图11所示,三个二维分割模型的训练方法包括但不仅限于如下步骤:
步骤1101,终端获取至少一组样本图像。
计算机设备通过医学影像采集设备采集得到至少一组样本图像,每组样本图像的数量不限,可以根据训练人员的需求设定。样本图像可以包括具有样本 器官的图像、不具有样本器官的图像。对于存在样本器官的样本图像,该样本图像中标定有样本器官所属的像素点。
对于第一分割模型,样本图像可以是x轴方向面上的二维切片图像,该x轴方向面上的二维切片图像上标定有样本器官所属的像素点。
对于第二分割模型,样本图像可以是y轴方向面上的二维切片图像,该y轴方向面上的二维切片图像上标定有样本器官所属的像素点。
对于第三分割模型,样本图像可以是z轴方向面上的二维切片图像,该z轴方向面上的二维切片图像上标定有样本器官所属的像素点。
步骤1102,终端获取样本图像中的样本器官的标定结果,得到样本图像和样本图像对应的样本器官组成的样本图像数据组,标定结果包括样本器官在样本图像中的分布位置。
计算机设备在获取样本图像后,由训练人员或计算机设备对样本图像中设定标定结果,该标定结果包括样本器官所属的像素点。该标定结果用于指示样本器官在样本图像中的分布位置、样本器官的大小和样本器官对应的椭球体形状中的至少一种信息。
比如,在具有样本器官的图像中标定出样本器官所在的区域和除样本器官以外的背景区域;在不具有样本器官的图像中标定出不存在样本器官所在的区域。
该样本图像数据组用于与样本图像对应的训练结果进行比较。
步骤1103,终端将样本图像输入原始的分割模型,得到训练结果。
计算机设备将标定过的同一组样本图像输入原始的分割模型中,通过原始的分割模型对样本图像和样本图像中的样本器官进行识别,将识别结果作为训练结果输出。
在一些实施例中,原始的分割模型是基于ResNet模型所构建的模型,如图8所示。该分割模型的初始权重可以由训练人员根据经验值进行设定,或,由计算机设备随机设定。在一个可能的实施例中,该分割模型中深度网络编码部的权重采用在ImageNet数据集上训练的ResNet参数进行初始化,而跳跃传递解码部的权重采用均值为0,方差为2除以输入个数的高斯分布数值进行初始化。
步骤1104,终端根据每组样本图像数据组,将训练结果与样本器官的标定结果进行比较,得到计算损失,计算损失用于指示训练结果与样本器官的标定结果之间的误差。
计算机设备根据得到的训练结果与同一组样本图像对应的样本图像数据组进行比较,计算训练结果与标定结果之间的误差。在一些实施例中,该误差为加权损失函数。该计算损失用于指示训练结果与样本器官的标定结果之间的误差。该加权损失函数采用交叉熵损失(cross entropy loss)函数,该交叉熵损失函数的加权损失公式为
其中,p表示该像素属于目标器官对应的目标像素的概率;y表示类别,即y是0或1;w
fg表示前景类别的权重,w
wg表示背景类别的权重;t
i表示第i张样本图像中前景的像素个数;n
i表示第i张样本图像中全图的像素个数;N为一个批输入尺寸(Batchsize)的样本图像的个数,加权的数值来源于对样本图像中前景和背景的比例统计。
步骤1105,终端根据至少一组样本图像数据组各自对应的计算损失,采用误差反向传播算法训练得到分割模型。
终端根据至少一组样本图像数据组各自对应的计算损失,采用误差反向传播算法重新设定权重,直至终端根据重新设定的权重得到的加权损失满足预设阈值,或终端进行训练的次数达到预设次数。如,要求训练次数达到20000次,终端可以停止训练,这时用于二维语义分割的分割模型训练完成。在一些实施例中,误差反向传播算法可以采用SGD(Stochastic Gradient Descent)的梯度下降法,根据SGD求解分割模型的卷积模板参数w和偏置参数b,训练迭代参数可以根据交叉验证选择。
在三个坐标轴分别对应的分割模型训练完毕后,根据每个三维样本图像的二维切片图像在训练完毕的分割模型中得到二维的分布概率图,将该二维的分布概率图和标定好的三维二值图作为另一组样本图像组。使用该样本图像组对自适应融合模型进行训练,该自适应融合模型的训练过程与上述方法相同或相似,本申请不再赘述。
在一些实施例中,计算加权损失是通过计算特征图中的每个像素点属于目标像素的概率,该目标像素是目标器官中每个特征对应的像素。
需要说明的是,自适应融合模型的训练过程与三个分割模型的训练过程相同,自适应融合模型的训练过程可以参照图11所示的步骤实现。其中,当自适应融合模型获得训练结果后,使用Dice损失(Dice Loss)函数作为损失函数, 该Dice损失函数用于计算自适应融合模型的训练结果与自适应融合模型的标定结果之间的误差。
在本申请提供的三维图像的语义分割方法也可以应用于二维图像的语义分割方法中。
图12示出了本申请另一个示例性实施例提供的二维图像的语义分割的方法流程图,该方法可以应用于图2所示的实施环境中,本实施例中以二维图像是二维医学图像,目标对象是目标器官来举例说明。该方法包括:
步骤1201,终端获取二维医学图像。
计算机设备通过医学影像采集设备采集二维医学图像,该二维医学图像包括二目标器官,以及除目标器官以外的背景区域。
计算机设备在获取到二维医学图像后,对进行分析。在一些实施例中,由于每种目标器官在二维医学图像中的分布位置是相对固定的,计算机设备还读取预先存储的第三临床先验知识,该第三临床先验知识用于指示目标器官在每个二维医学图像中的候选出现位置的目标取值范围。比如,目标器官A在x轴的二维医学图像中候选出现的横坐标范围为[a1,a2],在y轴的二维医学图像中候选出现的纵坐标范围为[b1,b2]。该目标取值范围用于在后处理过程中进行第三噪声滤除。
步骤1202,当二维医学图像的长宽比例超过预设比例范围时,终端按照二维医学图像的短边边长所形成的正方形边框对二维医学图像进行扫描框分割,得到若干个待处理的二维医学图像。
由于两个个坐标轴所对应的分割模型的输入图尺寸通常为正方形尺寸,而某些实施方式中的二维医学图像太过狭长,直接将狭长的二维医学图像转换为正方形尺寸后,会导致目标器官发生严重变形而导致语义分割失败,因此计算机设备还可以采用如下图像预处理方式,对二维医学图像进行处理。
在一些实施例中,当获得的二维医学图像的长宽比例在预设比例范围内时,计算机设备将二维医学图像的尺寸转换为符合分割模型的输入尺寸。预设比例范围可以是[1/3,3]。
在一些实施例中,如图6所示,当获得的二维医学图像的长宽比例超过预设比例范围时,即二维医学图像的长宽比例超出[1/3,3]时,认为二维医学图像太过狭长,若计算机设备直接将二维医学图像按照原始尺寸转换为输入尺寸, 该输入尺寸是符合分割模型的像素点尺寸,则二维医学图像中的目标器官会挤压成条形,从而造成最终的预测结果不准确。
在这种情况下,如图7所示,计算机设备按照二维医学图像的短边边长所形成的正方形边框对该二维医学图像进行扫描框分割,得到若干个待处理的二维医学图像。之后,计算机设备再将若干个待处理的二维医学图像的尺寸转换为分割模型的输入尺寸,然后分别输入至分割模型进行预测。
步骤1203,终端调用分割模型对二维医学图像进行语义分割,得到目标器官的分布概率图。
其中,分割模型结构与第一分割模型结构相同,故分割模型结构可以参考如图8所示的模型结构。分割模型包括:深度网络编码部和跳跃传递解码部,深度网络编码部包括n层卷积层,跳跃传递解码部包括m层反卷积层,n和m均为正整数。
深度网络编码部,用于终端通过n层卷积层对二维图像进行降采样特征提取,得到降采样后的第三中间特征图。
跳跃传递解码部,用于终端通过m层反卷积层对第三中间特征图和第四中间特征图进行升采样处理,得到升采样后的分布概率图。
其中,第四中间特征图包括n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
在一些实施例中,分割模型和第一分割模型具有相同的结构,仅为训练过程所使用的样本图像存在区别。因此,采用分割模型对二维医学图像进行语义分割的过程可以参考步骤504的描述,不再赘述。
步骤1204,终端根据分布概率图中每个像素点的最大概率类别,计算得到目标器官的二维分布二值图。
在一些实施例中,分割模型根据分布概率图中每个像素点的最大概率类别,确定图像中每个像素点所属的类别,该类别包括属于目标器官的前景像素点和不属于目标器官的背景像素点。
在一些实施例中,分布概率图中包括每个像素点属于前景像素点的第三概率以及属于背景像素点的第四概率,最大概率类别是根据第三概率和第四概率中较大一个概率所对应的类别。比如,某个像素点属于前景像素点的概率是80%,属于背景像素点的概率是20%,则该像素点的最大概率类别是前景像素点。在一些实施例中,二维分布二值图中采用1来代表前景像素点,0来代表背景像 素点。
步骤1205,终端基于临床先验知识对二维分布二值图中的噪声像素点进行滤除处理。
由于每种目标器官在二维医学图像中的分布位置是相对固定的,计算机设备还可采用临床先验指示对二维分布二值图中的噪声像素点进行滤除。
计算机设备在二维分布二值图中滤除超出目标取值范围的第三噪声像素点。
其中,目标取值范围是根据第三临床先验知识得到的目标器官可能出现的坐标取值范围。在一些实施例中,目标取值范围是一个二维的平面框区域。第三临床先验知识可以是基于多个样本图像构建的。
综上所述,在本实施例提供的方法中,通过分割模型对获得的二维图像进行语义分割,得到目标器官的分布概率图,通过确定分布概率图中每个像素点的最大概率类别,得到目标器官的二维分布二值图,并根据第三临床先验知识对得到的二维分布二值图进行噪声像素点的滤除,实现对二维图像的语义分割的目的,且经过对噪声像素点的滤除,使得语义分割后的图像分割边界清晰,且边缘处理友好。并且证明了本申请提供的三维图像的语义分割方法不仅适用于三维图像的语义分割,还适用于二维图像的语义分割,而且分割效果较为优秀。
应该理解的是,本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,还提供了一种终端,该终端包括三维图像的语义分割装置和二维图形的语义分割装置,三维图像的语义分割装置和二维图形的语义分割装置中包括各个模块,每个模块可全部或部分通过软件、硬件或其组合来实现。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
图13示出了本申请一个示例性实施例提供的三维图像的语义分割的装置示意图,该装置包括:
第一获取模块1310,用于获取三维图像。
切片模块1320,用于将三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像。
第一分割模块1330,用于调用第一分割模型对x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图;调用第二分割模型将y轴的二维切片图像进行语义分割,得到目标对象在y轴方向面的分布概率图;调用第三分割模型将z轴的二维切片图像进行语义分割,得到目标对象在z轴方向面的分布概率图。
融合模块1340,用于调用自适应融合模型对x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行三维融合,得到目标对象的三维分布二值图。
图14示出了本申请另一个示例性实施例提供的三维图像的语义分割的装置示意图,该装置包括:
第一获取模块1410,用于获取三维图像。
切片模块1420,用于将三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像。
第一扫描模块1430,用于当二维切片图像的长宽比例超过预设比例范围时,按照二维切片图像的短边边长所形成的正方形边框对二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像。
第一分割模块1440,用于调用第一分割模型对x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图;调用第二分割模型将y轴的二维切片图像进行语义分割,得到目标对象在y轴方向面的分布概率图;调用第三分割模型将z轴的二维切片图像进行语义分割,得到目标对象在z轴方向面的分布概率图。
在一些实施例中,第一分割模型、第二分割模型和第三分割模型中的至少一个模型包括:深度网络编码部和跳跃传递解码部,深度网络编码部包括n层卷积层,跳跃传递解码部包括m层反卷积层,n和m均为正整数。
深度网络编码部,用于通过n层卷积层对二维切片图像进行降采样特征提取,得到降采样后的第一中间特征图。
跳跃传递解码部,用于通过m层反卷积层对第一中间特征图和第二中间特征图进行升采样处理,得到升采样后的分布概率图。
其中,第二中间特征图包括n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
融合模块1450,用于调用自适应融合模型对x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行三维融合,得到目标对象的三维分布二值图。
在一些实施例中,如图15所示,融合模块1450,包括:
组合单元1451,用于调用自适应融合模型将x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图。
融合单元1452,用于对三维分布特征图进行三维融合卷积,得到三维分割概率图。
计算单元1453,用于根据三维分割概率图中每个像素点的最大概率类别,计算得到目标对象的三维分布二值图。
在一些实施例中,三维图像是三维医学图像;
第一滤除模块1460,用于基于临床先验知识对三维分布二值图中的噪声像素点进行滤除处理;
所述临床先验知识是对所述三维医学图像中的所述目标对象的分布位置进行统计得到的知识。
在一些实施例中,第一滤除模块1460,用于在三维分布二值图中滤除超出目标取值范围的第一噪声像素点。
其中,目标取值范围是根据第一临床先验知识得到的目标对象可能出现的坐标取值范围。
在一些实施例中,第一滤除模块1460,用于在三维分布二值图中滤除超出三维椭球模型的第二噪声像素点。
其中,三维椭球模型是根据第二临床先验知识得到的目标对象对应的椭球模型。
相关细节可结合参考图3和图5所示的方法实施例。其中,第一获取模块1410还用于实现上述方法实施例中其他任意隐含或公开的获取步骤相关的功 能;切片模块1420还用于实现上述方法实施例中其他任意隐含或公开的切片步骤相关的功能;第一扫描模块1430还用于实现上述方法实施例中其他任意隐含或公开的扫描步骤相关的功能;第一分割模块1440还用于实现上述方法实施例中其他任意隐含或公开的分割步骤相关的功能;融合模块1450还用于实现上述方法实施例中其他任意隐含或公开的融合步骤相关的功能;第一滤除模块1460还用于实现上述方法实施例中其他任意隐含或公开的滤除步骤相关的功能。
需要说明的是:上述实施例提供的三维图像的语义分割装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的三维图像的语义分割装置与三维图像的语义分割方法的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图16示出了本申请一个示例性实施例提供的二维图像的语义分割的装置示意图,该装置包括:
第二获取模块1610,用于获取二维图像。
第二扫描模块1620,用于当二维图像的长宽比例超过预设比例范围时,按照二维图像的短边边长所形成的正方形边框对二维图像进行扫描框分割,得到若干个待处理的二维图像。
第二分割模块1630,用于调用分割模型对二维图像进行语义分割,得到目标对象的分布概率图;
其中,分割模型包括:深度网络编码部和跳跃传递解码部,深度网络编码部包括n层卷积层,跳跃传递解码部包括m层反卷积层,n和m均为正整数。
深度网络编码部,用于通过n层卷积层对二维图像进行降采样特征提取,得到降采样后的第三中间特征图。
跳跃传递解码部,用于通过m层反卷积层对第三中间特征图和第四中间特征图进行升采样处理,得到升采样后的分布概率图。
其中,第四中间特征图包括n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
计算模块1640,用于根据分布概率图中每个像素点的最大概率类别,计算 得到目标对象的二维分布二值图。
在一些实施例中,二维图像是二维医学图像;
第二滤除模块1650,用于基于临床先验知识对二维分布二值图中的噪声像素点进行滤除处理;
所述临床先验知识是对所述二维医学图像中的所述目标对象的分布位置进行统计得到的知识。
在一些实施例中,第二滤除模块1650,用于在二维分布二值图中滤除超出目标取值范围的第三噪声像素点。
其中,目标取值范围是根据第三临床先验知识得到的目标对象可能出现的坐标取值范围。
相关细节可结合参考图12所示的方法实施例。其中,第二获取模块1610还用于实现上述方法实施例中其他任意隐含或公开的获取步骤相关的功能;第二扫描模块1620还用于实现上述方法实施例中其他任意隐含或公开的扫描步骤相关的功能;第二分割模块1630还用于实现上述方法实施例中其他任意隐含或公开的分割步骤相关的功能;计算模块1640还用于实现上述方法实施例中其他任意隐含或公开的计算步骤相关的功能;第二滤除模块1650还用于实现上述方法实施例中其他任意隐含或公开的滤除步骤相关的功能。
需要说明的是:上述实施例提供的二维图像的语义分割装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的二维图形的语义分割装置与二维图像的语义分割方法的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图17示出了本申请一个实施例提供的计算机设备的结构示意图。该计算机设备用于实施上述实施例中提供的三维图像的语义分割方法和二维图像的语义分割方法。具体来讲:
计算机设备1700包括中央处理单元(CPU)1701、包括随机存取存储器(RAM)1702和只读存储器(ROM)1703的系统存储器1704,以及连接系统存储器1704和中央处理单元1701的系统总线1705。该计算机设备1700还包括 帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)1706,和用于存储操作系统1713、应用程序1714和其他程序模块1715的大容量存储设备1707。
基本输入/输出系统1706包括有用于显示信息的显示器1708和用于用户输入信息的诸如鼠标、键盘之类的输入设备1709。其中该显示器1708和输入设备1709都通过连接到系统总线1705的输入输出控制器1710连接到中央处理单元1701。基本输入/输出系统1706还可以包括输入输出控制器1710以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1710还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备1707通过连接到系统总线1705的大容量存储控制器(未示出)连接到中央处理单元1701。该大容量存储设备1707及其相关联的计算机可读介质为计算机设备1700提供非易失性存储。也就是说,大容量存储设备1707可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知该计算机存储介质不局限于上述几种。上述的系统存储器1704和大容量存储设备1707可以统称为存储器。
根据本申请的各种实施例,计算机设备1700还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设别1700可以通过连接在系统总线1705上的网络接口单元1711连接到网络1712,或者说,也可以使用网络接口单元1711来连接到其他类型的网络或远程计算机系统(未示出)。
存储器还包括一个或者一个以上的程序,该一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行。上述一个或者一个以上程序包含用于进行以下操作的指令:
获取三维图像;将三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像;调用第一分割模型对x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图;调用第二分割模型将y轴的二维切片图像进行语义分割,得到 目标对象在y轴方向面的分布概率图;调用第三分割模型将z轴的二维切片图像进行语义分割,得到目标对象在z轴方向面的分布概率图;调用自适应融合模型对x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行三维融合,得到目标对象的三维分布二值图。
假设上述为第一种可能的实施方式,则在第一种可能的实施方式作为基础而提供的第二种可能的实施方式中,所述服务器的存储器中,还包含用于执行以下操作的指令:
获取二维图像;调用分割模型对二维图像进行语义分割,得到目标对象的分布概率图;根据分布概率图中每个像素点的最大概率类别,计算得到目标对象的二维分布二值图。
图18示出了一个实施例中终端的内部结构图。如图18所示,该终端包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。其中,存储器包括非易失性存储介质和内存储器。该终端的非易失性存储介质存储有操作系统,还可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器实现三维图像的语义分割方法和二维图像的语义分割方法。该内存储器中也可储存有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行三维图像的语义分割方法和二维图像的语义分割方法。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸屏,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接键盘、触控板或鼠标等。
本领域技术人员可以理解,图18示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的三维图像的语义分割装置和二维图像的语义分割装置可以实现为一种计算机可读指令的形式,计算机可读指令可在如图 18所示的终端上运行。终端的存储器中可存储组成该三维图像的语义分割装置和二维图像的语义分割装置的各个程序模块,比如第一获取模块1410、切片模块1420、第一扫描模块1430、第一分割模块1440和融合模块1450。各个程序模块构成的计算机可读指令使得处理器执行本说明书中描述的本申请各个实施例的三维图像的语义分割方法和二维图像的语义分割方法中的步骤。
本领域技术人员可以理解,图18示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
本申请实施例提供了一种计算机可读存储介质,所述存储介质中存储有计算机可读指令,该计算机可读指令由处理器加载并具有以实现上述实施例的三维图像的语义分割方法和二维图像的语义分割方法中所具有的操作。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化, 这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
Claims (24)
- 一种三维图像的语义分割方法,所述方法包括:终端获取所述三维图像;所述终端将所述三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像;所述终端调用第一分割模型对所述x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图;所述终端调用第二分割模型将所述y轴的二维切片图像进行语义分割,得到所述目标对象在y轴方向面的分布概率图;所述终端调用第三分割模型将所述z轴的二维切片图像进行语义分割,得到所述目标对象在z轴方向面的分布概率图;所述终端调用自适应融合模型对所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行三维融合,得到所述目标对象的三维分布二值图。
- 根据权利要求1所述的方法,其特征在于,所述第一分割模型、所述第二分割模型和所述第三分割模型中的至少一个模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;所述深度网络编码部,用于所述终端通过所述n层卷积层对所述二维切片图像进行降采样特征提取,得到降采样后的第一中间特征图;所述跳跃传递解码部,用于所述终端通过所述m层反卷积层对所述第一中间特征图和第二中间特征图进行升采样处理,得到升采样后的所述分布概率图;其中,所述第二中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
- 根据权利要求1所述的方法,其特征在于,所述终端调用自适应融合模型对所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行三维融合,得到所述目标对象的三维分布二值图,包括:所述终端调用所述自适应融合模型将所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图;所述终端对所述三维分布特征图进行三维融合卷积,得到三维分割概率图;所述终端根据所述三维分割概率图中每个像素点的最大概率类别,计算得到所述目标对象的三维分布二值图。
- 根据权利要求3所述的方法,其特征在于,所述三维图像是三维医学图像;所述终端根据所述三维分割概率图中每个像素点的最大概率类别,计算得到所述目标对象的三维分布二值图之后,还包括:所述终端基于临床先验知识对所述三维分布二值图中的噪声像素点进行滤除处理;所述临床先验知识是对所述三维医学图像中的所述目标对象的分布位置进行统计得到的知识。
- 根据权利要求4所述的方法,其特征在于,所述终端基于临床先验知识对所述三维分布二值图中的噪声像素点进行滤除处理,包括:所述终端在所述三维分布二值图中滤除超出目标取值范围的第一噪声像素点;其中,所述目标取值范围是根据第一临床先验知识得到的所述目标对象可能出现的坐标取值范围。
- 根据权利要求4所述的方法,其特征在于,所述终端基于临床先验知识对所述三维分布二值图中的噪声像素点进行滤除处理,包括:所述终端在所述三维分布二值图中滤除超出三维椭球模型的第二噪声像素点;其中,所述三维椭球模型是根据第二临床先验知识得到的所述目标对象对应的椭球模型。
- 根据权利要求1至6任一所述的方法,其特征在于,所述方法还包括:当所述二维切片图像的长宽比例超过预设比例范围时,所述终端按照所述二维切片图像的短边边长所形成的正方形边框对所述二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像。
- 一种二维图像的语义分割方法,所述方法包括:终端获取所述二维图像;所述终端调用分割模型对所述二维图像进行语义分割,得到目标对象的分布概率图;所述终端根据所述分布概率图中每个像素点的最大概率类别,计算得到所述目标对象的二维分布二值图;其中,所述分割模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;所述深度网络编码部,用于所述终端通过所述n层卷积层对所述二维图像进行降采样特征提取,得到降采样后的第三中间特征图;所述跳跃传递解码部,用于所述终端通过所述m层反卷积层对所述第三中间特征图和第四中间特征图进行升采样处理,得到升采样后的所述分布概率图;其中,所述第四中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
- 根据权利要求8所述的方法,其特征在于,所述终端根据所述分布概率图中每个像素点的最大概率类别,计算得到所述目标对象的二维分布二值图之后,还包括:所述终端基于临床先验知识对二维分布二值图中的噪声像素点进行滤除处理;所述临床先验知识是对所述二维医学图像中的所述目标对象的分布位置进行统计得到的知识。
- 根据权利要求9所述的方法,其特征在于,所述终端基于临床先验知识对二维分布二值图中的噪声像素点进行滤除处理,包括:所述终端在二维分布二值图中滤除超出目标取值范围的第三噪声像素点;所述目标取值范围是根据第三临床先验知识得到的目标器官可能出现的坐标取值范围。
- 根据权利要求8至10任一所述的方法,其特征在于,所述方法还包括:当二维图像的长宽比例超过预设比例范围时,所述终端按照二维图像的短边边长所形成的正方形边框对二维图像进行扫描框分割,得到若干个待处理的二维图像。
- 一种三维图像的语义分割装置,其特征在于,所述装置包括:第一获取模块,用于获取所述三维图像;切片模块,用于将所述三维图像按照三维坐标轴所在的三个方向面进行切片,得到x轴的二维切片图像、y轴的二维切片图像和z轴的二维切片图像;第一分割模块,用于调用第一分割模型对所述x轴的二维切片图像进行语义分割,得到目标对象在x轴方向面的分布概率图;所述第一分割模块,用于调用第二分割模型将所述y轴的二维切片图像进行语义分割,得到所述目标对象在y轴方向面的分布概率图;所述第一分割模块,用于调用第三分割模型将所述z轴的二维切片图像进行语义分割,得到所述目标对象在z轴方向面的分布概率图;融合模块,用于调用自适应融合模型对所述x轴方向面、所述y轴方向面和所述z轴方向面各自对应的三个分布概率图进行三维融合,得到所述目标对象的三维分布二值图。
- 根据权利要求12所述的装置,其特征在于,所述第一分割模型、所述第二分割模型和所述第三分割模型中的至少一个模型包括:深度网络编码部和跳跃传递解码部,所述深度网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;所述深度网络编码部,用于通过所述n层卷积层对所述二维切片图像进行降采样特征提取,得到降采样后的第一中间特征图;所述跳跃传递解码部,用于通过所述m层反卷积层对所述第一中间特征图和第二中间特征图进行升采样处理,得到升采样后的所述分布概率图;其中,所述第二中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
- 根据权利要求12所述的装置,其特征在于,所述融合模块包括:组合单元,用于调用自适应融合模型将x轴方向面、y轴方向面和z轴方向面各自对应的三个分布概率图进行组合,得到三维分布特征图;融合单元,用于对三维分布特征图进行三维融合卷积,得到三维分割概率图;计算单元,用于根据三维分割概率图中每个像素点的最大概率类别,计算得到目标对象的三维分布二值图。
- 根据权利要求14所述的装置,其特征在于,所述三维图像是三维医学图像,所述装置还包括:第一滤除模块,用于基于临床先验知识对三维分布二值图中的噪声像素点进行滤除处理;所述临床先验知识是对所述三维医学图像中的所述目标对象的分布位置进行统计得到的知识。
- 根据权利要求15所述的装置,其特征在于,所述第一滤除模块还用于在三维分布二值图中滤除超出目标取值范围的第一噪声像素点;所述目标取值范围是根据第一临床先验知识得到的目标对象可能出现的坐标取值范围。
- 根据权利要求15所述的装置,其特征在于,所述第一滤除模块还用于在三维分布二值图中滤除超出三维椭球模型的第二噪声像素点;所述三维椭球模型是根据第二临床先验知识得到的目标对象对应的椭球模型。
- 根据权利要求12至17任一所述的装置,其特征在于,所述装置还包括:第一扫描模块,用于当二维切片图像的长宽比例超过预设比例范围时,按照二维切片图像的短边边长所形成的正方形边框对二维切片图像进行扫描框分割,得到若干个待处理的二维切片图像。
- 一种二维图像的语义分割装置,其特征在于,所述装置包括:第二获取模块,用于获取所述二维图像;第二分割模块,用于调用分割模型对所述二维图像进行语义分割,得到目标对象的分布概率图;计算模块,用于根据所述分布概率图中每个像素点的最大概率类别,计算得到所述目标对象的二维分布二值图;其中,所述分割模型包括:深度网络编码部和跳跃传递解码部,所述深度 网络编码部包括n层卷积层,所述跳跃传递解码部包括m层反卷积层,n和m均为正整数;所述深度网络编码部,用于通过所述n层卷积层对所述二维图像进行降采样特征提取,得到降采样后的第三中间特征图;所述跳跃传递解码部,用于通过所述m层反卷积层对所述第三中间特征图和第四中间特征图进行升采样处理,得到升采样后的所述分布概率图;其中,所述第四中间特征图包括所述n层卷积层中的第i层卷积层输出的特征图,i为小于或等于n的整数。
- 根据权利要求19所述的装置,其特征在于,所述装置还包括:第二滤除模块,用于基于临床先验知识对二维分布二值图中的噪声像素点进行滤除处理;所述临床先验知识是对所述二维医学图像中的所述目标对象的分布位置进行统计得到的知识。
- 根据权利要求20所述的装置,其特征在于,所述第二滤除模块还用于在二维分布二值图中滤除超出目标取值范围的第三噪声像素点;所述目标取值范围是根据第三临床先验知识得到的目标对象可能出现的坐标取值范围。
- 根据权利要求19至21任一所述的装置,其特征在于,所述装置还包括:第二扫描模块,用于当二维图像的长宽比例超过预设比例范围时,按照二维图像的短边边长所形成的正方形边框对二维图像进行扫描框分割,得到若干个待处理的二维图像。
- 一种终端,所述终端包括处理器和存储器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如权利要求1至11任一项所述的方法。
- 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至11任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19872855.2A EP3869387A4 (en) | 2018-10-16 | 2019-10-11 | METHOD AND DEVICE FOR SEMANTIC SEGMENTATION OF THREE-DIMENSIONAL IMAGE, TERMINAL AND STORAGE MEDIA |
US17/074,629 US11861501B2 (en) | 2018-10-16 | 2020-10-20 | Semantic segmentation method and apparatus for three-dimensional image, terminal, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811204375.4A CN109446951B (zh) | 2018-10-16 | 2018-10-16 | 三维图像的语义分割方法、装置、设备及存储介质 |
CN201811204375.4 | 2018-10-16 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/074,629 Continuation US11861501B2 (en) | 2018-10-16 | 2020-10-20 | Semantic segmentation method and apparatus for three-dimensional image, terminal, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020078269A1 true WO2020078269A1 (zh) | 2020-04-23 |
Family
ID=65545672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/110562 WO2020078269A1 (zh) | 2018-10-16 | 2019-10-11 | 三维图像的语义分割方法、装置、终端及存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11861501B2 (zh) |
EP (1) | EP3869387A4 (zh) |
CN (2) | CN111126242B (zh) |
WO (1) | WO2020078269A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753839A (zh) * | 2020-05-18 | 2020-10-09 | 北京捷通华声科技股份有限公司 | 一种文本检测方法和装置 |
CN112528944A (zh) * | 2020-12-23 | 2021-03-19 | 杭州海康汽车软件有限公司 | 一种图像识别方法、装置、电子设备及存储介质 |
CN113409340A (zh) * | 2021-06-29 | 2021-09-17 | 北京百度网讯科技有限公司 | 语义分割模型训练方法、语义分割方法、装置及电子设备 |
CN113792623A (zh) * | 2021-08-27 | 2021-12-14 | 同方威视技术股份有限公司 | 安检ct目标物识别方法和装置 |
CN113949867A (zh) * | 2020-07-16 | 2022-01-18 | 武汉Tcl集团工业研究院有限公司 | 一种图像处理的方法及装置 |
CN114092815A (zh) * | 2021-11-29 | 2022-02-25 | 自然资源部国土卫星遥感应用中心 | 一种大范围光伏发电设施遥感智能提取方法 |
CN115880573A (zh) * | 2023-03-01 | 2023-03-31 | 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) | 一种基于神经网络获取海草面积的方法、装置及设备 |
CN116341362A (zh) * | 2023-02-07 | 2023-06-27 | 南京友一智能科技有限公司 | 一种气体扩散层三维结构压缩变形预测方法 |
Families Citing this family (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126242B (zh) * | 2018-10-16 | 2023-03-21 | 腾讯科技(深圳)有限公司 | 肺部图像的语义分割方法、装置、设备及存储介质 |
CN109978037B (zh) * | 2019-03-18 | 2021-08-06 | 腾讯科技(深圳)有限公司 | 图像处理方法、模型训练方法、装置、和存储介质 |
CN110070022A (zh) * | 2019-04-16 | 2019-07-30 | 西北工业大学 | 一种基于图像的自然场景材料识别方法 |
US11126895B2 (en) * | 2019-05-22 | 2021-09-21 | Lawrence Livermore National Security, Llc | Mimicking of corruption in images |
CN110197712B (zh) * | 2019-06-05 | 2023-09-15 | 桂林电子科技大学 | 一种医学图像储存系统及储存方法 |
CN110276408B (zh) * | 2019-06-27 | 2022-11-22 | 腾讯科技(深圳)有限公司 | 3d图像的分类方法、装置、设备及存储介质 |
CN110348351B (zh) * | 2019-07-01 | 2021-09-28 | 达闼机器人有限公司 | 一种图像语义分割的方法、终端和可读存储介质 |
CN110533029A (zh) * | 2019-08-02 | 2019-12-03 | 杭州依图医疗技术有限公司 | 确定影像中目标区域的方法及装置 |
CN110414620B (zh) * | 2019-08-06 | 2021-08-31 | 厦门大学 | 一种语义分割模型训练方法、计算机设备及存储介质 |
CN110782446B (zh) * | 2019-10-25 | 2022-04-15 | 杭州依图医疗技术有限公司 | 一种确定肺结节体积的方法及装置 |
CN110942465A (zh) * | 2019-11-08 | 2020-03-31 | 浙江工业大学 | 一种基于ResUnet的3视图PET图像分割方法 |
CN111179269B (zh) * | 2019-11-11 | 2023-07-11 | 浙江工业大学 | 一种基于多视图和三维卷积融合策略的pet图像分割方法 |
CN111178211B (zh) * | 2019-12-20 | 2024-01-12 | 天津极豪科技有限公司 | 图像分割方法、装置、电子设备及可读存储介质 |
EP3846123B1 (en) * | 2019-12-31 | 2024-05-29 | Dassault Systèmes | 3d reconstruction with smooth maps |
CN111337710A (zh) * | 2020-02-20 | 2020-06-26 | 罗浩维 | 一种压电力显微镜数据读取和测量系统、装置及方法 |
CN111325766B (zh) * | 2020-02-20 | 2023-08-25 | 腾讯科技(深圳)有限公司 | 三维边缘检测方法、装置、存储介质和计算机设备 |
TWI731604B (zh) * | 2020-03-02 | 2021-06-21 | 國立雲林科技大學 | 三維點雲資料處理方法 |
KR102534762B1 (ko) * | 2020-03-30 | 2023-05-19 | 주식회사 뷰웍스 | 신쎄틱 2차원 영상 합성 방법 및 장치 |
CN111461130B (zh) * | 2020-04-10 | 2021-02-09 | 视研智能科技(广州)有限公司 | 一种高精度图像语义分割算法模型及分割方法 |
CN111967462B (zh) * | 2020-04-26 | 2024-02-02 | 杭州依图医疗技术有限公司 | 一种获取感兴趣区域的方法及装置 |
US11430176B2 (en) | 2020-05-20 | 2022-08-30 | International Business Machines Corporation | Generating volume predictions of three-dimensional volumes using slice features |
CN111739159A (zh) * | 2020-06-29 | 2020-10-02 | 上海商汤智能科技有限公司 | 三维模型生成方法、神经网络生成方法及装置 |
CN113949868B (zh) * | 2020-07-17 | 2023-07-07 | 武汉Tcl集团工业研究院有限公司 | 一种熵编码方法及装置 |
CN112034456B (zh) * | 2020-08-27 | 2023-10-17 | 五邑大学 | 烟雾巡检系统、方法、控制装置及存储介质 |
US20220172330A1 (en) * | 2020-12-01 | 2022-06-02 | BWXT Advanced Technologies LLC | Deep learning based image enhancement for additive manufacturing |
CN112967293B (zh) * | 2021-03-04 | 2024-07-12 | 江苏中科重德智能科技有限公司 | 一种图像语义分割方法、装置及存储介质 |
CN112967381B (zh) * | 2021-03-05 | 2024-01-16 | 北京百度网讯科技有限公司 | 三维重建方法、设备和介质 |
CN113409245B (zh) * | 2021-04-06 | 2024-04-02 | 中国电子技术标准化研究院 | 一种电子元器件x射线检查缺陷自动识别方法 |
CN113344935B (zh) * | 2021-06-30 | 2023-02-03 | 山东建筑大学 | 基于多尺度难度感知的图像分割方法及系统 |
CN113449690A (zh) * | 2021-07-21 | 2021-09-28 | 华雁智科(杭州)信息技术有限公司 | 图像场景变化的检测方法、系统及电子设备 |
CN113487664B (zh) * | 2021-07-23 | 2023-08-04 | 深圳市人工智能与机器人研究院 | 三维场景感知方法、装置、电子设备、机器人及介质 |
CN113689406B (zh) * | 2021-08-24 | 2022-04-08 | 北京长木谷医疗科技有限公司 | 基于运动模拟算法的膝关节股骨后髁点识别方法及系统 |
EP4156097A1 (en) * | 2021-09-22 | 2023-03-29 | Robert Bosch GmbH | Device and method for determining a semantic segmentation and/or an instance segmentation of an image |
CN113689435B (zh) * | 2021-09-28 | 2023-06-20 | 平安科技(深圳)有限公司 | 图像分割方法、装置、电子设备及存储介质 |
CN113920314B (zh) * | 2021-09-30 | 2022-09-02 | 北京百度网讯科技有限公司 | 语义分割、模型训练方法,装置,设备以及存储介质 |
CN113963159B (zh) * | 2021-10-11 | 2024-08-02 | 华南理工大学 | 一种基于神经网络的三维医学图像分割方法 |
CN113989349B (zh) * | 2021-10-25 | 2022-11-25 | 北京百度网讯科技有限公司 | 图像生成方法、图像处理模型的训练方法、图像处理方法 |
CN113971728B (zh) * | 2021-10-25 | 2023-04-21 | 北京百度网讯科技有限公司 | 图像识别方法、模型的训练方法、装置、设备及介质 |
CN114049344A (zh) * | 2021-11-23 | 2022-02-15 | 上海商汤智能科技有限公司 | 图像分割方法及其模型的训练方法及相关装置、电子设备 |
CN114495236B (zh) * | 2022-02-11 | 2023-02-28 | 北京百度网讯科技有限公司 | 图像分割方法、装置、设备、介质及程序产品 |
CN114332087B (zh) * | 2022-03-15 | 2022-07-12 | 杭州电子科技大学 | Octa图像的三维皮层表面分割方法及系统 |
CN114550021B (zh) * | 2022-04-25 | 2022-08-09 | 深圳市华汉伟业科技有限公司 | 基于特征融合的表面缺陷检测方法及设备 |
CN117437516A (zh) * | 2022-07-11 | 2024-01-23 | 北京字跳网络技术有限公司 | 语义分割模型训练方法、装置、电子设备及存储介质 |
CN115439686B (zh) * | 2022-08-30 | 2024-01-09 | 一选(浙江)医疗科技有限公司 | 一种基于扫描影像的关注对象检测方法及系统 |
CN115205300B (zh) * | 2022-09-19 | 2022-12-09 | 华东交通大学 | 基于空洞卷积和语义融合的眼底血管图像分割方法与系统 |
CN116523990A (zh) * | 2023-03-24 | 2023-08-01 | 北京鉴智科技有限公司 | 三维语义场景补全方法、设备和介质 |
CN116152241B (zh) * | 2023-04-18 | 2023-07-25 | 湖南炅旭生物科技有限公司 | 一种脑部影像的处理方法、系统、电子设备及存储介质 |
CN116912488B (zh) * | 2023-06-14 | 2024-02-13 | 中国科学院自动化研究所 | 基于多目相机的三维全景分割方法及装置 |
CN117036696A (zh) * | 2023-07-21 | 2023-11-10 | 清华大学深圳国际研究生院 | 图像分割方法、装置、设备和存储介质 |
CN117351215B (zh) * | 2023-12-06 | 2024-02-23 | 上海交通大学宁波人工智能研究院 | 一种人工肩关节假体设计系统及方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107644426A (zh) * | 2017-10-12 | 2018-01-30 | 中国科学技术大学 | 基于金字塔池化编解码结构的图像语义分割方法 |
CN107749061A (zh) * | 2017-09-11 | 2018-03-02 | 天津大学 | 基于改进的全卷积神经网络脑肿瘤图像分割方法及装置 |
WO2018081537A1 (en) * | 2016-10-31 | 2018-05-03 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for image segmentation using controlled feedback |
CN108389251A (zh) * | 2018-03-21 | 2018-08-10 | 南京大学 | 基于融合多视角特征的投影全卷积网络三维模型分割方法 |
CN109446951A (zh) * | 2018-10-16 | 2019-03-08 | 腾讯科技(深圳)有限公司 | 三维图像的语义分割方法、装置、设备及存储介质 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8155405B2 (en) * | 2007-04-20 | 2012-04-10 | Siemens Aktiengsellschaft | System and method for lesion segmentation in whole body magnetic resonance images |
US8577115B2 (en) * | 2008-03-04 | 2013-11-05 | Tomotherapy Incorporated | Method and system for improved image segmentation |
CN102519391B (zh) * | 2011-12-28 | 2013-11-20 | 东南大学 | 基于弱饱和二维图像的物体表面三维立体图像重建方法 |
CN104463825B (zh) * | 2013-09-16 | 2019-06-18 | 北京三星通信技术研究有限公司 | 用于在三维体积图像中检测对象的设备和方法 |
CN105513036B (zh) * | 2014-09-26 | 2019-05-31 | 上海联影医疗科技有限公司 | 三维ct图像的分割方法及装置 |
CN104809723B (zh) * | 2015-04-13 | 2018-01-19 | 北京工业大学 | 基于超体素和图割算法的三维肝脏ct图像自动分割方法 |
JP6464281B2 (ja) | 2015-11-06 | 2019-02-06 | 富士フイルム株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US11200664B2 (en) * | 2015-12-18 | 2021-12-14 | The Regents Of The University Of California | Interpretation and quantification of emergency features on head computed tomography |
CN105957066B (zh) * | 2016-04-22 | 2019-06-25 | 北京理工大学 | 基于自动上下文模型的ct图像肝脏分割方法及系统 |
CN106548178B (zh) * | 2016-09-26 | 2019-04-02 | 深圳大学 | 一种基于肺结节ct图像的语义特征自动打分方法及系统 |
JP2020510463A (ja) * | 2017-01-27 | 2020-04-09 | アーテリーズ インコーポレイテッド | 全層畳み込みネットワークを利用する自動化されたセグメンテーション |
US10111632B2 (en) * | 2017-01-31 | 2018-10-30 | Siemens Healthcare Gmbh | System and method for breast cancer detection in X-ray images |
JP6931870B2 (ja) * | 2017-02-15 | 2021-09-08 | 国立研究開発法人理化学研究所 | 細胞の再プログラム化を検出するための方法及び装置 |
CN107220965B (zh) * | 2017-05-05 | 2021-03-09 | 上海联影医疗科技股份有限公司 | 一种图像分割方法及系统 |
US9968257B1 (en) * | 2017-07-06 | 2018-05-15 | Halsa Labs, LLC | Volumetric quantification of cardiovascular structures from medical imaging |
CN107563983B (zh) * | 2017-09-28 | 2020-09-01 | 上海联影医疗科技有限公司 | 图像处理方法以及医学成像设备 |
CN108447063B (zh) * | 2017-12-15 | 2020-06-19 | 浙江中医药大学 | 脑胶质母细胞瘤的多模态核磁共振图像分割方法 |
CN108198184B (zh) | 2018-01-09 | 2020-05-05 | 北京理工大学 | 造影图像中血管分割的方法和系统 |
CN108062754B (zh) * | 2018-01-19 | 2020-08-25 | 深圳大学 | 基于密集网络图像的分割、识别方法和装置 |
CN108629785B (zh) * | 2018-05-10 | 2021-09-10 | 西安电子科技大学 | 基于自步学习的三维磁共振胰腺图像分割方法 |
KR102165840B1 (ko) * | 2020-05-21 | 2020-10-16 | 주식회사 휴런 | 인공지능 기반 뇌졸중 진단 장치 및 방법 |
-
2018
- 2018-10-16 CN CN201911319326.XA patent/CN111126242B/zh active Active
- 2018-10-16 CN CN201811204375.4A patent/CN109446951B/zh active Active
-
2019
- 2019-10-11 EP EP19872855.2A patent/EP3869387A4/en active Pending
- 2019-10-11 WO PCT/CN2019/110562 patent/WO2020078269A1/zh unknown
-
2020
- 2020-10-20 US US17/074,629 patent/US11861501B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018081537A1 (en) * | 2016-10-31 | 2018-05-03 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for image segmentation using controlled feedback |
CN107749061A (zh) * | 2017-09-11 | 2018-03-02 | 天津大学 | 基于改进的全卷积神经网络脑肿瘤图像分割方法及装置 |
CN107644426A (zh) * | 2017-10-12 | 2018-01-30 | 中国科学技术大学 | 基于金字塔池化编解码结构的图像语义分割方法 |
CN108389251A (zh) * | 2018-03-21 | 2018-08-10 | 南京大学 | 基于融合多视角特征的投影全卷积网络三维模型分割方法 |
CN109446951A (zh) * | 2018-10-16 | 2019-03-08 | 腾讯科技(深圳)有限公司 | 三维图像的语义分割方法、装置、设备及存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3869387A4 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753839A (zh) * | 2020-05-18 | 2020-10-09 | 北京捷通华声科技股份有限公司 | 一种文本检测方法和装置 |
CN113949867A (zh) * | 2020-07-16 | 2022-01-18 | 武汉Tcl集团工业研究院有限公司 | 一种图像处理的方法及装置 |
CN113949867B (zh) * | 2020-07-16 | 2023-06-20 | 武汉Tcl集团工业研究院有限公司 | 一种图像处理的方法及装置 |
CN112528944A (zh) * | 2020-12-23 | 2021-03-19 | 杭州海康汽车软件有限公司 | 一种图像识别方法、装置、电子设备及存储介质 |
CN113409340A (zh) * | 2021-06-29 | 2021-09-17 | 北京百度网讯科技有限公司 | 语义分割模型训练方法、语义分割方法、装置及电子设备 |
CN113792623A (zh) * | 2021-08-27 | 2021-12-14 | 同方威视技术股份有限公司 | 安检ct目标物识别方法和装置 |
CN114092815A (zh) * | 2021-11-29 | 2022-02-25 | 自然资源部国土卫星遥感应用中心 | 一种大范围光伏发电设施遥感智能提取方法 |
CN114092815B (zh) * | 2021-11-29 | 2022-04-15 | 自然资源部国土卫星遥感应用中心 | 一种大范围光伏发电设施遥感智能提取方法 |
CN116341362A (zh) * | 2023-02-07 | 2023-06-27 | 南京友一智能科技有限公司 | 一种气体扩散层三维结构压缩变形预测方法 |
CN115880573A (zh) * | 2023-03-01 | 2023-03-31 | 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) | 一种基于神经网络获取海草面积的方法、装置及设备 |
Also Published As
Publication number | Publication date |
---|---|
EP3869387A1 (en) | 2021-08-25 |
EP3869387A4 (en) | 2021-12-08 |
CN109446951A (zh) | 2019-03-08 |
US20210049397A1 (en) | 2021-02-18 |
CN109446951B (zh) | 2019-12-10 |
CN111126242B (zh) | 2023-03-21 |
CN111126242A (zh) | 2020-05-08 |
US11861501B2 (en) | 2024-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020078269A1 (zh) | 三维图像的语义分割方法、装置、终端及存储介质 | |
US10304193B1 (en) | Image segmentation and object detection using fully convolutional neural network | |
CN109978037B (zh) | 图像处理方法、模型训练方法、装置、和存储介质 | |
CN109754361A (zh) | 3d各向异性的混合网络:将来自2d图像的卷积特征传递到3d各向异性体积 | |
JP2023511300A (ja) | 医用画像における解剖学的構造を自動的に発見するための方法及びシステム | |
WO2020125498A1 (zh) | 心脏磁共振图像分割方法、装置、终端设备及存储介质 | |
JP6716765B1 (ja) | 画像処理装置、画像処理システム、画像処理方法、プログラム | |
CN109363697B (zh) | 一种乳腺影像病灶识别的方法及装置 | |
CN111667459B (zh) | 一种基于3d可变卷积和时序特征融合的医学征象检测方法、系统、终端及存储介质 | |
WO2024066049A1 (zh) | 一种pet图像去噪的方法、终端设备及可读存储介质 | |
CN111179366A (zh) | 基于解剖结构差异先验的低剂量图像重建方法和系统 | |
CN110570394A (zh) | 医学图像分割方法、装置、设备及存储介质 | |
CN115131345B (zh) | 基于ct图像的病灶检测方法、装置及计算机可读存储介质 | |
CN114742802B (zh) | 基于3Dtransformer混合卷积神经网络的胰腺CT图像分割方法 | |
CN116681894A (zh) | 一种结合大核卷积的相邻层特征融合Unet多器官分割方法、系统、设备及介质 | |
Shenoy et al. | Ultrasound image segmentation through deep learning based improvised U-net | |
CN112419342A (zh) | 图像处理方法、装置、电子设备和计算机可读介质 | |
CN116740081A (zh) | Ct图像中肺血管分割方法、装置、终端设备及介质 | |
Razali et al. | Enhancement technique based on the breast density level for mammogram for computer-aided diagnosis | |
Lu et al. | Fine-grained calibrated double-attention convolutional network for left ventricular segmentation | |
CN116664590A (zh) | 基于动态对比增强磁共振图像的自动分割方法及装置 | |
CN114419375B (zh) | 图像分类方法、训练方法、装置、电子设备以及存储介质 | |
Chen et al. | Pulmonary nodule segmentation in computed tomography with an encoder-decoder architecture | |
Ding et al. | CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion | |
JP7105918B2 (ja) | 領域特定装置、方法およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19872855 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019872855 Country of ref document: EP Effective date: 20210517 |