US20200058126A1 - Image segmentation and object detection using fully convolutional neural network - Google Patents
Image segmentation and object detection using fully convolutional neural network Download PDFInfo
- Publication number
- US20200058126A1 US20200058126A1 US16/380,670 US201916380670A US2020058126A1 US 20200058126 A1 US20200058126 A1 US 20200058126A1 US 201916380670 A US201916380670 A US 201916380670A US 2020058126 A1 US2020058126 A1 US 2020058126A1
- Authority
- US
- United States
- Prior art keywords
- feature map
- generate
- map
- training image
- convoluted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 71
- 238000003709 image segmentation Methods 0.000 title claims abstract description 14
- 238000001514 detection method Methods 0.000 title abstract description 25
- 230000011218 segmentation Effects 0.000 claims abstract description 168
- 238000012549 training Methods 0.000 claims abstract description 155
- 238000000034 method Methods 0.000 claims abstract description 40
- 230000008602 contraction Effects 0.000 claims abstract description 21
- 238000013528 artificial neural network Methods 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims description 63
- 238000012545 processing Methods 0.000 claims description 52
- 238000005070 sampling Methods 0.000 claims description 52
- 238000004891 communication Methods 0.000 claims description 17
- 230000015654 memory Effects 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims 7
- 230000003902 lesion Effects 0.000 abstract description 58
- 210000000056 organ Anatomy 0.000 abstract description 18
- 210000001519 tissue Anatomy 0.000 description 35
- 210000002307 prostate Anatomy 0.000 description 23
- 230000008569 process Effects 0.000 description 22
- 238000004195 computer-aided diagnosis Methods 0.000 description 15
- 238000010606 normalization Methods 0.000 description 12
- 230000004913 activation Effects 0.000 description 10
- 238000011176 pooling Methods 0.000 description 9
- 238000003384 imaging method Methods 0.000 description 8
- 210000004072 lung Anatomy 0.000 description 8
- 238000002597 diffusion-weighted imaging Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000002595 magnetic resonance imaging Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000001575 pathological effect Effects 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000002591 computed tomography Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 206010060862 Prostate cancer Diseases 0.000 description 1
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000011157 brain segmentation Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 230000010412 perfusion Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 208000017497 prostate disease Diseases 0.000 description 1
- 208000023958 prostate neoplasm Diseases 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G06K9/6232—
-
- G06K9/6257—
-
- G06K9/6262—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/20—ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G06K2209/05—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10088—Magnetic resonance imaging [MRI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30081—Prostate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
- G06V2201/031—Recognition of patterns in medical or anatomical images of internal organs
Definitions
- This disclosure relates generally to computer segmentation and object detection in digital images and particularly to segmentation of multi-dimensional and multi-channel medical images and to detection of lesions based on convolutional neural networks.
- a digital image may contain one or more regions of interest (ROIs).
- ROIs regions of interest
- image data contained within the one or more ROIs of a digital image may need to be retained for further processing and for information extraction by computers. Efficient and accurate identification of these ROIs thus constitutes a critical step in image processing applications, including but not limited to applications that handle high-volume and/or real-time digital images.
- Each ROI of a digital image may contain pixels forming patches with drastic variation in texture and pattern, making accurate and efficient identification of the boundary between these ROIs and the rest of the digital image a challenging task for a computer.
- an entire ROI or a subsection of an ROI may need to be further identified and classified.
- an ROI in a medical image may correspond to a particular organ of a human body and the organ region of the image may need to be further processed to identify, e.g., lesions within the organ, and to determine the nature of the identified lesions.
- This disclosure is directed to an enhanced convolutional neural network including a contraction neural network and an expansion neural network.
- These neural networks are connected in tandem and are enhanced using a coarse-to-fine architecture and densely connected convolutional module to extract auto-context features for more accurate and more efficient segmentation and object detection in digital images.
- the present disclosure describes a method for image segmentation.
- the method includes receiving, by a computer comprising a memory storing instructions and a processor in communication with the memory, a set of training images labeled with a corresponding set of ground truth segmentation masks.
- the method includes establishing, by the computer, a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem.
- the method includes iteratively training, by the computer, the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by down-sampling, by the computer, a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image; up-sampling, by the computer, the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map and a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map; generating, by the computer based on the first feature map and the second feature map, a predictive segmentation mask for the training image; generating, by the computer based on a loss function, an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image; back-propagating, by the computer, the end loss through the
- the present disclosure also describes a computer image segmentation system for digital images.
- the computer image segmentation system for digital images includes a communication interface circuitry; a database; a predictive model repository; and a processing circuitry in communication with the database and the predictive model repository.
- the processing circuitry configured to: receive a set of training images labeled with a corresponding set of ground truth segmentation masks.
- the processing circuitry configured to establish a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem.
- the processing circuitry configured to iteratively train the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by configuring the processing circuitry to: down-sample a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image; up-sample the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map and a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map; generate, based on the first feature map and the second feature map, a predictive segmentation mask for the training image; generate, based on a loss function, an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image; back-propagating, by the computer, the end loss through the full convolutional neural network; and minimizing, by
- the present disclosure also describes a non-transitory computer readable storage medium storing instructions.
- the instructions when executed by a processor, cause the processor to receive a set of training images labeled with a corresponding set of ground truth segmentation masks.
- the instructions when executed by a processor, cause the processor to establish a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem.
- the instructions when executed by a processor, cause the processor to iteratively train the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by causing the processor to: down-sample a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image; up-sample the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map and a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map; generate, based on the first feature map and the second feature map, a predictive segmentation mask for the training image; generate, based on a loss function, an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image; back-propagating, by the computer, the end loss through the full convolutional neural network;
- FIG. 1 illustrates a general data/logic flow of various fully convolutional neural networks (FCNs) for implementing image segmentation and object detection.
- FCNs fully convolutional neural networks
- FIG. 2 illustrates an exemplary general implementation and data/logic flows of the fully convolutional neural network of FIG. 1 .
- FIG. 3 illustrates an exemplary implementation and data/logic flows of the fully convolutional neural network of FIG. 1 .
- FIG. 4 illustrates another exemplary implementation and data/logic flows of the fully convolutional neural network of FIG. 1 .
- FIG. 5 illustrates an exemplary implementation and data/logic flows of the fully convolutional neural network enhanced by a coarse-to-fine architecture.
- FIG. 6A illustrates an exemplary implementation and data/logic flows of the fully convolutional neural network having a coarse-to-fine architecture with auxiliary segmentation masks.
- FIG. 6B illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network of FIG. 6A that combines the auxiliary segmentation masks by concatenation.
- FIG. 6C illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network of FIG. 6B with further improvement.
- FIG. 6D illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network of FIG. 6A that combines the auxiliary segmentation masks by summation.
- FIG. 6E illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network of FIG. 6D with further improvement.
- FIG. 6F illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network of FIG. 6B as stage I, further including a dense-convolution (DenseConv) module in a stage II.
- DenseConv dense-convolution
- FIG. 6G illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network of FIG. 6F with further improvement.
- FIG. 7 illustrates an exemplary implementation and data/logic flows of the DenseConv module of FIG. 6F .
- FIG. 8 illustrates a flow diagram of a method for training the exemplary implementation in FIG. 6F or 6G .
- FIG. 9 illustrates an exemplary implementation and data/logic flows of combining various fully convolutional neural networks.
- FIG. 10 shows an exemplary computer platform for segmenting digital images.
- FIG. 11 illustrates a computer system that may be used to implement various computing components and functionalities of the computer platform of FIG. 10 .
- a digital image may contain one or more regions of interest (ROIs).
- ROI may include a particular type of object.
- image data within the ROIs contains useful or relevant information.
- recognition of ROIs in a digital image and identification of boundaries for these ROIs using computer vision often constitute a critical first step before further image processing is performed.
- a digital image may contain multiple ROIs of a same type or may contain ROIs of different types.
- a digital image may contain only human faces or may contain both human faces of and other objects of interest. Identification of ROIs in a digital image is often alternatively referred to as image segmentation.
- digital image may be alternatively referred to as “image”.
- An image may be a two-dimensional (2D) image.
- a 2D image includes pixels having two-dimensional coordinates, which may be denoted along an x-axis and a y-axis.
- the two-dimensional coordinates of the pixels may correspond to a spatial 2D surface.
- the spatial 2D surface may be a planar surface or a curved surface projected from a three-dimensional object.
- An image may have multiple channels.
- the multiple channels may be different chromatic channels, for example and not limited to, red-green-blue (RGB) color channels.
- the multiple channels may be different modality channels for a same object, representing images of the same object taking under different imaging conditions.
- different modalities may correspond to different combinations of focus, aperture, exposure parameters, and the like.
- different modality channels may include but are not limited to T2-weighted imaging (T2 W), diffusion weighted imaging (DWI), apparent diffusion coefficient (ADC) and K-trans channels.
- An image may be a three-dimensional (3D) image.
- a 3D image includes pixels having three-dimensional coordinates, which may be denoted along an x-axis, a y-axis, and a z-axis.
- the three-dimensional coordinates of the pixels may correspond to a spatial 3D space.
- MRI images in each modality channel may be three dimensional, including a plurality of slices of 2D images.
- a 3D image may also have multiple channels, effectively forming a four-dimensional (4D) image.
- the 4D image including multiple channels may be referred to as pseudo 4D image.
- a 4D image includes pixels having four-dimensional coordinates, which may be denoted along an x-axis, a y-axis, a z-axis, and a channel-number.
- ROIs for an image may be represented by a digital mask containing a same number of pixels as the digital image or down-sized number of pixels from the digital image.
- a digital mask may be alternatively referred as a mask or a segmentation mask.
- Each pixel of the mask may contain a value used to denote whether a particular corresponding pixel of the digital image is among any ROI, and if it is, which type of ROI among multiple types of ROIs does it fall. For example, if there is only a single type of ROI, a binary mask is sufficient to represent all ROIs.
- each pixel of the ROI mask may be either zero or one, representing whether the pixel is or is not among the ROIs.
- each pixel may be at one of a number of values each corresponding to one type of ROIs.
- a multi-value mask may be decomposed into a combination of the more fundamental binary masks each for one type of ROI.
- its mask may correspondingly be 2D, including pixels having two-dimensional coordinates.
- its mask may nevertheless be a single combined mask, wherein the single mask corresponds to all the channels in the image.
- the mask of a multi-channel image may be a multi-channel mask, wherein each of the multiple channels of the mask corresponds to one or more channels of the multi-channel image.
- its mask may correspondingly be a 3D mask, including pixels having three-dimensional coordinates along an x-axis, a y-axis, and a z-axis.
- its mask may be a three-dimensional mask having either a single channel or multiple channels, similar to the 2D mask described above.
- ROI masks are particularly useful for further processing of the digital image.
- an ROI mask can be used as a filter to determine a subset of image data that are among particular types of ROIs and that need be further analyzed and processed. Image data outside of these particular types of ROIs may be removed from further analysis. Reducing the amount of data that need to be further processed may be advantageous in situations where processing speed is essential and memory space is limited. As such, automatic identification of ROIs in a digital image presents a technological problem to be overcome before further processing can be performed on the data which form the digital image.
- ROI identification and ROI mask generation, or image segmentation may be implemented in various applications, including but not limited to face identification and recognition, object identification and recognition, satellite map processing, and general computer vision and image processing.
- ROI identification and segmentation may be implemented in medical image processing.
- medical images may include but are not limited to Computed Tomography (CT) images, Magnetic Resonance Imaging (MRI) images, ultrasound images, X-Ray images, and the like.
- CT Computed Tomography
- MRI Magnetic Resonance Imaging
- ultrasound images X-Ray images, and the like.
- X-Ray images X-Ray images
- CAD Computer-Aided Diagnosis
- a single or a group of images may first be analyzed and segmented into ROIs and non-ROIs.
- One or more ROI masks alternatively referred to as segmentation masks, may be generated.
- An ROI in a medical image may be specified at various levels depending on the applications.
- an ROI may be an entire organ.
- a corresponding binary ROI mask may be used to mask the location of the organ tissues and the regions outside of the ROI and that are not part of the organ.
- an ROI may represent a lesion in an organ or tissue of one or more particular types in the organ. These different levels of ROIs may be hierarchical. For example, a lesion may be part of an organ.
- the present disclosure may be particularly applied to different types of images by imaging various types of human tissues or organs to perform ROI identification and ROI mask generation and image segmentation, for example, including but not limited to, brain segmentation, pancreas segmentation, lung segmentation, or prostate segmentation.
- MR images of prostate from one or more patient may be processed using computer aided diagnosis (CAD).
- CAD computer aided diagnosis
- Prostate segmentation for marking the boundary of the organ of prostate is usually the first step in a prostate MR image processing and plays an important role in computer aided diagnosis of prostate diseases.
- One key to prostate segmentation is to accurately determine the boundary of the prostate tissues, either normal or pathological. Because images of normal prostate tissues may vary in texture and an abnormal prostate tissue may additionally contain patches of distinct or varying texture and patterns, identification of prostate tissues using computer vision may be particularly challenging. Misidentifying a pathological portion of the prostate tissue as not being part of the prostate and masking it out from subsequent CAD analysis may lead to unacceptable false diagnostic negatives. Accordingly, the need to accurately and reliably identify an ROI in a digital image such as a medical image of a prostate or other organs is critical to proper medical diagnosis.
- Segmentation of images may be performed by a computer using a model developed using deep neural network-based machine learning algorithms.
- a segmentation model may be based on Fully Convolutional Network (FCN) or Deep Convolutional Neural Networks (DCNN).
- FCN Fully Convolutional Network
- DCNN Deep Convolutional Neural Networks
- Model parameters may be trained using labeled images.
- the image labels in this case may contain ground truth masks, which may be produced by human experts or via other independent processes.
- the FCN may contain only multiple convolution layers-.
- digital images of lungs may be processed by such neural networks for lung segmentation and computer aided diagnosis.
- the model learns various features and patterns of lung tissues using the labeled images. These features and patterns include both global and local features and patterns, as represented by various convolution layers of the FCN.
- a trained FCN model may process an input digital image with unknown mask and output a predicted segmentation mask. It is critical to architect the FCN to facilitate efficient and accurate learning of image features of relevance to a particular type of images.
- a system 100 for CAD using a fully convolutional network may include two stages: tissue segmentation and lesion detection.
- a first stage 120 performs tissue (or organ) segmentation from images 110 to generate tissue segmentation mask 140 .
- a second stage 160 performs lesion detection from the tissue segmentation mask 140 and the images 110 to generate the lesion mask 180 .
- the lesion mask may be non-binary. For example, within the organ region of the image 110 as represented by the segmentation mask 140 , there may be multiple lesion regions of different pathological types.
- the lesion mask 180 thus may correspondingly be non-binary as discussed above, and contains both special information (as to where the lesions are) and information as to the type of each lesion.
- the system may generate a diagnosis with a certain probability for a certain disease.
- a lesion region may be portions of the organ containing cancer, tumor, or the like.
- the system may only include the stage of tissue segmentation 120 .
- the system may only include the stage of lesion detection 160 .
- an FCN of a slightly variation may be used in each stage of the first stage 120 or the second stage 160 .
- the first stage 120 and the second stage 160 may use the same type of FCN, or may use different type of FCN.
- the FCNs for stage 120 and 160 may be separately or jointly trained.
- the tissue or organ may be a prostate
- CAD of prostate may be implemented in two steps.
- the first step may include determining a segmentation boundary of prostate tissue by processing input MR prostate images, producing an ROI mask for the prostate; and the second step includes detecting a diseased portion of the prostate tissue, e.g., a prostate tumor or cancer, by processing the ROI masked MR images of the prostate tissue.
- MR images of prostate from one or more patients may be used in computer aided diagnosis (CAD).
- CAD computer aided diagnosis
- volumetric prostate MR images is acquired with multiple channels from multi-modalities including, e.g., T2 W, DWI, ADC and K-trans, where ADC maps may be calculated from DWI, and K-trans images may be obtained using dynamic contrast enhanced (DCE) MR perfusion.
- DCE dynamic contrast enhanced
- the FCN may be adapted to the form of the input images (either two-dimensional images or three-dimensional images, either multichannel or single-channel images).
- the FCN may be adapted to include features of an appropriate number of dimensions for processing a single channel 2D or 3D images or a multichannel 2D or 3D images.
- the images 110 may be one or more 2D images each with a single channel (SC).
- the first stage 120 may use two-dimensional single-channel FCN (2D, SC) 122 to perform tissue segmentation to obtain the segmentation mask 140 of an input image 110 .
- the segmentation mask 140 may be a single 2D mask (or a mask with single channel).
- the second stage 160 may use two-dimensional single-channel FCN (2D, SC) 162 to perform lesion detection.
- the FCN (2D, SC) 162 operates on the single channel segmentation mask 140 and the single channel image 110 (via arrow 190 ) to generate the lesion mask 180 .
- the lesion mask 180 may be a single-channel 2D mask.
- the images 110 may be one or more 2D images each with multiple channels (MC).
- the multiple channels may be different chromatic channels (e.g., red, blue, and green colors) or other different imaging modality channels (e.g., T2 W, DWI, ADC and K-trans for MR imaging).
- the first stage 120 may use two-dimensional single-channel FCN (2D, SC) 122 to perform tissue segmentation of multiple channels of an input image 110 to obtain multi-channel segmentation mask 140 .
- the two-dimensional single-channel FCN (2D, SC) 122 is used in the tissue segmentation, each channel of the multi-channel image 110 may be regarded as independent from other channels, and thus each channel may be processed individually.
- the segmentation mask 140 may be a 2D mask with multiple channels.
- the second stage 160 may use two dimensional multi-channel FCN (2D, MC) 164 to perform lesion detection.
- the FCN (2D, MC) 164 may operate simultaneously on the multi-channel segmentation mask 140 and the multi-channel image 110 to generate the lesion mask 180 .
- the two-dimensional multi-channel FCN (2D, MC) 164 may process the multi-channel mask 140 and the multi-channel image 110 in a combined manner to generate a single-channel lesion mask 180 .
- the images 110 may be one or more 2D images with multiple channels.
- the first stage 120 may use two-dimensional multi-channel FCN (2D, MC) 124 to perform tissue segmentation of a multi-channel image 110 in a combined manner to obtain a single-channel segmentation mask 140 .
- the second stage 160 may use two-dimensional single-channel FCN (2D, SC) 162 to perform lesion detection in the single channel mask 140 .
- the two-dimensional single-channel FCN (2D, SC) 162 operates on the segmentation mask 140 and the multi-channel image 110 to generate a multi-channel lesion mask 180 .
- the images 110 may be one or more 2D images with multiple channels.
- the first stage 120 may use two-dimensional multi-channel FCN (2D, MC) 124 to perform tissue segmentation on the multi-channel image 110 to obtain a single-channel segmentation mask 140 .
- the second stage 160 may use two-dimensional multi-channel FCN (2D, MC) 164 to perform lesion detection.
- the two-dimensional multi-channel FCN (2D, MC) 164 operates on the single-channel segmentation mask 140 and the multi-channel images 110 to generate a single-channel lesion mask 180 .
- the images 110 may be one or more 3D images with a single channel.
- the first stage 120 may use three-dimensional single-channel FCN (3D, SC) 126 to perform tissue segmentation of a single channel image 110 to obtain a three dimensional single-channel segmentation mask 140 .
- the second stage 160 may use three-dimensional single-channel FCN (3D, SC) 166 to perform lesion detection.
- the FCN (3D, SC) 166 operates on the single-channel segmentation mask 140 and the single-channel image 110 to generate a three-dimensional single-channel lesion mask 180 .
- the images 110 may be one or more 3D images with multiple channels.
- the multiple channels may be different chromatic channels (e.g., red, blue, and green colors) or other different modality channels (e.g., T2 W, DWI, ADC and K-trans for MR imaging).
- the first stage 120 may use three-dimensional single-channel FCN (3D, SC) 126 to perform tissue segmentation to obtain three-dimensional multi-channel segmentation mask 140 .
- each channel may be regarded as independent from other channels, and thus each channel is processed individually by the FCN (3D, SC) 126 .
- the second stage 160 may use three-dimensional single-channel FCN (3D, SC) 166 to perform lesion detection.
- the FCN (3D, SC) 166 operates on the multi-channel segmentation mask 140 and the multi-channel images 110 to generate the lesion mask 180 . Since single-channel FCN (3D, SC) 166 is used in the lesion detection, each channel of the multiple channels may be processed independent. Therefore, the lesion mask 180 may be 3D multi-channel mask.
- the images 110 may be one or more 3D images with multiple channels.
- the first stage 120 may use three-dimensional single-channel FCN (3D, SC) 126 to perform tissue segmentation to obtain the segmentation mask 140 . Since FCN (3D, SC) 126 is used in the tissue segmentation, each channel may be regarded as independent from other channels, and thus each channel may be processed individually. Therefore, the segmentation mask 140 may be three-dimensional multi-channel mask.
- the second stage 160 may use three-dimensional multi-channel FCN (3D, MC) 168 to perform lesion detection.
- the FCN (3D, MC) 168 operates on the multi-channel segmentation mask 140 and the multi-change image 110 to generate the lesion mask 180 . Since multi-channel FCN (3D, MC) 168 is used in the lesion detection, multiple channels may be processed in a combined/aggregated manner. Therefore, the lesion mask 180 may be three-dimensional single channel mask.
- the images 110 may be one or more 3D images with multiple channels.
- the first stage 120 may use three-dimensional multi-channel FCN (3D, MC) 128 to perform tissue segmentation of the 3D multi-channel image 110 in an aggregated manner to obtain a three-dimensional single-channel segmentation mask 140 .
- the second stage 160 may use three-dimensional single-channel FCN (3D, SC) 166 to perform lesion detection.
- the FCN (3D, SC) 166 operates on the single-channel segmentation mask 140 and the multi-channel images 110 to generate three-dimensional multi-channel lesion mask 180 .
- the images 110 may be one or more 3D images with multiple channels.
- the first stage 120 may use three-dimensional multi-channel FCN (3D, MC) 128 to perform tissue segmentation of the multi-channel image 110 in an aggregated manner to obtain three-dimensional single-channel segmentation mask 140 .
- the second stage 160 may use three-dimensional multi-channel FCN (3D, MC) 168 to perform lesion detection.
- the FCN (3D, MC) 168 operates on the single-channel segmentation mask 140 and the multi-channel image 110 to generate three-dimensional single-channel lesion mask 180 in an aggregated manner.
- a 2D mask may be sufficient to serve as the mask for the 3D image having different values along z-axis.
- a first stage 120 may use a modified FCN (3D, SC) to perform tissue segmentation to obtain the segmentation mask 140 , where the segmentation mask 140 180 may be a 2D mask.
- FCN Fully Convolutional Network
- Segmentation of images and/or target object (e.g., lesion) detection may be performed by a computer using a model developed using deep neural network-based machine learning algorithms.
- a model may be based on Fully Convolutional Network (FCN), Deep Convolutional Neural Networks (DCNN), U-net, or V-Net.
- FCN Fully Convolutional Network
- DCNN Deep Convolutional Neural Networks
- U-net U-net
- V-Net V-Net
- Model parameters may be trained using labeled images.
- the image labels in this case may contain ground truth masks, which may be produced by human experts or via other independent processes.
- the FCN may contain multiple convolution layers.
- An exemplary embodiment involves processing digital images of lung tissues for computer aided diagnosis.
- the model learns various features and patterns of, e.g., lung tissues or prostate tissues, using the labeled images. These features and patterns include both global and local features and patterns, as represented by various convolution layers of the FCN.
- the knowledge obtained during the training process are embodied in the set of model parameters representing the trained FCN model.
- a trained FCN model may process an input digital image with unknown mask and output a predicted segmentation mask.
- the present disclosure describes an FCN with different variations.
- the FCN is capable of performing tissue segmentation or lesion detection on 2D or 3D images with single or multiple channels.
- 3D images are used as an example in the embodiment below. 2D images can be processed similarly as 2D images may be regarded as 3D images with one z-slice, i.e, a single z-axis value.
- FCN model 200 An exemplary FCN model for predicting a segmentation/lesion mask for an input digital image is shown as FCN model 200 in FIG. 2 .
- the FCN model 200 may be configured to process input 2D/3D images 210 .
- the input image may be 3D and may be of an exemplary size 128 ⁇ 128 ⁇ 24, i.e., 3D images having a size of 128 along x-axis, a size of 128 along y-axis, and a size of 24 along z-axis.
- a down-sampling path 220 may comprise a contraction neural network which gradually reduces the resolution of the 2D/3D images to generate feature maps with smaller resolution and larger depth.
- the output 251 from the down-sampling part 220 may be feature maps of 16 ⁇ 16 ⁇ 3 with depth of 128, i.e., the feature maps having a size of 16 along x-axis, a size of 16 along y-axis, a size of 3 along z-axis, and a depth of 128 (exemplary only, and corresponding to number of features).
- the down-sampling path 220 is contracting because it processes an input image into feature maps that progressively reduce their resolutions through one or more layers of the down-sampling part 220 .
- the term “contraction path” may be alternatively referred to as “down-sampling path”.
- the FCN model 200 may include an up-sampling path 260 , which may be an expansion path to generate high-resolution feature maps for voxel-level prediction.
- the output 251 from the down-sampling path 220 may serve as an input to the up-sampling path 260 .
- An output 271 from the up-sampling path 260 may be feature maps of 128 ⁇ 128 ⁇ 24 with depth of 16, i.e., the feature maps having a size of 128 along x-axis, a size of 128 along y-axis, a size of 24 along z-axis, and a depth of 16.
- the up sampling path 260 processes these feature maps in one or more layers and in an opposite direction to that of the down-sampling path 220 and eventually generates a segmentation mask 290 with a resolution similar or equal to the input image.
- the term “expansion path” may be alternatively referred to as “up-sampling path”.
- the FCN model 200 may include a convolution step 280 , which is performed on the output of the up-sampling stage with highest resolution to generate map 281 .
- the convolution operation kernel in step 280 may be 1 ⁇ 1 ⁇ 1 for 3D images or 1 ⁇ 1 for 2D images.
- the FCN model 200 may include a rectifier, e.g., sigmoid step 280 , which take map 281 as input to generate voxel-wise binary classification probability prediction mask 290 .
- a rectifier e.g., sigmoid step 280
- the down-sampling path 220 and up-sampling path 260 may include any number of convolution layers, pooling layers, or de-convolutional layers.
- the size and number of features in each convolutional or de-convolutional layer may be predetermined and are not limited in this disclosure.
- the FCN model 200 may optionally include one or more steps 252 connecting the feature maps within the down-sampling path 220 with the feature maps within the up-sampling path 260 .
- steps 252 the output of de-convolution layers may be fed to the corresponding feature maps generated in the down-sampling path 220 with matching resolution (e.g., by concatenation, as shown below in connection with FIG. 4 ).
- Steps 252 may provide complementary high-resolution information into the up-sampling path 260 to enhance the final prediction mask, since the de-convolution layers only take coarse features from low-resolution layer as input.
- Feature maps at different convolution layer and with different level of resolution from the contraction CNN may be input into the expansion CNN as shown by the arrow 252 .
- the model parameters for the FCN include features or kernels used in various convolutional layers for generating the feature maps, their weight and bias, and other parameters.
- a set of features and other parameters may be learned such that patterns and textures in an input image with unknown label may be identified.
- this learning process may be challenging due to lack of a large number of samples of certain important texture or patterns in training images relative to other texture or patterns.
- disease image patches are usually much fewer than other normal image patches and yet it is extremely critical that these disease image patches are correctly identified by the FCN model and segmented as part of the lung.
- the large number of parameters in a typical multilayer FCN tend to over-fit the network even after data augmentation.
- the model parameters and the training process are preferably designed to reduce overfitting and promote identification of features that are critical but scarce in the labeled training images.
- an input image may then be processed through the down-sampling/contraction path 220 and up-sampling/expansion path 260 to generate a predicted segmentation mask.
- the predicted segmentation mask may be used as a filter for subsequent processing of the input image.
- the training process for the FCN 200 may involve forward-propagating each of a set of training images through the down-sampling/contraction path 220 and up-sampling/expansion path 260 .
- the set of training images are each associated with a label, e.g., a ground truth mask 291 .
- the training parameters such as all the convolutional features or kernels, various weights, and bias may be, e.g., randomly initialized.
- the output segmentation mask as a result of the forward-propagation may be compared with the ground truth mask 291 of the input image.
- a loss function 295 may be determined.
- the loss function 295 may include a soft max cross-entropy loss.
- the loss function 295 may include dice coefficient (DC) loss function. Other types of loss function are also contemplated. Then a back-propagation through the expansion path 260 and then the contraction path 220 may be performed based on, e.g., stochastic gradient descent, and aimed at minimizing the loss function 295 . By iterating the forward-propagation and back-propagation for the same input images, and for the entire training image set, the training parameters may converge to provide acceptable errors between the predicted masks and ground truth masks for all or most of the input images.
- DC dice coefficient
- the converged training parameters may form a final predictive model that may be further verified using test images and used to predict segmentation masks for images that the network has never seen before.
- the model is preferably trained to promote errors on the over-inclusive side to reduce or prevent false negatives in later stage of CAD based on a predicted mask.
- FIG. 3 describes one specific example of FCN. The working of the exemplary implementation of FIG. 3 is described below. More detailed description is included in U.S. application Ser. No. 15/943,392, filed on Apr. 2, 2018 by the same Applicant as the present application, which is incorporated herein by reference in its entirety.
- 2D/3D images may be fed into the FCN.
- the 2D/3D images may be MR image patches with size 128 ⁇ 128 ⁇ 24.
- the 2D/3D images 311 may be fed into step 312 , which includes one or more convolutional layers, for example and not limited to, two convolutional layers.
- a kernel size may be adopted and number of filters may increase or be the same.
- a kernel size of 3 ⁇ 3 ⁇ 3 voxels may be used in the convolution sub-step in step 312 .
- Each convolutional sub-step is followed by batch normalization (BN) and rectified-linear unit (ReLU) sub-step.
- BN batch normalization
- ReLU rectified-linear unit
- Number of features at each convolution layer may be predetermined.
- the number of features for step 312 may be 16.
- the output of convolution between the input and features may be processed by a ReLU to generate stacks of feature maps.
- the ReLUs may be of any mathematical form.
- the ReLUs may include but are not limited to noisy ReLUs, leaky ReLUs, and exponential linear units.
- the number of pixels in a resulting feature map may be reduced.
- a 100-pixel by 100-pixel input after convolution with a 5-pixel by 5-pixel feature sliding through the input with a stride of 1, the resulting feature map may be of 96 pixel by 96 pixel.
- the input image or a feature map may be, for example, zero padded around the edges to 104-pixel by 104-pixel such that the output feature map is the 100-pixel by 100-pixel resolution.
- the example below may use zero padding method, so that the resolution does not change before and after convolution function.
- the feature maps from step 312 may be max-pooled spatially to generate the input to a next layer below.
- the max-pooling may be performed using any suitable basis, e.g., 2 ⁇ 2, resulting in down-sampling of an input by a factor of 2 in each of the two spatial dimensions.
- spatial pooling other than max-pooling may be used.
- different layers of 314 , 324 , and 334 may be pooled using a same basis or different basis, and using a same or different pooling method.
- the feature maps 321 after max-pooling by a factor of 2, the feature maps 321 have a size of 64 ⁇ 64 ⁇ 12. Since the number of features used in step 312 is 16. The depth of the feature maps 321 may be 16.
- the feature map 321 may be fed into next convolution layer 322 with another kernel and number of features.
- the number of features in step 322 may be number 32.
- the output from step 324 may be feature maps 331 having a size of 32 ⁇ 32 ⁇ 6. Since the number of features used in step 322 is 32. The depth of the feature maps 331 may be 32.
- the feature map 331 may be fed into next convolution layer 332 with another kernel and number of features.
- the number of features in step 332 may be number 64.
- the output from step 334 may be feature maps 341 having a size of 16 ⁇ 16 ⁇ 3. Since the number of features used in step 332 is 64. The depth of the feature maps 341 may be 64.
- the feature map 341 may be fed into next convolution layer 342 with another kernel and number of features.
- the number of features in step 342 may be number 128.
- the output from step 342 may be feature maps 351 having a size of 16 ⁇ 16 ⁇ 3. There is no size reduction between feature maps 341 and feature maps 351 because there is no max-pooling steps in between. Since the number of features used in step 342 is 128. The depth of the feature maps 351 may be 128.
- the feature maps 351 of the final layer 342 of the down-sampling path may be input into an up-sampling path and processed upward through the expansion path.
- the expansion path may include one or more de-convolution layers, corresponding to convolution layers on the contraction path, respectively.
- the up-sampling path may involve increasing the number of pixels for feature maps in each spatial dimension, by a factor of, e.g., 2, but reducing the number of feature maps, i.e., reducing the depth of the feature maps. This reduction may be by a factor, e.g., 2, or this reduction may not be by an integer factor, for example, the previous layer has a depth of 128, and the reduced depth is 64.
- the feature maps 351 may be fed into a de-convolution operation 364 with 2 ⁇ 2 ⁇ 2 trainable kernels to increase/expand the size of input feature maps by a factor of 2, the output of the de-convolution operation 364 may be feature maps 361 having a size of 32 ⁇ 32 ⁇ 6. Since the number of features used in the de-convolution operation 364 is 64, the depth of the feature maps 361 is 64.
- the feature maps 361 may be fed into a step 362 .
- the step 362 may include one or more convolution layers.
- the step 362 includes two convolution layers.
- Each of the convolution layer includes a convolution function sub-step followed by batch normalization (BN) and rectified-linear unit (ReLU) sub-step.
- BN batch normalization
- ReLU rectified-linear unit
- connections 352 a , 352 b , and 352 c in FIG. 3 show, at each expansion layer of 362 , 372 , and 382 , the feature maps from the down-sampling path may be concatenated with the feature maps in the up-sampling path, respectively.
- the feature maps in step 332 of the down-sampling path has a size of 32 ⁇ 32 ⁇ 6 and a depth of 64.
- the feature maps 361 of the up-sampling path has a size of 32 ⁇ 32 ⁇ 6 and a depth of 64.
- These two feature maps may be concatenated together to form new feature maps having a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the new feature maps may be fed as the input into the step 362 .
- the connection 352 c may provide complementary high-resolution information, since the de-convolution layers only take course features from low-resolution layers as input.
- the output feature maps of step 362 may be fed into de-convolution layer 374 .
- the de-convolution layer may have a trainable kernel 2 ⁇ 2 ⁇ 2 to increase/expand the size of the input feature maps by a factor of 2.
- the output of the de-convolution operation 374 may be feature maps 371 having a size of 64 ⁇ 64 ⁇ 12. Since the number of features used in the de-convolution operation 374 is 32, the depth of the feature maps 371 is 32.
- the feature maps from step 322 in the down-sampling path may be concatenated with the feature maps 371 .
- the feature maps in step 322 of the down-sampling path has a size of 64 ⁇ 64 ⁇ 12 and a depth of 32.
- the feature maps 371 of the up-sampling path has a size of 64 ⁇ 64 ⁇ 12 and a depth of 32.
- These two feature maps may be concatenated together to form new feature maps having a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the new feature maps may be fed as the input into the step 372 .
- the output feature maps of step 372 may be fed into de-convolution layer 384 .
- the de-convolution layer may have a trainable kernel 2 ⁇ 2 ⁇ 2 to increase/expand the size of the input feature maps by a factor of 2.
- the output of the de-convolution operation 384 may be feature maps 381 having a size of 128 ⁇ 128 ⁇ 24. Since the number of features used in the de-convolution operation 384 is 16, the depth of the feature maps 381 is 16.
- the feature maps from step 312 in the down-sampling path may be concatenated with the feature maps 381 .
- the feature maps in step 312 of the down-sampling path has a size of 128 ⁇ 128 ⁇ 24 and a depth of 16.
- the feature maps 371 of the up-sampling path has a size of 128 ⁇ 128 ⁇ 24 and a depth of 16.
- These two feature maps may be concatenated together to form new feature maps having a size of 128 ⁇ 128 ⁇ 24 and a depth of 32.
- the new feature maps may be fed as the input into the step 382 .
- the output figure maps from step 382 may be fed into a convolution step 390 , which perform on the feature maps with highest resolution to generate feature maps 391 .
- the feature maps 391 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the convolution operation kernel in step 390 may be 1 ⁇ 1 ⁇ 1 for 3D images or 1 ⁇ 1 for 2D images.
- the feature maps 391 may be fed into a sigmoid step 392 , which take feature maps 391 as input to generate voxel-wise binary classification probabilities, which may be further determined to be the predicted segmentation mask for the tissue or lesion.
- FIG. 4 describes another embodiment of FCN for segmentation of 2D/3D images with either single or multiple channels.
- 3D images with multiple channels may be taken as an example.
- the FCN model in this disclosure is not limited to 3D images with multiple channels (FCN (3D, MC)).
- the FCN model in this embodiment may be applied to 2D images with multiple channels (FCN(2D, MC)), 3D images with single channel (FCN(3D, SC)), or 2D images with single channel (FCN(2D, SC)), as shown in FIG. 1 .
- the input 2D/3D images with multiple channels 401 may include 3D images having a size of 128 ⁇ 128 ⁇ 24 with three channels.
- the three channels may be multi-parametric modalities in MR imaging including T2W, T1 and DWI with highest b-value. Or in some other situations, the three channels may include red, green, and blue chromatic channels.
- each of the three channels may be processed independently.
- the three channels may be processed collectively, and as such, the first convolutional layer operating on the input images 401 has a 3 ⁇ 3 ⁇ 3 ⁇ 3 convolutional kernels.
- the FCN model in FIG. 4 may include a down-sampling/contracting path and a corresponding up-sampling/expansion path.
- the down-sampling/contracting path may include one or more convolutional layers 412 .
- the convolutional layer 412 may comprise 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 412 may be 32, so that the feature maps of the convolution layer 412 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 32.
- the FCN model in FIG. 4 may include convolution layers 414 with strides greater than 1 for gradually reducing the resolution of feature maps and increase the receptive field to incorporate more spatial information.
- the convolution layer 414 may have a stride of 2 along x-axis, y-axis, and z-axis, so that the resolution along x-axis, y-axis, and z-axis all reduced by a factor of 2.
- the number of features in the convolution layer 414 may be 64, so that the output feature maps 421 of the convolution layer 414 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the feature map may be fed into one or more convolution layers 422 .
- Each of the convolutional layers 422 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 422 may be 64, so that the output feature maps 423 of the convolution layer 422 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function.
- the feature maps 421 and 423 may be added together.
- the output feature maps 425 of operator ⁇ may be a summation of feature maps 421 and 423 element by element.
- the feature maps 425 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the feature maps 425 may be fed into a convolution layer 424 with a stride of 2 along x-axis, y-axis, and z-axis, so that the resolution along x-axis, y-axis, and z-axis is reduced by a factor of 2.
- the number of features in the convolution layer 424 may be 128, so that the output feature maps 431 of the convolution layer 424 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the feature map 431 may be fed into one or more convolution layers 432 .
- Each of the convolutional layers 432 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 432 may be 128, so that the output feature maps 433 of the convolution layers 432 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function.
- the feature maps 431 and 433 may be added together.
- the output feature maps 435 of operator ⁇ may be an element by element summation of feature maps 431 and 433 .
- the feature maps 435 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the feature maps 435 may be fed into a convolution layer 434 with a stride of 2 along x-axis, y-axis, and z-axis, so that the resolution along x-axis, y-axis, and z-axis isl reduced by a factor of 2.
- the number of features in the convolution layer 434 may be 256, so that the output feature maps 441 of the convolution layer 434 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 256.
- the feature map 441 may be fed into one or more convolution layers 442 .
- Each of the convolutional layers 442 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 442 may be 256, so that the output feature maps 443 of the convolution layers 442 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 256.
- input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function.
- the feature maps 441 and 443 may be added together.
- the output feature maps 445 of the operator ⁇ may be an element-by-element summation of feature maps 441 and 443 .
- the feature maps 445 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 256.
- the feature maps 445 may be fed into a convolution layer 444 with a stride of 2 along x-axis and y-axis, and with a stride of 1 along z-axis, so that the resolution along x-axis and y-axis is reduced by a factor of 2 and the resolution along z-axis remains the same.
- the number of features in the convolution layer 444 may be 512, so that the output feature maps 451 of the convolution layer 444 may have a size of 8 ⁇ 8 ⁇ 3 and a depth of 512.
- the feature map 451 may be fed into one or more convolution layers 452 .
- Each of the convolutional layers 452 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 452 may be 512, so that the output feature maps 453 of the convolution layers 452 may have a size of 8 ⁇ 8 ⁇ 3 and a depth of 512.
- input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function.
- the feature maps 451 and 453 may be added together.
- the output feature maps 455 of operator ⁇ may be an element-by-element summation of feature maps 451 and 453 .
- the feature maps 455 may have a size of 8 ⁇ 8 ⁇ 3 and a depth of 512.
- the FCN model in FIG. 4 may include a corresponding up-sampling/expansion path to increase the resolution of feature maps generated from the down-sampling/contracting path.
- the feature maps generated in the contracting path may be concatenated with the output of de-convolutional layers to incorporate high-resolution information.
- the up-sampling/expansion path may include a de-convolution layer 464 .
- the de-convolution layer 464 may de-convolute input feature maps 455 to increase its resolution by a factor of 2 along x-axis and y-axis.
- the output feature maps 461 of the de-convolution layer 464 may have a size of 16 ⁇ 16 ⁇ 3.
- the number of features in the de-convolution layer 464 may be 256, so that the output feature maps 461 of the convolution layer 464 may have a depth of 256.
- the feature maps 443 generated from the convolution layer 442 may be concatenated with the feature maps 461 generated from the de-convolution layer 464 .
- the feature maps 443 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 256.
- the feature maps 461 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 256.
- the cascaded feature maps 463 may correspondingly have a size of 16 ⁇ 16 ⁇ 3 and a depth of 512.
- the feature map 463 may be fed into one or more convolution layers 462 .
- Each of the convolutional layers 462 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 462 may be 256, so that the output feature maps 465 of the convolution layers 462 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 256.
- the feature maps 461 and 465 may be added together.
- the output feature maps 467 of the operator ⁇ 466 may be an element-by-element summation of feature maps 461 and 465 .
- the feature maps 467 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 512.
- the feature maps 467 may be fed into a de-convolution layer 474 .
- the de-convolution layer 474 may de-convolute input feature maps 467 to increase its resolution by a factor of 2 along x-axis, y-axis and z-axis.
- the output feature maps 471 of the de-convolution layer 474 may have a size of 32 ⁇ 32 ⁇ 6.
- the number of features in the de-convolution layer 474 may be 128, so that the output feature maps 471 of the convolution layer 474 may have a depth of 128.
- the feature maps 433 generated from the convolution layer 432 may be concatenated with the feature maps 471 generated from the de-convolution layer 474 .
- the feature maps 433 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the feature maps 471 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the concatenated feature maps 473 may correspondingly have a size of 32 ⁇ 32 ⁇ 6 and a depth of 256.
- the feature map 473 may be fed into one or more convolution layers 472 .
- Each of the convolutional layers 472 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 472 may be 128, so that the output feature maps 475 of the convolution layers 472 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the feature maps 471 and 475 may be added together.
- the output feature maps 477 of the operator ⁇ 476 may be an element-by-element summation of feature maps 471 and 475 .
- the feature maps 477 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 128.
- the feature maps 477 may be fed into a de-convolution layer 484 .
- the de-convolution layer 484 may de-convolute input feature maps 477 to increase its resolution by a factor of 2 along x-axis, y-axis and z-axis.
- the output feature maps 481 of the de-convolution layer 484 may have a size of 64 ⁇ 64 ⁇ 12.
- the number of features in the de-convolution layer 484 may be 64, so that the output feature maps 481 of the convolution layer 484 may have a depth of 64.
- the feature maps 423 generated from the convolution layer 422 may be concatenated with the feature maps 481 generated from the de-convolution layer 484 .
- the feature maps 423 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the concatenated feature maps 481 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the feature maps 483 may correspondingly have a size of 64 ⁇ 64 ⁇ 12 and a depth of 128.
- the feature map 483 may be fed into one or more convolution layers 482 .
- Each of the convolutional layers 482 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 482 may be 64, so that the output feature maps 485 of the convolution layers 482 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the feature maps 481 and 485 may be added together.
- the output feature maps 487 of ⁇ 486 may be an element-by-element summation of feature maps 481 and 485 .
- the feature maps 487 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 64.
- the feature maps 487 may be fed into a de-convolution layer 494 .
- the de-convolution layer 494 may de-convolute input feature maps 487 to increase its resolution by a factor of 2 along x-axis, y-axis and z-axis.
- the output feature maps 491 of the de-convolution layer 494 may have a size of 128 ⁇ 128 ⁇ 24.
- the number of features in the de-convolution layer 494 may be 32, so that the output feature maps 491 of the convolution layer 494 may have a depth of 32.
- the feature maps 413 generated from the convolution layer 412 may be concatenated with the feature maps 491 generated from the de-convolution layer 494 .
- the feature maps 413 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 32.
- the feature maps 491 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 32.
- the concatenated feature maps 493 may correspondingly have a size of 128 ⁇ 128 ⁇ 24 and a depth of 64.
- the feature map 493 may be fed into one or more convolution layers 492 .
- Each of the convolutional layers 492 may comprise a 3 ⁇ 3 ⁇ 3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features.
- the number of features in the convolution layer 492 may be 32, so that the output feature maps 495 of the convolution layers 492 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 32.
- the feature maps 491 and 495 may be added together.
- the output feature maps 497 of ⁇ 496 may be an element-by-element summation of feature maps 491 and 495 .
- the feature maps 497 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 32.
- the feature maps 497 may be fed into a convolutional layer 498 to generate voxel-wise binary classification probabilities.
- the convolutional layer 498 may include a 1 ⁇ 1 ⁇ 1 kernel and sigmoid activation function.
- Dice loss may be adopted as the objective function, and the dice loss may be expressed as
- Post-processing steps may be then applied to refine the initial segmentation generated by the FCN model.
- a 3D Gaussian filter may be used to smooth the predicted probability maps and a connected component analysis may be used to remove small isolated components.
- FCN Fully Convolutional Network
- the present disclosure describes an FCN with coarse-to-fine architecture, which takes advantages of residual learning and deep supervision, in order to improve the segmentation performance and training efficiency.
- the FCN model includes a down-sampling/contraction path 504 and an up-sampling/expansion path 508 , similar to the model described in FIG. 3 .
- Auxiliary convolutional layers 522 , 532 , 542 , and 552 may be connected to feature maps 521 , 531 , 541 , and 551 with progressive resolution in the expansion path 508 , in order to generate single feature maps which are then up-sampled and fed into a sigmoid function to obtain auxiliary predictions 559 .
- the auxiliary predictions 559 may be used to further determine the mask for the organ or lesion.
- the output segmentation mask generated from the auxiliary predictions 559 may be compared with the ground truth mask of the input images.
- a loss function may be determined.
- the loss function may include a soft max cross-entropy loss.
- the loss function may include dice coefficient (DC) loss function.
- the input 2D/3D images 501 in FIG. 5 may be, for example and not limited to, 3D images having a size of 128 ⁇ 128 ⁇ 24.
- the feature maps 521 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of 128;
- the feature maps 531 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of 64;
- the feature maps 541 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of 32;
- the feature maps 551 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 16.
- the feature maps 521 may be fed into an auxiliary convolutional layer 522 .
- the auxiliary convolutional layer 522 may have a kernel size of 1 ⁇ 1 ⁇ 1 for 3D images or 1 ⁇ 1 for 2D images.
- the output feature maps 523 may have a size of 16 ⁇ 16 ⁇ 3 and a depth of one.
- the output feature maps 523 may be fed into a de-convolutional layer 524 to increase the resolution, so as to generate feature maps 527 having a size of 32 ⁇ 32 ⁇ 6.
- the feature maps 531 may be fed into an auxiliary convolutional layer 532 .
- the auxiliary convolutional layer 532 may have a kernel size of 1 ⁇ 1 ⁇ 1 for 3D images or 1 ⁇ 1 for 2D images.
- the output feature maps 533 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of one.
- the output feature maps 533 may be added onto the feature maps 527 in an element-by-element manner to generate feature maps 535 .
- the feature maps 535 may have a size of 32 ⁇ 32 ⁇ 6 and a depth of one.
- the feature maps 535 may be fed into a de-convolutional layer 536 to increase the resolution, so as to generate feature maps 537 having a size of 64 ⁇ 64 ⁇ 12.
- the feature maps 541 may be fed into an auxiliary convolutional layer 542 .
- the auxiliary convolutional layer 542 may have a kernel size of 1 ⁇ 1 ⁇ 1 for 3D images or 1 ⁇ 1 for 2D images.
- the output feature maps 543 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of one.
- the output feature maps 543 may be added onto the feature maps 537 element-by-element to generate feature maps 545 .
- the feature maps 545 may have a size of 64 ⁇ 64 ⁇ 12 and a depth of one.
- the feature maps 545 may be fed into a de-convolutional layer 546 to increase the resolution, so as to generate feature maps 547 having a size of 128 ⁇ 128 ⁇ 24.
- the feature maps 551 may be fed into an auxiliary convolutional layer 552 .
- the auxiliary convolutional layer 552 may have a kernel size of 1 ⁇ 1 ⁇ 1 for 3D images or 1 ⁇ 1 for 2D images.
- the output feature maps 553 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the output feature maps 553 may be added onto the feature maps 547 in element-by-element to generate feature maps 555 .
- the feature maps 555 may have a size of 128 ⁇ 128 ⁇ 64 and a depth of one.
- the feature maps 555 may be fed into a sigmoid function, which take feature maps 555 as input to generate auxiliary 4 ( 559 ), which includes voxel-wise binary classification probabilities and may be further processed to be the mask for the tissue or lesion.
- FIG. 6A Another enhanced embodiment of an FCN model 600 is shown in FIG. 6A .
- the feature maps with reduced resolutions in the up-sampling/expansion path may be fed into auxiliary convolutional layers, up-sampled to an original resolution, and then fed into a sigmoid function to generate their corresponding auxiliary predictions.
- the one or more auxiliary predictions may be combined together to generate the mask for the tissue or lesion.
- the feature maps 521 may be fed into an auxiliary convolutional layer 522 to generate feature maps 523 having a size of 16 ⁇ 16 ⁇ 3 and a depth of one.
- the output feature maps 523 may be fed into one or more de-convolutional layers to generate feature maps 528 and expand the resolution from 16 ⁇ 16 ⁇ 3 to 128 ⁇ 128 ⁇ 24.
- the one or more de-convolutional layers with respect to feature map 521 may include three de-convolutional layers and each of the three de-convolutional layers may expand the resolution by a factor of 2 in x-axis, y-axis and z-axis.
- the one or more de-convolutional layers with respect to feature map 521 may include one de-convolutional layer, which may expand the resolution by a factor of 8 in x-axis, y-axis and z-axis.
- the feature maps 528 may recover the full resolution and may be then fed into a sigmoid function to obtain auxiliary 1 ( 529 ).
- the feature maps 535 at the second lowest resolution layer may be fed into one or more de-convolutional layers to generate feature maps 538 and expand the resolution from 32 ⁇ 32 ⁇ 6 to 128 ⁇ 128 ⁇ 24 (full resolution).
- the one or more de-convolutional layers here may include two de-convolutional layers and each of the two de-convolutional layers may expand the resolution by a factor of 2 in x-axis, y-axis and z-axis.
- the one or more de-convolutional layers may include one de-convolutional layer, which may expand the resolution by a factor of 4 in x-axis, y-axis and z-axis.
- the feature maps 538 may be fed into a sigmoid function to obtain auxiliary 2 ( 539 ).
- the feature maps 545 at the second highest resolution layer may be fed into a de-convolutional layer to generate feature maps 548 and expand the resolution from 64 ⁇ 64 ⁇ 12 to 128 ⁇ 128 ⁇ 24 (full resolution).
- the de-convolutional layer may expand the resolution by a factor of 2 in x-axis, y-axis and z-axis.
- the feature maps 548 may be fed into a sigmoid function to obtain auxiliary 3 ( 549 ).
- the one or more auxiliary predictions 529 , 539 , 549 , and 559 in FIG. 6A may be combined together to generate the mask for the tissue or lesion. The manner of combination will be discussed in more detail below.
- the auxiliary predictions 529 , 539 , 549 , and 559 may be concatenated together in step 620 to generate a concatenated auxiliary prediction 621 .
- the concatenated auxiliary prediction 621 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of four.
- the concatenated auxiliary prediction 621 may be fed into a convolutional layer with kernel size of 1 ⁇ 1 ⁇ 1 to generate auxiliary prediction 623 .
- the auxiliary prediction 623 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the auxiliary prediction 623 may be fed into a sigmoid function to obtain final prediction 629 .
- the final prediction 629 may be used to further determine the mask for the tissue or lesion.
- the output segmentation mask generated from the prediction 629 may be compared with the ground truth mask of the input images.
- a loss function may be determined.
- the loss function may include a soft max cross-entropy loss.
- the loss function may include dice coefficient (DC) loss function.
- FIG. 6B may be further augmented as shown in FIG. 6C , where an auto-context strategy may be further used in the FCN model.
- the input 2D/3D images 501 in addition to the auxiliary predictions 529 , 539 , 549 , and 559 may be concatenated together in step 620 to generate a combined auxiliary prediction 621 .
- the concatenated auxiliary prediction 621 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of five (compared with the depth of 4 for the implementation of FIG. 6B ).
- the concatenated auxiliary prediction 621 may be fed into a convolutional layer with kernel size of 1 ⁇ 1 ⁇ 1 to generate auxiliary prediction 623 .
- the auxiliary prediction 623 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the auxiliary prediction 623 may be fed into a sigmoid function to obtain final prediction 629 .
- the final prediction 629 may be used to further determine the mask for the tissue or lesion.
- FIG. 6D Another implementation for combining auxiliary predictions 529 , 539 , 549 , and 559 is shown in FIG. 6D where an element-by-element summation rather than concatenation is used.
- the auxiliary predictions 529 , 539 , 549 , and 559 may be added together in step 640 element-by-element to generate a combined auxiliary prediction 641 .
- the combined auxiliary prediction 641 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the combined auxiliary prediction 641 may be fed into a sigmoid function 642 to obtain final prediction 649 .
- the final prediction 649 may be used to further determine the mask for the tissue or lesion.
- FIG. 6D may be further augmented as shown in FIG. 6E .
- the input 2D/3D images 501 in addition to the auxiliary predictions 529 , 539 , 549 , and 559 may be added together in step 640 in an element-by-element manner to generate a combined auxiliary prediction 641 .
- the combined auxiliary prediction 641 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the combined auxiliary prediction 641 may be fed into a sigmoid function 642 to obtain final prediction 649 .
- the final prediction 629 may be used to further determine the mask for the tissue or lesion of the input image.
- FCN Fully Convolutional Network
- FIG. 6F Another implementation of the FCN based on FIG. 6A is shown in FIG. 6F , where the combination of the auxiliary prediction masks 529 , 539 , 549 , and 559 may be further processed by a densely connected convolutional (DenseConv) network to extract auto-context features prior to obtaining the final prediction mask 669 .
- DenseConv densely connected convolutional
- the auxiliary predictions 529 , 539 , 549 , and 559 may be concatenated together in step 660 to generate a concatenated auxiliary prediction, also referred to as auto-context input 661 .
- the concatenated auxiliary prediction 661 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of four.
- the auto-context input 661 may be fed into a DenseConv module 662 to generate a prediction map 663 .
- the prediction map 663 may have a depth of one or more.
- the prediction map 663 may be fed into a convolutional layer 664 to generate a prediction map 665 .
- the prediction map 665 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the prediction map 665 may be optionally added together with the feature maps 555 in an element-by-element manner to generate a prediction map 667 .
- the prediction map 667 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the prediction map 667 may be fed into a sigmoid function 668 to obtain final prediction 669 .
- the final prediction 669 may be used to further determine the mask for the tissue or lesion.
- the output segmentation mask generated from the prediction 629 may be compared with the ground truth mask of the input images.
- a loss function may be determined.
- the loss function may include a soft max cross-entropy loss.
- the loss function may include dice coefficient (DC) loss function.
- FIG. 6F may be further augmented as shown in FIG. 6G .
- the input 2D/3D images 501 in addition to the auxiliary predictions 529 , 539 , 549 , and 559 may be concatenated together in step 660 to generate the auto-context input 661 .
- the concatenated auxiliary input at 661 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of five rather than the depth of 4 in FIG. 6F .
- the auto-context input 661 may be fed into a DenseConv module 662 to generate a prediction map 663 .
- the prediction map 663 may have a depth of one or more.
- the prediction map 663 may be fed into a convolutional layer 664 to generate a prediction map 665 .
- the prediction map 665 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the prediction map 665 in FIG. 6G may be added together with the feature maps 555 in an element-by-element manner to generate a prediction map 667 .
- the prediction map 667 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of one.
- the prediction map 667 may be fed into a sigmoid function 668 to obtain final prediction 669 .
- the final prediction 669 may be used to further determine the mask for the tissue or lesion.
- the output segmentation mask generated from the prediction 629 may be compared with the ground truth mask of the input images.
- a loss function may be determined.
- the loss function may include a soft max cross-entropy loss.
- the loss function may include dice coefficient (DC) loss function.
- the DenseConv module 662 above in FIG. 6F and FIG. 6G may include one or more convolutional layers.
- One embodiment is shown in FIG. 7 for a DenseConv module 700 including six convolutional layers 710 , 720 , 730 , 740 , 750 , and 760 .
- Each of the six convolutional layers may include a convolutional layer with an exemplary kernel size 3 ⁇ 3 ⁇ 3 and with a number of feature maps of 16.
- Each of the six convolution layers may further include a batch normalization (BN) layer, and an ReLU layer.
- BN batch normalization
- An auto-context input 701 may be fed into the convolutional layer 710 to generate feature maps 713 .
- the auto-context input 701 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of four.
- the feature maps 713 may have a size of 128 ⁇ 128 ⁇ 24 and a depth of 16.
- the input of the current convolutional layer may be concatenated with the output of the current convolutional layer to generate the input for the next convolutional layer.
- the auto-context input 701 may be concatenated with the output 713 of the convolutional layer 710 to generate the feature maps 715 .
- the feature maps 715 may be the input for the convolutional layer 720 .
- the feature maps 715 may be fed into the convolutional layer 720 to generate the feature maps 723 .
- the feature maps 723 may be concatenated with the feature maps 715 to generate the feature maps 725 .
- the feature maps 725 may be the input for the convolutional layer 730 .
- the feature maps 725 may be fed into the convolutional layer 730 to generate the feature maps 733 .
- the feature maps 733 may be concatenated with the feature maps 725 to generate the feature maps 735 .
- the feature maps 735 may be the input for the convolutional layer 740 .
- the feature maps 735 may be fed into the convolutional layer 740 to generate the feature maps 743 .
- the feature maps 743 may be concatenated with the feature maps 735 to generate the feature maps 745 .
- the feature maps 745 may be the input for the convolutional layer 750 .
- the feature maps 745 may be fed into the convolutional layer 750 to generate the feature maps 753 .
- the feature maps 753 may be concatenated with the feature maps 745 to generate the feature maps 755 .
- the feature maps 755 may be the input for the convolutional layer 760 .
- the feature maps 755 may be fed into the convolutional layer 760 to generate the feature maps, which serve as the output 791 of the DenseConv module 700 .
- the six-convolutional layer DenseConv implementation above is merely exemplary.
- the DenseConv module 700 may include any number of convolutional layers, and each of the convolutional layers may use any kernel size and/or include any number of features.
- the output segmentation mask generated in forward-propagation from an FCN model may be compared with the ground truth mask of the input images.
- a loss function may be determined.
- the loss function may include a soft max cross-entropy loss.
- the loss function may include dice coefficient (DC) loss function.
- DC dice coefficient
- the converged training parameters may form a final predictive model that may be further verified using test images and used to predict segmentation or lesion masks for images that the network has never seen before.
- the FCN model is preferably trained to promote errors on the over-inclusive side to reduce or prevent false negatives in later stage of CAD based on a predicted mask.
- the training process may include three steps as shown in FIG. 8 .
- Step 810 may include training Stage 1 (of the FCN model, e.g., FIGS. 6F and 6G ) for generating prediction masks based on one or more Auxiliary outputs in Stage 1 by comparing the prediction masks with the ground truth masks.
- the one or more Auxiliary output in Stage 1 may include one or more from the Auxiliary 1 - 4 ( 529 , 539 , 549 , and 559 ).
- Step 820 may include fixing the training parameter in Stage 1 and training Stage 2 (of the FCN model, e.g., FIGS. 6F and 6G ) by generating prediction masks based on the output of Stage 2 and comparing the prediction masks with the ground truth masks.
- Stage 2 may include a DenseConv module, and the output of Stage 2 may include Prediction 669 .
- Step 830 may include fine tuning and training of Stage 1 and Stage 2 jointly by using the model parameters obtained in steps 810 and 820 as initial values and further performing forward and back-propagation training processes based on the output of Stage 2 .
- Stage 2 may include a DenseConv module, and the output of Stage 2 may include Prediction 669 .
- FIG. 9 shows one embodiment of a system 900 including two different FCN models: a first FCN 920 and a second FCN 930 .
- the first FCN 920 and the second FCN 930 may be different in term of their architecture, for example, the first FCN 920 may be an FCN model similar to the model in FIG. 3 , and the second FCN 930 may be an FCN model similar the model in FIG. 6F .
- the first FCN 920 and the second FCN 930 may be different in term of how the multiple channels are processed, for example, the first FCN 920 may be an FCN model processing each channel individually and independently (e.g., FCN(2D/3D, SC), as described above), and the second FCN 930 may be an FCN model processing all multiple channels together in an aggregated manner (e.g., FCN (2D/3D, MC), as described above).
- a first predication 921 may be generated from the first FCN 920 and a second prediction 931 may be generated from the second FCN 930 .
- the first predication 921 and second prediction 931 may be fed into a parameterized comparator 950 to determine which prediction of 921 and 931 to select, so as to generate a final prediction 990 .
- the parameters for the comparator may be trained.
- the selection by the comparator 950 may be performed at an individual pixel/voxel level, i.e., the comparator 950 may select, for each individual pixel/voxel, which probability for the corresponding pixel/voxel, out of the first predication 921 and second prediction 931 , as the final prediction for the corresponding pixel/voxel.
- the FCN image segmentation and/or lesion detection above may be implemented as a computer platform 1000 shown in FIG. 10 .
- the computer platform 1000 may include one or more training servers 1004 and 1006 , one or more prediction engines 1008 and 1010 , one or more databases 1012 , one or more model repositories 1002 , and user devices 1014 and 1016 associated with users 1022 and 1024 . These components of the computer platform 1000 are inter-connected and in communication with one another via public or private communication networks 1001 .
- the training servers and prediction engine 1004 , 1006 , 1008 , and 1010 may be implemented as a central server or a plurality of servers distributed in the communication networks.
- the training servers 1004 and 1006 may be responsible for training the FCN segmentation model discussed above.
- the prediction engine 1008 and 1010 may be responsible to analyze an input image using the FCN segmentation model to generate a segmentation mask for the input image. While the various servers are shown in FIG. 10 as implemented as separate servers, they may be alternatively combined in a single server or single group of distributed servers combining the functionality of training and prediction.
- the user devices 1014 and 1016 may be of any form of mobile or fixed electronic devices including but not limited to desktop personal computer, laptop computers, tablets, mobile phones, personal digital assistants, and the like. The user devices 1014 and 1016 may be installed with a user interface for accessing the digital platform.
- the one or more databases 1012 of FIG. 10 may be hosted in a central database server or a plurality of distributed database servers.
- the one or more databases 1020 may be implemented as being hosted virtually in a cloud by a cloud service provider.
- the one or more databases 1020 may organize data in any form, including but not limited to relational database containing data tables, graphic database containing nodes and relationships, and the like.
- the one or more databases 1020 may be configured to store, for example, images and their labeled masks collected from various sources. These images and labels may be used as training data corpus for the training server 1006 for generating DCNN segmentation models.
- the one or more model repositories 602 may be used to store, for example, the segmentation model with its trained parameters.
- the model repository 602 may be integrated as part of the predictive engines 608 and 601 .
- FIG. 11 shows an exemplary computer system 1100 for implementing any of the computing components of FIGS. 1-10 .
- the computer system 1100 may include communication interfaces 1102 , system circuitry 1104 , input/output (I/O) interfaces 1106 , storage 1109 , and display circuitry 1108 that generates machine interfaces 1110 locally or for remote display, e.g., in a web browser running on a local or remote machine.
- the machine interfaces 1110 and the I/O interfaces 1106 may include GUIs, touch sensitive displays, voice or facial recognition inputs, buttons, switches, speakers and other user interface elements.
- I/O interfaces 1106 include microphones, video and still image cameras, headset and microphone input/output jacks, Universal Serial Bus (USB) connectors, memory card slots, and other types of inputs.
- the I/O interfaces 1106 may further include magnetic or optical media interfaces (e.g., a CDROM or DVD drive), serial and parallel bus interfaces, and keyboard and mouse interfaces.
- the communication interfaces 1102 may include wireless transmitters and receivers (“transceivers”) 1112 and any antennas 1114 used by the transmitting and receiving circuitry of the transceivers 1112 .
- the transceivers 1112 and antennas 1114 may support Wi-Fi network communications, for instance, under any version of IEEE 802.11, e.g., 802.11n or 802.11ac.
- the communication interfaces 1102 may also include wireline transceivers 1116 .
- the wireline transceivers 1116 may provide physical layer interfaces for any of a wide range of communication protocols, such as any type of Ethernet, data over cable service interface specification (DOCSIS), digital subscriber line (DSL), Synchronous Optical Network (SONET), or other protocol.
- DOCSIS data over cable service interface specification
- DSL digital subscriber line
- SONET Synchronous Optical Network
- the storage 1109 may be used to store various initial, intermediate, or final data or model needed for the implantation of the computer platform 1000 .
- the storage 1109 may be separate or integrated with the one or more databases 1012 of FIG. 10 .
- the storage 1109 may be centralized or distributed, and may be local or remote to the computer system 1100 .
- the storage 1109 may be hosted remotely by a cloud computing service provider.
- the system circuitry 1104 may include hardware, software, firmware, or other circuitry in any combination.
- the system circuitry 1104 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), microprocessors, discrete analog and digital circuits, and other circuitry.
- SoC systems on a chip
- ASIC application specific integrated circuits
- the system circuitry 1104 is part of the implementation of any desired functionality related to the reconfigurable computer platform 1000 .
- the system circuitry 1104 may include one or more instruction processors 1118 and memories 1120 .
- the memories 1120 stores, for example, control instructions 1126 and an operating system 1124 .
- the instruction processors 1118 executes the control instructions 1126 and the operating system 1124 to carry out any desired functionality related to the reconfigurable computer platform 1000 .
- circuitry that includes an instruction processor, such as a Central Processing Unit (CPU), microcontroller, or a microprocessor; an Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD), or Field Programmable Gate Array (FPGA); or circuitry that includes discrete logic or other circuit components, including analog circuit components, digital circuit components or both; or any combination thereof.
- the circuitry may include discrete interconnected hardware components and/or may be combined on a single integrated circuit die, distributed among multiple integrated circuit dies, or implemented in a Multiple Chip Module (MCM) of multiple integrated circuit dies in a common package, as examples.
- MCM Multiple Chip Module
- the circuitry may further include or access instructions for execution by the circuitry.
- the instructions may be stored in a tangible storage medium that is other than a transitory signal, such as a flash memory, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM); or on a magnetic or optical disc, such as a Compact Disc Read Only Memory (CDROM), Hard Disk Drive (HDD), or other magnetic or optical disk; or in or on another machine-readable medium.
- a product such as a computer program product, may include a storage medium and instructions stored in or on the medium, and the instructions when executed by the circuitry in a device may cause the device to implement any of the processing described above or illustrated in the drawings.
- the implementations may be distributed as circuitry among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems.
- Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may be implemented in many different ways, including as data structures such as linked lists, hash tables, arrays, records, objects, or implicit storage mechanisms.
- Programs may be parts (e.g., subroutines) of a single program, separate programs, distributed across several memories and processors, or implemented in many different ways, such as in a library, such as a shared library (e.g., a Dynamic Link Library (DLL)).
- the DLL may store instructions that perform any of the processing described above or illustrated in the drawings, when executed by the circuitry.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Public Health (AREA)
- Computational Linguistics (AREA)
- Radiology & Medical Imaging (AREA)
- Multimedia (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Quality & Reliability (AREA)
- Pathology (AREA)
- Image Analysis (AREA)
Abstract
This disclosure relates to digital image segmentation, region of interest identification, and object recognition. This disclosure describes a method, a system, for image segmentation based on fully convolutional neural network including an expansion neural network and contraction neural network. The various convolutional and deconvolution layers of the neural networks are architected to include a coarse-to-fine residual learning module and learning paths, as well as a dense convolution module to extract auto context features and to facilitate fast, efficient, and accurate training of the neural networks capable of producing prediction masks of regions of interest. While the disclosed method and system are applicable for general image segmentation and object detection/identification, they are particularly suitable for organ, tissue, and lesion segmentation and detection in medical images.
Description
- This application is a continuation application of U.S. patent application Ser. No. 16/104,449 filed on Aug. 17, 2018, which is incorporated by reference in its entirety.
- This disclosure relates generally to computer segmentation and object detection in digital images and particularly to segmentation of multi-dimensional and multi-channel medical images and to detection of lesions based on convolutional neural networks.
- A digital image may contain one or more regions of interest (ROIs). In many applications, only image data contained within the one or more ROIs of a digital image may need to be retained for further processing and for information extraction by computers. Efficient and accurate identification of these ROIs thus constitutes a critical step in image processing applications, including but not limited to applications that handle high-volume and/or real-time digital images. Each ROI of a digital image may contain pixels forming patches with drastic variation in texture and pattern, making accurate and efficient identification of the boundary between these ROIs and the rest of the digital image a challenging task for a computer. In some imaging processing applications, an entire ROI or a subsection of an ROI may need to be further identified and classified. For example, in the field of computer-aided medical image analysis and diagnosis, an ROI in a medical image may correspond to a particular organ of a human body and the organ region of the image may need to be further processed to identify, e.g., lesions within the organ, and to determine the nature of the identified lesions.
- This disclosure is directed to an enhanced convolutional neural network including a contraction neural network and an expansion neural network. These neural networks are connected in tandem and are enhanced using a coarse-to-fine architecture and densely connected convolutional module to extract auto-context features for more accurate and more efficient segmentation and object detection in digital images.
- The present disclosure describes a method for image segmentation. The method includes receiving, by a computer comprising a memory storing instructions and a processor in communication with the memory, a set of training images labeled with a corresponding set of ground truth segmentation masks. The method includes establishing, by the computer, a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem. The method includes iteratively training, by the computer, the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by down-sampling, by the computer, a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image; up-sampling, by the computer, the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map and a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map; generating, by the computer based on the first feature map and the second feature map, a predictive segmentation mask for the training image; generating, by the computer based on a loss function, an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image; back-propagating, by the computer, the end loss through the full convolutional neural network; and minimizing, by the computer, the end loss by adjusting a set of training parameters of the fully convolutional neural network using gradient descent.
- The present disclosure also describes a computer image segmentation system for digital images. The computer image segmentation system for digital images includes a communication interface circuitry; a database; a predictive model repository; and a processing circuitry in communication with the database and the predictive model repository. The processing circuitry configured to: receive a set of training images labeled with a corresponding set of ground truth segmentation masks. The processing circuitry configured to establish a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem. The processing circuitry configured to iteratively train the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by configuring the processing circuitry to: down-sample a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image; up-sample the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map and a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map; generate, based on the first feature map and the second feature map, a predictive segmentation mask for the training image; generate, based on a loss function, an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image; back-propagating, by the computer, the end loss through the full convolutional neural network; and minimizing, by the computer, the end loss by adjusting a set of training parameters of the fully convolutional neural network using gradient descent.
- The present disclosure also describes a non-transitory computer readable storage medium storing instructions. The instructions, when executed by a processor, cause the processor to receive a set of training images labeled with a corresponding set of ground truth segmentation masks. The instructions, when executed by a processor, cause the processor to establish a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem. The instructions, when executed by a processor, cause the processor to iteratively train the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by causing the processor to: down-sample a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image; up-sample the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map and a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map; generate, based on the first feature map and the second feature map, a predictive segmentation mask for the training image; generate, based on a loss function, an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image; back-propagating, by the computer, the end loss through the full convolutional neural network; and minimizing, by the computer, the end loss by adjusting a set of training parameters of the fully convolutional neural network using gradient descent.
-
FIG. 1 illustrates a general data/logic flow of various fully convolutional neural networks (FCNs) for implementing image segmentation and object detection. -
FIG. 2 illustrates an exemplary general implementation and data/logic flows of the fully convolutional neural network ofFIG. 1 . -
FIG. 3 illustrates an exemplary implementation and data/logic flows of the fully convolutional neural network ofFIG. 1 . -
FIG. 4 illustrates another exemplary implementation and data/logic flows of the fully convolutional neural network ofFIG. 1 . -
FIG. 5 illustrates an exemplary implementation and data/logic flows of the fully convolutional neural network enhanced by a coarse-to-fine architecture. -
FIG. 6A illustrates an exemplary implementation and data/logic flows of the fully convolutional neural network having a coarse-to-fine architecture with auxiliary segmentation masks. -
FIG. 6B illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network ofFIG. 6A that combines the auxiliary segmentation masks by concatenation. -
FIG. 6C illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network ofFIG. 6B with further improvement. -
FIG. 6D illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network ofFIG. 6A that combines the auxiliary segmentation masks by summation. -
FIG. 6E illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network ofFIG. 6D with further improvement. -
FIG. 6F illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network ofFIG. 6B as stage I, further including a dense-convolution (DenseConv) module in a stage II. -
FIG. 6G illustrates an exemplary implementation and data/logic flows of the enhanced fully convolutional neural network ofFIG. 6F with further improvement. -
FIG. 7 illustrates an exemplary implementation and data/logic flows of the DenseConv module ofFIG. 6F . -
FIG. 8 illustrates a flow diagram of a method for training the exemplary implementation inFIG. 6F or 6G . -
FIG. 9 illustrates an exemplary implementation and data/logic flows of combining various fully convolutional neural networks. -
FIG. 10 shows an exemplary computer platform for segmenting digital images. -
FIG. 11 illustrates a computer system that may be used to implement various computing components and functionalities of the computer platform ofFIG. 10 . - A digital image may contain one or more regions of interest (ROIs). An ROI may include a particular type of object. In many applications, only image data within the ROIs contains useful or relevant information. As such, recognition of ROIs in a digital image and identification of boundaries for these ROIs using computer vision often constitute a critical first step before further image processing is performed. A digital image may contain multiple ROIs of a same type or may contain ROIs of different types. For example, a digital image may contain only human faces or may contain both human faces of and other objects of interest. Identification of ROIs in a digital image is often alternatively referred to as image segmentation. The term “digital image” may be alternatively referred to as “image”.
- An image may be a two-dimensional (2D) image. A 2D image includes pixels having two-dimensional coordinates, which may be denoted along an x-axis and a y-axis. The two-dimensional coordinates of the pixels may correspond to a spatial 2D surface. The spatial 2D surface may be a planar surface or a curved surface projected from a three-dimensional object.
- An image may have multiple channels. The multiple channels may be different chromatic channels, for example and not limited to, red-green-blue (RGB) color channels. The multiple channels may be different modality channels for a same object, representing images of the same object taking under different imaging conditions. For example, in conventional photography, different modalities may correspond to different combinations of focus, aperture, exposure parameters, and the like. For another example, in medical images based on Magnetic Resonance Imaging (MRI), different modality channels may include but are not limited to T2-weighted imaging (T2 W), diffusion weighted imaging (DWI), apparent diffusion coefficient (ADC) and K-trans channels.
- An image may be a three-dimensional (3D) image. A 3D image includes pixels having three-dimensional coordinates, which may be denoted along an x-axis, a y-axis, and a z-axis. The three-dimensional coordinates of the pixels may correspond to a spatial 3D space. For example, MRI images in each modality channel may be three dimensional, including a plurality of slices of 2D images.
- A 3D image may also have multiple channels, effectively forming a four-dimensional (4D) image. The 4D image including multiple channels may be referred to as pseudo 4D image. A 4D image includes pixels having four-dimensional coordinates, which may be denoted along an x-axis, a y-axis, a z-axis, and a channel-number.
- ROIs for an image, once determined, may be represented by a digital mask containing a same number of pixels as the digital image or down-sized number of pixels from the digital image. A digital mask may be alternatively referred as a mask or a segmentation mask. Each pixel of the mask may contain a value used to denote whether a particular corresponding pixel of the digital image is among any ROI, and if it is, which type of ROI among multiple types of ROIs does it fall. For example, if there is only a single type of ROI, a binary mask is sufficient to represent all ROIs. In particular, each pixel of the ROI mask may be either zero or one, representing whether the pixel is or is not among the ROIs. For a mask capable of representing multiple types of ROI, each pixel may be at one of a number of values each corresponding to one type of ROIs. A multi-value mask, however, may be decomposed into a combination of the more fundamental binary masks each for one type of ROI.
- For a 2D image, its mask may correspondingly be 2D, including pixels having two-dimensional coordinates. When an image includes multiple channels, its mask may nevertheless be a single combined mask, wherein the single mask corresponds to all the channels in the image. In some other embodiment, the mask of a multi-channel image may be a multi-channel mask, wherein each of the multiple channels of the mask corresponds to one or more channels of the multi-channel image.
- For a 3D image, its mask may correspondingly be a 3D mask, including pixels having three-dimensional coordinates along an x-axis, a y-axis, and a z-axis. When a 3D image includes multiple channels, its mask may be a three-dimensional mask having either a single channel or multiple channels, similar to the 2D mask described above.
- ROI masks are particularly useful for further processing of the digital image. For example, an ROI mask can be used as a filter to determine a subset of image data that are among particular types of ROIs and that need be further analyzed and processed. Image data outside of these particular types of ROIs may be removed from further analysis. Reducing the amount of data that need to be further processed may be advantageous in situations where processing speed is essential and memory space is limited. As such, automatic identification of ROIs in a digital image presents a technological problem to be overcome before further processing can be performed on the data which form the digital image.
- ROI identification and ROI mask generation, or image segmentation may be implemented in various applications, including but not limited to face identification and recognition, object identification and recognition, satellite map processing, and general computer vision and image processing. For example, ROI identification and segmentation may be implemented in medical image processing. Such medical images may include but are not limited to Computed Tomography (CT) images, Magnetic Resonance Imaging (MRI) images, ultrasound images, X-Ray images, and the like. In Computer-Aided Diagnosis (CAD), a single or a group of images may first be analyzed and segmented into ROIs and non-ROIs. One or more ROI masks, alternatively referred to as segmentation masks, may be generated. An ROI in a medical image may be specified at various levels depending on the applications. For example, an ROI may be an entire organ. As such, a corresponding binary ROI mask may be used to mask the location of the organ tissues and the regions outside of the ROI and that are not part of the organ. For another example, an ROI may represent a lesion in an organ or tissue of one or more particular types in the organ. These different levels of ROIs may be hierarchical. For example, a lesion may be part of an organ.
- The present disclosure may be particularly applied to different types of images by imaging various types of human tissues or organs to perform ROI identification and ROI mask generation and image segmentation, for example, including but not limited to, brain segmentation, pancreas segmentation, lung segmentation, or prostate segmentation.
- In one embodiment, MR images of prostate from one or more patient may be processed using computer aided diagnosis (CAD). Prostate segmentation for marking the boundary of the organ of prostate is usually the first step in a prostate MR image processing and plays an important role in computer aided diagnosis of prostate diseases. One key to prostate segmentation is to accurately determine the boundary of the prostate tissues, either normal or pathological. Because images of normal prostate tissues may vary in texture and an abnormal prostate tissue may additionally contain patches of distinct or varying texture and patterns, identification of prostate tissues using computer vision may be particularly challenging. Misidentifying a pathological portion of the prostate tissue as not being part of the prostate and masking it out from subsequent CAD analysis may lead to unacceptable false diagnostic negatives. Accordingly, the need to accurately and reliably identify an ROI in a digital image such as a medical image of a prostate or other organs is critical to proper medical diagnosis.
- Segmentation of images may be performed by a computer using a model developed using deep neural network-based machine learning algorithms. For example, a segmentation model may be based on Fully Convolutional Network (FCN) or Deep Convolutional Neural Networks (DCNN). Model parameters may be trained using labeled images. The image labels in this case may contain ground truth masks, which may be produced by human experts or via other independent processes. The FCN may contain only multiple convolution layers-. In an exemplary implementation, digital images of lungs may be processed by such neural networks for lung segmentation and computer aided diagnosis. During the training process, the model learns various features and patterns of lung tissues using the labeled images. These features and patterns include both global and local features and patterns, as represented by various convolution layers of the FCN. The knowledge obtained during the training process is embodied in the set of model parameters representing the trained FCN model. As such, a trained FCN model may process an input digital image with unknown mask and output a predicted segmentation mask. It is critical to architect the FCN to facilitate efficient and accurate learning of image features of relevance to a particular type of images.
- As
FIG. 1 shows, asystem 100 for CAD using a fully convolutional network (FCN) may include two stages: tissue segmentation and lesion detection. Afirst stage 120 performs tissue (or organ) segmentation from images 110 to generatetissue segmentation mask 140. Asecond stage 160 performs lesion detection from thetissue segmentation mask 140 and the images 110 to generate thelesion mask 180. In some implementations, the lesion mask, may be non-binary. For example, within the organ region of the image 110 as represented by thesegmentation mask 140, there may be multiple lesion regions of different pathological types. Thelesion mask 180 thus may correspondingly be non-binary as discussed above, and contains both special information (as to where the lesions are) and information as to the type of each lesion. According to thelesion mask 180, the system may generate a diagnosis with a certain probability for a certain disease. A lesion region may be portions of the organ containing cancer, tumor, or the like. In some embodiments, the system may only include the stage oftissue segmentation 120. In other embodiments, the system may only include the stage oflesion detection 160. In each stage of thefirst stage 120 or thesecond stage 160, an FCN of a slightly variation may be used. Thefirst stage 120 and thesecond stage 160 may use the same type of FCN, or may use different type of FCN. The FCNs forstage - In one exemplary implementation, the tissue or organ may be a prostate, and CAD of prostate may be implemented in two steps. The first step may include determining a segmentation boundary of prostate tissue by processing input MR prostate images, producing an ROI mask for the prostate; and the second step includes detecting a diseased portion of the prostate tissue, e.g., a prostate tumor or cancer, by processing the ROI masked MR images of the prostate tissue. In another embodiment, MR images of prostate from one or more patients may be used in computer aided diagnosis (CAD). For each patient, volumetric prostate MR images is acquired with multiple channels from multi-modalities including, e.g., T2 W, DWI, ADC and K-trans, where ADC maps may be calculated from DWI, and K-trans images may be obtained using dynamic contrast enhanced (DCE) MR perfusion.
- The FCN may be adapted to the form of the input images (either two-dimensional images or three-dimensional images, either multichannel or single-channel images). For example, the FCN may be adapted to include features of an appropriate number of dimensions for processing a
single channel - In one embodiment, for example, the images 110 may be one or more 2D images each with a single channel (SC). The
first stage 120 may use two-dimensional single-channel FCN (2D, SC) 122 to perform tissue segmentation to obtain thesegmentation mask 140 of an input image 110. Thesegmentation mask 140 may be a single 2D mask (or a mask with single channel). Thesecond stage 160 may use two-dimensional single-channel FCN (2D, SC) 162 to perform lesion detection. The FCN (2D, SC) 162 operates on the singlechannel segmentation mask 140 and the single channel image 110 (via arrow 190) to generate thelesion mask 180. Accordingly, thelesion mask 180 may be a single-channel 2D mask. - In a second embodiment, the images 110 may be one or more 2D images each with multiple channels (MC). The multiple channels may be different chromatic channels (e.g., red, blue, and green colors) or other different imaging modality channels (e.g., T2 W, DWI, ADC and K-trans for MR imaging). The
first stage 120 may use two-dimensional single-channel FCN (2D, SC) 122 to perform tissue segmentation of multiple channels of an input image 110 to obtainmulti-channel segmentation mask 140. In particular, since the two-dimensional single-channel FCN (2D, SC) 122 is used in the tissue segmentation, each channel of the multi-channel image 110 may be regarded as independent from other channels, and thus each channel may be processed individually. Therefore, thesegmentation mask 140 may be a 2D mask with multiple channels. Thesecond stage 160 may use two dimensional multi-channel FCN (2D, MC) 164 to perform lesion detection. The FCN (2D, MC) 164 may operate simultaneously on themulti-channel segmentation mask 140 and the multi-channel image 110 to generate thelesion mask 180. The two-dimensional multi-channel FCN (2D, MC) 164 may process themulti-channel mask 140 and the multi-channel image 110 in a combined manner to generate a single-channel lesion mask 180. - In a third embodiment, the images 110 may be one or more 2D images with multiple channels. The
first stage 120 may use two-dimensional multi-channel FCN (2D, MC) 124 to perform tissue segmentation of a multi-channel image 110 in a combined manner to obtain a single-channel segmentation mask 140. Thesecond stage 160 may use two-dimensional single-channel FCN (2D, SC) 162 to perform lesion detection in thesingle channel mask 140. In particular, the two-dimensional single-channel FCN (2D, SC) 162 operates on thesegmentation mask 140 and the multi-channel image 110 to generate amulti-channel lesion mask 180. - In a fourth embodiment, the images 110 may be one or more 2D images with multiple channels. The
first stage 120 may use two-dimensional multi-channel FCN (2D, MC) 124 to perform tissue segmentation on the multi-channel image 110 to obtain a single-channel segmentation mask 140. Thesecond stage 160 may use two-dimensional multi-channel FCN (2D, MC) 164 to perform lesion detection. The two-dimensional multi-channel FCN (2D, MC) 164 operates on the single-channel segmentation mask 140 and the multi-channel images 110 to generate a single-channel lesion mask 180. - In a fifth embodiment, the images 110 may be one or more 3D images with a single channel. The
first stage 120 may use three-dimensional single-channel FCN (3D, SC) 126 to perform tissue segmentation of a single channel image 110 to obtain a three dimensional single-channel segmentation mask 140. Thesecond stage 160 may use three-dimensional single-channel FCN (3D, SC) 166 to perform lesion detection. The FCN (3D, SC) 166 operates on the single-channel segmentation mask 140 and the single-channel image 110 to generate a three-dimensional single-channel lesion mask 180. - In a sixth embodiment, the images 110 may be one or more 3D images with multiple channels. The multiple channels may be different chromatic channels (e.g., red, blue, and green colors) or other different modality channels (e.g., T2 W, DWI, ADC and K-trans for MR imaging). The
first stage 120 may use three-dimensional single-channel FCN (3D, SC) 126 to perform tissue segmentation to obtain three-dimensionalmulti-channel segmentation mask 140. In particular, each channel may be regarded as independent from other channels, and thus each channel is processed individually by the FCN (3D, SC) 126. Thesecond stage 160 may use three-dimensional single-channel FCN (3D, SC) 166 to perform lesion detection. The FCN (3D, SC) 166 operates on themulti-channel segmentation mask 140 and the multi-channel images 110 to generate thelesion mask 180. Since single-channel FCN (3D, SC) 166 is used in the lesion detection, each channel of the multiple channels may be processed independent. Therefore, thelesion mask 180 may be 3D multi-channel mask. - In a seventh embodiment, the images 110 may be one or more 3D images with multiple channels. The
first stage 120 may use three-dimensional single-channel FCN (3D, SC) 126 to perform tissue segmentation to obtain thesegmentation mask 140. Since FCN (3D, SC) 126 is used in the tissue segmentation, each channel may be regarded as independent from other channels, and thus each channel may be processed individually. Therefore, thesegmentation mask 140 may be three-dimensional multi-channel mask. Thesecond stage 160 may use three-dimensional multi-channel FCN (3D, MC) 168 to perform lesion detection. The FCN (3D, MC) 168 operates on themulti-channel segmentation mask 140 and the multi-change image 110 to generate thelesion mask 180. Since multi-channel FCN (3D, MC) 168 is used in the lesion detection, multiple channels may be processed in a combined/aggregated manner. Therefore, thelesion mask 180 may be three-dimensional single channel mask. - In an eighth embodiment, the images 110 may be one or more 3D images with multiple channels. The
first stage 120 may use three-dimensional multi-channel FCN (3D, MC) 128 to perform tissue segmentation of the 3D multi-channel image 110 in an aggregated manner to obtain a three-dimensional single-channel segmentation mask 140. Thesecond stage 160 may use three-dimensional single-channel FCN (3D, SC) 166 to perform lesion detection. The FCN (3D, SC) 166 operates on the single-channel segmentation mask 140 and the multi-channel images 110 to generate three-dimensionalmulti-channel lesion mask 180. - In a ninth embodiment, the images 110 may be one or more 3D images with multiple channels. The
first stage 120 may use three-dimensional multi-channel FCN (3D, MC) 128 to perform tissue segmentation of the multi-channel image 110 in an aggregated manner to obtain three-dimensional single-channel segmentation mask 140. Thesecond stage 160 may use three-dimensional multi-channel FCN (3D, MC) 168 to perform lesion detection. The FCN (3D, MC) 168 operates on the single-channel segmentation mask 140 and the multi-channel image 110 to generate three-dimensional single-channel lesion mask 180 in an aggregated manner. - Optionally, when the total z-axis depth of a 3D image 110 is relatively shallow, a 2D mask may be sufficient to serve as the mask for the 3D image having different values along z-axis. In one exemplary implementation, for a 3D image 110 with a shallow total z-axis range, a
first stage 120 may use a modified FCN (3D, SC) to perform tissue segmentation to obtain thesegmentation mask 140, where thesegmentation mask 140 180 may be a 2D mask. - The various two or three-dimensional and single- or multi-channel FCN models discussed above are further elaborated below.
- Segmentation of images and/or target object (e.g., lesion) detection may be performed by a computer using a model developed using deep neural network-based machine learning algorithms. For example, a model may be based on Fully Convolutional Network (FCN), Deep Convolutional Neural Networks (DCNN), U-net, or V-Net. Hereinafter, the term “FCN” is used in generally refer to any CNN-based model.
- Model parameters may be trained using labeled images. The image labels in this case may contain ground truth masks, which may be produced by human experts or via other independent processes. The FCN may contain multiple convolution layers. An exemplary embodiment involves processing digital images of lung tissues for computer aided diagnosis. During the training process, the model learns various features and patterns of, e.g., lung tissues or prostate tissues, using the labeled images. These features and patterns include both global and local features and patterns, as represented by various convolution layers of the FCN. The knowledge obtained during the training process are embodied in the set of model parameters representing the trained FCN model. As such, a trained FCN model may process an input digital image with unknown mask and output a predicted segmentation mask.
- The present disclosure describes an FCN with different variations. The FCN is capable of performing tissue segmentation or lesion detection on 2D or 3D images with single or multiple channels. 3D images are used as an example in the embodiment below. 2D images can be processed similarly as 2D images may be regarded as 3D images with one z-slice, i.e, a single z-axis value.
- An exemplary FCN model for predicting a segmentation/lesion mask for an input digital image is shown as
FCN model 200 inFIG. 2 . TheFCN model 200 may be configured to processinput 2D/3D images 210. As an example for the illustration below, the input image may be 3D and may be of anexemplary size 128×128×24, i.e., 3D images having a size of 128 along x-axis, a size of 128 along y-axis, and a size of 24 along z-axis. - A down-
sampling path 220 may comprise a contraction neural network which gradually reduces the resolution of the 2D/3D images to generate feature maps with smaller resolution and larger depth. For example, theoutput 251 from the down-sampling part 220 may be feature maps of 16×16×3 with depth of 128, i.e., the feature maps having a size of 16 along x-axis, a size of 16 along y-axis, a size of 3 along z-axis, and a depth of 128 (exemplary only, and corresponding to number of features). - The down-
sampling path 220 is contracting because it processes an input image into feature maps that progressively reduce their resolutions through one or more layers of the down-sampling part 220. As such, the term “contraction path” may be alternatively referred to as “down-sampling path”. - The
FCN model 200 may include an up-sampling path 260, which may be an expansion path to generate high-resolution feature maps for voxel-level prediction. Theoutput 251 from the down-sampling path 220 may serve as an input to the up-sampling path 260. Anoutput 271 from the up-sampling path 260 may be feature maps of 128×128×24 with depth of 16, i.e., the feature maps having a size of 128 along x-axis, a size of 128 along y-axis, a size of 24 along z-axis, and a depth of 16. - The up
sampling path 260 processes these feature maps in one or more layers and in an opposite direction to that of the down-sampling path 220 and eventually generates asegmentation mask 290 with a resolution similar or equal to the input image. As such, the term “expansion path” may be alternatively referred to as “up-sampling path”. - The
FCN model 200 may include aconvolution step 280, which is performed on the output of the up-sampling stage with highest resolution to generatemap 281. The convolution operation kernel instep 280 may be 1×1×1 for 3D images or 1×1 for 2D images. - The
FCN model 200 may include a rectifier, e.g.,sigmoid step 280, which takemap 281 as input to generate voxel-wise binary classificationprobability prediction mask 290. - Depending on specific applications, the down-
sampling path 220 and up-sampling path 260 may include any number of convolution layers, pooling layers, or de-convolutional layers. For example, for a training set having a relatively small size, there may be 6-50 convolution layers, 2-6 pooling layers and 2-6 de-convolutional layers. Specifically as an example, there may be 15 convolution layers, 3 pooling layers and 3 de-convolutional layers. The size and number of features in each convolutional or de-convolutional layer may be predetermined and are not limited in this disclosure. - The
FCN model 200 may optionally include one ormore steps 252 connecting the feature maps within the down-sampling path 220 with the feature maps within the up-sampling path 260. Duringsteps 252, the output of de-convolution layers may be fed to the corresponding feature maps generated in the down-sampling path 220 with matching resolution (e.g., by concatenation, as shown below in connection withFIG. 4 ).Steps 252 may provide complementary high-resolution information into the up-sampling path 260 to enhance the final prediction mask, since the de-convolution layers only take coarse features from low-resolution layer as input. Feature maps at different convolution layer and with different level of resolution from the contraction CNN may be input into the expansion CNN as shown by thearrow 252. - The model parameters for the FCN include features or kernels used in various convolutional layers for generating the feature maps, their weight and bias, and other parameters. By training the FCN model using images labeled with ground truth segmentation masks, a set of features and other parameters may be learned such that patterns and textures in an input image with unknown label may be identified. In many circumstances, such as medical image segmentation, this learning process may be challenging due to lack of a large number of samples of certain important texture or patterns in training images relative to other texture or patterns. For example, in a lung image, disease image patches are usually much fewer than other normal image patches and yet it is extremely critical that these disease image patches are correctly identified by the FCN model and segmented as part of the lung. The large number of parameters in a typical multilayer FCN tend to over-fit the network even after data augmentation. As such, the model parameters and the training process are preferably designed to reduce overfitting and promote identification of features that are critical but scarce in the labeled training images.
- Once the
FCN 200 described above is trained, an input image may then be processed through the down-sampling/contraction path 220 and up-sampling/expansion path 260 to generate a predicted segmentation mask. The predicted segmentation mask may be used as a filter for subsequent processing of the input image. - The training process for the
FCN 200 may involve forward-propagating each of a set of training images through the down-sampling/contraction path 220 and up-sampling/expansion path 260. The set of training images are each associated with a label, e.g., aground truth mask 291. The training parameters, such as all the convolutional features or kernels, various weights, and bias may be, e.g., randomly initialized. The output segmentation mask as a result of the forward-propagation may be compared with theground truth mask 291 of the input image. Aloss function 295 may be determined. In one implementation, theloss function 295 may include a soft max cross-entropy loss. In another implementation, theloss function 295 may include dice coefficient (DC) loss function. Other types of loss function are also contemplated. Then a back-propagation through theexpansion path 260 and then thecontraction path 220 may be performed based on, e.g., stochastic gradient descent, and aimed at minimizing theloss function 295. By iterating the forward-propagation and back-propagation for the same input images, and for the entire training image set, the training parameters may converge to provide acceptable errors between the predicted masks and ground truth masks for all or most of the input images. The converged training parameters, including but not limited to the convolutional features/kernels and various weights and bias, may form a final predictive model that may be further verified using test images and used to predict segmentation masks for images that the network has never seen before. The model is preferably trained to promote errors on the over-inclusive side to reduce or prevent false negatives in later stage of CAD based on a predicted mask. -
FIG. 3 describes one specific example of FCN. The working of the exemplary implementation ofFIG. 3 is described below. More detailed description is included in U.S. application Ser. No. 15/943,392, filed on Apr. 2, 2018 by the same Applicant as the present application, which is incorporated herein by reference in its entirety. - As shown in
step 310 ofFIG. 3 , 2D/3D images may be fed into the FCN. The 2D/3D images may be MR image patches withsize 128×128×24. - The 2D/
3D images 311 may be fed intostep 312, which includes one or more convolutional layers, for example and not limited to, two convolutional layers. In each convolutional layer, a kernel size may be adopted and number of filters may increase or be the same. For example, a kernel size of 3×3×3 voxels may be used in the convolution sub-step instep 312. Each convolutional sub-step is followed by batch normalization (BN) and rectified-linear unit (ReLU) sub-step. - Number of features at each convolution layer may be predetermined. For example, in the particular implementation of
FIG. 3 , the number of features forstep 312 may be 16. The output of convolution between the input and features may be processed by a ReLU to generate stacks of feature maps. The ReLUs may be of any mathematical form. For example, the ReLUs may include but are not limited to noisy ReLUs, leaky ReLUs, and exponential linear units. - After each convolution of an input with a feature in each layer, the number of pixels in a resulting feature map may be reduced. For example, a 100-pixel by 100-pixel input, after convolution with a 5-pixel by 5-pixel feature sliding through the input with a stride of 1, the resulting feature map may be of 96 pixel by 96 pixel. Further, the input image or a feature map may be, for example, zero padded around the edges to 104-pixel by 104-pixel such that the output feature map is the 100-pixel by 100-pixel resolution. The example below may use zero padding method, so that the resolution does not change before and after convolution function.
- The feature maps from
step 312 may be max-pooled spatially to generate the input to a next layer below. The max-pooling may be performed using any suitable basis, e.g., 2×2, resulting in down-sampling of an input by a factor of 2 in each of the two spatial dimensions. In some implementations, spatial pooling other than max-pooling may be used. Further, different layers of 314, 324, and 334 may be pooled using a same basis or different basis, and using a same or different pooling method. - For example, after max-pooling by a factor of 2, the feature maps 321 have a size of 64×64×12. Since the number of features used in
step 312 is 16. The depth of the feature maps 321 may be 16. - The
feature map 321 may be fed intonext convolution layer 322 with another kernel and number of features. For example, the number of features instep 322 may benumber 32. - In the example shown in
FIG. 3 , the output fromstep 324 may be feature maps 331 having a size of 32×32×6. Since the number of features used instep 322 is 32. The depth of the feature maps 331 may be 32. - The
feature map 331 may be fed intonext convolution layer 332 with another kernel and number of features. For example, the number of features instep 332 may benumber 64. - In the example shown in
FIG. 3 , the output fromstep 334 may be feature maps 341 having a size of 16×16×3. Since the number of features used instep 332 is 64. The depth of the feature maps 341 may be 64. - The
feature map 341 may be fed intonext convolution layer 342 with another kernel and number of features. For example, the number of features instep 342 may benumber 128. - In the example shown in
FIG. 3 , the output fromstep 342 may be feature maps 351 having a size of 16×16×3. There is no size reduction betweenfeature maps 341 andfeature maps 351 because there is no max-pooling steps in between. Since the number of features used instep 342 is 128. The depth of the feature maps 351 may be 128. - The feature maps 351 of the
final layer 342 of the down-sampling path may be input into an up-sampling path and processed upward through the expansion path. The expansion path, for example, may include one or more de-convolution layers, corresponding to convolution layers on the contraction path, respectively. The up-sampling path, for example, may involve increasing the number of pixels for feature maps in each spatial dimension, by a factor of, e.g., 2, but reducing the number of feature maps, i.e., reducing the depth of the feature maps. This reduction may be by a factor, e.g., 2, or this reduction may not be by an integer factor, for example, the previous layer has a depth of 128, and the reduced depth is 64. - The feature maps 351 may be fed into a
de-convolution operation 364 with 2×2×2 trainable kernels to increase/expand the size of input feature maps by a factor of 2, the output of thede-convolution operation 364 may be feature maps 361 having a size of 32×32×6. Since the number of features used in thede-convolution operation 364 is 64, the depth of the feature maps 361 is 64. - The feature maps 361 may be fed into a
step 362. Thestep 362 may include one or more convolution layers. In the example shown inFIG. 3 , thestep 362 includes two convolution layers. Each of the convolution layer includes a convolution function sub-step followed by batch normalization (BN) and rectified-linear unit (ReLU) sub-step. - Optionally, as
connections FIG. 3 show, at each expansion layer of 362, 372, and 382, the feature maps from the down-sampling path may be concatenated with the feature maps in the up-sampling path, respectively. Taking 352 c as an example, the feature maps instep 332 of the down-sampling path has a size of 32×32×6 and a depth of 64. The feature maps 361 of the up-sampling path has a size of 32×32×6 and a depth of 64. These two feature maps may be concatenated together to form new feature maps having a size of 32×32×6 and a depth of 128. The new feature maps may be fed as the input into thestep 362. Theconnection 352 c may provide complementary high-resolution information, since the de-convolution layers only take course features from low-resolution layers as input. - The output feature maps of
step 362 may be fed intode-convolution layer 374. The de-convolution layer may have atrainable kernel 2×2×2 to increase/expand the size of the input feature maps by a factor of 2. The output of thede-convolution operation 374 may be feature maps 371 having a size of 64×64×12. Since the number of features used in thede-convolution operation 374 is 32, the depth of the feature maps 371 is 32. - Optionally, as the
connection 352 b inFIG. 3 shows, the feature maps fromstep 322 in the down-sampling path may be concatenated with the feature maps 371. The feature maps instep 322 of the down-sampling path has a size of 64×64×12 and a depth of 32. The feature maps 371 of the up-sampling path has a size of 64×64×12 and a depth of 32. These two feature maps may be concatenated together to form new feature maps having a size of 64×64×12 and a depth of 64. The new feature maps may be fed as the input into thestep 372. - The output feature maps of
step 372 may be fed intode-convolution layer 384. The de-convolution layer may have atrainable kernel 2×2×2 to increase/expand the size of the input feature maps by a factor of 2. The output of thede-convolution operation 384 may be feature maps 381 having a size of 128×128×24. Since the number of features used in thede-convolution operation 384 is 16, the depth of the feature maps 381 is 16. - Optionally, as the
connection 352 a inFIG. 3 shows, the feature maps fromstep 312 in the down-sampling path may be concatenated with the feature maps 381. The feature maps instep 312 of the down-sampling path has a size of 128×128×24 and a depth of 16. The feature maps 371 of the up-sampling path has a size of 128×128×24 and a depth of 16. These two feature maps may be concatenated together to form new feature maps having a size of 128×128×24 and a depth of 32. The new feature maps may be fed as the input into thestep 382. - The output figure maps from
step 382 may be fed into aconvolution step 390, which perform on the feature maps with highest resolution to generatefeature maps 391. The feature maps 391 may have a size of 128×128×24 and a depth of one. The convolution operation kernel instep 390 may be 1×1×1 for 3D images or 1×1 for 2D images. - The feature maps 391 may be fed into a
sigmoid step 392, which takefeature maps 391 as input to generate voxel-wise binary classification probabilities, which may be further determined to be the predicted segmentation mask for the tissue or lesion. -
FIG. 4 describes another embodiment of FCN for segmentation of 2D/3D images with either single or multiple channels. In the example described below, 3D images with multiple channels may be taken as an example. However, the FCN model in this disclosure is not limited to 3D images with multiple channels (FCN (3D, MC)). For example, the FCN model in this embodiment may be applied to 2D images with multiple channels (FCN(2D, MC)), 3D images with single channel (FCN(3D, SC)), or 2D images with single channel (FCN(2D, SC)), as shown inFIG. 1 . - In
FIG. 4 , theinput 2D/3D images withmultiple channels 401 may include 3D images having a size of 128×128×24 with three channels. The three channels may be multi-parametric modalities in MR imaging including T2W, T1 and DWI with highest b-value. Or in some other situations, the three channels may include red, green, and blue chromatic channels. In one embodiment, each of the three channels may be processed independently. In another embodiment, the three channels may be processed collectively, and as such, the first convolutional layer operating on theinput images 401 has a 3×3×3×3 convolutional kernels. - The FCN model in
FIG. 4 may include a down-sampling/contracting path and a corresponding up-sampling/expansion path. The down-sampling/contracting path may include one or moreconvolutional layers 412. - The
convolutional layer 412 may comprise 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 412 may be 32, so that the feature maps of theconvolution layer 412 may have a size of 128×128×24 and a depth of 32. - The FCN model in
FIG. 4 may includeconvolution layers 414 with strides greater than 1 for gradually reducing the resolution of feature maps and increase the receptive field to incorporate more spatial information. For example, theconvolution layer 414 may have a stride of 2 along x-axis, y-axis, and z-axis, so that the resolution along x-axis, y-axis, and z-axis all reduced by a factor of 2. The number of features in theconvolution layer 414 may be 64, so that the output feature maps 421 of theconvolution layer 414 may have a size of 64×64×12 and a depth of 64. - The feature map may be fed into one or more convolution layers 422. Each of the
convolutional layers 422 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 422 may be 64, so that the output feature maps 423 of theconvolution layer 422 may have a size of 64×64×12 and a depth of 64. - At each resolution level, input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function. As described by operator ⊕ in
FIG. 4 , the feature maps 421 and 423 may be added together. The output feature maps 425 of operator ⊕ may be a summation offeature maps - The feature maps 425 may be fed into a
convolution layer 424 with a stride of 2 along x-axis, y-axis, and z-axis, so that the resolution along x-axis, y-axis, and z-axis is reduced by a factor of 2. The number of features in theconvolution layer 424 may be 128, so that the output feature maps 431 of theconvolution layer 424 may have a size of 32×32×6 and a depth of 128. - The
feature map 431 may be fed into one or more convolution layers 432. Each of theconvolutional layers 432 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 432 may be 128, so that the output feature maps 433 of the convolution layers 432 may have a size of 32×32×6 and a depth of 128. - At each resolution level, input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function. As described by the operator ⊕ in
FIG. 4 , the feature maps 431 and 433 may be added together. The output feature maps 435 of operator ⊕ may be an element by element summation offeature maps - The feature maps 435 may be fed into a
convolution layer 434 with a stride of 2 along x-axis, y-axis, and z-axis, so that the resolution along x-axis, y-axis, and z-axis isl reduced by a factor of 2. The number of features in theconvolution layer 434 may be 256, so that the output feature maps 441 of theconvolution layer 434 may have a size of 16×16×3 and a depth of 256. - The
feature map 441 may be fed into one or more convolution layers 442. Each of theconvolutional layers 442 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 442 may be 256, so that the output feature maps 443 of the convolution layers 442 may have a size of 16×16×3 and a depth of 256. - Again, at each resolution level, input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function. As described by operator ⊕ in
FIG. 4 , the feature maps 441 and 443 may be added together. The output feature maps 445 of the operator ⊕ may be an element-by-element summation offeature maps - Optionally, the feature maps 445 may be fed into a
convolution layer 444 with a stride of 2 along x-axis and y-axis, and with a stride of 1 along z-axis, so that the resolution along x-axis and y-axis is reduced by a factor of 2 and the resolution along z-axis remains the same. The number of features in theconvolution layer 444 may be 512, so that the output feature maps 451 of theconvolution layer 444 may have a size of 8×8×3 and a depth of 512. - The
feature map 451 may be fed into one or more convolution layers 452. Each of theconvolutional layers 452 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 452 may be 512, so that the output feature maps 453 of the convolution layers 452 may have a size of 8×8×3 and a depth of 512. - At each resolution level, input feature maps may be directly added to output feature maps, which enables the stacked convolutional layers to learn a residual function. As described by operator ⊕ in
FIG. 4 , the feature maps 451 and 453 may be added together. The output feature maps 455 of operator ⊕ may be an element-by-element summation offeature maps - The FCN model in
FIG. 4 may include a corresponding up-sampling/expansion path to increase the resolution of feature maps generated from the down-sampling/contracting path. Optionally, in the up-sampling/expansion path, the feature maps generated in the contracting path may be concatenated with the output of de-convolutional layers to incorporate high-resolution information. - The up-sampling/expansion path may include a
de-convolution layer 464. Thede-convolution layer 464 may de-convolute input feature maps 455 to increase its resolution by a factor of 2 along x-axis and y-axis. The output feature maps 461 of thede-convolution layer 464 may have a size of 16×16×3. The number of features in thede-convolution layer 464 may be 256, so that the output feature maps 461 of theconvolution layer 464 may have a depth of 256. - As denoted by operator “©” in
FIG. 4 , the feature maps 443 generated from theconvolution layer 442 may be concatenated with the feature maps 461 generated from thede-convolution layer 464. The feature maps 443 may have a size of 16×16×3 and a depth of 256. The feature maps 461 may have a size of 16×16×3 and a depth of 256. Thecascaded feature maps 463 may correspondingly have a size of 16×16×3 and a depth of 512. - The
feature map 463 may be fed into one or more convolution layers 462. Each of theconvolutional layers 462 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 462 may be 256, so that the output feature maps 465 of the convolution layers 462 may have a size of 16×16×3 and a depth of 256. - As described by the operator ⊕ 466 in
FIG. 4 , the feature maps 461 and 465 may be added together. The output feature maps 467 of the operator ⊕ 466 may be an element-by-element summation offeature maps - The feature maps 467 may be fed into a
de-convolution layer 474. Thede-convolution layer 474 may de-convolute input feature maps 467 to increase its resolution by a factor of 2 along x-axis, y-axis and z-axis. The output feature maps 471 of thede-convolution layer 474 may have a size of 32×32×6. The number of features in thede-convolution layer 474 may be 128, so that the output feature maps 471 of theconvolution layer 474 may have a depth of 128. - As denoted by operator “©” in
FIG. 4 , the feature maps 433 generated from theconvolution layer 432 may be concatenated with the feature maps 471 generated from thede-convolution layer 474. The feature maps 433 may have a size of 32×32×6 and a depth of 128. The feature maps 471 may have a size of 32×32×6 and a depth of 128. The concatenated feature maps 473 may correspondingly have a size of 32×32×6 and a depth of 256. - The
feature map 473 may be fed into one or more convolution layers 472. Each of theconvolutional layers 472 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 472 may be 128, so that the output feature maps 475 of the convolution layers 472 may have a size of 32×32×6 and a depth of 128. - As described by operator ⊕ 476 in
FIG. 4 , the feature maps 471 and 475 may be added together. The output feature maps 477 of the operator ⊕ 476 may be an element-by-element summation offeature maps - The feature maps 477 may be fed into a
de-convolution layer 484. Thede-convolution layer 484 may de-convolute input feature maps 477 to increase its resolution by a factor of 2 along x-axis, y-axis and z-axis. The output feature maps 481 of thede-convolution layer 484 may have a size of 64×64×12. The number of features in thede-convolution layer 484 may be 64, so that the output feature maps 481 of theconvolution layer 484 may have a depth of 64. - As denoted by operator “©” in
FIG. 4 , the feature maps 423 generated from theconvolution layer 422 may be concatenated with the feature maps 481 generated from thede-convolution layer 484. The feature maps 423 may have a size of 64×64×12 and a depth of 64. The concatenated feature maps 481 may have a size of 64×64×12 and a depth of 64. The feature maps 483 may correspondingly have a size of 64×64×12 and a depth of 128. - The
feature map 483 may be fed into one or more convolution layers 482. Each of theconvolutional layers 482 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 482 may be 64, so that the output feature maps 485 of the convolution layers 482 may have a size of 64×64×12 and a depth of 64. - As described by operator ⊕ 486 in
FIG. 4 , the feature maps 481 and 485 may be added together. The output feature maps 487 of ⊕ 486 may be an element-by-element summation offeature maps - The feature maps 487 may be fed into a
de-convolution layer 494. Thede-convolution layer 494 may de-convolute input feature maps 487 to increase its resolution by a factor of 2 along x-axis, y-axis and z-axis. The output feature maps 491 of thede-convolution layer 494 may have a size of 128×128×24. The number of features in thede-convolution layer 494 may be 32, so that the output feature maps 491 of theconvolution layer 494 may have a depth of 32. - As denoted by operator “©” in
FIG. 4 , the feature maps 413 generated from theconvolution layer 412 may be concatenated with the feature maps 491 generated from thede-convolution layer 494. The feature maps 413 may have a size of 128×128×24 and a depth of 32. The feature maps 491 may have a size of 128×128×24 and a depth of 32. The concatenated feature maps 493 may correspondingly have a size of 128×128×24 and a depth of 64. - The
feature map 493 may be fed into one or more convolution layers 492. Each of theconvolutional layers 492 may comprise a 3×3×3 convolution kernels, batch normalization (BN), and ReLU activation function to extract high-level features. The number of features in theconvolution layer 492 may be 32, so that the output feature maps 495 of the convolution layers 492 may have a size of 128×128×24 and a depth of 32. - As described by operator ⊕ 496 in
FIG. 4 , the feature maps 491 and 495 may be added together. The output feature maps 497 of ⊕ 496 may be an element-by-element summation offeature maps - The feature maps 497 may be fed into a
convolutional layer 498 to generate voxel-wise binary classification probabilities. Theconvolutional layer 498 may include a 1×1×1 kernel and sigmoid activation function. - During training phase, Dice loss may be adopted as the objective function, and the dice loss may be expressed as
-
- Where gi and pi are the ground truth label and predicted label respectively. Post-processing steps may be then applied to refine the initial segmentation generated by the FCN model. In some embodiments, specifically, a 3D Gaussian filter may be used to smooth the predicted probability maps and a connected component analysis may be used to remove small isolated components.
- The present disclosure describes an FCN with coarse-to-fine architecture, which takes advantages of residual learning and deep supervision, in order to improve the segmentation performance and training efficiency.
- One embodiment of such an FCN model is shown as 500 in
FIG. 5 . The FCN model includes a down-sampling/contraction path 504 and an up-sampling/expansion path 508, similar to the model described inFIG. 3 . Auxiliaryconvolutional layers maps expansion path 508, in order to generate single feature maps which are then up-sampled and fed into a sigmoid function to obtainauxiliary predictions 559. - The
auxiliary predictions 559 may be used to further determine the mask for the organ or lesion. During the training process using input images for the FCN model inFIG. 5 , the output segmentation mask generated from theauxiliary predictions 559 may be compared with the ground truth mask of the input images. A loss function may be determined. In one implementation, the loss function may include a soft max cross-entropy loss. In another implementation, the loss function may include dice coefficient (DC) loss function. - The
input 2D/3D images 501 inFIG. 5 may be, for example and not limited to, 3D images having a size of 128×128×24. In theexpansion path 508, the feature maps 521 may have a size of 16×16×3 and a depth of 128; the feature maps 531 may have a size of 32×32×6 and a depth of 64; the feature maps 541 may have a size of 64×64×12 and a depth of 32; and the feature maps 551 may have a size of 128×128×24 and a depth of 16. - The feature maps 521 may be fed into an auxiliary
convolutional layer 522. The auxiliaryconvolutional layer 522 may have a kernel size of 1×1×1 for 3D images or 1×1 for 2D images. The output feature maps 523 may have a size of 16×16×3 and a depth of one. The output feature maps 523 may be fed into ade-convolutional layer 524 to increase the resolution, so as to generatefeature maps 527 having a size of 32×32×6. - The feature maps 531 may be fed into an auxiliary
convolutional layer 532. The auxiliaryconvolutional layer 532 may have a kernel size of 1×1×1 for 3D images or 1×1 for 2D images. The output feature maps 533 may have a size of 32×32×6 and a depth of one. The output feature maps 533 may be added onto the feature maps 527 in an element-by-element manner to generatefeature maps 535. The feature maps 535 may have a size of 32×32×6 and a depth of one. The feature maps 535 may be fed into ade-convolutional layer 536 to increase the resolution, so as to generatefeature maps 537 having a size of 64×64×12. - The feature maps 541 may be fed into an auxiliary
convolutional layer 542. The auxiliaryconvolutional layer 542 may have a kernel size of 1×1×1 for 3D images or 1×1 for 2D images. The output feature maps 543 may have a size of 64×64×12 and a depth of one. The output feature maps 543 may be added onto the feature maps 537 element-by-element to generatefeature maps 545. The feature maps 545 may have a size of 64×64×12 and a depth of one. The feature maps 545 may be fed into ade-convolutional layer 546 to increase the resolution, so as to generatefeature maps 547 having a size of 128×128×24. - The feature maps 551 may be fed into an auxiliary
convolutional layer 552. The auxiliaryconvolutional layer 552 may have a kernel size of 1×1×1 for 3D images or 1×1 for 2D images. The output feature maps 553 may have a size of 128×128×24 and a depth of one. The output feature maps 553 may be added onto the feature maps 547 in element-by-element to generatefeature maps 555. The feature maps 555 may have a size of 128×128×64 and a depth of one. - The feature maps 555 may be fed into a sigmoid function, which take
feature maps 555 as input to generate auxiliary 4 (559), which includes voxel-wise binary classification probabilities and may be further processed to be the mask for the tissue or lesion. - Another enhanced embodiment of an
FCN model 600 is shown inFIG. 6A . In theFCN model 600, the feature maps with reduced resolutions in the up-sampling/expansion path may be fed into auxiliary convolutional layers, up-sampled to an original resolution, and then fed into a sigmoid function to generate their corresponding auxiliary predictions. The one or more auxiliary predictions may be combined together to generate the mask for the tissue or lesion. - In particular, the feature maps 521 may be fed into an auxiliary
convolutional layer 522 to generatefeature maps 523 having a size of 16×16×3 and a depth of one. The output feature maps 523 may be fed into one or more de-convolutional layers to generatefeature maps 528 and expand the resolution from 16×16×3 to 128×128×24. In the embodiment as shown inFIG. 6A , the one or more de-convolutional layers with respect to feature map 521 (lowest resolution layer in the expansion network) may include three de-convolutional layers and each of the three de-convolutional layers may expand the resolution by a factor of 2 in x-axis, y-axis and z-axis. In another embodiment, the one or more de-convolutional layers with respect to feature map 521 (lowest resolution layer in the expansion network) may include one de-convolutional layer, which may expand the resolution by a factor of 8 in x-axis, y-axis and z-axis. As such, the feature maps 528 may recover the full resolution and may be then fed into a sigmoid function to obtain auxiliary 1 (529). - The feature maps 535 at the second lowest resolution layer may be fed into one or more de-convolutional layers to generate
feature maps 538 and expand the resolution from 32×32×6 to 128×128×24 (full resolution). In the embodiment as shown inFIG. 6A , the one or more de-convolutional layers here may include two de-convolutional layers and each of the two de-convolutional layers may expand the resolution by a factor of 2 in x-axis, y-axis and z-axis. In another embodiment, the one or more de-convolutional layers may include one de-convolutional layer, which may expand the resolution by a factor of 4 in x-axis, y-axis and z-axis. The feature maps 538 may be fed into a sigmoid function to obtain auxiliary 2 (539). - The feature maps 545 at the second highest resolution layer may be fed into a de-convolutional layer to generate
feature maps 548 and expand the resolution from 64×64×12 to 128×128×24 (full resolution). In one embodiment as shown inFIG. 6A , the de-convolutional layer may expand the resolution by a factor of 2 in x-axis, y-axis and z-axis. The feature maps 548 may be fed into a sigmoid function to obtain auxiliary 3 (549). - The one or more
auxiliary predictions FIG. 6A may be combined together to generate the mask for the tissue or lesion. The manner of combination will be discussed in more detail below. - For example, one implementation for combining the
auxiliary predictions FIG. 6B . In particular, theauxiliary predictions step 620 to generate a concatenatedauxiliary prediction 621. The concatenatedauxiliary prediction 621 may have a size of 128×128×24 and a depth of four. The concatenatedauxiliary prediction 621 may be fed into a convolutional layer with kernel size of 1×1×1 to generateauxiliary prediction 623. Theauxiliary prediction 623 may have a size of 128×128×24 and a depth of one. Theauxiliary prediction 623 may be fed into a sigmoid function to obtainfinal prediction 629. - The
final prediction 629 may be used to further determine the mask for the tissue or lesion. During the training process of input images for the FCN model inFIG. 6B , the output segmentation mask generated from theprediction 629 may be compared with the ground truth mask of the input images. A loss function may be determined. In one implementation, the loss function may include a soft max cross-entropy loss. In another implementation, the loss function may include dice coefficient (DC) loss function. - The implementation of
FIG. 6B may be further augmented as shown inFIG. 6C , where an auto-context strategy may be further used in the FCN model. Specifically, theinput 2D/3D images 501 in addition to theauxiliary predictions step 620 to generate a combinedauxiliary prediction 621. The concatenatedauxiliary prediction 621 may have a size of 128×128×24 and a depth of five (compared with the depth of 4 for the implementation ofFIG. 6B ). The concatenatedauxiliary prediction 621 may be fed into a convolutional layer with kernel size of 1×1×1 to generateauxiliary prediction 623. Theauxiliary prediction 623 may have a size of 128×128×24 and a depth of one. Theauxiliary prediction 623 may be fed into a sigmoid function to obtainfinal prediction 629. Thefinal prediction 629 may be used to further determine the mask for the tissue or lesion. - Another implementation for combining
auxiliary predictions FIG. 6D where an element-by-element summation rather than concatenation is used. Specifically, theauxiliary predictions step 640 element-by-element to generate a combinedauxiliary prediction 641. The combinedauxiliary prediction 641 may have a size of 128×128×24 and a depth of one. The combinedauxiliary prediction 641 may be fed into asigmoid function 642 to obtainfinal prediction 649. Thefinal prediction 649 may be used to further determine the mask for the tissue or lesion. - The implementation of
FIG. 6D may be further augmented as shown inFIG. 6E . Specifically, theinput 2D/3D images 501 in addition to theauxiliary predictions step 640 in an element-by-element manner to generate a combinedauxiliary prediction 641. The combinedauxiliary prediction 641 may have a size of 128×128×24 and a depth of one. The combinedauxiliary prediction 641 may be fed into asigmoid function 642 to obtainfinal prediction 649. Thefinal prediction 629 may be used to further determine the mask for the tissue or lesion of the input image. - Fully Convolutional Network (FCN) with Coarse-To-Fine Architecture and Densely Connected Convolutional Module
- Another implementation of the FCN based on
FIG. 6A is shown inFIG. 6F , where the combination of the auxiliary prediction masks 529, 539, 549, and 559 may be further processed by a densely connected convolutional (DenseConv) network to extract auto-context features prior to obtaining thefinal prediction mask 669. - Particularly in
FIG. 6F , theauxiliary predictions step 660 to generate a concatenated auxiliary prediction, also referred to as auto-context input 661. The concatenatedauxiliary prediction 661 may have a size of 128×128×24 and a depth of four. The auto-context input 661 may be fed into aDenseConv module 662 to generate aprediction map 663. Theprediction map 663 may have a depth of one or more. Theprediction map 663 may be fed into aconvolutional layer 664 to generate aprediction map 665. Theprediction map 665 may have a size of 128×128×24 and a depth of one. - The
prediction map 665 may be optionally added together with the feature maps 555 in an element-by-element manner to generate aprediction map 667. Theprediction map 667 may have a size of 128×128×24 and a depth of one. Theprediction map 667 may be fed into asigmoid function 668 to obtainfinal prediction 669. - The
final prediction 669 may be used to further determine the mask for the tissue or lesion. During the training process of input images for the FCN model inFIG. 6F , the output segmentation mask generated from theprediction 629 may be compared with the ground truth mask of the input images. A loss function may be determined. In one implementation, the loss function may include a soft max cross-entropy loss. In another implementation, the loss function may include dice coefficient (DC) loss function. - The implementation of
FIG. 6F may be further augmented as shown inFIG. 6G . InFIG. 6G , theinput 2D/3D images 501 in addition to theauxiliary predictions step 660 to generate the auto-context input 661. The concatenated auxiliary input at 661 may have a size of 128×128×24 and a depth of five rather than the depth of 4 inFIG. 6F . The auto-context input 661 may be fed into aDenseConv module 662 to generate aprediction map 663. Theprediction map 663 may have a depth of one or more. Theprediction map 663 may be fed into aconvolutional layer 664 to generate aprediction map 665. Theprediction map 665 may have a size of 128×128×24 and a depth of one. - Similar to
FIG. 6F , theprediction map 665 inFIG. 6G may be added together with the feature maps 555 in an element-by-element manner to generate aprediction map 667. Theprediction map 667 may have a size of 128×128×24 and a depth of one. Theprediction map 667 may be fed into asigmoid function 668 to obtainfinal prediction 669. Thefinal prediction 669 may be used to further determine the mask for the tissue or lesion. During the training process of input images for the FCN model inFIG. 6G , the output segmentation mask generated from theprediction 629 may be compared with the ground truth mask of the input images. A loss function may be determined. In one implementation, the loss function may include a soft max cross-entropy loss. In another implementation, the loss function may include dice coefficient (DC) loss function. - The
DenseConv module 662 above inFIG. 6F andFIG. 6G may include one or more convolutional layers. One embodiment is shown inFIG. 7 for aDenseConv module 700 including sixconvolutional layers exemplary kernel size 3×3×3 and with a number of feature maps of 16. Each of the six convolution layers may further include a batch normalization (BN) layer, and an ReLU layer. - An auto-
context input 701 may be fed into theconvolutional layer 710 to generate feature maps 713. In some exemplar implementations, the auto-context input 701 may have a size of 128×128×24 and a depth of four. The feature maps 713 may have a size of 128×128×24 and a depth of 16. - For a current convolutional layer, the input of the current convolutional layer may be concatenated with the output of the current convolutional layer to generate the input for the next convolutional layer.
- For example, as shown in
FIG. 7 , the auto-context input 701 may be concatenated with the output 713 of theconvolutional layer 710 to generate the feature maps 715. The feature maps 715 may be the input for theconvolutional layer 720. - As shown in
FIG. 7 , the feature maps 715 may be fed into theconvolutional layer 720 to generate the feature maps 723. The feature maps 723 may be concatenated with the feature maps 715 to generate the feature maps 725. The feature maps 725 may be the input for theconvolutional layer 730. - As further shown in
FIG. 7 , the feature maps 725 may be fed into theconvolutional layer 730 to generate the feature maps 733. The feature maps 733 may be concatenated with the feature maps 725 to generate the feature maps 735. The feature maps 735 may be the input for theconvolutional layer 740. The feature maps 735 may be fed into theconvolutional layer 740 to generate the feature maps 743. The feature maps 743 may be concatenated with the feature maps 735 to generate the feature maps 745. The feature maps 745 may be the input for theconvolutional layer 750. Furthermore, the feature maps 745 may be fed into theconvolutional layer 750 to generate the feature maps 753. The feature maps 753 may be concatenated with the feature maps 745 to generate the feature maps 755. The feature maps 755 may be the input for theconvolutional layer 760. - As shown in
FIG. 7 , the feature maps 755 may be fed into theconvolutional layer 760 to generate the feature maps, which serve as theoutput 791 of theDenseConv module 700. - The six-convolutional layer DenseConv implementation above is merely exemplary. In other embodiment, the
DenseConv module 700 may include any number of convolutional layers, and each of the convolutional layers may use any kernel size and/or include any number of features. - During training process, the output segmentation mask generated in forward-propagation from an FCN model may be compared with the ground truth mask of the input images. A loss function may be determined. In one implementation, the loss function may include a soft max cross-entropy loss. In another implementation, the loss function may include dice coefficient (DC) loss function. Then a back-propagation through the FCN model may be performed based on, e.g., stochastic gradient descent, and aimed at minimizing the loss function. By iterating the forward-propagation and back-propagation for the same input images, and for the entire training image set, the training parameters may converge to provide acceptable errors between the predicted masks and ground truth masks for all or most of the input images. The converged training parameters, including but not limited to the convolutional features/kernels and various weights and bias, may form a final predictive model that may be further verified using test images and used to predict segmentation or lesion masks for images that the network has never seen before. The FCN model is preferably trained to promote errors on the over-inclusive side to reduce or prevent false negatives in later stage of CAD based on a predicted mask.
- For complex models with multiple stages such as the ones shown in
FIGS. 6F and 6G including model stage I and stage II, the training process may include three steps as shown inFIG. 8 . - Step 810 may include training Stage 1 (of the FCN model, e.g.,
FIGS. 6F and 6G ) for generating prediction masks based on one or more Auxiliary outputs inStage 1 by comparing the prediction masks with the ground truth masks. The one or more Auxiliary output inStage 1 may include one or more from the Auxiliary 1-4 (529, 539, 549, and 559). - Step 820 may include fixing the training parameter in
Stage 1 and training Stage 2 (of the FCN model, e.g.,FIGS. 6F and 6G ) by generating prediction masks based on the output ofStage 2 and comparing the prediction masks with the ground truth masks.Stage 2 may include a DenseConv module, and the output ofStage 2 may includePrediction 669. - Step 830 may include fine tuning and training of
Stage 1 andStage 2 jointly by using the model parameters obtained insteps Stage 2.Stage 2 may include a DenseConv module, and the output ofStage 2 may includePrediction 669. - System with Different FCN Models
- Because different FCN models trained under different conditions may perform well for different input images and under various circumstances, a segmentation/lesion detection model may include multiple FCN sub-models to improve prediction accuracy.
FIG. 9 shows one embodiment of asystem 900 including two different FCN models: afirst FCN 920 and asecond FCN 930. - The
first FCN 920 and thesecond FCN 930 may be different in term of their architecture, for example, thefirst FCN 920 may be an FCN model similar to the model inFIG. 3 , and thesecond FCN 930 may be an FCN model similar the model inFIG. 6F . - The
first FCN 920 and thesecond FCN 930 may be different in term of how the multiple channels are processed, for example, thefirst FCN 920 may be an FCN model processing each channel individually and independently (e.g., FCN(2D/3D, SC), as described above), and thesecond FCN 930 may be an FCN model processing all multiple channels together in an aggregated manner (e.g., FCN (2D/3D, MC), as described above). - For
input images 910 of thesystem 900 inFIG. 9 , afirst predication 921 may be generated from thefirst FCN 920 and asecond prediction 931 may be generated from thesecond FCN 930. Thefirst predication 921 andsecond prediction 931 may be fed into a parameterizedcomparator 950 to determine which prediction of 921 and 931 to select, so as to generate afinal prediction 990. The parameters for the comparator may be trained. - The selection by the
comparator 950 may be performed at an individual pixel/voxel level, i.e., thecomparator 950 may select, for each individual pixel/voxel, which probability for the corresponding pixel/voxel, out of thefirst predication 921 andsecond prediction 931, as the final prediction for the corresponding pixel/voxel. - The FCN image segmentation and/or lesion detection above may be implemented as a
computer platform 1000 shown inFIG. 10 . Thecomputer platform 1000 may include one ormore training servers more prediction engines more databases 1012, one ormore model repositories 1002, anduser devices users computer platform 1000 are inter-connected and in communication with one another via public orprivate communication networks 1001. - The training servers and
prediction engine training servers prediction engine FIG. 10 as implemented as separate servers, they may be alternatively combined in a single server or single group of distributed servers combining the functionality of training and prediction. Theuser devices user devices - The one or
more databases 1012 ofFIG. 10 may be hosted in a central database server or a plurality of distributed database servers. For example, the one or more databases 1020 may be implemented as being hosted virtually in a cloud by a cloud service provider. The one or more databases 1020 may organize data in any form, including but not limited to relational database containing data tables, graphic database containing nodes and relationships, and the like. The one or more databases 1020 may be configured to store, for example, images and their labeled masks collected from various sources. These images and labels may be used as training data corpus for thetraining server 1006 for generating DCNN segmentation models. - The one or more model repositories 602 may be used to store, for example, the segmentation model with its trained parameters. In some implementation, the model repository 602 may be integrated as part of the predictive engines 608 and 601.
-
FIG. 11 shows anexemplary computer system 1100 for implementing any of the computing components ofFIGS. 1-10 . Thecomputer system 1100 may includecommunication interfaces 1102,system circuitry 1104, input/output (I/O) interfaces 1106,storage 1109, anddisplay circuitry 1108 that generatesmachine interfaces 1110 locally or for remote display, e.g., in a web browser running on a local or remote machine. The machine interfaces 1110 and the I/O interfaces 1106 may include GUIs, touch sensitive displays, voice or facial recognition inputs, buttons, switches, speakers and other user interface elements. Additional examples of the I/O interfaces 1106 include microphones, video and still image cameras, headset and microphone input/output jacks, Universal Serial Bus (USB) connectors, memory card slots, and other types of inputs. The I/O interfaces 1106 may further include magnetic or optical media interfaces (e.g., a CDROM or DVD drive), serial and parallel bus interfaces, and keyboard and mouse interfaces. - The communication interfaces 1102 may include wireless transmitters and receivers (“transceivers”) 1112 and any
antennas 1114 used by the transmitting and receiving circuitry of thetransceivers 1112. Thetransceivers 1112 andantennas 1114 may support Wi-Fi network communications, for instance, under any version of IEEE 802.11, e.g., 802.11n or 802.11ac. The communication interfaces 1102 may also includewireline transceivers 1116. Thewireline transceivers 1116 may provide physical layer interfaces for any of a wide range of communication protocols, such as any type of Ethernet, data over cable service interface specification (DOCSIS), digital subscriber line (DSL), Synchronous Optical Network (SONET), or other protocol. - The
storage 1109 may be used to store various initial, intermediate, or final data or model needed for the implantation of thecomputer platform 1000. Thestorage 1109 may be separate or integrated with the one ormore databases 1012 ofFIG. 10 . Thestorage 1109 may be centralized or distributed, and may be local or remote to thecomputer system 1100. For example, thestorage 1109 may be hosted remotely by a cloud computing service provider. - The
system circuitry 1104 may include hardware, software, firmware, or other circuitry in any combination. Thesystem circuitry 1104 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), microprocessors, discrete analog and digital circuits, and other circuitry. Thesystem circuitry 1104 is part of the implementation of any desired functionality related to thereconfigurable computer platform 1000. As just one example, thesystem circuitry 1104 may include one ormore instruction processors 1118 andmemories 1120. Thememories 1120 stores, for example, controlinstructions 1126 and anoperating system 1124. In one implementation, theinstruction processors 1118 executes thecontrol instructions 1126 and theoperating system 1124 to carry out any desired functionality related to thereconfigurable computer platform 1000. - The methods, devices, processing, and logic described above may be implemented in many different ways and in many different combinations of hardware and software. For example, all or parts of the implementations may be circuitry that includes an instruction processor, such as a Central Processing Unit (CPU), microcontroller, or a microprocessor; an Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD), or Field Programmable Gate Array (FPGA); or circuitry that includes discrete logic or other circuit components, including analog circuit components, digital circuit components or both; or any combination thereof. The circuitry may include discrete interconnected hardware components and/or may be combined on a single integrated circuit die, distributed among multiple integrated circuit dies, or implemented in a Multiple Chip Module (MCM) of multiple integrated circuit dies in a common package, as examples.
- The circuitry may further include or access instructions for execution by the circuitry. The instructions may be stored in a tangible storage medium that is other than a transitory signal, such as a flash memory, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM); or on a magnetic or optical disc, such as a Compact Disc Read Only Memory (CDROM), Hard Disk Drive (HDD), or other magnetic or optical disk; or in or on another machine-readable medium. A product, such as a computer program product, may include a storage medium and instructions stored in or on the medium, and the instructions when executed by the circuitry in a device may cause the device to implement any of the processing described above or illustrated in the drawings.
- The implementations may be distributed as circuitry among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may be implemented in many different ways, including as data structures such as linked lists, hash tables, arrays, records, objects, or implicit storage mechanisms. Programs may be parts (e.g., subroutines) of a single program, separate programs, distributed across several memories and processors, or implemented in many different ways, such as in a library, such as a shared library (e.g., a Dynamic Link Library (DLL)). The DLL, for example, may store instructions that perform any of the processing described above or illustrated in the drawings, when executed by the circuitry.
- While the particular invention has been described with reference to illustrative embodiments, this description is not meant to be limiting. Various modifications of the illustrative embodiments and additional embodiments of the invention will be apparent to one of ordinary skill in the art from this description. Those skilled in the art will readily recognize that these and various other modifications can be made to the exemplary embodiments, illustrated and described herein, without departing from the spirit and scope of the present invention. It is therefore contemplated that the appended claims will cover any such modifications and alternate embodiments. Certain proportions within the illustrations may be exaggerated, while other proportions may be minimized. Accordingly, the disclosure and the figures are to be regarded as illustrative rather than restrictive.
Claims (20)
1. A system for performing segmentation of digital images, comprising:
a communication interface circuitry;
a database;
a predictive model repository; and
a processing circuitry in communication with the database and the predictive model repository, the processing circuitry configured to:
receive a set of training images labeled with a corresponding set of ground truth segmentation masks;
establish a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem; and
iteratively train the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by configuring the processing circuitry to:
down-sample a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image,
up-sample the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map,
generate, based on the training image and the first feature map, a predictive segmentation mask for the training image,
generate an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image,
back-propagate the end loss through the full convolutional neural network, and
minimize the end loss by adjusting a set of training parameters of the fully convolutional neural network using gradient descent.
2. The system of claim 1 , wherein the processing circuitry is further configured to:
store the iteratively trained fully convolutional neural network with the set of training parameters in the predictive model repository;
receive an input image, wherein the input image comprises one of a test image or a unlabeled image; and
forward-propagate the input image through the iteratively trained fully convolutional neural network with the set of training parameters to generate an output segmentation mask.
3. The system of claim 1 , wherein when the processing circuitry is configured to generate, based on the training image and the first feature map, the predictive segmentation mask for the training image, the processing circuitry is configured to:
implement a first auxiliary convolutional layer on the first feature map to generate a convoluted first feature map, the convoluted first feature map having a same resolution as the first feature map, the convoluted first feature map having a depth of one;
when the convoluted first feature map has a different resolution as the training image, adjust a resolution of the convoluted first feature map to have the same resolution as the training image; and
generate the predictive segmentation mask for the training image, based on the training image and the resolution-adjusted convoluted first feature map.
4. The system of claim 3 , wherein when the processing circuitry is configured to generate the predictive segmentation mask for the training image, based on the training image and the resolution-adjusted convoluted first feature map, the processing circuitry is configured to:
perform a first sigmoid function on the resolution-adjusted convoluted first feature map to generate a first auxiliary prediction map; and
generate the predictive segmentation mask for the training image, based on the training image and the first auxiliary prediction map.
5. The system of claim 4 , wherein when the processing circuitry is configured to generate the predictive segmentation mask for the training image, based on the training image and the first auxiliary prediction map, the processing circuitry is configured to:
add the training image and the first auxiliary prediction map to generate an auto-context prediction map, or concatenate the training image and the first auxiliary prediction map to generate the auto-context prediction map;
generate the predictive segmentation mask for the training image, based on the auto-context prediction map.
6. The system of claim 5 , wherein when the processing circuitry is configured to generate the predictive segmentation mask for the training image, based on the auto-context prediction map, the processing circuitry is configured to:
perform a densely connected convolutional (DenseConv) operation on the auto-context prediction map to generate a DenseConv prediction map, the DenseConv operation including one or more convolutional layers;
perform a DenseConv auxiliary convolutional layer on the DenseConv prediction map to generate a convoluted DenseConv prediction map;
add the convoluted DenseConv prediction map and the added second feature map to generate the added DenseConv prediction map; and
generate, based on the added DenseConv prediction map, the predictive segmentation mask for the training image.
7. The system of claim 1 , wherein the processing circuitry is further configured to iteratively train the full convolution neural network in the end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by configuring the processing circuitry to:
up-sample the intermediate feature map through the multi-layer expansion convolutional neural network to generate a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map;
implement a first auxiliary convolutional layer on the first feature map to generate a convoluted first feature map, the convoluted first feature map having a same resolution as the first feature map, the convoluted first feature map having a depth of one;
implement a second auxiliary convolutional layer on the second feature map to generate a convoluted second feature map, the convoluted second feature map having a same resolution as the second feature map, the convoluted second feature map having a depth of one;
implement a first de-convolutional layer on the convoluted first feature map to generate a de-convoluted first feature map, the de-convoluted first feature map having a larger resolution than the first feature map;
add the de-convoluted first feature map and the convoluted second feature map to generate an added second feature map;
perform a second sigmoid function on the added second feature map to generate a second auxiliary prediction map; and
generate the predictive segmentation mask for the training image, based on the training image and the second auxiliary prediction map.
8. The system of claim 7 , wherein when the processing circuitry is configured to generate the predictive segmentation mask for the training image, based on the training image and the second auxiliary prediction map, the processing circuitry is configured to:
add the training image and the second auxiliary prediction map to generate an auto-context prediction map, or concatenate the training image and the second auxiliary prediction map to generate an auto-context prediction map; and
generate the predictive segmentation mask for the training image, based on the auto-context prediction map.
9. A method for image segmentation, comprising:
receiving, by a computer comprising a memory storing instructions and a processor in communication with the memory, a set of training images labeled with a corresponding set of ground truth segmentation masks;
establishing, by the computer, a fully convolutional neural network comprising a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem; and
iteratively training, by the computer, the full convolution neural network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by:
down-sampling a training image of the set of training images through the multi-layer contraction convolutional neural network to generate an intermediate feature map, wherein a resolution of the intermediate feature map is lower than a resolution of the training image;
up-sampling the intermediate feature map through the multi-layer expansion convolutional neural network to generate a first feature map;
generating, based on the training image and the first feature map, a predictive segmentation mask for the training image;
generating an end loss based on a difference between the predictive segmentation mask and a ground truth segmentation mask corresponding to the training image;
back-propagating the end loss through the full convolutional neural network; and
minimizing the end loss by adjusting a set of training parameters of the fully convolutional neural network using gradient descent.
10. The method of claim 9 , further comprising:
storing, by the computer, the iteratively trained fully convolutional neural network with the set of training parameters in a predictive model repository;
receiving, by the computer, an input image, wherein the input image comprises one of a test image or a unlabeled image; and
forward-propagating, by the computer, the input image through the iteratively trained fully convolutional neural network with the set of training parameters to generate an output segmentation mask.
11. The method of claim 9 , wherein the generating, based on the training image and the first feature map, the predictive segmentation mask for the training image comprises:
implementing, by the computer, a first auxiliary convolutional layer on the first feature map to generate a convoluted first feature map, the convoluted first feature map having a same resolution as the first feature map, the convoluted first feature map having a depth of one;
when the convoluted first feature map has a different resolution as the training image, adjusting, by the computer, a resolution of the convoluted first feature map to have the same resolution as the training image; and
generating, by the computer, the predictive segmentation mask for the training image, based on the training image and the resolution-adjusted convoluted first feature map.
12. The method of claim 11 , wherein the generating the predictive segmentation mask for the training image, based on the training image and the resolution-adjusted convoluted first feature map, comprises:
performing a first sigmoid function on the resolution-adjusted convoluted first feature map to generate a first auxiliary prediction map; and
generating the predictive segmentation mask for the training image, based on the training image and the first auxiliary prediction map.
13. The method of claim 12 , wherein the generating the predictive segmentation mask for the training image, based on the training image and the first auxiliary prediction map comprises:
adding the training image and the first auxiliary prediction map to generate an auto-context prediction map, or concatenating the training image and the first auxiliary prediction map to generate the auto-context prediction map;
generating the predictive segmentation mask for the training image, based on the auto-context prediction map.
14. The method of claim 13 , wherein the generating the predictive segmentation mask for the training image, based on the auto-context prediction map, comprises:
performing a densely connected convolutional (DenseConv) operation on the auto-context prediction map to generate a DenseConv prediction map, the DenseConv operation including one or more convolutional layers;
performing a DenseConv auxiliary convolutional layer on the DenseConv prediction map to generate a convoluted DenseConv prediction map;
adding the convoluted DenseConv prediction map and the added second feature map to generate the added DenseConv prediction map; and
generating, based on the added DenseConv prediction map, the predictive segmentation mask for the training image.
15. The method of claim 9 , wherein the iteratively training the full convolution neural network in the end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks further comprises:
up-sampling the intermediate feature map through the multi-layer expansion convolutional neural network to generate a second feature map, wherein a resolution of the second feature map is larger than the resolution of the first feature map;
implementing a first auxiliary convolutional layer on the first feature map to generate a convoluted first feature map, the convoluted first feature map having a same resolution as the first feature map, the convoluted first feature map having a depth of one;
implementing a second auxiliary convolutional layer on the second feature map to generate a convoluted second feature map, the convoluted second feature map having a same resolution as the second feature map, the convoluted second feature map having a depth of one;
implementing a first de-convolutional layer on the convoluted first feature map to generate a de-convoluted first feature map, the de-convoluted first feature map having a larger resolution than the first feature map;
adding the de-convoluted first feature map and the convoluted second feature map to generate an added second feature map;
performing a second sigmoid function on the added second feature map to generate a second auxiliary prediction map; and
generating the predictive segmentation mask for the training image, based on the training image and the second auxiliary prediction map.
16. The method of claim 15 , wherein the generating the predictive segmentation mask for the training image, based on the training image and the second auxiliary prediction map comprises:
adding the training image and the second auxiliary prediction map to generate an auto-context prediction map, or concatenating the training image and the second auxiliary prediction map to generate an auto-context prediction map; and
generating the predictive segmentation mask for the training image, based on the auto-context prediction map.
17. A system for performing segmentation of digital images, comprising:
a communication interface circuitry;
a database;
a predictive model repository; and
a processing circuitry in communication with the database and the predictive model repository, the processing circuitry configured to:
receive a set of training images labeled with a corresponding set of ground truth segmentation masks;
establish a segmentation network comprising a first fully convolutional neural network, a second fully convolutional neural network, and an evaluation network, wherein each of the first and second fully convolutional neural networks comprises a multi-layer contraction convolutional neural network and an expansion convolutional neural network connected in tandem, and the evaluation network is in communication with the first and second fully convolutional neural networks; and
iteratively train the segmentation network in an end-to-end manner using the set of training images and the corresponding set of ground truth segmentation masks by configuring the processing circuitry to:
generate a first predictive segmentation mask for a training image of the set of training images, by the first fully convolutional neural network based on the training image,
generate a second predictive segmentation mask for the training image, by the second fully convolutional neural network based on the training image,
generate a final predictive segmentation mask for the training image, by the evaluation network based on the first predictive segmentation mask and the second predictive segmentation mask,
generate an end loss based on a difference between the final predictive segmentation mask and a ground truth segmentation mask corresponding to the training image,
back-propagate the end loss through the segmentation network, and
minimize the end loss by adjusting a set of training parameters of the segmentation network using gradient descent.
18. The system of claim 17 , wherein the processing circuitry is further configured to:
store the iteratively trained segmentation network with the set of training parameters in the predictive model repository;
receive an input image, wherein the input image comprises one of a test image or a unlabeled image; and
forward-propagate the input image through the iteratively trained segmentation network with the set of training parameters to generate an output segmentation mask.
19. The system of claim 17 , wherein when the processing circuitry is configured to generate the final predictive segmentation mask for the training image, by the evaluation network based on the first predictive segmentation mask and the second predictive segmentation mask, the processing circuitry is configured to:
generate a final predictive value for each pixel of the final predictive segmentation mask, by the evaluation network based on values of corresponding pixels of the first predictive segmentation mask and the second predictive segmentation mask.
20. The system of claim 17 , wherein when the processing circuitry is configured to generate the final predictive segmentation mask for the training image, by the evaluation network based on the first predictive segmentation mask and the second predictive segmentation mask, the processing circuitry is configured to:
add, by the evaluation network, the first predictive segmentation mask and the second predictive segmentation mask to generate the final predictive segmentation mask.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/380,670 US20200058126A1 (en) | 2018-08-17 | 2019-04-10 | Image segmentation and object detection using fully convolutional neural network |
PCT/US2019/044303 WO2020036734A2 (en) | 2018-08-17 | 2019-07-31 | Image segmentation and object detection using fully convolutional neural network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/104,449 US10304193B1 (en) | 2018-08-17 | 2018-08-17 | Image segmentation and object detection using fully convolutional neural network |
US16/380,670 US20200058126A1 (en) | 2018-08-17 | 2019-04-10 | Image segmentation and object detection using fully convolutional neural network |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/104,449 Continuation US10304193B1 (en) | 2018-08-17 | 2018-08-17 | Image segmentation and object detection using fully convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
US20200058126A1 true US20200058126A1 (en) | 2020-02-20 |
Family
ID=66636269
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/104,449 Expired - Fee Related US10304193B1 (en) | 2018-08-17 | 2018-08-17 | Image segmentation and object detection using fully convolutional neural network |
US16/380,670 Abandoned US20200058126A1 (en) | 2018-08-17 | 2019-04-10 | Image segmentation and object detection using fully convolutional neural network |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/104,449 Expired - Fee Related US10304193B1 (en) | 2018-08-17 | 2018-08-17 | Image segmentation and object detection using fully convolutional neural network |
Country Status (2)
Country | Link |
---|---|
US (2) | US10304193B1 (en) |
WO (1) | WO2020036734A2 (en) |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111415356A (en) * | 2020-03-17 | 2020-07-14 | 北京推想科技有限公司 | Pneumonia symptom segmentation method, pneumonia symptom segmentation device, pneumonia symptom segmentation medium and electronic equipment |
CN111818298A (en) * | 2020-06-08 | 2020-10-23 | 北京航空航天大学 | High-definition video monitoring system and method based on light field |
CN112116605A (en) * | 2020-09-29 | 2020-12-22 | 西北工业大学深圳研究院 | Pancreas CT image segmentation method based on integrated depth convolution neural network |
CN112446859A (en) * | 2020-11-18 | 2021-03-05 | 中国科学院上海技术物理研究所 | Satellite-borne thermal infrared camera image cloud detection method based on deep learning |
US20210098127A1 (en) * | 2019-09-30 | 2021-04-01 | GE Precision Healthcare LLC | Medical imaging stroke model |
US20210118137A1 (en) * | 2019-05-16 | 2021-04-22 | Beijing Boe Technology Development Co., Ltd. | Method and apparatus of labeling target in image, and computer recording medium |
WO2021168745A1 (en) * | 2020-02-24 | 2021-09-02 | 深圳先进技术研究院 | Method and apparatus for training magnetic resonance imaging model |
WO2021183684A1 (en) * | 2020-03-10 | 2021-09-16 | AI:ON Innovations, Inc. | System and methods for mammalian transfer learning |
WO2021202204A1 (en) * | 2020-03-31 | 2021-10-07 | Alibaba Group Holding Limited | Data processing method, means and system |
WO2021236939A1 (en) * | 2020-05-22 | 2021-11-25 | Alibaba Group Holding Limited | Recognition method, apparatus, and device, and storage medium |
US20210383533A1 (en) * | 2020-06-03 | 2021-12-09 | Nvidia Corporation | Machine-learning-based object detection system |
US11232572B2 (en) * | 2019-08-20 | 2022-01-25 | Merck Sharp & Dohme Corp. | Progressively-trained scale-invariant and boundary-aware deep neural network for the automatic 3D segmentation of lung lesions |
WO2022017025A1 (en) * | 2020-07-23 | 2022-01-27 | Oppo广东移动通信有限公司 | Image processing method and apparatus, storage medium, and electronic device |
US20220051405A1 (en) * | 2019-11-12 | 2022-02-17 | Tencent Technology (Shenzhen) Company Limited | Image processing method and apparatus, server, medical image processing device and storage medium |
US20220076422A1 (en) * | 2018-12-21 | 2022-03-10 | Nova Scotia Health Authority | Systems and methods for generating cancer prediction maps from multiparametric magnetic resonance images using deep |
WO2022059920A1 (en) * | 2020-09-15 | 2022-03-24 | 삼성전자주식회사 | Electronic device, control method thereof, and electronic system |
US20220092789A1 (en) * | 2020-04-09 | 2022-03-24 | Zhejiang Lab | Automatic pancreas ct segmentation method based on a saliency-aware densely connected dilated convolutional neural network |
WO2022066725A1 (en) * | 2020-09-23 | 2022-03-31 | Proscia Inc. | Training end-to-end weakly supervised networks in a multi-task fashion at the specimen (supra-image) level |
US11328430B2 (en) * | 2019-05-28 | 2022-05-10 | Arizona Board Of Regents On Behalf Of Arizona State University | Methods, systems, and media for segmenting images |
US11348233B2 (en) * | 2018-12-28 | 2022-05-31 | Shanghai United Imaging Intelligence Co., Ltd. | Systems and methods for image processing |
US11354918B2 (en) * | 2020-03-12 | 2022-06-07 | Korea Advanced Institute Of Science And Technology | Electronic device for recognizing visual stimulus based on spontaneous selective neural response of deep artificial neural network and operating method thereof |
WO2022116163A1 (en) * | 2020-12-04 | 2022-06-09 | 深圳市优必选科技股份有限公司 | Portrait segmentation method, robot, and storage medium |
US11423678B2 (en) | 2019-09-23 | 2022-08-23 | Proscia Inc. | Automated whole-slide image classification using deep learning |
US11455706B2 (en) | 2020-09-15 | 2022-09-27 | Samsung Electronics Co., Ltd. | Electronic apparatus, control method thereof and electronic system |
US20220319141A1 (en) * | 2021-09-15 | 2022-10-06 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for processing image, device and storage medium |
US20220354466A1 (en) * | 2019-09-27 | 2022-11-10 | Google Llc | Automated Maternal and Prenatal Health Diagnostics from Ultrasound Blind Sweep Video Sequences |
US20220383045A1 (en) * | 2021-05-25 | 2022-12-01 | International Business Machines Corporation | Generating pseudo lesion masks from bounding box annotations |
JP2023016717A (en) * | 2021-07-21 | 2023-02-02 | ジーイー・プレシジョン・ヘルスケア・エルエルシー | Systems and methods for fast mammography data handling |
US20230154610A1 (en) * | 2021-11-17 | 2023-05-18 | City University Of Hong Kong | Task interaction netwrok for prostate cancer diagnosis |
WO2023081978A1 (en) * | 2021-11-12 | 2023-05-19 | OMC International Pty Ltd | Systems and methods for draft calculation |
EP4202825A1 (en) * | 2021-12-21 | 2023-06-28 | Koninklijke Philips N.V. | Network architecture for 3d image processing |
US11861881B2 (en) | 2020-09-23 | 2024-01-02 | Proscia Inc. | Critical component detection using deep learning and attention |
GB2621332A (en) * | 2022-08-08 | 2024-02-14 | Twinn Health Ltd | A method and an artificial intelligence system for assessing an MRI image |
Families Citing this family (143)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10387740B2 (en) * | 2016-10-10 | 2019-08-20 | Gyrfalcon Technology Inc. | Object detection and recognition apparatus based on CNN based integrated circuits |
US10621725B2 (en) * | 2017-04-12 | 2020-04-14 | Here Global B.V. | Small object detection from a large image |
US10776668B2 (en) * | 2017-12-14 | 2020-09-15 | Robert Bosch Gmbh | Effective building block design for deep convolutional neural networks using search |
US11669724B2 (en) | 2018-05-17 | 2023-06-06 | Raytheon Company | Machine learning using informed pseudolabels |
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
US10796152B2 (en) * | 2018-09-21 | 2020-10-06 | Ancestry.Com Operations Inc. | Ventral-dorsal neural networks: object detection via selective attention |
CN110956575B (en) * | 2018-09-26 | 2022-04-12 | 京东方科技集团股份有限公司 | Method and device for converting image style and convolution neural network processor |
US11169531B2 (en) | 2018-10-04 | 2021-11-09 | Zoox, Inc. | Trajectory prediction on top-down scenes |
US11195418B1 (en) * | 2018-10-04 | 2021-12-07 | Zoox, Inc. | Trajectory prediction on top-down scenes and associated model |
US11188799B2 (en) * | 2018-11-12 | 2021-11-30 | Sony Corporation | Semantic segmentation with soft cross-entropy loss |
US10410352B1 (en) * | 2019-01-25 | 2019-09-10 | StradVision, Inc. | Learning method and learning device for improving segmentation performance to be used for detecting events including pedestrian event, vehicle event, falling event and fallen event using edge loss and test method and test device using the same |
US10402977B1 (en) * | 2019-01-25 | 2019-09-03 | StradVision, Inc. | Learning method and learning device for improving segmentation performance in road obstacle detection required to satisfy level 4 and level 5 of autonomous vehicles using laplacian pyramid network and testing method and testing device using the same |
US10839543B2 (en) * | 2019-02-26 | 2020-11-17 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
CN110222636B (en) * | 2019-05-31 | 2023-04-07 | 中国民航大学 | Pedestrian attribute identification method based on background suppression |
CN112053367A (en) * | 2019-06-06 | 2020-12-08 | 阿里巴巴集团控股有限公司 | Image processing method, apparatus and storage medium |
JP7130905B2 (en) * | 2019-06-18 | 2022-09-06 | ユーエービー “ニューロテクノロジー” | Fast and Robust Dermatoglyphic Mark Minutia Extraction Using Feedforward Convolutional Neural Networks |
CN110264483B (en) * | 2019-06-19 | 2023-04-18 | 东北大学 | Semantic image segmentation method based on deep learning |
CN110490203B (en) * | 2019-07-05 | 2023-11-03 | 平安科技(深圳)有限公司 | Image segmentation method and device, electronic equipment and computer readable storage medium |
CN110334679B (en) * | 2019-07-11 | 2021-11-26 | 厦门美图之家科技有限公司 | Face point processing method and device |
CN110490878A (en) * | 2019-07-29 | 2019-11-22 | 上海商汤智能科技有限公司 | Image processing method and device, electronic equipment and storage medium |
TWI710762B (en) * | 2019-07-31 | 2020-11-21 | 由田新技股份有限公司 | An image classification system |
GB2586604B (en) * | 2019-08-28 | 2022-10-05 | Canon Kk | 3d representation reconstruction from images using volumic probability data |
CN110599500B (en) * | 2019-09-03 | 2022-08-26 | 南京邮电大学 | Tumor region segmentation method and system of liver CT image based on cascaded full convolution network |
CN110634133A (en) * | 2019-09-04 | 2019-12-31 | 杭州健培科技有限公司 | Knee joint orthopedic measurement method and device based on X-ray plain film |
CN112446266B (en) * | 2019-09-04 | 2024-03-29 | 北京君正集成电路股份有限公司 | Face recognition network structure suitable for front end |
CN110717913B (en) * | 2019-09-06 | 2022-04-22 | 浪潮电子信息产业股份有限公司 | Image segmentation method and device |
CN110659724B (en) * | 2019-09-12 | 2023-04-28 | 复旦大学 | Target detection depth convolution neural network construction method based on target scale |
CN110706209B (en) * | 2019-09-17 | 2022-04-29 | 东南大学 | Method for positioning tumor in brain magnetic resonance image of grid network |
CN110738127B (en) * | 2019-09-19 | 2023-04-18 | 福建技术师范学院 | Helmet identification method based on unsupervised deep learning neural network algorithm |
CN110853045B (en) * | 2019-09-24 | 2022-02-11 | 西安交通大学 | Vascular wall segmentation method and device based on nuclear magnetic resonance image and storage medium |
CN110717420A (en) * | 2019-09-25 | 2020-01-21 | 中国科学院深圳先进技术研究院 | Cultivated land extraction method and system based on remote sensing image and electronic equipment |
US11068747B2 (en) | 2019-09-27 | 2021-07-20 | Raytheon Company | Computer architecture for object detection using point-wise labels |
CN110827238B (en) * | 2019-09-29 | 2023-07-21 | 哈尔滨工程大学 | Improved side-scan sonar image feature extraction method of full convolution neural network |
CN110675419B (en) * | 2019-10-11 | 2022-03-08 | 上海海事大学 | Multi-modal brain glioma image segmentation method for self-adaptive attention gate |
CN110853048A (en) * | 2019-10-14 | 2020-02-28 | 北京缙铖医疗科技有限公司 | MRI image segmentation method, device and storage medium based on rough training and fine training |
CN112686902B (en) * | 2019-10-17 | 2023-02-03 | 西安邮电大学 | Two-stage calculation method for brain glioma identification and segmentation in nuclear magnetic resonance image |
CN110957036B (en) * | 2019-10-24 | 2023-07-14 | 中国人民解放军总医院 | Disease prognosis risk assessment model method based on causal reasoning construction |
CN112734814B (en) * | 2019-10-28 | 2023-10-20 | 北京大学 | Three-dimensional craniofacial cone beam CT image registration method |
US11341635B2 (en) * | 2019-10-31 | 2022-05-24 | Tencent America LLC | Computer aided diagnosis system for detecting tissue lesion on microscopy images based on multi-resolution feature fusion |
CN111047598B (en) * | 2019-11-04 | 2023-09-01 | 华北电力大学(保定) | Deep learning-based ultraviolet discharge light spot segmentation method and device for power transmission and transformation equipment |
CN111047594B (en) * | 2019-11-06 | 2023-04-07 | 安徽医科大学 | Tumor MRI weak supervised learning analysis modeling method and model thereof |
CN110852240A (en) * | 2019-11-06 | 2020-02-28 | 创新奇智(成都)科技有限公司 | Retail commodity detection system and detection method |
CN110992309B (en) * | 2019-11-07 | 2023-08-18 | 吉林大学 | Fundus image segmentation method based on deep information transfer network |
CN111028235B (en) * | 2019-11-11 | 2023-08-22 | 东北大学 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
KR20210061146A (en) | 2019-11-19 | 2021-05-27 | 삼성전자주식회사 | Electronic apparatus and control method thereof |
CN110827275B (en) * | 2019-11-22 | 2023-12-22 | 吉林大学第一医院 | Liver nuclear magnetic artery image quality grading method based on raspberry pie and deep learning |
CN110910371A (en) * | 2019-11-22 | 2020-03-24 | 北京理工大学 | Liver tumor automatic classification method and device based on physiological indexes and image fusion |
CN111241908B (en) * | 2019-11-26 | 2023-04-14 | 暨南大学 | Device and method for identifying biological characteristics of young poultry |
CN111191674B (en) * | 2019-11-30 | 2024-08-06 | 北京林业大学 | Primary feature extractor and extraction method based on densely connected perforated convolution network |
CN110895814B (en) * | 2019-11-30 | 2023-04-18 | 南京工业大学 | Aero-engine hole-finding image damage segmentation method based on context coding network |
CN111080598B (en) * | 2019-12-12 | 2020-08-28 | 哈尔滨市科佳通用机电股份有限公司 | Bolt and nut missing detection method for coupler yoke key safety crane |
CN111179231B (en) * | 2019-12-20 | 2024-05-28 | 上海联影智能医疗科技有限公司 | Image processing method, device, equipment and storage medium |
CN111209916B (en) * | 2019-12-31 | 2024-01-23 | 中国科学技术大学 | Focus identification method and system and identification equipment |
CN111340828A (en) * | 2020-01-10 | 2020-06-26 | 南京航空航天大学 | Brain glioma segmentation based on cascaded convolutional neural networks |
CN111275714B (en) * | 2020-01-13 | 2022-02-01 | 武汉大学 | Prostate MR image segmentation method based on attention mechanism 3D convolutional neural network |
CN111275083B (en) * | 2020-01-15 | 2021-06-18 | 浙江工业大学 | Optimization method for realizing residual error network characteristic quantity matching |
CN111275720B (en) * | 2020-01-20 | 2022-05-17 | 浙江大学 | Full end-to-end small organ image identification method based on deep learning |
CN111354002A (en) * | 2020-02-07 | 2020-06-30 | 天津大学 | Kidney and kidney tumor segmentation method based on deep neural network |
CN111311561B (en) * | 2020-02-10 | 2023-10-10 | 浙江未来技术研究院(嘉兴) | Automatic operation area photometry method and device based on microsurgery imaging system |
CN111260060B (en) * | 2020-02-20 | 2022-06-14 | 武汉大学 | Object detection neural network hybrid training method and system based on dynamic intensity |
CN111292331B (en) * | 2020-02-23 | 2023-09-12 | 华为云计算技术有限公司 | Image processing method and device |
EP3994661A4 (en) | 2020-02-24 | 2023-08-02 | Thales Canada Inc. | Method for semantic object detection with knowledge graph |
CN111341438B (en) * | 2020-02-25 | 2023-04-28 | 中国科学技术大学 | Image processing method, device, electronic equipment and medium |
CN111429474B (en) * | 2020-02-27 | 2023-04-07 | 西北大学 | Mammary gland DCE-MRI image focus segmentation model establishment and segmentation method based on mixed convolution |
CN111353504B (en) * | 2020-03-02 | 2023-05-26 | 济南大学 | Source camera identification method based on image block diversity selection and residual prediction module |
JP2023515669A (en) * | 2020-03-05 | 2023-04-13 | マジック リープ, インコーポレイテッド | Systems and Methods for Depth Estimation by Learning Sparse Point Triangulation and Densification for Multiview Stereo |
CN111369582B (en) * | 2020-03-06 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Image segmentation method, background replacement method, device, equipment and storage medium |
CN111476796B (en) * | 2020-03-10 | 2023-04-18 | 西北大学 | Semi-supervised coronary artery segmentation system and segmentation method combining multiple networks |
CN111292317B (en) * | 2020-03-11 | 2022-06-07 | 四川大学华西医院 | Method for enhancing image local feature type multitask segmentation of in-situ cancer region in mammary duct |
CN111368845B (en) * | 2020-03-16 | 2023-04-07 | 河南工业大学 | Feature dictionary construction and image segmentation method based on deep learning |
CN111292324B (en) * | 2020-03-20 | 2022-03-01 | 电子科技大学 | Multi-target identification method and system for brachial plexus ultrasonic image |
CN113435226B (en) * | 2020-03-23 | 2022-09-16 | 北京百度网讯科技有限公司 | Information processing method and device |
CN111462133B (en) * | 2020-03-31 | 2023-06-30 | 厦门亿联网络技术股份有限公司 | System, method, storage medium and equipment for real-time video image segmentation |
CN111476773A (en) * | 2020-04-07 | 2020-07-31 | 重庆医科大学附属儿童医院 | Auricle malformation analysis and identification method, system, medium and electronic terminal |
CN111489364B (en) * | 2020-04-08 | 2022-05-03 | 重庆邮电大学 | Medical image segmentation method based on lightweight full convolution neural network |
CN111489345B (en) * | 2020-04-13 | 2023-08-15 | 中国科学院高能物理研究所 | Training method, device, equipment and storage medium of region segmentation model |
US11676391B2 (en) | 2020-04-16 | 2023-06-13 | Raytheon Company | Robust correlation of vehicle extents and locations when given noisy detections and limited field-of-view image frames |
CN111667488B (en) * | 2020-04-20 | 2023-07-28 | 浙江工业大学 | Medical image segmentation method based on multi-angle U-Net |
CN111783977B (en) * | 2020-04-21 | 2024-04-05 | 北京大学 | Neural network training process intermediate value storage compression method and device based on regional gradient update |
CN111563902B (en) * | 2020-04-23 | 2022-05-24 | 华南理工大学 | Lung lobe segmentation method and system based on three-dimensional convolutional neural network |
CN111553925B (en) * | 2020-04-27 | 2023-06-06 | 南通智能感知研究院 | FCN-based end-to-end crop image segmentation method and system |
CN111598098B (en) * | 2020-05-09 | 2022-07-29 | 河海大学 | Water gauge water line detection and effectiveness identification method based on full convolution neural network |
CN111489354B (en) * | 2020-05-18 | 2023-07-14 | 国网浙江省电力有限公司检修分公司 | Method and device for detecting bird nest on electric power tower, server and storage medium |
CN111709293B (en) * | 2020-05-18 | 2023-10-03 | 杭州电子科技大学 | Chemical structural formula segmentation method based on Resunet neural network |
CN111738295B (en) * | 2020-05-22 | 2024-03-22 | 南通大学 | Image segmentation method and storage medium |
CN111862099B (en) * | 2020-06-04 | 2024-07-19 | 杭州深睿博联科技有限公司 | Blood vessel segmentation method and device based on pyramid architecture and coarse-to-fine strategy |
US11436703B2 (en) * | 2020-06-12 | 2022-09-06 | Samsung Electronics Co., Ltd. | Method and apparatus for adaptive artificial intelligence downscaling for upscaling during video telephone call |
CN111524149B (en) * | 2020-06-19 | 2023-02-28 | 安徽工业大学 | Gas ash microscopic image segmentation method and system based on full convolution residual error network |
CN111931805B (en) * | 2020-06-23 | 2022-10-28 | 西安交通大学 | Knowledge-guided CNN-based small sample similar abrasive particle identification method |
CN111724300B (en) * | 2020-06-30 | 2023-10-13 | 珠海复旦创新研究院 | Single picture background blurring method, device and equipment |
CN111862042B (en) * | 2020-07-21 | 2023-05-23 | 北京航空航天大学 | Pipeline contour detection method based on full convolution neural network |
CN111739027B (en) * | 2020-07-24 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and readable storage medium |
CN112037171B (en) * | 2020-07-30 | 2023-08-15 | 西安电子科技大学 | Multi-mode feature fusion-based multi-task MRI brain tumor image segmentation method |
US10885387B1 (en) * | 2020-08-04 | 2021-01-05 | SUPERB Al CO., LTD. | Methods for training auto-labeling device and performing auto-labeling by using hybrid classification and devices using the same |
US10902291B1 (en) * | 2020-08-04 | 2021-01-26 | Superb Ai Co., Ltd. | Methods for training auto labeling device and performing auto labeling related to segmentation while performing automatic verification by using uncertainty scores and devices using the same |
CN114073536A (en) * | 2020-08-12 | 2022-02-22 | 通用电气精准医疗有限责任公司 | Perfusion imaging system and method |
CN111898324B (en) * | 2020-08-13 | 2022-06-28 | 四川大学华西医院 | Segmentation task assistance-based nasopharyngeal carcinoma three-dimensional dose distribution prediction method |
CN112070686B (en) * | 2020-08-14 | 2023-04-28 | 林红军 | Backlight image cooperative enhancement method based on deep learning |
CN111914948B (en) * | 2020-08-20 | 2024-07-26 | 上海海事大学 | Ocean current machine blade attachment self-adaptive identification method based on rough and fine semantic segmentation network |
CN112070176B (en) * | 2020-09-18 | 2022-05-13 | 福州大学 | Cutting-free end-to-end license plate recognition method |
CN111932482B (en) * | 2020-09-25 | 2021-05-18 | 平安科技(深圳)有限公司 | Method and device for detecting target object in image, electronic equipment and storage medium |
CN112155729B (en) * | 2020-10-15 | 2021-11-23 | 中国科学院合肥物质科学研究院 | Intelligent automatic planning method and system for surgical puncture path and medical system |
WO2022087853A1 (en) * | 2020-10-27 | 2022-05-05 | 深圳市深光粟科技有限公司 | Image segmentation method and apparatus, and computer-readable storage medium |
CN112365434B (en) * | 2020-11-10 | 2022-10-21 | 大连理工大学 | Unmanned aerial vehicle narrow passage detection method based on double-mask image segmentation |
CN113177567B (en) * | 2020-11-11 | 2021-09-17 | 苏州知云创宇信息科技有限公司 | Image data processing method and system based on cloud computing service |
CN114830168B (en) * | 2020-11-16 | 2024-07-23 | 京东方科技集团股份有限公司 | Image reconstruction method, electronic device, and computer-readable storage medium |
CN113903432B (en) * | 2020-11-18 | 2024-08-27 | 苏州律点信息科技有限公司 | Image resolution improving method and device, electronic equipment and storage medium |
WO2022106302A1 (en) | 2020-11-20 | 2022-05-27 | Bayer Aktiengesellschaft | Representation learning |
AU2020281143B1 (en) * | 2020-12-04 | 2021-03-25 | Commonwealth Scientific And Industrial Research Organisation | Creating super-resolution images |
WO2022120739A1 (en) * | 2020-12-10 | 2022-06-16 | 深圳先进技术研究院 | Medical image segmentation method and apparatus based on convolutional neural network |
CN112465060A (en) * | 2020-12-10 | 2021-03-09 | 平安科技(深圳)有限公司 | Method and device for detecting target object in image, electronic equipment and readable storage medium |
CN112541900B (en) * | 2020-12-15 | 2024-01-02 | 平安科技(深圳)有限公司 | Detection method and device based on convolutional neural network, computer equipment and storage medium |
CN112633348B (en) * | 2020-12-17 | 2022-03-15 | 首都医科大学附属北京天坛医院 | Method and device for detecting cerebral arteriovenous malformation and judging dispersion property of cerebral arteriovenous malformation |
CN112651987B (en) * | 2020-12-30 | 2024-06-18 | 内蒙古自治区农牧业科学院 | Method and system for calculating coverage of grasslands of sample side |
CN112767280B (en) * | 2021-02-01 | 2022-06-14 | 福州大学 | Single image raindrop removing method based on loop iteration mechanism |
CN112766229B (en) * | 2021-02-08 | 2022-09-27 | 南京林业大学 | Human face point cloud image intelligent identification system and method based on attention mechanism |
US11562184B2 (en) | 2021-02-22 | 2023-01-24 | Raytheon Company | Image-based vehicle classification |
CN115115567A (en) * | 2021-03-22 | 2022-09-27 | 腾讯云计算(北京)有限责任公司 | Image processing method, image processing device, computer equipment and medium |
CN113160188B (en) * | 2021-04-27 | 2022-07-05 | 福州大学 | Robust blood cell detection method based on circular features |
CN113192035A (en) * | 2021-04-30 | 2021-07-30 | 哈尔滨理工大学 | Improved mammary gland MRI segmentation method based on U-Net network |
CN113344043B (en) * | 2021-05-21 | 2024-05-28 | 北京工业大学 | River turbidity monitoring method based on self-organizing multichannel deep learning network |
CN113269788B (en) * | 2021-05-21 | 2024-03-29 | 东南大学 | Guide wire segmentation method based on depth segmentation network and shortest path algorithm under X-ray perspective image |
CN113343861B (en) * | 2021-06-11 | 2023-09-05 | 浙江大学 | Remote sensing image water body region extraction method based on neural network model |
CN113420770B (en) * | 2021-06-21 | 2024-06-21 | 梅卡曼德(北京)机器人科技有限公司 | Image data processing method, device, electronic equipment and storage medium |
CN113538348B (en) * | 2021-06-29 | 2024-03-26 | 沈阳东软智能医疗科技研究院有限公司 | Processing method of craniocerebral magnetic resonance diffusion weighted image and related products |
US20230034782A1 (en) * | 2021-07-29 | 2023-02-02 | GE Precision Healthcare LLC | Learning-based clean data selection |
CN113601306B (en) * | 2021-08-04 | 2022-07-08 | 上海电器科学研究所(集团)有限公司 | Charging facility box body weld joint polishing method based on one-dimensional segmentation network |
CN113643136B (en) * | 2021-09-01 | 2024-06-18 | 京东科技信息技术有限公司 | Information processing method, system and device |
CN113763251B (en) * | 2021-09-14 | 2023-06-16 | 浙江师范大学 | Image super-resolution amplification model and method thereof |
CN113807332A (en) * | 2021-11-19 | 2021-12-17 | 珠海亿智电子科技有限公司 | Mask robust face recognition network, method, electronic device and storage medium |
CN114241422A (en) * | 2021-12-23 | 2022-03-25 | 长春大学 | Student classroom behavior detection method based on ESRGAN and improved YOLOv5s |
CN114387436B (en) * | 2021-12-28 | 2022-10-25 | 北京安德医智科技有限公司 | Wall coronary artery detection method and device, electronic device and storage medium |
CN114782676B (en) * | 2022-04-02 | 2023-01-06 | 北京广播电视台 | Method and system for extracting region of interest of video |
CN114897922B (en) * | 2022-04-03 | 2024-04-26 | 西北工业大学 | Tissue pathology image segmentation method based on deep reinforcement learning |
CN114678121B (en) * | 2022-05-30 | 2022-09-09 | 上海芯超生物科技有限公司 | Method and system for constructing HP spherical deformation diagnosis model |
CN117529753A (en) * | 2022-05-31 | 2024-02-06 | 北京小米移动软件有限公司 | Training method of image segmentation model, image segmentation method and device |
CN115589377A (en) * | 2022-08-31 | 2023-01-10 | 中国人民解放军陆军工程大学 | Unbalanced protocol identification method based on residual U-Net network |
CN115578370B (en) * | 2022-10-28 | 2023-05-09 | 深圳市铱硙医疗科技有限公司 | Brain image-based metabolic region abnormality detection method and device |
CN115439857B (en) * | 2022-11-03 | 2023-03-24 | 武昌理工学院 | Inclined character recognition method based on complex background image |
CN116258672B (en) * | 2022-12-26 | 2023-11-17 | 浙江大学 | Medical image segmentation method, system, storage medium and electronic equipment |
CN117556208B (en) * | 2023-11-20 | 2024-05-14 | 中国地质大学(武汉) | Intelligent convolution universal network prediction method, equipment and medium for multi-mode data |
CN117635639B (en) * | 2023-11-29 | 2024-08-16 | 陕西中科通大生命科学技术有限公司 | Neural network-based system and method for segmenting abnormal focus of female reproductive system |
CN117936105B (en) * | 2024-03-25 | 2024-06-18 | 杭州安鸿科技股份有限公司 | Multimode melanoma immunotherapy prediction method based on deep learning network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170200067A1 (en) * | 2016-01-08 | 2017-07-13 | Siemens Healthcare Gmbh | Deep Image-to-Image Network Learning for Medical Image Analysis |
US20180161986A1 (en) * | 2016-12-12 | 2018-06-14 | The Charles Stark Draper Laboratory, Inc. | System and method for semantic simultaneous localization and mapping of static and dynamic objects |
US20180253622A1 (en) * | 2017-03-06 | 2018-09-06 | Honda Motor Co., Ltd. | Systems for performing semantic segmentation and methods thereof |
US20180285700A1 (en) * | 2016-09-27 | 2018-10-04 | Facebook, Inc. | Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks |
US10140544B1 (en) * | 2018-04-02 | 2018-11-27 | 12 Sigma Technologies | Enhanced convolutional neural network for image segmentation |
US20190042888A1 (en) * | 2017-08-02 | 2019-02-07 | Preferred Networks, Inc. | Training method, training apparatus, region classifier, and non-transitory computer readable medium |
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
US10387740B2 (en) * | 2016-10-10 | 2019-08-20 | Gyrfalcon Technology Inc. | Object detection and recognition apparatus based on CNN based integrated circuits |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8521664B1 (en) * | 2010-05-14 | 2013-08-27 | Google Inc. | Predictive analytical model matching |
CN108603922A (en) * | 2015-11-29 | 2018-09-28 | 阿特瑞斯公司 | Automatic cardiac volume is divided |
US9916522B2 (en) * | 2016-03-11 | 2018-03-13 | Kabushiki Kaisha Toshiba | Training constrained deconvolutional networks for road scene semantic segmentation |
US9589374B1 (en) * | 2016-08-01 | 2017-03-07 | 12 Sigma Technologies | Computer-aided diagnosis system for medical images using deep convolutional neural networks |
-
2018
- 2018-08-17 US US16/104,449 patent/US10304193B1/en not_active Expired - Fee Related
-
2019
- 2019-04-10 US US16/380,670 patent/US20200058126A1/en not_active Abandoned
- 2019-07-31 WO PCT/US2019/044303 patent/WO2020036734A2/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170200067A1 (en) * | 2016-01-08 | 2017-07-13 | Siemens Healthcare Gmbh | Deep Image-to-Image Network Learning for Medical Image Analysis |
US20180285700A1 (en) * | 2016-09-27 | 2018-10-04 | Facebook, Inc. | Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks |
US10387740B2 (en) * | 2016-10-10 | 2019-08-20 | Gyrfalcon Technology Inc. | Object detection and recognition apparatus based on CNN based integrated circuits |
US20180161986A1 (en) * | 2016-12-12 | 2018-06-14 | The Charles Stark Draper Laboratory, Inc. | System and method for semantic simultaneous localization and mapping of static and dynamic objects |
US20180253622A1 (en) * | 2017-03-06 | 2018-09-06 | Honda Motor Co., Ltd. | Systems for performing semantic segmentation and methods thereof |
US20190042888A1 (en) * | 2017-08-02 | 2019-02-07 | Preferred Networks, Inc. | Training method, training apparatus, region classifier, and non-transitory computer readable medium |
US10140544B1 (en) * | 2018-04-02 | 2018-11-27 | 12 Sigma Technologies | Enhanced convolutional neural network for image segmentation |
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11922629B2 (en) * | 2018-12-21 | 2024-03-05 | Nova Scotia Health Authority | Systems and methods for generating cancer prediction maps from multiparametric magnetic resonance images using deep learning |
US20220076422A1 (en) * | 2018-12-21 | 2022-03-10 | Nova Scotia Health Authority | Systems and methods for generating cancer prediction maps from multiparametric magnetic resonance images using deep |
US11348233B2 (en) * | 2018-12-28 | 2022-05-31 | Shanghai United Imaging Intelligence Co., Ltd. | Systems and methods for image processing |
US11735316B2 (en) * | 2019-05-16 | 2023-08-22 | Beijing Boe Technology Development Co., Ltd. | Method and apparatus of labeling target in image, and computer recording medium |
US20210118137A1 (en) * | 2019-05-16 | 2021-04-22 | Beijing Boe Technology Development Co., Ltd. | Method and apparatus of labeling target in image, and computer recording medium |
US11328430B2 (en) * | 2019-05-28 | 2022-05-10 | Arizona Board Of Regents On Behalf Of Arizona State University | Methods, systems, and media for segmenting images |
US11232572B2 (en) * | 2019-08-20 | 2022-01-25 | Merck Sharp & Dohme Corp. | Progressively-trained scale-invariant and boundary-aware deep neural network for the automatic 3D segmentation of lung lesions |
US11776130B2 (en) * | 2019-08-20 | 2023-10-03 | Merck Sharp & Dohme Llc | Progressively-trained scale-invariant and boundary-aware deep neural network for the automatic 3D segmentation of lung lesions |
US20220138954A1 (en) * | 2019-08-20 | 2022-05-05 | Merck Sharp & Dohme Corp. | Progressively-trained scale-invariant and boundary-aware deep neural network for the automatic 3d segmentation of lung lesions |
US11462032B2 (en) | 2019-09-23 | 2022-10-04 | Proscia Inc. | Stain normalization for automated whole-slide image classification |
US11423678B2 (en) | 2019-09-23 | 2022-08-23 | Proscia Inc. | Automated whole-slide image classification using deep learning |
US20220354466A1 (en) * | 2019-09-27 | 2022-11-10 | Google Llc | Automated Maternal and Prenatal Health Diagnostics from Ultrasound Blind Sweep Video Sequences |
US20210098127A1 (en) * | 2019-09-30 | 2021-04-01 | GE Precision Healthcare LLC | Medical imaging stroke model |
US11545266B2 (en) * | 2019-09-30 | 2023-01-03 | GE Precision Healthcare LLC | Medical imaging stroke model |
US12051199B2 (en) * | 2019-11-12 | 2024-07-30 | Tencent Technology (Shenzhen) Company Limited | Image processing method and apparatus, server, medical image processing device and storage medium |
US20220051405A1 (en) * | 2019-11-12 | 2022-02-17 | Tencent Technology (Shenzhen) Company Limited | Image processing method and apparatus, server, medical image processing device and storage medium |
WO2021168745A1 (en) * | 2020-02-24 | 2021-09-02 | 深圳先进技术研究院 | Method and apparatus for training magnetic resonance imaging model |
WO2021183684A1 (en) * | 2020-03-10 | 2021-09-16 | AI:ON Innovations, Inc. | System and methods for mammalian transfer learning |
US12094605B2 (en) | 2020-03-10 | 2024-09-17 | AI:ON Innovations, Inc. | System and methods for mammalian transfer learning |
US11705245B2 (en) | 2020-03-10 | 2023-07-18 | AI:ON Innovations, Inc. | System and methods for mammalian transfer learning |
US11354918B2 (en) * | 2020-03-12 | 2022-06-07 | Korea Advanced Institute Of Science And Technology | Electronic device for recognizing visual stimulus based on spontaneous selective neural response of deep artificial neural network and operating method thereof |
CN111415356A (en) * | 2020-03-17 | 2020-07-14 | 北京推想科技有限公司 | Pneumonia symptom segmentation method, pneumonia symptom segmentation device, pneumonia symptom segmentation medium and electronic equipment |
JP2023519658A (en) * | 2020-03-31 | 2023-05-12 | アリババ・グループ・ホールディング・リミテッド | Data processing method, means and system |
WO2021202204A1 (en) * | 2020-03-31 | 2021-10-07 | Alibaba Group Holding Limited | Data processing method, means and system |
US11941802B2 (en) | 2020-03-31 | 2024-03-26 | Alibaba Group Holding Limited | Data processing method, means and system |
JP7564229B2 (en) | 2020-03-31 | 2024-10-08 | アリババ・グループ・ホールディング・リミテッド | Data processing method, means, and system |
US11562491B2 (en) * | 2020-04-09 | 2023-01-24 | Zhejiang Lab | Automatic pancreas CT segmentation method based on a saliency-aware densely connected dilated convolutional neural network |
US20220092789A1 (en) * | 2020-04-09 | 2022-03-24 | Zhejiang Lab | Automatic pancreas ct segmentation method based on a saliency-aware densely connected dilated convolutional neural network |
WO2021236939A1 (en) * | 2020-05-22 | 2021-11-25 | Alibaba Group Holding Limited | Recognition method, apparatus, and device, and storage medium |
US11907838B2 (en) | 2020-05-22 | 2024-02-20 | Alibaba Group Holding Limited | Recognition method, apparatus, and device, and storage medium |
US20210383533A1 (en) * | 2020-06-03 | 2021-12-09 | Nvidia Corporation | Machine-learning-based object detection system |
CN111818298A (en) * | 2020-06-08 | 2020-10-23 | 北京航空航天大学 | High-definition video monitoring system and method based on light field |
WO2022017025A1 (en) * | 2020-07-23 | 2022-01-27 | Oppo广东移动通信有限公司 | Image processing method and apparatus, storage medium, and electronic device |
US11455706B2 (en) | 2020-09-15 | 2022-09-27 | Samsung Electronics Co., Ltd. | Electronic apparatus, control method thereof and electronic system |
WO2022059920A1 (en) * | 2020-09-15 | 2022-03-24 | 삼성전자주식회사 | Electronic device, control method thereof, and electronic system |
WO2022066725A1 (en) * | 2020-09-23 | 2022-03-31 | Proscia Inc. | Training end-to-end weakly supervised networks in a multi-task fashion at the specimen (supra-image) level |
US11861881B2 (en) | 2020-09-23 | 2024-01-02 | Proscia Inc. | Critical component detection using deep learning and attention |
CN112116605A (en) * | 2020-09-29 | 2020-12-22 | 西北工业大学深圳研究院 | Pancreas CT image segmentation method based on integrated depth convolution neural network |
CN112446859A (en) * | 2020-11-18 | 2021-03-05 | 中国科学院上海技术物理研究所 | Satellite-borne thermal infrared camera image cloud detection method based on deep learning |
WO2022116163A1 (en) * | 2020-12-04 | 2022-06-09 | 深圳市优必选科技股份有限公司 | Portrait segmentation method, robot, and storage medium |
US20220383045A1 (en) * | 2021-05-25 | 2022-12-01 | International Business Machines Corporation | Generating pseudo lesion masks from bounding box annotations |
JP7395668B2 (en) | 2021-07-21 | 2023-12-11 | ジーイー・プレシジョン・ヘルスケア・エルエルシー | System and method for high speed mammography data handling |
US11954853B2 (en) | 2021-07-21 | 2024-04-09 | GE Precision Healthcare LLC | Systems and methods for fast mammography data handling |
JP2023016717A (en) * | 2021-07-21 | 2023-02-02 | ジーイー・プレシジョン・ヘルスケア・エルエルシー | Systems and methods for fast mammography data handling |
US20220319141A1 (en) * | 2021-09-15 | 2022-10-06 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for processing image, device and storage medium |
WO2023081978A1 (en) * | 2021-11-12 | 2023-05-19 | OMC International Pty Ltd | Systems and methods for draft calculation |
US20230154610A1 (en) * | 2021-11-17 | 2023-05-18 | City University Of Hong Kong | Task interaction netwrok for prostate cancer diagnosis |
US11961618B2 (en) * | 2021-11-17 | 2024-04-16 | City University Of Hong Kong | Task interaction network for prostate cancer diagnosis |
WO2023117953A1 (en) | 2021-12-21 | 2023-06-29 | Koninklijke Philips N.V. | Network architecture for 3d image processing |
EP4202825A1 (en) * | 2021-12-21 | 2023-06-28 | Koninklijke Philips N.V. | Network architecture for 3d image processing |
GB2621332A (en) * | 2022-08-08 | 2024-02-14 | Twinn Health Ltd | A method and an artificial intelligence system for assessing an MRI image |
GB2621332B (en) * | 2022-08-08 | 2024-09-11 | Twinn Health Ltd | A method and an artificial intelligence system for assessing an MRI image |
Also Published As
Publication number | Publication date |
---|---|
WO2020036734A3 (en) | 2020-07-23 |
US10304193B1 (en) | 2019-05-28 |
WO2020036734A2 (en) | 2020-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10304193B1 (en) | Image segmentation and object detection using fully convolutional neural network | |
US10140544B1 (en) | Enhanced convolutional neural network for image segmentation | |
CN108776969B (en) | Breast ultrasound image tumor segmentation method based on full convolution network | |
US11508146B2 (en) | Convolutional neural network processing method and apparatus | |
Hu et al. | Automatic tumor segmentation in breast ultrasound images using a dilated fully convolutional network combined with an active contour model | |
Al-Masni et al. | Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks | |
US20210233244A1 (en) | System and method for image segmentation using a joint deep learning model | |
KR20230059799A (en) | A Connected Machine Learning Model Using Collaborative Training for Lesion Detection | |
US20210012504A1 (en) | Encoder Regularization of a Segmentation Model | |
KR20190042429A (en) | Method for image processing | |
CN115239716B (en) | Medical image segmentation method based on shape prior U-Net | |
Khademi et al. | Spatio-temporal hybrid fusion of cae and swin transformers for lung cancer malignancy prediction | |
Chutia et al. | Classification of lung diseases using an attention-based modified DenseNet model | |
US20230342933A1 (en) | Representation Learning for Organs at Risk and Gross Tumor Volumes for Treatment Response Prediction | |
US12106549B2 (en) | Self-supervised learning for artificial intelligence-based systems for medical imaging analysis | |
CN113379770B (en) | Construction method of nasopharyngeal carcinoma MR image segmentation network, image segmentation method and device | |
CN113658119B (en) | Human brain injury detection method and device based on VAE | |
WO2023281317A1 (en) | Method and system for analyzing magnetic resonance images | |
CN114586065A (en) | Method and system for segmenting images | |
KR20220023841A (en) | Magnetic resonance image analysis system and method for alzheimer's disease classification | |
JP2021117964A (en) | Medical system and medical information processing method | |
Wang et al. | Effect of data augmentation of renal lesion image by nine-layer convolutional neural network in kidney CT | |
Jayasekara Pathiranage | Convolutional neural networks for predicting skin lesions of melanoma | |
US20230206438A1 (en) | Multi arm machine learning models with attention for lesion segmentation | |
US20230267607A1 (en) | Hybrid convolutional wavelet networks for predicting treatment response via radiological images of bowel disease |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |