US20220005192A1 - Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network - Google Patents
Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network Download PDFInfo
- Publication number
- US20220005192A1 US20220005192A1 US17/480,520 US202117480520A US2022005192A1 US 20220005192 A1 US20220005192 A1 US 20220005192A1 US 202117480520 A US202117480520 A US 202117480520A US 2022005192 A1 US2022005192 A1 US 2022005192A1
- Authority
- US
- United States
- Prior art keywords
- subject
- image
- learning model
- medical condition
- convrnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0033—Features or image-related aspects of imaging apparatus classified in A61B5/00, e.g. for MRI, optical tomography or impedance tomography apparatus; arrangements of imaging apparatus in a room
- A61B5/004—Features or image-related aspects of imaging apparatus classified in A61B5/00, e.g. for MRI, optical tomography or impedance tomography apparatus; arrangements of imaging apparatus in a room adapted for image acquisition of a particular organ or body part
- A61B5/0042—Features or image-related aspects of imaging apparatus classified in A61B5/00, e.g. for MRI, optical tomography or impedance tomography apparatus; arrangements of imaging apparatus in a room adapted for image acquisition of a particular organ or body part for the brain
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
- A61B5/02042—Determining blood loss or bleeding, e.g. during a surgical procedure
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7282—Event detection, e.g. detecting unique waveforms indicative of a medical condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G06K9/46—
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G06N3/0445—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/0007—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/003—Reconstruction from projections, e.g. tomography
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B2505/00—Evaluating, monitoring or diagnosing in the context of a particular type of medical care
- A61B2505/01—Emergency care
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B2576/00—Medical imaging apparatus involving image processing or analysis
- A61B2576/02—Medical imaging apparatus involving image processing or analysis specially adapted for a particular organ or body part
- A61B2576/026—Medical imaging apparatus involving image processing or analysis specially adapted for a particular organ or body part for the brain
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/04—Indexing scheme for image data processing or generation, in general involving 3D image data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30016—Brain
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30101—Blood vessel; Artery; Vein; Vascular
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
- G06V2201/031—Recognition of patterns in medical or anatomical images of internal organs
Definitions
- the present disclosure relates to systems and methods for intracerebral hemorrhage detection, and more particularly to, method and system for end-to-end intracerebral hemorrhage detection, segmentation and volume estimation based on a multi-task fully convolutional network.
- ICH Intracerebral hemorrhage
- the fatality rate can be up to 40% within one month from ICH presence, and up to 54% by the end of the first year. Only 12% to 39% survivors can be long-term functionally independent after an intervention. Therefore, fast and accurate ICH diagnosis is essential for improving treatment outcomes.
- ICH diagnosis mainly consists of two closely related tasks: 1) ICH detection with its subtype classification, and 2) ICH bleeding region segmentation with volume estimation.
- Existing ICH analysis approaches usually treat the two tasks independently by building two learning models and processing the images separately, though the two tasks may use a same set of input head scan image slices.
- a 2D convolutional neural networks (CNN) is usually applied to each slice of a head CT scan to generate a slice-level probability score, with post-processing of the slice-level probabilities to obtain a subject-level ICH prediction.
- CNN convolutional neural networks
- 3D CNN may be used to analyze a whole head CT scan to obtain a subject-level result directly.
- U-Net is usually applied to segment each scan slice (e.g. 2D U-Net) or the entire head CT (e.g. 3D U-Net).
- ICH detection and segmentation tasks share a same set of input data, features are extracted in two separate models and not shared between the two tasks. It may decrease the algorithm performance and increase computation time in both training and prediction stages.
- Embodiments of the disclosure address the above problems by providing method and system for end-to-end intracerebral hemorrhage detection, segmentation and volume estimation based on a multi-task fully convolutional network.
- a novel deep learning-based architecture is disclosed to handle the challenging automatic intracerebral hemorrhage (ICH) detection and segmentation tasks.
- ICH intracerebral hemorrhage
- inventions of the disclosure provide a system for detecting an intracerebral hemorrhage.
- the system includes a communication interface configured to receive a sequence of image slices and an end-to-end multi-task learning model.
- the sequence of image slices is the head scan images of a subject and acquired by an image acquisition device.
- the end-to-end multi-task learning model includes an encoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a decoder, and a classifier.
- ConvRNN Convolutional Recurrent Neural Network
- the system further includes at least one processor configured to extract feature maps from each image slice using the encoder, capture contextual information between adjacent image slices using the bi-directional ConvRNN, and detect the ICH of the subject using the classifier based on the extracted feature maps of the image slices and the contextual information or segment each image slice using the decoder to obtain an ICH region based on the extracted feature maps of the image slice.
- inventions of the disclosure also provide a method for detecting an intracerebral hemorrhage (ICH).
- the method includes receiving a sequence of image slices and an end-to-end multi-task learning model.
- the sequence of image slices is head scan images of a subject and acquired by an image acquisition device.
- the end-to-end multi-task learning model includes an encoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a decoder, and a classifier.
- the method also includes extracting, by at least one processor, feature maps from each image slice using the encoder.
- the method further includes capturing, by at least one processor, contextual information between adjacent image slices using the bi-directional ConvRNN.
- the method also includes detecting, by at least one processor, the ICH of the subject using the classifier based on the extracted feature maps of the image slices and the contextual information or segmenting, by the at least one processor, each image slice using the decoder to obtain an ICH region based on the extracted feature maps of the image slice.
- embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one processor, causes the at least one processor to perform a method for detecting an intracerebral hemorrhage (ICH).
- the method includes receiving a sequence of image slices and an end-to-end multi-task learning model.
- the sequence of image slices is head scan images of a subject and acquired by an image acquisition device.
- the end-to-end multi-task learning model includes an encoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a decoder, and a classifier.
- the method also includes extracting, by at least one processor, feature maps from each image slice using the encoder.
- the method further includes capturing, by at least one processor, contextual information between adjacent image slices using the bi-directional ConvRNN.
- the method also includes detecting, by at least one processor, the ICH of the subject using the classifier based on the extracted feature maps of the image slices and the contextual information or segmenting each image slice using the decoder to obtain an ICH region based on the extracted feature maps of the image slice.
- FIG. 1 illustrates a schematic diagram of an exemplary ICH detection system, according to embodiments of the disclosure.
- FIG. 2 illustrates an exemplary end-to-end multi-task learning model, according to embodiments of the disclosure.
- FIG. 3 illustrates a block diagram of an exemplary image processing device, according to embodiments of the disclosure
- FIG. 4 is a flow diagram of an exemplary method for slice-level ICH detection and segmentation, according to embodiments of the disclosure.
- FIG. 5 is a flow diagram of an exemplary method for subject-level ICH detection, segmentation and volume estimation, according to embodiments of the disclosure.
- FIG. 6 is a flow diagram of an exemplary attention module, according to embodiments of the disclosure.
- FIG. 7 is a flowchart of an exemplary method for training an end-to-end multi-task learning model, according to embodiments of the disclosure.
- FIG. 8 is a flowchart of an exemplary method for ICH detection and segmentation using an end-to-end multi-task learning model, according to embodiments of the disclosure.
- this end-to-end multi-task learning model may include the following modules: an encoder, a decoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a classification, and an optional attention module.
- the encoder module extracts task-relevant features from the scan slices, which is used by the ConvRNN module and the decoder module as an input.
- the decoder module combines features of different granularities to generate segmentation results and estimation of bleeding volume as the output of the end-to-end multi-task learning model.
- the ConvRNN module captures context information between adjacent slides without losing spatial information encoded in the slice-level feature maps.
- the classification module then utilizes the output feature maps from the ConvRNN module to generate slice-level and subject-level classification results.
- the end-to-end multi-task learning model may optionally include an attention model to magnify signals from salient features and filter out task-irrelevant features.
- the learning model can perform ICH detection and segmentation tasks simultaneously. This enables information sharing and complementation between the two different but closely related tasks. Modules of the learning model are jointly optimized, which can preserve the overall performance of the two tasks while reducing time consumption in both training and prediction stages.
- the model can seamlessly combine the advantages of Convolutional Neural Network (CNN) on image feature extraction and that of ConvRNN on sequential learning of spatial information. Model performance can be further improved by using the combined 2D and 3D information extracted from slice-level and subject-level images.
- the model can be flexible on the type of ICH classification labels it predicts and supports different training scenarios.
- FIG. 1 illustrates an exemplary ICH detection system 100 , according to some embodiments of the present disclosure.
- ICH detection system 100 is configured to classify and segment biomedical images acquired by an image acquisition device 105 .
- the biomedical images may be head scan image that can be used to diagnose ICH.
- image acquisition device 105 may be using one or more imaging modalities that are suitable for head scans, including, e.g., Magnetic Resonance Imaging (MRI), Computed Tomography (CT), functional fMRI, DCE-MRI and diffusion MRI), Cone Beam CT (CBCT), Positron Emission Tomography (PET), Single-Photon Emission Computed Tomography (SPECT), etc.
- the biomedical image captured may be two dimensional (2D) or three dimensional (3D) images.
- a 3D image may contain multiple 2D image slices.
- ICH detection system 100 may include components for performing two phases, a training phase and a prediction phase.
- ICH detection system 100 may include a training database 101 and a model training device 102 .
- ICH detection system 100 may include an image processing device 103 and a biomedical image database 104 .
- ICH detection system 100 may include more or less of the components shown in FIG. 1 .
- ICH detection system 100 may include only image processing device 103 and head scan image database 104 .
- ICH detection system 100 may optionally include a network 106 to facilitate the communication among the various components of ICH detection system 100 , such as databases 101 and 104 , devices 102 , 103 , and 105 .
- network 106 may be a local area network (LAN), a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service), a client-server, a wide area network (WAN); etc.
- LAN local area network
- cloud computing environment e.g., software as a service, platform as a service, infrastructure as a service
- client-server e.g., a client-server
- WAN wide area network
- ICH detection system 100 may be remote from each other or in different locations, and be connected through network 106 as shown in FIG. 1 .
- certain components of ICH detection system 100 may be located on the same site or inside one device.
- training database 101 may be located on-site with or be part of model training device 102 .
- model training device 102 and image processing device 103 may be inside the same computer or processing device.
- Model training device 102 may use the training data received from training database 101 to train a learning model for classifying and segmenting the sequence of image slices received from, e.g., head scan image database 104 . As shown in FIG. 1 , model training device 102 may communicate with training database 101 to receive one or more sets of training data. Each set of training data may include head scan images (2D or 3D)s from a subject and the corresponding ground truth ICH labels and segmentation masks that provides the classification and segmentation result to each image.
- head scan images (2D or 3D)s from a subject and the corresponding ground truth ICH labels and segmentation masks that provides the classification and segmentation result to each image.
- ICH can be further categorized into 5 subtypes: epidural hemorrhage (EDH), subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), cerebral parenchymal hemorrhage (CPH) and intraventricular hemorrhage (IVH).
- EH epidural hemorrhage
- SDH subdural hemorrhage
- SAH subarachnoid hemorrhage
- CPH cerebral parenchymal hemorrhage
- IVH intraventricular hemorrhage
- ICH subtype labels on slice-level or subject-level may be included in training data.
- the ground truth data may further include a set of segmentation masks of the bleeding region.
- the training images are previously segmented or annotated by expert operators with each pixel/voxel classified and labeled, e.g., with value 1 if the pixel/voxel indicates a bleeding or value 0 if otherwise.
- the ground truth data may be probability maps where each pixel/voxel is associated with a probability value indicating how likely the pixel/voxel indicate a bleeding.
- the aim of the training phase is to learn a mapping between a sequence of head scan image slices and the ground truth labels with classification and segmentation masks by finding the best fit between predictions and ground truth values over the sets of training data.
- the training phase may be performed “online” or “offline.”
- An “online” training refers to performing the training phase contemporarily with the prediction phase, e.g., learning the model in real-time just prior to classifying and segmenting a biomedical image.
- An “online” training may have the benefit to obtain a most updated learning model based on the training data that is then available.
- an “online” training may be computational costive to perform and may not always be possible if the training data is large and/or the model is complicate.
- an “offline” training is used where the training phase is performed separately from the prediction phase. The learned model trained offline is saved and reused for classifying and segmenting images.
- Model training device 102 may be implemented with hardware specially programmed by software that performs the training process.
- model training device 102 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection with FIG. 3 ).
- the processor may conduct the training by performing instructions of a training process stored in the computer-readable medium.
- Model training device 102 may additionally include input and output interfaces to communicate with training database 101 , network 106 , and/or a user interface (not shown).
- the user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a framework of the learning model, and/or manually or semi-automatically providing prediction results associated with a sequence of images for training.
- the end-to-end multi-task learning model may be a fully convolutional network (FCN) that include an encoder module, a decoder module, a bi-directional ConvRNN module, a classification module and an optional attention module (discussed in detail in connection with FIG. 2 ).
- FCN fully convolutional network
- the model may include more or less modules and the structure of the model is not limited to what is disclosed as long as the model performs multiple tasks simultaneously and utilizes contextual information between adjacent image slices.
- the trained learning model may be used by image processing device 103 to detect ICH in new head scan images (images that are not associated with ground-truth detection results).
- Image processing device 103 may receive a trained learning model, e.g., end-to-end multi-task learning model 200 , from model training device 102 .
- Image processing device 103 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection with FIG. 3 ).
- the processor may perform instructions of a sequence of image slices classification and segmentation process stored in the medium.
- Image processing device 103 may additionally include input and output interfaces (discussed in detail in connection with FIG. 3 ) to communicate with head scan image database 104 , network 106 , and/or a user interface (not shown).
- the user interface may be used for selecting one or more head scan images of a subject for classification and segmentation, initiating the classification and segmentation process, displaying the head scan image and/or the classification and segmentation results.
- Types of ICH detection results that the learning model may provide depend on the types of ground truth ICH labels used in a model training.
- the segmentation mask can include but not limited to the following examples: 1) binary ICH masks; 2) detailed ICH subtype masks; 3) ICH or subtype masks together with other desired labels such as skulls, normal brain tissues and regions outside the brain.
- Image processing device 103 may communicate with head scan image database 104 to receive one or more head scan images.
- the images stored in head scan image database 104 may include 2D image slices from a 3D scan.
- the images may be acquired by image acquisition devices 105 .
- Image processing device 103 uses the trained learning model received from model training device 102 to perform one or more of: (1) predict whether ICH exists, (2) predict the subtype of ICH, and (3) determine segmentation masks of the image optionally with an estimated bleeding volume.
- FIG. 2 illustrates an exemplary end-to-end multi-task learning model 200 (hereafter, “learning model 200 ”), according to embodiments of the disclosure.
- learning model 200 can be a Fully Convolutional Network (FCN).
- FCN Fully Convolutional Network
- learning model 200 may include an encoder module 202 , a decoder module 204 , a ConvRNN module 206 , a classification module 208 , and optionally an attention module 210 .
- encoder module 202 may include a sequence of convolution/pooling layers to extract task-relevant features from the image slices, e.g., head CT scan slices.
- Decoder module 204 is used to combine feature maps of different granularities to generate segmentation masks.
- Bi-directional ConvRNN module 206 is used to utilize the contextual information between adjacent slices to improve the slice-level feature maps.
- Classification module 208 utilizes output feature maps from ConvRNN module 206 to generate slice-level or subject-level ICH predictions.
- Attention module 210 is an optional component to magnify signals from salient features and filter out task-irrelevant features.
- encoder module 202 may be in any suitable convolutional neutral network (CNN) architecture, including but not limited to the CNN component of commonly used image classification architectures such as VGG, ResNet, and DenseNet.
- CNN convolutional neutral network
- FIG. 2 encoder module 202 employs a VGG architectures as an example to illustrate a feature map extraction procedure.
- convolutional layers use multiple 3 ⁇ 3 kernel-sized filters and pooling layers use 2 ⁇ 2 size filters. It is contemplated that classification architectures other than VGG can be used.
- ConvRNN module 206 may be used to learn contextual information between adjacent image slices across axial axis and enhance the quality of feature maps generated from encoder module 202 .
- ConvRNN module 206 may be implemented by either bidirectional Convolutional Long Short-Term Memory (ConvLSTM) or bidirectional Convolutional Gated Recurrent Unit (ConvGRU).
- ConvLSTM bidirectional Convolutional Long Short-Term Memory
- ConvGRU bidirectional Convolutional Gated Recurrent Unit
- “Adjacent image slices” refer to every two image slices immediately next to each other in the sequence of image slices. Most of the structural information captured by the adjacent image slices can be common, while the differences between the image slices could provide valuable information of bleeding or other abrupt changes. Therefore, contextual information captured by ConvRNN module 206 could assist the detection and segmentation of the bleeding regions.
- Feature maps generated from ConvRNN module 206 may be fed to classification module 208 to generate slice-level or subject-level ICH detection results.
- classification module 208 may be composed of convolution and pooling layers. For example, convolutional layers may use multiple 3 ⁇ 3 kernel-sized filters and pooling layers may use 2 ⁇ 2 size filters as shown in FIG. 2 .
- Slice-level ICH detection predicts whether ICH exists on each input image slice (discussed in detail in connection with FIG. 4 ).
- slice-level feature maps are fused to obtain subject-level feature maps using operations e.g., average/max pooling (discussed in detail in connection with FIG. 5 ).
- Feature maps generated from ConvRNN module 206 are also used by decoder module 204 to produce segmentation masks.
- upsampling via deconvolution is used to determine a segmentation map having the same size as the input image slice.
- FIG. 2 illustrates how decoder module 204 generates a segmentation map by fusing features of different granularities. For example, the output from ConvRNN module 206 is first 2 x upsampled and then fused with feature maps from encoder module 202 that has the same granularity. The fusion operation can improve segmentation performance.
- attention module 210 can be used to enhance feature maps before a fusion operation.
- FIG. 6 is a flow diagram of an exemplary attention module 600 , according to embodiments of the disclosure.
- convolutional filter 602 and convolutional filter 603 are first applied to two feature maps, respectively.
- Convolutional filter 602 and convolutional filter 603 can be any suitable filters of any suitable sizes. For example, 3 ⁇ 3 kernel-sized filters are illustrated in FIG. 6 .
- the results are then element-wise summed.
- ReLu activation 604 and Sigmoid activation 605 are sequentially applied to the summed feature map.
- the resulted feature map is downsampled using convolutional filter 606 .
- Sigmoid activation 607 is used to determine attention weights from the downsampled feature map.
- the attention weight map is upsampled, e.g., using 2 ⁇ 2 filters, and then element-wise multiplied with the input feature maps to obtain an enhanced feature map.
- FIG. 6 shows one possible implementation of attention module 210 .
- the module can be implemented using any architectures, operations, and activations (e.g., tan h activation function) as long as it can effectively enhance feature maps.
- model training device 102 may jointly train encoder module 202 , decoder module 204 , ConvRNN module 206 , classification module 208 , and attention module 210 , using the training data from training database 101 .
- the end-to-end multi-task learning model is trained as one model rather than different modules separately.
- the jointly trained learning model can utilize information of the adjacent image slices of the subject to provide a better prediction.
- Training an end-to-end multi-task learning model refers to determining one or more parameters of at least one layer in the network.
- a convolution layer of encoder module 202 or decoder module 204 may include at least one filter or kernel.
- One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process.
- the end-to-end multi-task learning model may be trained using supervised learning, semi-supervised learning, or unsupervised learning.
- a joint loss function may be used by the joint training to account for losses associated with the various tasks performed by learning model 200 .
- the losses of the various tasks may be weighted according their significance or another predetermined priority. For example, returning to FIG. 2 , a loss function may be defined as
- sli-b denotes a slice-level binary classification loss
- sub-b denotes a subject-level binary classification loss
- sli-m denotes a slice-level ICH subtype classification loss
- sub-m denotes a subject-level ICH subtype classification loss.
- sli-b , sub-b , sli-m , and sub-m can be binary cross entropy (BCE) loss or any other suitable types of losses.
- BCE binary cross entropy
- parameters ⁇ sli-m or ⁇ sub-m can be set to 0, seg denotes a segmentation loss, winch can be a soft dice loss, a cross entropy, or any other suitable types of losses.
- v denotes an ICH volume estimation loss and it can be a quadratic loss or any other suitable types of losses.
- ⁇ 1 and ⁇ 2 are L 1 /L 2 norm of all model parameters to control the model complexity.
- ⁇ . ⁇ may be used to control the strength of L 1 /L 2 regularizations.
- the parameters ( ⁇ ., ⁇ .) can be pre-defined or optimized together with other model parameters during training.
- the loss function can be optimized by using neural network optimizers such as Stochastic Gradient Descent (SGD), AdaGrad, or Adam. Backpropagation can be used to compute gradients and optimize parameters.
- neural network optimizers such as Stochastic Gradient Descent (SGD), AdaGrad, or Adam.
- Backpropagation can be used to compute gradients and optimize parameters.
- learning model 200 may be employed to do only ICH detection (using encoder module 202 , ConvRNN module 206 , and classification module 208 ), only segmentation (using encoder module 202 , ConvRNN module 206 , and decoder module 204 ), or ICH detection and segmentation simultaneously (using all the modules of learning model 200 ).
- ICH detection using encoder module 202 , ConvRNN module 206 , and classification module 208
- ICH detection and segmentation simultaneously (using all the modules of learning model 200 ).
- learning model 200 can improve performance and reduce computation time.
- learning model 200 still preserves the flexibility of doing these tasks separately by only including a subset of modules.
- FIG. 3 illustrates an exemplary image processing device 103 , according to some embodiments of the present disclosure.
- image processing device 103 may be a special-purpose computer, or a general-purpose computer.
- image processing device 103 may be a computer custom-built for hospitals to perform image acquisition and image processing tasks, or a server placed in the cloud.
- image processing device 103 may include a communication interface 302 , a storage 304 , a memory 306 , a processor 308 , and a bus 310 .
- Communication interface 302 , storage 304 , memory 306 , and processor 308 are connected with bus 310 and communicate with each other through bus 310 .
- Communication interface 302 may include a network adaptor, a cable connector, a serial connector, a USB connector, a parallel connector, a high-speed data transmission adaptor, such as fiber, USB 3.0, thunderbolt, and the like, a wireless network adaptor, such as a WiFi adaptor, a telecommunication (3G, 4G/LTE and the like) adaptor, etc.
- Image processing device 103 may be connected to other components of ICH detection system 100 and network 106 through communication interface 302 .
- communication interface 302 receives biomedical images (each including a sequence of image slices) from image acquisition device 105 .
- communication interface 302 also receives the trained learning model, e.g., end-to-end multi-task learning model 200 , from modeling training device 102 .
- Storage 304 /memory 306 may be a non-transitory computer-readable medium, such as a read-only memory (ROM), a random access memory (RAM), a phase-change random access memory (PRAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), an electrically erasable programmable read-only memory (EEPROM), other types of random access memories (RAMs), a flash disk or other forms of flash memory, a cache, a register, a static memory, a compact disc read-only memory (CD-ROM), a digital versatile disc (DVD) or other optical storage, a cassette tape or other magnetic storage devices, or any other non-transitory medium that may be used to store information or instructions capable of being accessed by a computer device, etc.
- ROM read-only memory
- RAM random access memory
- PRAM phase-change random access memory
- SRAM static random access memory
- DRAM dynamic random access memory
- EEPROM electrically erasable programmable read-only memory
- storage 304 may store the trained learning model, e.g., end-to-end multi-task learning model 200 , and data, such as feature maps generated while executing the computer programs, etc.
- memory 306 may store computer-executable instructions, such as one or more image processing programs.
- feature maps may be extracted at different granularities from image slices stored in storage 304 . The feature maps may be read from storage 304 one by one or simultaneously and stored in memory 306 .
- Processor 308 may be a processing device that includes one or more general processing devices, such as a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), and the like. More specifically, the processor may be a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor running other instruction sets, or a processor that runs a combination of instruction sets. The processor may also be one or more dedicated processing devices such as application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), system-on-chip (SoCs), and the like. Processor 308 may be communicatively coupled to memory 306 and configured to execute the computer-executable instructions stored thereon.
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- DSPs digital signal processors
- SoCs system-on-chip
- processor 308 is configured to detect an ICH (optionally including its subtype) or segment the images, or both. For example, as shown in FIG. 4 , processor 308 may first, receive an image (including the sequence of image slices) from image acquisition device 105 . Processor 308 then uses the trained learning model, such as end-to-end multi-task learning model 200 , to predict whether ICH exists in the received image and outputs a segmentation mask of the image.
- the trained learning model such as end-to-end multi-task learning model 200
- the trained learning model may include slice-level encoder-decoder-attention modules 402 (e.g., including encoder module 202 , decoder module 204 , and attention module 210 ), a ConvRNN module 404 (e.g., ConvRNN module 206 ), and a slice-level classification module 406 (e.g., classification module 208 ).
- slice-level encoder-decoder-attention modules 402 e.g., including encoder module 202 , decoder module 204 , and attention module 210
- ConvRNN module 404 e.g., ConvRNN module 206
- slice-level classification module 406 e.g., classification module 208
- processor 308 may apply the learning model to the image to perform the slice-level classification (e.g., using the slice-level classification module) and segmentation (e.g., using the decoder module) in parallel.
- processor 308 may execute the encoder module to extract, feature maps from the received image. The feature maps may then be fed to the ConvRNN module and the attention module in parallel.
- the ConvRNN module encodes contextual information into the feature maps, which is provided to the slice-level classification module.
- processor 308 may additionally execute the attention module to enhance the feature maps before they are used for segmentation in the decoder module.
- Processor 308 may execute the slice-level classification module to generate a slice-level classification result. In some embodiments, this classification result can be either an ICH identification result or an ICH subtype label depending on the ground truth label types used in the model training.
- the decoder module produces the segmentation mask using the enhanced maps.
- the decoder module may produce a probability map indicating the probability each pixel in the image slice belongs to a bleeding region.
- Processor 308 may then perform a thresholding to obtain a segmentation mask. For example, processor 308 may set pixels with probabilities above 0.8 as 1 (i.e., belong to a bleeding region) and the remaining pixels as 0 (i.e., not belong to a bleeding region). The threshold may be set by an operator or automatically selected by processor 308 .
- processor 308 may also process a sequence of image slices (e.g., slice 1, slice 2, . . . , slice n of a head scan) of a subject to generate subject-level ICH prediction and segmentation mark according to data diagram 500 as shown in FIG. 5 .
- Each slice may be individually processed by the slice-level encoder-decoder-attention modules in parallel.
- slice is processed by slice-level encoder-decoder-attention modules 510 and slice 2 is processed by slice-level encoder-decoder-attention modules 520 .
- Certain intermediate information generated by the slice-level encoder-decoder-attention modules of every two adjacent image slices may be fed to the ConvRNNs.
- information of slice 1 (generated by modules 510 ) and information of slice 2 (generated by modules 520 ) are fed into ConvRNN 520 .
- the ConvRNNs then extract contextual information between the adjacent slices, which will be used to enhance the features maps.
- feature maps of each image slice generated using the ConvRNNs may be fused to obtain the subject-level feature maps using fusion operations (e.g., average pooling, max pooling, or a combination of the two).
- the subject-level feature maps from some or all ConvRNNs are then fed into a subject-level classification module 530 to obtain a subject-level ICH prediction.
- decoder and attention modules may be used to generate segmentation masks for image slices of the subject in parallel. Voxel size information extracted from the segmentation masks may be used to estimate the bleeding volume.
- An exemplary image classification and segmentation process will be described in connection with FIG. 8 .
- processor 308 may use only some modules of learning model 200 to perform either only ICH detection (e.g., using encoder module 202 , ConvRNN module 206 , and classification module 208 ) or only segmentation (e.g., using encoder module 202 , ConvRNN module 206 , and decoder module 204 ).
- model training device 102 can have same or similar structures as image processing device 103 .
- model training device 102 includes a processor, among other components, configured to jointly train modules in the end-to-end multi-task learning model. An exemplar) network training process will be described in connection with FIG. 7 .
- FIG. 7 is a flow-chart of an exemplary method 700 for training an end-to-end multi-task learning model, according to embodiments of the disclosure.
- method 700 may be implemented by model training device 102 in FIG. 1 .
- method 700 is not limited to that exemplary embodiment.
- Method 700 may include steps S 702 -S 712 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 7 .
- model training device 102 may communicate with training database 101 to receive one or more sets of training data.
- Each set of training data may include a sequence of image slices of a subject (e.g., a patient's head scan images) and its corresponding ground truth ICH labels (optionally also ICH subtype labels) and segmentation masks that provides the segmentation result to each image slice.
- model training device 102 may initialize the parameters of a learning model. Training the learning model is a process of determining one or more parameters of the learning model, in some embodiments, the learning model may include encoder module 202 , decoder module 204 , ConvRNN module 206 , classification module 208 , and attention module 210 . Consistent with the present disclosure, model training device 102 jointly trains the various modules in the learning model, using the training data from training database 101 . That is, the set of parameters of the modules are trained together. The parameters may be initially set to certain values. The initial values may be predetermined, selected by an operator, or decided by model training device 102 based on prior experience of similar image slices. For example, parameters of a learning model previously trained for head scan image slices of patient A may be used as initial values for the parameters of the learning model being trained for head scan image slices of patient B.
- model training device 102 may calculate the value of a joint loss function.
- the joint loss function may be a weighted combination of a slice-level classification loss, a subject-level classification loss, a segmentation loss, and an ICH volume loss.
- binary cross entropy (BCE) loss may be used to represent the slice-level classification loss and the subject-level classification loss.
- soft dice loss or cross entropy may be used to represent the segmentation loss.
- quadratic loss may be used to represent the ICH volume loss.
- the calculated value may be compared with a predetermined threshold.
- the predetermined threshold is also known as the stopping criteria for iterative methods. The smaller it is, the more optimal the parameters, but the longer it takes (i.e., more iterations) for the computation to converge. Therefore, the threshold may be selected to balance the accuracy of the prediction and the computational cost.
- step S 708 If the value is below the predetermined threshold (step S 708 : Yes), the method is considered as have converged, and the joint loss function is minimized.
- step S 710 model training device 102 outputs the learning model with the optimized sets of parameters and method 700 concludes. Otherwise (step S 708 : No), model training device 102 may further adjust the sets of parameters jointly in step S 712 .
- a stochastic gradient descent related method with backpropagation may be used. For example, the parameters may be adjusted with a gradient ⁇ of the loss function L with respect to all parameters over image slices sampled from the training dataset.
- Method 700 may return to step S 706 to calculate value of the loss function based on outputs obtained from the learning model with the adjusted sets of parameters. Each pass of steps S 706 -S 712 is considered as one iteration. Method 700 iterates until the value of the loss function is reduced to below the predetermined threshold (step S 708 ).
- FIG. 8 is a flowchart of an exemplary method 800 for ICH detection and segmentation using an end-to-end multi-task learning model, according to embodiments of the disclosure.
- method 800 may be implemented by image processing device 103 in FIG. 1 using learning model 200 in FIG. 2 .
- method 800 is not limited to that exemplary embodiment.
- Method 800 may include steps S 802 -S 814 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 8 .
- image processing device 103 receives a sequence of image slices, e.g., from head scan image database 104 .
- the image slices may capture an ICH.
- Image processing device 103 may additionally receive a learning model, e.g., end-to-end multi-task learning model 200 .
- the learning model may be trained using method 700 .
- image processing device 103 extracts feature maps from the image slices.
- the individual slices may be processed independently and in parallel.
- the feature maps may be extracted at different granularities by encoder module 202 of learning model 200 .
- image processing device 103 extracts contextual information between every two adjacent image slices among the sequence of image slices. For example, feature maps of adjacent image slices (e.g., slice 1 and slice 2 in data diagram 500 ) are fed into ConvRNN module (e.g., ConvRNN 520 of data diagram 500 ). In some embodiments, the spatial relationship between two feature maps are extracted.
- image processing device 103 may perform ICH prediction on each image slice.
- Image processing device 103 can also perform a subject-level ICH prediction. For example, all slice-level feature maps and contextual information between image slices are used to determine if the subject has an ICH.
- slice-level classification module 406 is used to detect ICH for each image slice and subject-level classification module 530 is used to detect ICH for the subject.
- image processing device 103 can further perform ICH subtype prediction on each image slice if ground truth subtype labels are used to train the learning model.
- Image processing device 103 can additionally perform a subject-level ICH subtype prediction. For example, all slice-level feature maps and contextual information between image slices are used to determine the ICH subtype of the subject.
- image processing device 103 may segment the image slices using a decoder module, e.g., decoder model 204 .
- the decoder module fuses all slice-level feature maps and contextual information between image slices to produce a segmentation mask for each image slice.
- image processing device 103 may segment the image slices using parallel computing.
- the decoder may produce a probability map indicating the probability each pixel in the image slice belongs to the bleeding region. Based on the probability map, image processing device 103 may then perform a thresholding to obtain a segmentation mask.
- image processing device 103 may estimate the bleeding volume of the subject.
- the segmentation masks generated in step S 812 are used to obtain the voxel size information and calculate the bleeding volume.
- the computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices.
- the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed.
- the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Surgery (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Pathology (AREA)
- Veterinary Medicine (AREA)
- Public Health (AREA)
- Animal Behavior & Ethology (AREA)
- Heart & Thoracic Surgery (AREA)
- Mathematical Physics (AREA)
- Radiology & Medical Imaging (AREA)
- General Engineering & Computer Science (AREA)
- Physiology (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Fuzzy Systems (AREA)
- Neurology (AREA)
- Cardiology (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
Abstract
Description
- This application is a continuation of U.S. application Ser. No. 16/861,114, filed Apr. 28, 2020, which claims the benefit of priority to U.S. Provisional Application No. 62/842,482, filed on May 2, 2019. The entire content of both priority applications is incorporated herein by reference in their entireties.
- The present disclosure relates to systems and methods for intracerebral hemorrhage detection, and more particularly to, method and system for end-to-end intracerebral hemorrhage detection, segmentation and volume estimation based on a multi-task fully convolutional network.
- Intracerebral hemorrhage (ICH) may lead to severe disabilities or death, though it is a common subtype of stroke. The fatality rate can be up to 40% within one month from ICH presence, and up to 54% by the end of the first year. Only 12% to 39% survivors can be long-term functionally independent after an intervention. Therefore, fast and accurate ICH diagnosis is essential for improving treatment outcomes.
- ICH diagnosis mainly consists of two closely related tasks: 1) ICH detection with its subtype classification, and 2) ICH bleeding region segmentation with volume estimation. Existing ICH analysis approaches usually treat the two tasks independently by building two learning models and processing the images separately, though the two tasks may use a same set of input head scan image slices.
- For example, for the ICH detection and subtype classification task, a 2D convolutional neural networks (CNN) is usually applied to each slice of a head CT scan to generate a slice-level probability score, with post-processing of the slice-level probabilities to obtain a subject-level ICH prediction. Alternatively, 3D CNN may be used to analyze a whole head CT scan to obtain a subject-level result directly. For the ICH segmentation and volume estimation task. U-Net is usually applied to segment each scan slice (e.g. 2D U-Net) or the entire head CT (e.g. 3D U-Net).
- These current ICH diagnosis approaches have at least three issue. First, though ICH detection and segmentation tasks share a same set of input data, features are extracted in two separate models and not shared between the two tasks. It may decrease the algorithm performance and increase computation time in both training and prediction stages. Second, the application of 2D CNN models for ICH classification task is unable to capture contextual information along the axial axis in the head scan, while ICH classification approaches with simple 3D convolution designs have unsatisfactory performance, Third, existing approaches are designed to utilize models trained using either slice-level labels or subject-level labels, which cannot satisfy the needs in various real clinical needs.
- Embodiments of the disclosure address the above problems by providing method and system for end-to-end intracerebral hemorrhage detection, segmentation and volume estimation based on a multi-task fully convolutional network.
- A novel deep learning-based architecture is disclosed to handle the challenging automatic intracerebral hemorrhage (ICH) detection and segmentation tasks.
- In one aspect, embodiments of the disclosure provide a system for detecting an intracerebral hemorrhage. The system includes a communication interface configured to receive a sequence of image slices and an end-to-end multi-task learning model. The sequence of image slices is the head scan images of a subject and acquired by an image acquisition device. The end-to-end multi-task learning model includes an encoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a decoder, and a classifier. The system further includes at least one processor configured to extract feature maps from each image slice using the encoder, capture contextual information between adjacent image slices using the bi-directional ConvRNN, and detect the ICH of the subject using the classifier based on the extracted feature maps of the image slices and the contextual information or segment each image slice using the decoder to obtain an ICH region based on the extracted feature maps of the image slice.
- In another aspect, embodiments of the disclosure also provide a method for detecting an intracerebral hemorrhage (ICH). The method includes receiving a sequence of image slices and an end-to-end multi-task learning model. The sequence of image slices is head scan images of a subject and acquired by an image acquisition device. The end-to-end multi-task learning model includes an encoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a decoder, and a classifier. The method also includes extracting, by at least one processor, feature maps from each image slice using the encoder. The method further includes capturing, by at least one processor, contextual information between adjacent image slices using the bi-directional ConvRNN. The method also includes detecting, by at least one processor, the ICH of the subject using the classifier based on the extracted feature maps of the image slices and the contextual information or segmenting, by the at least one processor, each image slice using the decoder to obtain an ICH region based on the extracted feature maps of the image slice.
- In yet another aspect, embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one processor, causes the at least one processor to perform a method for detecting an intracerebral hemorrhage (ICH). The method includes receiving a sequence of image slices and an end-to-end multi-task learning model. The sequence of image slices is head scan images of a subject and acquired by an image acquisition device. The end-to-end multi-task learning model includes an encoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a decoder, and a classifier. The method also includes extracting, by at least one processor, feature maps from each image slice using the encoder. The method further includes capturing, by at least one processor, contextual information between adjacent image slices using the bi-directional ConvRNN. The method also includes detecting, by at least one processor, the ICH of the subject using the classifier based on the extracted feature maps of the image slices and the contextual information or segmenting each image slice using the decoder to obtain an ICH region based on the extracted feature maps of the image slice.
- It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
-
FIG. 1 illustrates a schematic diagram of an exemplary ICH detection system, according to embodiments of the disclosure. -
FIG. 2 illustrates an exemplary end-to-end multi-task learning model, according to embodiments of the disclosure. -
FIG. 3 illustrates a block diagram of an exemplary image processing device, according to embodiments of the disclosure, -
FIG. 4 is a flow diagram of an exemplary method for slice-level ICH detection and segmentation, according to embodiments of the disclosure. -
FIG. 5 is a flow diagram of an exemplary method for subject-level ICH detection, segmentation and volume estimation, according to embodiments of the disclosure. -
FIG. 6 is a flow diagram of an exemplary attention module, according to embodiments of the disclosure. -
FIG. 7 is a flowchart of an exemplary method for training an end-to-end multi-task learning model, according to embodiments of the disclosure. -
FIG. 8 is a flowchart of an exemplary method for ICH detection and segmentation using an end-to-end multi-task learning model, according to embodiments of the disclosure. - Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings.
- The disclosed systems and methods use an end-to-end multi-task learning model for modeling head scan images to solve ICH detection and segmentation problems. In some embodiments, this end-to-end multi-task learning model may include the following modules: an encoder, a decoder, a bi-directional Convolutional Recurrent Neural Network (ConvRNN), a classification, and an optional attention module. In some embodiments, the encoder module extracts task-relevant features from the scan slices, which is used by the ConvRNN module and the decoder module as an input. The decoder module combines features of different granularities to generate segmentation results and estimation of bleeding volume as the output of the end-to-end multi-task learning model. In some embodiments, the ConvRNN module captures context information between adjacent slides without losing spatial information encoded in the slice-level feature maps. The classification module then utilizes the output feature maps from the ConvRNN module to generate slice-level and subject-level classification results. In some embodiments, the end-to-end multi-task learning model may optionally include an attention model to magnify signals from salient features and filter out task-irrelevant features.
- The disclosed systems and methods provide several improvements over conventional approaches. First, the learning model can perform ICH detection and segmentation tasks simultaneously. This enables information sharing and complementation between the two different but closely related tasks. Modules of the learning model are jointly optimized, which can preserve the overall performance of the two tasks while reducing time consumption in both training and prediction stages. Second, the model can seamlessly combine the advantages of Convolutional Neural Network (CNN) on image feature extraction and that of ConvRNN on sequential learning of spatial information. Model performance can be further improved by using the combined 2D and 3D information extracted from slice-level and subject-level images. Third, the model can be flexible on the type of ICH classification labels it predicts and supports different training scenarios.
-
FIG. 1 illustrates an exemplaryICH detection system 100, according to some embodiments of the present disclosure. Consistent with the present disclosure,ICH detection system 100 is configured to classify and segment biomedical images acquired by animage acquisition device 105. In some embodiments, the biomedical images may be head scan image that can be used to diagnose ICH. In some embodiments,image acquisition device 105 may be using one or more imaging modalities that are suitable for head scans, including, e.g., Magnetic Resonance Imaging (MRI), Computed Tomography (CT), functional fMRI, DCE-MRI and diffusion MRI), Cone Beam CT (CBCT), Positron Emission Tomography (PET), Single-Photon Emission Computed Tomography (SPECT), etc. In some embodiments, the biomedical image captured may be two dimensional (2D) or three dimensional (3D) images. A 3D image may contain multiple 2D image slices. - As shown in
FIG. 1 ,ICH detection system 100 may include components for performing two phases, a training phase and a prediction phase. To perform the training phase,ICH detection system 100 may include atraining database 101 and amodel training device 102. To perform the prediction phase,ICH detection system 100 may include animage processing device 103 and abiomedical image database 104. In some embodiments,ICH detection system 100 may include more or less of the components shown inFIG. 1 . For example, when a learning model for classifying and segmenting the images is pre-trained and provided,ICH detection system 100 may include onlyimage processing device 103 and headscan image database 104. -
ICH detection system 100 may optionally include anetwork 106 to facilitate the communication among the various components ofICH detection system 100, such asdatabases devices network 106 may be a local area network (LAN), a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service), a client-server, a wide area network (WAN); etc. In some embodiments,network 106 may be replaced by wired data communication systems or devices. - In some embodiments, the various components of
ICH detection system 100 may be remote from each other or in different locations, and be connected throughnetwork 106 as shown inFIG. 1 . In some alternative embodiments, certain components ofICH detection system 100 may be located on the same site or inside one device. For example,training database 101 may be located on-site with or be part ofmodel training device 102. As another example,model training device 102 andimage processing device 103 may be inside the same computer or processing device. -
Model training device 102 may use the training data received fromtraining database 101 to train a learning model for classifying and segmenting the sequence of image slices received from, e.g., headscan image database 104. As shown inFIG. 1 ,model training device 102 may communicate withtraining database 101 to receive one or more sets of training data. Each set of training data may include head scan images (2D or 3D)s from a subject and the corresponding ground truth ICH labels and segmentation masks that provides the classification and segmentation result to each image. - Based on a bleeding location in a brain, ICH can be further categorized into 5 subtypes: epidural hemorrhage (EDH), subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), cerebral parenchymal hemorrhage (CPH) and intraventricular hemorrhage (IVH). In some embodiments, ICH subtype labels on slice-level or subject-level may be included in training data.
- In some embodiments, the ground truth data may further include a set of segmentation masks of the bleeding region. The training images are previously segmented or annotated by expert operators with each pixel/voxel classified and labeled, e.g., with
value 1 if the pixel/voxel indicates a bleeding or value 0 if otherwise. In some embodiments, instead of binary values, the ground truth data may be probability maps where each pixel/voxel is associated with a probability value indicating how likely the pixel/voxel indicate a bleeding. The aim of the training phase is to learn a mapping between a sequence of head scan image slices and the ground truth labels with classification and segmentation masks by finding the best fit between predictions and ground truth values over the sets of training data. - In some embodiments, the training phase may be performed “online” or “offline.” An “online” training refers to performing the training phase contemporarily with the prediction phase, e.g., learning the model in real-time just prior to classifying and segmenting a biomedical image. An “online” training may have the benefit to obtain a most updated learning model based on the training data that is then available. However, an “online” training may be computational costive to perform and may not always be possible if the training data is large and/or the model is complicate. Consistent with the present disclosure, an “offline” training is used where the training phase is performed separately from the prediction phase. The learned model trained offline is saved and reused for classifying and segmenting images.
-
Model training device 102 may be implemented with hardware specially programmed by software that performs the training process. For example,model training device 102 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection withFIG. 3 ). The processor may conduct the training by performing instructions of a training process stored in the computer-readable medium.Model training device 102 may additionally include input and output interfaces to communicate withtraining database 101,network 106, and/or a user interface (not shown). The user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a framework of the learning model, and/or manually or semi-automatically providing prediction results associated with a sequence of images for training. - Consistent with some embodiments, the end-to-end multi-task learning model may be a fully convolutional network (FCN) that include an encoder module, a decoder module, a bi-directional ConvRNN module, a classification module and an optional attention module (discussed in detail in connection with
FIG. 2 ). However, it is contemplated that the model may include more or less modules and the structure of the model is not limited to what is disclosed as long as the model performs multiple tasks simultaneously and utilizes contextual information between adjacent image slices. - The trained learning model may be used by
image processing device 103 to detect ICH in new head scan images (images that are not associated with ground-truth detection results).Image processing device 103 may receive a trained learning model, e.g., end-to-endmulti-task learning model 200, frommodel training device 102.Image processing device 103 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection withFIG. 3 ). The processor may perform instructions of a sequence of image slices classification and segmentation process stored in the medium.Image processing device 103 may additionally include input and output interfaces (discussed in detail in connection withFIG. 3 ) to communicate with headscan image database 104,network 106, and/or a user interface (not shown). The user interface may be used for selecting one or more head scan images of a subject for classification and segmentation, initiating the classification and segmentation process, displaying the head scan image and/or the classification and segmentation results. - Types of ICH detection results that the learning model may provide depend on the types of ground truth ICH labels used in a model training. In some embodiments, given a trained model ψ50 and a sequence of input image scan x=(x1, x2, . . . , xn), the model returns ŷ=ψ(x). For example, if only slice-level binary labels (whether ICH exists or not in each training image slice) and segmentation masks are provided in the model training, the model returns ŷ=(ŷsli-b, ŷseg, ŷsub-b, ŷv). If only subject-level ICH subtype labels are available in the model training, the model returns ŷ=(ŷsli-b, ŷseg, ŷsub-b, ŷsub-m, ŷv). If only slice-level ICH subtype labels are used in the model training, the model returns ŷ=(ŷsli-b, ŷsli-m, ŷseg, ŷsub-b, ŷsub-m, ŷv), where ŷseg is the predicted segmentation, ŷsli-b is the ICH predictions for all slices, ŷsli-m is the subtype predictions for all slices, ŷsub-b is the subject-level ICH prediction, ŷsub-m is the subject-level subtype prediction, and ŷv is the ICH volume estimation.
- In some embodiments, the segmentation mask can include but not limited to the following examples: 1) binary ICH masks; 2) detailed ICH subtype masks; 3) ICH or subtype masks together with other desired labels such as skulls, normal brain tissues and regions outside the brain.
-
Image processing device 103 may communicate with headscan image database 104 to receive one or more head scan images. In some embodiments, the images stored in headscan image database 104 may include 2D image slices from a 3D scan. The images may be acquired byimage acquisition devices 105.Image processing device 103 uses the trained learning model received frommodel training device 102 to perform one or more of: (1) predict whether ICH exists, (2) predict the subtype of ICH, and (3) determine segmentation masks of the image optionally with an estimated bleeding volume. -
FIG. 2 illustrates an exemplary end-to-end multi-task learning model 200 (hereafter, “learningmodel 200”), according to embodiments of the disclosure. For example, learningmodel 200 can be a Fully Convolutional Network (FCN). In some embodiments, learningmodel 200 may include anencoder module 202, adecoder module 204, aConvRNN module 206, aclassification module 208, and optionally anattention module 210. As shown inFIG. 2 ,encoder module 202 may include a sequence of convolution/pooling layers to extract task-relevant features from the image slices, e.g., head CT scan slices.Decoder module 204 is used to combine feature maps of different granularities to generate segmentation masks.Bi-directional ConvRNN module 206 is used to utilize the contextual information between adjacent slices to improve the slice-level feature maps.Classification module 208 utilizes output feature maps fromConvRNN module 206 to generate slice-level or subject-level ICH predictions.Attention module 210 is an optional component to magnify signals from salient features and filter out task-irrelevant features. - In some embodiments,
encoder module 202 may be in any suitable convolutional neutral network (CNN) architecture, including but not limited to the CNN component of commonly used image classification architectures such as VGG, ResNet, and DenseNet. InFIG. 2 ,encoder module 202 employs a VGG architectures as an example to illustrate a feature map extraction procedure. For example, convolutional layers use multiple 3×3 kernel-sized filters and pooling layers use 2×2 size filters. It is contemplated that classification architectures other than VGG can be used. - Consistent with the present disclosure,
ConvRNN module 206 may be used to learn contextual information between adjacent image slices across axial axis and enhance the quality of feature maps generated fromencoder module 202. For example,ConvRNN module 206 may be implemented by either bidirectional Convolutional Long Short-Term Memory (ConvLSTM) or bidirectional Convolutional Gated Recurrent Unit (ConvGRU). “Adjacent image slices” refer to every two image slices immediately next to each other in the sequence of image slices. Most of the structural information captured by the adjacent image slices can be common, while the differences between the image slices could provide valuable information of bleeding or other abrupt changes. Therefore, contextual information captured byConvRNN module 206 could assist the detection and segmentation of the bleeding regions. - Feature maps generated from
ConvRNN module 206 may be fed toclassification module 208 to generate slice-level or subject-level ICH detection results. In some embodiments,classification module 208 may be composed of convolution and pooling layers. For example, convolutional layers may use multiple 3×3 kernel-sized filters and pooling layers may use 2×2 size filters as shown inFIG. 2 . Slice-level ICH detection predicts whether ICH exists on each input image slice (discussed in detail in connection withFIG. 4 ). In some embodiments, slice-level feature maps are fused to obtain subject-level feature maps using operations e.g., average/max pooling (discussed in detail in connection withFIG. 5 ). - Feature maps generated from
ConvRNN module 206 are also used bydecoder module 204 to produce segmentation masks. In some embodiments, upsampling via deconvolution is used to determine a segmentation map having the same size as the input image slice.FIG. 2 illustrates howdecoder module 204 generates a segmentation map by fusing features of different granularities. For example, the output fromConvRNN module 206 is first 2 x upsampled and then fused with feature maps fromencoder module 202 that has the same granularity. The fusion operation can improve segmentation performance. - In some embodiments,
attention module 210 can be used to enhance feature maps before a fusion operation.FIG. 6 is a flow diagram of anexemplary attention module 600, according to embodiments of the disclosure. As shown inFIG. 6 ,convolutional filter 602 andconvolutional filter 603 are first applied to two feature maps, respectively.Convolutional filter 602 andconvolutional filter 603 can be any suitable filters of any suitable sizes. For example, 3×3 kernel-sized filters are illustrated inFIG. 6 . The results are then element-wise summed.ReLu activation 604 andSigmoid activation 605 are sequentially applied to the summed feature map. The resulted feature map is downsampled usingconvolutional filter 606.Sigmoid activation 607 is used to determine attention weights from the downsampled feature map. The attention weight map is upsampled, e.g., using 2×2 filters, and then element-wise multiplied with the input feature maps to obtain an enhanced feature map.FIG. 6 shows one possible implementation ofattention module 210. However, it is contemplated that the module can be implemented using any architectures, operations, and activations (e.g., tan h activation function) as long as it can effectively enhance feature maps. - Consistent with the present disclosure, returning to
FIG. 1 ,model training device 102 may jointly trainencoder module 202,decoder module 204,ConvRNN module 206,classification module 208, andattention module 210, using the training data fromtraining database 101. In other words, the end-to-end multi-task learning model is trained as one model rather than different modules separately. As information propagates among the nodes in the ConvRNN module during the joint training, the jointly trained learning model can utilize information of the adjacent image slices of the subject to provide a better prediction. - As used herein, “Training” an end-to-end multi-task learning model refers to determining one or more parameters of at least one layer in the network. For example, a convolution layer of
encoder module 202 ordecoder module 204 may include at least one filter or kernel. One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process. The end-to-end multi-task learning model may be trained using supervised learning, semi-supervised learning, or unsupervised learning. - In some embodiments, a joint loss function may be used by the joint training to account for losses associated with the various tasks performed by learning
model 200. In some embodiments, the losses of the various tasks may be weighted according their significance or another predetermined priority. For example, returning toFIG. 2 , a loss function may be defined as -
ρ∥ψ∥1+ρ2∥ψ∥2, - where parameter {λ} controls the weights of different, subtask loss functions contributed in the total loss . Among them, sli-b denotes a slice-level binary classification loss, sub-b denotes a subject-level binary classification loss, sli-m denotes a slice-level ICH subtype classification loss, and sub-m denotes a subject-level ICH subtype classification loss. In some embodiments, sli-b, sub-b, sli-m, and sub-m can be binary cross entropy (BCE) loss or any other suitable types of losses. In some embodiments, if ground truth of the slice-level or subject-level ICH subtype is not provided in the training database, parameters λsli-m or λsub-m can be set to 0, seg denotes a segmentation loss, winch can be a soft dice loss, a cross entropy, or any other suitable types of losses. v denotes an ICH volume estimation loss and it can be a quadratic loss or any other suitable types of losses. ∥ψ∥1 and ∥ψ∥2 are L1/L2 norm of all model parameters to control the model complexity. {ρ.} may be used to control the strength of L1/L2 regularizations. The parameters (λ., ρ.) can be pre-defined or optimized together with other model parameters during training.
- In some embodiments, the loss function can be optimized by using neural network optimizers such as Stochastic Gradient Descent (SGD), AdaGrad, or Adam. Backpropagation can be used to compute gradients and optimize parameters.
- When applied by
image processing device 103, some or all the modules of learningmodel 200 may be used. For example, learningmodel 200 may be employed to do only ICH detection (usingencoder module 202,ConvRNN module 206, and classification module 208), only segmentation (usingencoder module 202,ConvRNN module 206, and decoder module 204), or ICH detection and segmentation simultaneously (using all the modules of learning model 200). When performing detection and segmentation at the same time, learningmodel 200 can improve performance and reduce computation time. On the other hand, learningmodel 200 still preserves the flexibility of doing these tasks separately by only including a subset of modules. -
FIG. 3 illustrates an exemplaryimage processing device 103, according to some embodiments of the present disclosure. In some embodiments,image processing device 103 may be a special-purpose computer, or a general-purpose computer. For example,image processing device 103 may be a computer custom-built for hospitals to perform image acquisition and image processing tasks, or a server placed in the cloud. As shown inFIG. 3 ,image processing device 103 may include acommunication interface 302, astorage 304, amemory 306, aprocessor 308, and a bus 310.Communication interface 302,storage 304,memory 306, andprocessor 308 are connected with bus 310 and communicate with each other through bus 310. -
Communication interface 302 may include a network adaptor, a cable connector, a serial connector, a USB connector, a parallel connector, a high-speed data transmission adaptor, such as fiber, USB 3.0, thunderbolt, and the like, a wireless network adaptor, such as a WiFi adaptor, a telecommunication (3G, 4G/LTE and the like) adaptor, etc.Image processing device 103 may be connected to other components ofICH detection system 100 andnetwork 106 throughcommunication interface 302. In some embodiments,communication interface 302 receives biomedical images (each including a sequence of image slices) fromimage acquisition device 105. In some embodiments,communication interface 302 also receives the trained learning model, e.g., end-to-endmulti-task learning model 200, frommodeling training device 102. -
Storage 304/memory 306 may be a non-transitory computer-readable medium, such as a read-only memory (ROM), a random access memory (RAM), a phase-change random access memory (PRAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), an electrically erasable programmable read-only memory (EEPROM), other types of random access memories (RAMs), a flash disk or other forms of flash memory, a cache, a register, a static memory, a compact disc read-only memory (CD-ROM), a digital versatile disc (DVD) or other optical storage, a cassette tape or other magnetic storage devices, or any other non-transitory medium that may be used to store information or instructions capable of being accessed by a computer device, etc. - In some embodiments,
storage 304 may store the trained learning model, e.g., end-to-endmulti-task learning model 200, and data, such as feature maps generated while executing the computer programs, etc. In some embodiments,memory 306 may store computer-executable instructions, such as one or more image processing programs. In some embodiments, feature maps may be extracted at different granularities from image slices stored instorage 304. The feature maps may be read fromstorage 304 one by one or simultaneously and stored inmemory 306. -
Processor 308 may be a processing device that includes one or more general processing devices, such as a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), and the like. More specifically, the processor may be a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor running other instruction sets, or a processor that runs a combination of instruction sets. The processor may also be one or more dedicated processing devices such as application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), system-on-chip (SoCs), and the like.Processor 308 may be communicatively coupled tomemory 306 and configured to execute the computer-executable instructions stored thereon. - In some embodiments,
processor 308 is configured to detect an ICH (optionally including its subtype) or segment the images, or both. For example, as shown inFIG. 4 ,processor 308 may first, receive an image (including the sequence of image slices) fromimage acquisition device 105.Processor 308 then uses the trained learning model, such as end-to-endmulti-task learning model 200, to predict whether ICH exists in the received image and outputs a segmentation mask of the image. The trained learning model may include slice-level encoder-decoder-attention modules 402 (e.g., includingencoder module 202,decoder module 204, and attention module 210), a ConvRNN module 404 (e.g., ConvRNN module 206), and a slice-level classification module 406 (e.g., classification module 208). - Consistent with some embodiments,
processor 308 may apply the learning model to the image to perform the slice-level classification (e.g., using the slice-level classification module) and segmentation (e.g., using the decoder module) in parallel. For example,processor 308 may execute the encoder module to extract, feature maps from the received image. The feature maps may then be fed to the ConvRNN module and the attention module in parallel. The ConvRNN module encodes contextual information into the feature maps, which is provided to the slice-level classification module. In some embodiments,processor 308 may additionally execute the attention module to enhance the feature maps before they are used for segmentation in the decoder module.Processor 308 may execute the slice-level classification module to generate a slice-level classification result. In some embodiments, this classification result can be either an ICH identification result or an ICH subtype label depending on the ground truth label types used in the model training. - In parallel, the decoder module produces the segmentation mask using the enhanced maps. In some embodiments, the decoder module may produce a probability map indicating the probability each pixel in the image slice belongs to a bleeding region.
Processor 308 may then perform a thresholding to obtain a segmentation mask. For example,processor 308 may set pixels with probabilities above 0.8 as 1 (i.e., belong to a bleeding region) and the remaining pixels as 0 (i.e., not belong to a bleeding region). The threshold may be set by an operator or automatically selected byprocessor 308. - In an exemplary embodiment,
processor 308 may also process a sequence of image slices (e.g.,slice 1,slice 2, . . . , slice n of a head scan) of a subject to generate subject-level ICH prediction and segmentation mark according to data diagram 500 as shown inFIG. 5 . Each slice may be individually processed by the slice-level encoder-decoder-attention modules in parallel. For example, slice is processed by slice-level encoder-decoder-attention modules 510 andslice 2 is processed by slice-level encoder-decoder-attention modules 520. Certain intermediate information generated by the slice-level encoder-decoder-attention modules of every two adjacent image slices may be fed to the ConvRNNs. For example, information of slice 1 (generated by modules 510) and information of slice 2 (generated by modules 520) are fed intoConvRNN 520. The ConvRNNs then extract contextual information between the adjacent slices, which will be used to enhance the features maps. In some embodiments, feature maps of each image slice generated using the ConvRNNs may be fused to obtain the subject-level feature maps using fusion operations (e.g., average pooling, max pooling, or a combination of the two). For example, the subject-level feature maps from some or all ConvRNNs are then fed into a subject-level classification module 530 to obtain a subject-level ICH prediction. In some embodiments, decoder and attention modules may be used to generate segmentation masks for image slices of the subject in parallel. Voxel size information extracted from the segmentation masks may be used to estimate the bleeding volume. An exemplary image classification and segmentation process will be described in connection withFIG. 8 . - In some embodiments,
processor 308 may use only some modules of learningmodel 200 to perform either only ICH detection (e.g., usingencoder module 202,ConvRNN module 206, and classification module 208) or only segmentation (e.g., usingencoder module 202,ConvRNN module 206, and decoder module 204). - Consistent with the present disclosure,
model training device 102 can have same or similar structures asimage processing device 103. In some embodiments,model training device 102 includes a processor, among other components, configured to jointly train modules in the end-to-end multi-task learning model. An exemplar) network training process will be described in connection withFIG. 7 . -
FIG. 7 is a flow-chart of anexemplary method 700 for training an end-to-end multi-task learning model, according to embodiments of the disclosure. For example,method 700 may be implemented bymodel training device 102 inFIG. 1 . However,method 700 is not limited to that exemplary embodiment.Method 700 may include steps S702-S712 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown inFIG. 7 . - In step S702,
model training device 102 may communicate withtraining database 101 to receive one or more sets of training data. Each set of training data may include a sequence of image slices of a subject (e.g., a patient's head scan images) and its corresponding ground truth ICH labels (optionally also ICH subtype labels) and segmentation masks that provides the segmentation result to each image slice. - In
step 704,model training device 102 may initialize the parameters of a learning model. Training the learning model is a process of determining one or more parameters of the learning model, in some embodiments, the learning model may includeencoder module 202,decoder module 204,ConvRNN module 206,classification module 208, andattention module 210. Consistent with the present disclosure,model training device 102 jointly trains the various modules in the learning model, using the training data fromtraining database 101. That is, the set of parameters of the modules are trained together. The parameters may be initially set to certain values. The initial values may be predetermined, selected by an operator, or decided bymodel training device 102 based on prior experience of similar image slices. For example, parameters of a learning model previously trained for head scan image slices of patient A may be used as initial values for the parameters of the learning model being trained for head scan image slices of patient B. - In step S706,
model training device 102 may calculate the value of a joint loss function. In some embodiments, the joint loss function may be a weighted combination of a slice-level classification loss, a subject-level classification loss, a segmentation loss, and an ICH volume loss. In some embodiments, binary cross entropy (BCE) loss may be used to represent the slice-level classification loss and the subject-level classification loss. In some embodiments, soft dice loss or cross entropy may be used to represent the segmentation loss. In some embodiments, quadratic loss may be used to represent the ICH volume loss. - In step S708, the calculated value may be compared with a predetermined threshold. The predetermined threshold is also known as the stopping criteria for iterative methods. The smaller it is, the more optimal the parameters, but the longer it takes (i.e., more iterations) for the computation to converge. Therefore, the threshold may be selected to balance the accuracy of the prediction and the computational cost.
- If the value is below the predetermined threshold (step S708: Yes), the method is considered as have converged, and the joint loss function is minimized. In step S710,
model training device 102 outputs the learning model with the optimized sets of parameters andmethod 700 concludes. Otherwise (step S708: No),model training device 102 may further adjust the sets of parameters jointly in step S712. In some embodiments, a stochastic gradient descent related method with backpropagation may be used. For example, the parameters may be adjusted with a gradient ∇ of the loss function L with respect to all parameters over image slices sampled from the training dataset.Method 700 may return to step S706 to calculate value of the loss function based on outputs obtained from the learning model with the adjusted sets of parameters. Each pass of steps S706-S712 is considered as one iteration.Method 700 iterates until the value of the loss function is reduced to below the predetermined threshold (step S708). -
FIG. 8 is a flowchart of anexemplary method 800 for ICH detection and segmentation using an end-to-end multi-task learning model, according to embodiments of the disclosure. For example,method 800 may be implemented byimage processing device 103 inFIG. 1 usinglearning model 200 inFIG. 2 . However,method 800 is not limited to that exemplary embodiment.Method 800 may include steps S802-S814 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown inFIG. 8 . - In step S802,
image processing device 103 receives a sequence of image slices, e.g., from headscan image database 104. The image slices may capture an ICH.Image processing device 103 may additionally receive a learning model, e.g., end-to-endmulti-task learning model 200. In some embodiments, the learning model may be trained usingmethod 700. - In step S804,
image processing device 103 extracts feature maps from the image slices. In some embodiments, the individual slices may be processed independently and in parallel. For example, the feature maps may be extracted at different granularities byencoder module 202 of learningmodel 200. In step S806,image processing device 103 extracts contextual information between every two adjacent image slices among the sequence of image slices. For example, feature maps of adjacent image slices (e.g.,slice 1 andslice 2 in data diagram 500) are fed into ConvRNN module (e.g.,ConvRNN 520 of data diagram 500). In some embodiments, the spatial relationship between two feature maps are extracted. - In step S808,
image processing device 103 may perform ICH prediction on each image slice.Image processing device 103 can also perform a subject-level ICH prediction. For example, all slice-level feature maps and contextual information between image slices are used to determine if the subject has an ICH. For example, slice-level classification module 406 is used to detect ICH for each image slice and subject-level classification module 530 is used to detect ICH for the subject. - In step S810,
image processing device 103 can further perform ICH subtype prediction on each image slice if ground truth subtype labels are used to train the learning model.Image processing device 103 can additionally perform a subject-level ICH subtype prediction. For example, all slice-level feature maps and contextual information between image slices are used to determine the ICH subtype of the subject. - In step S812,
image processing device 103 may segment the image slices using a decoder module, e.g.,decoder model 204. In some embodiments, the decoder module fuses all slice-level feature maps and contextual information between image slices to produce a segmentation mask for each image slice. In some embodiments,image processing device 103 may segment the image slices using parallel computing. In some embodiments, the decoder may produce a probability map indicating the probability each pixel in the image slice belongs to the bleeding region. Based on the probability map,image processing device 103 may then perform a thresholding to obtain a segmentation mask. - In step S814,
image processing device 103 may estimate the bleeding volume of the subject. For example, the segmentation masks generated in step S812 are used to obtain the voxel size information and calculate the bleeding volume. - Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.
- It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.
- It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/480,520 US11748879B2 (en) | 2019-05-02 | 2021-09-21 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962842482P | 2019-05-02 | 2019-05-02 | |
US16/861,114 US11170504B2 (en) | 2019-05-02 | 2020-04-28 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
US17/480,520 US11748879B2 (en) | 2019-05-02 | 2021-09-21 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/861,114 Continuation US11170504B2 (en) | 2019-05-02 | 2020-04-28 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220005192A1 true US20220005192A1 (en) | 2022-01-06 |
US11748879B2 US11748879B2 (en) | 2023-09-05 |
Family
ID=73016538
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/861,114 Active 2040-05-14 US11170504B2 (en) | 2019-05-02 | 2020-04-28 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
US17/480,520 Active 2040-06-05 US11748879B2 (en) | 2019-05-02 | 2021-09-21 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/861,114 Active 2040-05-14 US11170504B2 (en) | 2019-05-02 | 2020-04-28 | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network |
Country Status (1)
Country | Link |
---|---|
US (2) | US11170504B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210174939A1 (en) * | 2019-12-09 | 2021-06-10 | Tencent America LLC | Deep learning system for detecting acute intracranial hemorrhage in non-contrast head ct images |
CN115272165A (en) * | 2022-05-10 | 2022-11-01 | 推想医疗科技股份有限公司 | Image feature extraction method, and training method and device of image segmentation model |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112419335B (en) * | 2020-11-19 | 2022-07-22 | 哈尔滨理工大学 | Shape loss calculation method of cell nucleus segmentation network |
CN112508010A (en) * | 2020-11-30 | 2021-03-16 | 广州金域医学检验中心有限公司 | Method, system, device and medium for identifying digital pathological section target area |
US11615612B2 (en) * | 2020-12-04 | 2023-03-28 | Texas Instruments Incorporated | Systems and methods for image feature extraction |
CN112434663B (en) * | 2020-12-09 | 2023-04-07 | 国网湖南省电力有限公司 | Power transmission line forest fire detection method, system and medium based on deep learning |
CN114638776A (en) * | 2020-12-15 | 2022-06-17 | 通用电气精准医疗有限责任公司 | Method and system for early evaluation of stroke and method for segmenting brain region |
CN112842297B (en) * | 2020-12-16 | 2024-05-14 | 大连医科大学附属第一医院 | Signal processing method, apparatus, computer device and storage medium |
CN112819032B (en) * | 2021-01-11 | 2023-10-27 | 平安科技(深圳)有限公司 | Multi-model-based slice feature classification method, device, equipment and medium |
CN112652032B (en) * | 2021-01-14 | 2023-05-30 | 深圳科亚医疗科技有限公司 | Modeling method for organ, image classification device, and storage medium |
CN112734748B (en) * | 2021-01-21 | 2022-05-17 | 广东工业大学 | Image segmentation system for hepatobiliary and biliary calculi |
CN112819763A (en) * | 2021-01-25 | 2021-05-18 | 北京工业大学 | End-to-end three-dimensional breast ultrasound image tumor segmentation method based on Guide-MultiScale-Net |
CN113409243B (en) * | 2021-02-24 | 2023-04-18 | 浙江工业大学 | Blood vessel segmentation method combining global and neighborhood information |
CN112949550B (en) * | 2021-03-19 | 2022-04-15 | 中国科学院空天信息创新研究院 | Water body identification method, system and medium based on deep learning |
US11763545B2 (en) * | 2021-03-26 | 2023-09-19 | Adobe Inc. | Generating confidence-adaptive pixel-level predictions utilizing a multi-exit pixel-level prediction neural network |
CN113160151B (en) * | 2021-04-02 | 2023-07-25 | 浙江大学 | Panoramic sheet decayed tooth depth identification method based on deep learning and attention mechanism |
CN113051689B (en) * | 2021-04-25 | 2022-03-25 | 石家庄铁道大学 | Bearing residual service life prediction method based on convolution gating circulation network |
CN113177981B (en) * | 2021-04-29 | 2022-10-14 | 中国科学院自动化研究所 | Double-channel craniopharyngioma invasiveness classification and focus region segmentation system thereof |
CN113205133B (en) * | 2021-04-30 | 2024-01-26 | 成都国铁电气设备有限公司 | Tunnel water stain intelligent identification method based on multitask learning |
CN113222944B (en) * | 2021-05-18 | 2022-10-14 | 湖南医药学院 | Cell nucleus segmentation method and cancer auxiliary analysis system and device based on pathological image |
CN113327301B (en) * | 2021-05-25 | 2023-04-07 | 成都信息工程大学 | Strong convection extrapolation method and system based on depth analogy network under multi-dimensional radar data |
CN113349810B (en) * | 2021-05-27 | 2022-03-01 | 北京安德医智科技有限公司 | Cerebral hemorrhage focus identification and hematoma expansion prediction system and device |
CN113506258B (en) * | 2021-07-02 | 2022-06-07 | 中国科学院精密测量科学与技术创新研究院 | Under-sampling lung gas MRI reconstruction method for multitask complex value deep learning |
CN113658188B (en) * | 2021-08-18 | 2022-04-01 | 北京石油化工学院 | Solution crystallization process image semantic segmentation method based on improved Unet model |
CN113763390A (en) * | 2021-08-31 | 2021-12-07 | 山东师范大学 | Brain tumor image segmentation and enhancement system based on multi-task generation countermeasure network |
CN113920313B (en) * | 2021-09-29 | 2022-09-09 | 北京百度网讯科技有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN113947609B (en) * | 2021-10-12 | 2024-04-19 | 中南林业科技大学 | Deep learning network structure and multi-label aortic dissection CT image segmentation method |
CN113887499B (en) * | 2021-10-21 | 2022-11-18 | 清华大学 | Sand dune image recognition model, creation method thereof and sand dune image recognition method |
CN114332535B (en) * | 2021-12-30 | 2022-07-15 | 宁波大学 | sMRI image classification method based on high-resolution complementary attention UNet classifier |
CN114648509B (en) * | 2022-03-25 | 2023-05-12 | 中国医学科学院肿瘤医院 | Thyroid cancer detection system based on multi-classification task |
CN114419619B (en) * | 2022-03-29 | 2022-06-10 | 北京小蝇科技有限责任公司 | Erythrocyte detection and classification method and device, computer storage medium and electronic equipment |
US20240048152A1 (en) * | 2022-08-03 | 2024-02-08 | Arm Limited | Weight processing for a neural network |
CN115760810B (en) * | 2022-11-24 | 2024-04-12 | 江南大学 | Medical image segmentation apparatus, method and computer-readable storage medium |
CN116630386B (en) * | 2023-06-12 | 2024-02-20 | 新疆生产建设兵团医院 | CTA scanning image processing method and system thereof |
CN116580328B (en) * | 2023-07-12 | 2023-09-19 | 江西省水利科学院(江西省大坝安全管理中心、江西省水资源管理中心) | Intelligent recognition method for leakage danger of thermal infrared image dykes and dams based on multitasking assistance |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180033144A1 (en) * | 2016-09-21 | 2018-02-01 | Realize, Inc. | Anomaly detection in volumetric images |
US20180032846A1 (en) * | 2016-08-01 | 2018-02-01 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
US20180288086A1 (en) * | 2017-04-03 | 2018-10-04 | Royal Bank Of Canada | Systems and methods for cyberbot network detection |
US20190139218A1 (en) * | 2017-11-06 | 2019-05-09 | Beijing Curacloud Technology Co., Ltd. | System and method for generating and editing diagnosis reports based on medical images |
US10650230B2 (en) * | 2018-06-13 | 2020-05-12 | Sap Se | Image data extraction using neural networks |
US10691962B2 (en) * | 2017-09-22 | 2020-06-23 | Toyota Motor Engineering & Manufacturing North America, Inc. | Systems and methods for rear signal identification using machine learning |
-
2020
- 2020-04-28 US US16/861,114 patent/US11170504B2/en active Active
-
2021
- 2021-09-21 US US17/480,520 patent/US11748879B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180032846A1 (en) * | 2016-08-01 | 2018-02-01 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
US20180033144A1 (en) * | 2016-09-21 | 2018-02-01 | Realize, Inc. | Anomaly detection in volumetric images |
US20180082443A1 (en) * | 2016-09-21 | 2018-03-22 | Realize, Inc. | Anomaly detection in volumetric images |
US20180288086A1 (en) * | 2017-04-03 | 2018-10-04 | Royal Bank Of Canada | Systems and methods for cyberbot network detection |
US10691962B2 (en) * | 2017-09-22 | 2020-06-23 | Toyota Motor Engineering & Manufacturing North America, Inc. | Systems and methods for rear signal identification using machine learning |
US20190139218A1 (en) * | 2017-11-06 | 2019-05-09 | Beijing Curacloud Technology Co., Ltd. | System and method for generating and editing diagnosis reports based on medical images |
US10650230B2 (en) * | 2018-06-13 | 2020-05-12 | Sap Se | Image data extraction using neural networks |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210174939A1 (en) * | 2019-12-09 | 2021-06-10 | Tencent America LLC | Deep learning system for detecting acute intracranial hemorrhage in non-contrast head ct images |
CN115272165A (en) * | 2022-05-10 | 2022-11-01 | 推想医疗科技股份有限公司 | Image feature extraction method, and training method and device of image segmentation model |
Also Published As
Publication number | Publication date |
---|---|
US11170504B2 (en) | 2021-11-09 |
US20200349697A1 (en) | 2020-11-05 |
US11748879B2 (en) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11748879B2 (en) | Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network | |
US10573005B2 (en) | Automatic method and system for vessel refine segmentation in biomedical images using tree structure based deep learning model | |
US11574406B2 (en) | Systems and methods for image segmentation using a scalable and compact convolutional neural network | |
US10460447B2 (en) | Method and system for performing segmentation of image having a sparsely distributed object | |
US10769791B2 (en) | Systems and methods for cross-modality image segmentation | |
Mirikharaji et al. | Star shape prior in fully convolutional networks for skin lesion segmentation | |
US10846854B2 (en) | Systems and methods for detecting cancer metastasis using a neural network | |
EP3975117A1 (en) | Image segmentation method and apparatus, and training method and apparatus for image segmentation model | |
KR101977067B1 (en) | Method for reconstructing diagnosis map by deep neural network-based feature extraction and apparatus using the same | |
US20220351863A1 (en) | Method and System for Disease Quantification of Anatomical Structures | |
CN110738702B (en) | Three-dimensional ultrasonic image processing method, device, equipment and storage medium | |
EP4064187A1 (en) | Automatic hemorrhage expansion detection from head ct images | |
CN110599444B (en) | Device, system and non-transitory readable storage medium for predicting fractional flow reserve of a vessel tree | |
CN114972729A (en) | Method and system for label efficient learning for medical image analysis | |
CN115861250B (en) | Semi-supervised medical image organ segmentation method and system for self-adaptive data set | |
CN109410187B (en) | Systems, methods, and media for detecting cancer metastasis in a full image | |
CN116129184A (en) | Multi-phase focus classification method, device, equipment and readable storage medium | |
CN114913174B (en) | Method, apparatus and storage medium for vascular system variation detection | |
Adegun et al. | Deep convolutional network-based framework for melanoma lesion detection and segmentation | |
CN112862785B (en) | CTA image data identification method, device and storage medium | |
US11301999B2 (en) | Shape-aware organ segmentation by predicting signed distance maps | |
CN112651960A (en) | Image processing method, device, equipment and storage medium | |
KR102442591B1 (en) | Method, program, and apparatus for generating label | |
US20240104722A1 (en) | Method for detection and characterization of lesions | |
US20220392059A1 (en) | Method and system for representation learning with sparse convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: CURACLOUD CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GAO, FENG;YIN, YOUBING;GUO, DANFENG;AND OTHERS;SIGNING DATES FROM 20200426 TO 20200427;REEL/FRAME:057558/0249 Owner name: KEYAMED NA, INC., WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CURACLOUD CORPORATION;REEL/FRAME:057558/0265 Effective date: 20201229 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |