US20200193603A1 - Automated segmentation utilizing fully convolutional networks - Google Patents

Automated segmentation utilizing fully convolutional networks Download PDF

Info

Publication number
US20200193603A1
US20200193603A1 US16/800,922 US202016800922A US2020193603A1 US 20200193603 A1 US20200193603 A1 US 20200193603A1 US 202016800922 A US202016800922 A US 202016800922A US 2020193603 A1 US2020193603 A1 US 2020193603A1
Authority
US
United States
Prior art keywords
model
image
training
cnn model
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/800,922
Inventor
Daniel Irving GOLDEN
Matthieu Le
Jesse Lieman-Sifry
Hok Kan Lau
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arterys Inc
Original Assignee
Arterys Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arterys Inc filed Critical Arterys Inc
Priority to US16/800,922 priority Critical patent/US20200193603A1/en
Publication of US20200193603A1 publication Critical patent/US20200193603A1/en
Assigned to ARES CAPITAL CORPORATION, AS COLLATERAL AGENT reassignment ARES CAPITAL CORPORATION, AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARTERYS INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0454
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/149Segmentation; Edge detection involving deformable models, e.g. active contour models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac

Definitions

  • the present disclosure generally relates to automated segmentation of anatomical structures.
  • Magnetic Resonance Imaging is often used in cardiac imaging to assess patients with known or suspected cardiac pathologies.
  • cardiac MRI may be used to quantify metrics related to heart failure and similar pathologies through its ability to accurately capture high-resolution cine images of the heart. These high-resolution images allow the volumes of relevant anatomical regions of the heart (such as the ventricles and muscle) to be measured, either manually, or with the help of semi- or fully-automated software.
  • a cardiac MRI cine sequence consists of one or more spatial slices, each of which contains multiple time points (e.g., 20 time points) throughout a full cardiac cycle.
  • time points e.g. 20 time points
  • SAX The short axis (SAX) view, which consists of a series of slices along the long axis of the left ventricle.
  • Each slice is in the plane of the short axis of the left ventricle, which is orthogonal to the ventricle's long axis; the 2-chamber (2CH) view, a long axis (LAX) view that shows either the left ventricle and left atrium or the right ventricle and right atrium; the 3-chamber (3CH) view, an LAX view that shows the either the left ventricle, left atrium and aorta, or the right ventricle, right atrium and aorta; and the 4-chamber (4CH) view, a LAX view that shows the left ventricle, left atrium, right ventricle and right atrium.
  • 2CH 2-chamber
  • LAX long axis
  • 3CH 3-chamber
  • 4CH 4-chamber
  • these views may be captured directly in the scanner (e.g., steady-state free precession (SSFP) MRI) or may be created via multi-planar reconstructions (MPRs) of a volume aligned in a different orientation (such as the axial, sagittal or coronal planes, e.g., 4D Flow MRI).
  • the SAX view has multiple spatial slices, usually covering the entire volume of the heart, but the 2CH, 3CH and 4CH views often only have a single spatial slice. All series are cine, and have multiple time points encompassing a complete cardiac cycle.
  • ejection fraction represents the fraction of blood in the left ventricle (LV) that is pumped out with every heartbeat. Abnormally low EF readings are often associated with heart failure. Measurement of EF depends on the ventricular blood pool volume both at the end systolic phase, when the LV is maximally contracted, and at the end diastolic phase, when the LV is maximally dilated.
  • the ventricle In order to measure the volume of the LV, the ventricle is typically segmented in the SAX view.
  • the radiologist reviewing the case will first determine the end systole (ES) and end diastole (ED) time points by manually cycling through time points for a single slice and determining the time points at which the ventricle is maximally contracted or dilated, respectively. After determining those two time points, the radiologist will draw contours around the LV in all slices of the SAX series where the ventricle is visible.
  • ES end systole
  • ED end diastole
  • the area of the ventricle in each slice may be calculated by summing the pixels within the contour and multiplying by the in-plane pixel spacing (e.g., in mm per pixel) in the x and y directions.
  • the total ventricular volume can then be determined by summing the areas in each spatial slice and multiplying by the distance between slices (e.g., in millimeters (mm)). This yields a volume in cubic mm.
  • Other methods of integrating over the slice areas to determine the total volume may also be used, such as variants of Simpson's rule, which, instead of approximating the discrete integral using straight line segments, does so using quadratic segments. Volumes are typically calculated at ES and ED, and ejection fraction and similar metrics may be determined from the volumes.
  • the radiologist To measure the LV blood pool volume, the radiologist typically creates contours along the LV endocardium (interior wall of the myocardial muscle) on about 10 spatial slices at each of two time points (ES and ED), for a total of about 20 contours. Although some semi-automated contour placement tools exist (e.g., using an active contours or “snakes” algorithm), these still typically require some manual adjustment of the contours, particularly with images that have noise or artifacts. The whole process of creating these contours may take 10 minutes or more, mostly involving manual adjustments.
  • Example LV endocardium contours are shown as images 100 a - 100 k in FIG. 1 , which shows the contours at a single time point over a full SAX stack. From 100 a to 100 k , the slices proceed from the apex of the left ventricle to the base of the left ventricle.
  • the most basic method of creating ventricular contours is to complete the process manually with some sort of polygonal or spline drawing tool, without any automated algorithms or tools.
  • the user may, for example, create a freehand drawing of the outline of the ventricle, or drop spline control points which are then connected with a smoothed spline contour.
  • the user After initial creation of the contour, depending on the software's user interface, the user typically has some ability to modify the contour, e.g., by moving, adding or deleting control points or by moving the spline segments.
  • FIG. 2 shows a contour 202 for the LV endocardium.
  • the snakes algorithm is common, and although modifying its resulting contours can be significantly faster than generating contours from scratch, the snakes algorithm has several significant disadvantages.
  • the snakes algorithm requires a “seed.”
  • the “seed contour” that will be improved by the algorithm must be either set by the user or by a heuristic.
  • the snakes algorithm knows only about local context.
  • the cost function for snakes typically awards credit when the contour overlaps edges in the image; however, there is no way to inform the algorithm that the edge detected is the one desired; e.g., there is no explicit differentiation between the endocardium versus the border of other anatomical entities (e.g., the other ventricle, the lungs, the liver).
  • the algorithm is highly reliant on predictable anatomy and the seed being properly set.
  • the snakes algorithm is greedy.
  • the energy function of snakes is often optimized using a greedy algorithm, such as gradient descent, which iteratively moves the free parameters in the direction of the gradient of the cost function.
  • gradient descent and many similar optimization algorithms, are susceptible to getting stuck in local minima of the cost function. This manifests as a contour that is potentially bound to the wrong edge in the image, such as an imaging artifact or an edge between the blood pool and a papillary muscle.
  • the snakes algorithm has a small representation space. The snakes algorithm generally has only a few dozen tunable parameters, and therefore does not have the capacity to represent a diverse set of possible images on which segmentation is desired.
  • ventricle Many different factors can affect the perceived captured image of the ventricle, including anatomy (e.g., size, shape of ventricle, pathologies, prior surgeries, papillary muscles), imaging protocol (e.g., contrast agents, pulse sequence, scanner type, receiver coil quality and type, patient positioning, image resolution) and other factors (e.g., motion artifacts). Because of the great diversity on recorded images and the small number of tunable parameters, a snakes algorithm can only perform well on a small subset of “well-behaved” cases.
  • anatomy e.g., size, shape of ventricle, pathologies, prior surgeries, papillary muscles
  • imaging protocol e.g., contrast agents, pulse sequence, scanner type, receiver coil quality and type, patient positioning, image resolution
  • other factors e.g., motion artifacts. Because of the great diversity on recorded images and the small number of tunable parameters, a snakes algorithm can only perform well on a small subset of “well-behaved” cases.
  • the snakes algorithm's popularity primarily stems from the fact that the snakes algorithm can be deployed without any explicit “training,” which makes it relatively simple to implement.
  • the snakes algorithm cannot be adequately tuned to work on more challenging cases.
  • Papillary muscles are muscles on the interior of the endocardium of both the left and right ventricles. Papillary muscles serve to keep the mitral and tricuspid valves closed when the pressure on the valves increases during ventricular contraction.
  • FIG. 3 shows example SSFP MRI images 300 a (end diastole) and 300 b (end systole) which show the papillary muscles and myocardium of the left ventricle. Note that at end diastole (image 300 a ), the primary challenge is in distinguishing the papillary muscles from the blood pool in which they are embedded, while at end systole (image 300 b ), the primary challenge is in distinguishing the papillary muscles from the myocardium.
  • the papillary muscles When performing a segmentation of the ventricular blood pool (either manual or automated), the papillary muscles may be either included within the contour or excluded from the contour.
  • the contour that surrounds the blood pool is often colloquially referred to as an “endocardium contour,” regardless of whether the papillary muscles are included within the contour or excluded from the contour.
  • the term “endocardium” is not strictly accurate because the contour does not smoothly map to the true surface of the endocardium; despite this, the term “endocardium contour” is used for convenience.
  • Endocardium contours are typically created on every image in the SAX stack to measure the blood volume within the ventricle. The most accurate measure of blood volume will therefore be made if the papillary muscles are excluded from the endocardium contour.
  • the muscles are numerous and small, excluding them from a manual contour requires significantly more care to be taken when creating the contour, dramatically increasing the onerousness of the process.
  • the papillary muscles are typically included within the endocardium contour, resulting in a modest overestimate of the ventricular blood volume.
  • Automated or semi-automated utilities may speed up the process of excluding the papillary muscles from the endocardium contour, but they have significant caveats.
  • the snakes algorithm (discussed above) is not appropriate for excluding the papillary muscles at end diastole because its canonical formulation only allows for contouring of a single connected region without holes.
  • the algorithm may be adapted to handle holes within the contour, the algorithm would have to be significantly reformulated to handle both small and large connected regions simultaneously since the papillary muscles are so much smaller than the blood pool.
  • the canonical snakes algorithm it is not possible for the canonical snakes algorithm to be used to segment the blood pool and exclude the papillary muscles at end diastole.
  • the snakes algorithm At end systole, when the majority of the papillary muscle mass abuts the myocardium, the snakes algorithm will by default exclude the majority of the papillary muscles from the endocardium contour and it cannot be made to include them (since there is little or no intensity boundary between the papillary muscles and the myocardium). Therefore, in the standard formulation, the snakes algorithm can only include the papillary muscles at end diastole and only exclude them at end systole, resulting in inconsistent measurements of blood pool volume over the course of a cardiac cycle. This is a major limitation of the snakes algorithm, preventing clinical use of its output without significant correction by the user.
  • An alternate semi-automated method of creating a blood pool contour is using a “flood fill” algorithm.
  • the user selects an initial seed point, and all pixels that are connected to the seed point whose intensity gradients and distance from the seed point do not exceed a threshold are included within the selected mask.
  • flood fill requires the segmented region to be connected
  • flood fill carries the advantage that it allows for the connected region to have holes. Therefore, because papillary muscles can be distinguished from the blood pool based on their intensity, a flood fill algorithm can be formulated—either dynamically through user input, or in a hard-coded fashion—to exclude papillary muscles from the segmentation.
  • Flood fill could also be used to include papillary muscles from the endocardium segmentation at end diastole; however, at end systole, because the bulk of the papillary muscles are connected to the myocardium (making the two regions nearly indistinguishable from one another), flood fill cannot be used to include the papillary muscles within the endocardium segmentation.
  • flood fill Beyond the inability to distinguish papillary muscles from myocardium at end systole, the major disadvantage of flood fill is that, though it may significantly reduce the effort required for the segmentation process when compared to a fully-manual segmentation, it still requires a great deal of user input to dynamically determine the flood fill gradient and distance thresholds. The applicant has found that, while accurate segmentations can be created using a flood fill tool, creating them with acceptable clinical precision still requires significant manual adjustment.
  • Cardiac segmentations are typically created on images from a short axis or SAX stack.
  • One major disadvantage of performing segmentations on the SAX stack is that the SAX plane is nearly parallel to the plane of the mitral and tricuspid valves. This has two effects. First, the valves are very difficult to distinguish on slices from the SAX stack. Second, assuming the SAX stack is not exactly parallel to the valve plane, there will be at least one slice near the base of the heart that is partially in the ventricle and partially in the atrium.
  • the user may be required to define the regions of different landmarks in the heart in order to see different cardiac views (e.g., 2CH, 3CH, 4CH, SAX) and segment the ventricles.
  • the landmarks required to segment the LV and see 2CH, 3CH, and 4CH left heart views include LV apex, mitral valve, and aortic valve.
  • the landmarks required to segment the RV and see the corresponding views include RV apex, tricuspid valve and pulmonary valve.
  • LandMarkDetect is based on two notable components. First, a variation of the U-Net neural network is used, as discussed in Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241. Springer International Publishing, 2015. Second, the landmark is encoded during training using a Gaussian function of arbitrarily chosen standard deviation.
  • the LandMarkDetect neural network 500 of FIG. 5 differs from U-Net in the use of average pooling layers in place of max pooling layers.
  • LandMarkDetect relies on a limited pre-processing strategy which consists in removing the mean (i.e. centering the input data) of the 3D image.
  • a machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, the at least one processor: receives learning data including a plurality of batches of labeled image sets, each image set including image data representative of an anatomical structure, and each image set including at least one label which identifies the region of a particular part of the anatomical structure depicted in each image of the image set; trains a fully convolutional neural network (CNN) model to segment at least one part of the anatomical structure utilizing the received learning data; and stores the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system.
  • CNN fully convolutional neural network
  • the CNN model may include a contracting path and an expanding path
  • the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer
  • the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and includes a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • the CNN model may include a concatenation of feature maps from a corresponding layer in the contracting path through a skip connection.
  • the image data may be representative of a heart during one or more time points throughout a cardiac cycle.
  • the image data may include ultrasound data or visible light photograph data.
  • the CNN model may include a contracting path which may include a first convolutional layer which has between 1 and 2000 feature maps.
  • the CNN model may include a number of convolutional layers, and each convolutional layer may include a convolutional kernel of size 3 ⁇ 3 and a stride of 1.
  • the CNN model may include a number of pooling layers, and each pooling layer may include a 2 ⁇ 2 max-pooling layer with a stride of 2.
  • the CNN model may include four pooling layers and four upsampling layers.
  • the CNN model may include a number of convolutional layers, and the CNN model may pad the input to each convolutional layer using a zero padding operation.
  • the CNN model may include a plurality of nonlinear activation function layers.
  • the at least one processor may augment the learning data via modification of at least some of the image data in the plurality of batches of labeled image sets.
  • the at least one processor may modify at least some of the image data in the plurality of batches of labeled image sets according to at least one of: a horizontal flip, a vertical flip, a shear amount, a shift amount, a zoom amount, a rotation amount, a brightness level, or a contrast level.
  • the CNN model may include a plurality of hyperparameters stored in the at least one nontransitory processor-readable storage medium, and the at least one processor may configure the CNN model according to a plurality of configurations, each configuration including a different combination of values for the hyperparameters; for each of the plurality of configurations, validate the accuracy of the CNN model; and select at least one configuration based at least in part on the accuracies determined by the validations.
  • the at least one processor may, for each image set, identify whether the image set is missing a label for any of a plurality of parts of the anatomical structure; and for image sets identified as missing at least one label, modify a training loss function to account for the identified missing labels.
  • the image data may include volumetric images, and each label may include a volumetric label mask or contour.
  • Each convolutional layer of the CNN model may include a convolutional kernel of size N ⁇ N ⁇ K pixels, where N and K are positive integers.
  • Each convolutional layer of the CNN model may include a convolutional kernel of size N ⁇ M pixels, where N and M are positive integers.
  • the image data may be representative of a heart during one or more time points throughout a cardiac cycle, wherein a subset of the plurality of batches of labeled image sets may include labels which exclude papillary muscles.
  • the CNN model may utilize data for at least one image which may be at least one of: adjacent to the processed image with respect to space or adjacent to the processed image with respect to time.
  • the CNN model may utilize data for at least one image which may be adjacent to the processed image with respect to space and may utilize data for at least one image which is adjacent to the processed image with respect to time.
  • the CNN model may utilize at least one of temporal information or phase information.
  • the image data may include at least one of steady-state free precession (SSFP) magnetic resonance imaging (MRI) data or 4D flow MRI data.
  • SSFP steady-state free precession
  • MRI magnetic resonance imaging
  • 4D flow MRI data 4D flow MRI data.
  • a method of operating a machine learning system may include at least one nontransitory processor-readable storage medium that may store at least one of processor-executable instructions or data, and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, and may be summarized as including receiving, by the at least one processor, learning data including a plurality of batches of labeled image sets, each image set including image data representative of an anatomical structure, and each image set including at least one label which identifies the region of a particular part of the anatomical structure depicted in each image of the image set; training, by the at least one processor, a fully convolutional neural network (CNN) model to segment at least one part of the anatomical structure utilizing the received learning data; and storing, by the at least one processor, the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system.
  • CNN fully convolutional neural network
  • Training the CNN model may include training a CNN model including a contracting path and an expanding path
  • the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer
  • the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and may include a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data and, subsequent to each upsampling layer, the CNN model may include a concatenation of feature maps from a corresponding layer in the contracting path through a skip connection.
  • Receiving learning data may include receiving image data that may be representative of a heart during one or more time points throughout a cardiac cycle.
  • Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may include a contracting path which may include a first convolutional layer which has between 1 and 2000 feature maps.
  • Training a CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and each convolutional layer may include a convolutional kernel of size 3 ⁇ 3 and a stride of 1.
  • Training a CNN model may include training a CNN model which may include a plurality of pooling layers to segment at least one part of the anatomical structure utilizing the received learning data, and each pooling layer may include a 2 ⁇ 2 max-pooling layer with a stride of 2.
  • a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may include four pooling layers and four upsampling layers.
  • a CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may pad the input to each convolutional layer using a zero padding operation.
  • a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may include a plurality of nonlinear activation function layers.
  • the method may further include augmenting, by the at least one processor, the learning data via modification of at least some of the image data in the plurality of batches of labeled image sets.
  • the method may further include modifying, by the at least one processor, at least some of the image data in the plurality of batches of labeled image sets according to at least one of: a horizontal flip, a vertical flip, a shear amount, a shift amount, a zoom amount, a rotation amount, a brightness level, or a contrast level.
  • the CNN model may include a plurality of hyperparameters stored in the at least one nontransitory processor-readable storage medium, and may further include configuring, by the at least one processor, the CNN model according to a plurality of configurations, each configuration comprising a different combination of values for the hyperparameters; for each of the plurality of configurations, validating, by the at least one processor, the accuracy of the CNN model; and selecting, by the at least one processor, at least one configuration based at least in part on the accuracies determined by the validations.
  • the method may further include for each image set, identifying, by the at least one processor, whether the image set is missing a label for any of a plurality of parts of the anatomical structure; and for image sets identified as missing at least one label, modifying, by the at least one processor, a training loss function to account for the identified missing labels.
  • Receiving learning data may include receiving image data which may include volumetric images, and each label may include a volumetric label mask or contour.
  • a CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and each convolutional layer of the CNN model may include a convolutional kernel of size N ⁇ N ⁇ K pixels, where N and K are positive integers.
  • a CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and each convolutional layer of the CNN model may include a convolutional kernel of size N ⁇ M pixels, where N and M are positive integers.
  • Receiving learning data may include receiving image data representative of a heart during one or more time points throughout a cardiac cycle, and wherein a subset of the plurality of batches of labeled image sets may include labels which exclude papillary muscles.
  • Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and for each processed image, the CNN model may utilize data for at least one image which is at least one of: adjacent to the processed image with respect to space or adjacent to the processed image with respect to time.
  • Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and for each processed image, the CNN model may utilize data for at least one image which is adjacent to the processed image with respect to space and utilizes data for at least one image which is adjacent to the processed image with respect to time.
  • Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and for each processed image, the CNN model may utilize at least one of temporal information or phase information.
  • Receiving learning data may include receiving image data which may include at least one of steady-state free precession (SSFP) magnetic resonance imaging (MRI) data or 4D flow MRI data.
  • SSFP steady-state free precession
  • MRI magnetic resonance imaging
  • 4D flow MRI data 4D flow MRI data.
  • a machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, the at least one processor: receives image data which represents an anatomical structure; processes the received image data through a fully convolutional neural network (CNN) model to generate per-class probabilities for each pixel of each image of the image data, each class corresponding to one of a plurality of parts of the anatomical structure represented by the image data; and for each image of the image data, generates a probability map for each of the plurality of classes using the generated per-class probabilities; and stores the generated probability maps in the at least one nontransitory processor-readable storage medium.
  • CNN fully convolutional neural network
  • the CNN model may include a contracting path and an expanding path
  • the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer
  • the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and may include a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • the image data may be representative of a heart during one or more time points throughout a cardiac cycle.
  • the at least one processor may autonomously cause an indication of at least one of the plurality of parts of the anatomical structure to be displayed on a display based at least in part on the generated probability maps.
  • the at least one processor may post-process the processed image data to ensure at least one physical constraint is met.
  • the image data may be representative of a heart during one or more time points throughout a cardiac cycle, and the at least one physical constraint may include at least one of: the volume of the myocardium is the same at all time points, or the right ventricle and the left ventricle cannot overlap each other.
  • the at least one processor may, for each image of the image data, transform the plurality of probability maps into a label mask by setting the class of each pixel to the class with the highest probability.
  • the at least one processor may, for each image of the image data, set the class of each pixel to a background class when all of the class probabilities for the pixel are below a determined threshold.
  • the at least one processor may, for each image of the image data, set the class of each pixel to a background class when the pixel is not part of a largest connected region for the class to which the pixel is associated.
  • the at least one processor may convert each of the label masks for the image data into respective spline contours.
  • the at least one processor may autonomously cause the generated contours to be displayed with the image data on a display.
  • the at least one processor may receive a user modification of at least one of the displayed contours; and store the modified contour in the at least one nontransitory processor-readable storage medium.
  • the at least one processor may determine the volume of at least one of the plurality of parts of the anatomical structure utilizing the generated contours.
  • the anatomical structure may include a heart, and the at least one processor may determine the volume of at least one of the plurality of parts of the heart at a plurality of time points of a cardiac cycle utilizing the generated contours.
  • the at least one processor may automatically determine which of the plurality of time points of the cardiac cycle correspond to an end systole phase and an end diastole phase of the cardiac cycle based on the time points determined to have a minimum volume and a maximum volume, respectively.
  • the at least one processor may cause the determined volume of the at least one of the plurality of parts of the anatomical structure to be displayed on a display.
  • the image data may include volumetric images.
  • Each convolutional layer of the CNN model may include a convolutional kernel of sizes N ⁇ N ⁇ K pixels, where N and K are positive integers.
  • a method of operating a machine learning system may include at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data, and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, and may be summarized as including receiving, by the at least one processor, image data which represents an anatomical structure; processing, by the at least one processor, the received image data through a fully convolutional neural network (CNN) model to generate per-class probabilities for each pixel of each image of the image data, each class corresponding to one of a plurality of parts of the anatomical structure represented by the image data; and for each image of the image data, generating, by the at least one processor, a probability map for each of the plurality of classes using the generated per-class probabilities; and storing, by the at least one processor, the generated probability maps in the at least one nontransitory processor-readable storage medium.
  • CNN fully convolutional neural network
  • Processing the received image data through the CNN model may include processing the received image data through a CNN model which may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and may include a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • Receiving image data may include receiving image data that is representative of a heart during one or more time points throughout a cardiac cycle.
  • the method may further include autonomously causing, by the at least one processor, an indication of at least one of the plurality of parts of the anatomical structure to be displayed on a display based at least in part on the generated probability maps.
  • the method may further include post-processing, by the at least one processor, the processed image data to ensure at least one physical constraint is met.
  • Receiving image data may include receiving image data that may be representative of a heart during one or more time points throughout a cardiac cycle, and the at least one physical constraint may include at least one of: the volume of the myocardium is the same at all time points, or the right ventricle and the left ventricle cannot overlap each other.
  • the method may further include for each image of the image data, transforming, by the at least one processor, the plurality of probability maps into a label mask by setting the class of each pixel to the class with the highest probability.
  • the method may further include for each image of the image data, setting, by the at least one processor, the class of each pixel to a background class when all of the class probabilities for the pixel are below a determined threshold.
  • the method may further include for each image of the image data, setting, by the at least one processor, the class of each pixel to a background class when the pixel is not part of a largest connected region for the class to which the pixel is associated.
  • the method may further include converting, by the at least one processor, each of the label masks for the image data into respective spline contours.
  • the method may further include autonomously causing, by the at least one processor, the generated contours to be displayed with the image data on a display.
  • the method may further include receiving, by the at least one processor, a user modification of at least one of the displayed contours; and storing, by the at least one processor, the modified contour in the at least one nontransitory processor-readable storage medium.
  • the method may further include determining, by the at least one processor, the volume of at least one of the plurality of parts of the anatomical structure utilizing the generated contours.
  • the anatomical structure may include a heart, and the method may further include determining, by the at least one processor, the volume of at least one of the plurality of parts of the heart at a plurality of time points of a cardiac cycle utilizing the generated contours.
  • the method may further include automatically determining, by the at least one processor, which of the plurality of time points of the cardiac cycle correspond to an end systole phase and an end diastole phase of the cardiac cycle based on the time points determined to have a minimum volume and a maximum volume, respectively.
  • the method may further include causing, by the at least one processor, the determined volume of the at least one of the plurality of parts of the anatomical structure to be displayed on a display.
  • Receiving image data may include receiving volumetric image data.
  • Processing the received image data through a CNN model may include processing the received image data through a CNN model in which each convolutional layer may include a convolutional kernel of sizes N ⁇ N ⁇ K pixels, where N and K are positive integers.
  • a machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, the at least one processor: receives a plurality of sets of 3D MRI images, the images in each of the plurality of sets represent an anatomical structure of a patient; receives a plurality of annotations for the plurality of sets of 3D MRI images, each annotation indicative of a landmark of an anatomical structure of a patient depicted in a corresponding image; trains a convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images; and stores the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system.
  • CNN convolutional neural network
  • the at least one processor may train a fully convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images.
  • the at least one processor may train a CNN model which has an output which is one or more sets of spatial coordinates, each set of the one or more spatial coordinates identifying a location of one of the plurality of landmarks.
  • the CNN model may include a contracting path followed by one or more fully connected layers.
  • the CNN model may include a contracting path and an expanding path
  • the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by one or more convolutional layers
  • the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by one or more convolutional layers and comprises a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • the at least one processor may, for each of one or more landmarks of the anatomical structure, define a 3D label map based at least in part on the received sets of 3D MRI images and the received plurality of annotations, each 3D label map may encode a likelihood that the landmark is located at a particular location on the 3D label map, wherein the at least one processor may train the CNN model to segment the one or more landmarks utilizing the 3D MRI images and the generated 3D label maps.
  • the images in each of the plurality of sets may represent a heart of a patient at different respective time points of a cardiac cycle, and each annotation may be indicative of a landmark of a heart of a patient depicted in a corresponding image.
  • the at least one processor may, receive a set of 3D MRI images; process the received 3D MRI images through the CNN model to detect at least one of the one or more landmarks; and cause the detected at least one of the plurality of landmarks to be presented on a display.
  • the at least one processor may process the received 3D MRI images through the CNN model and outputs at least one of: a point or a label map.
  • the at least one processor may process the received 3D MRI images through the CNN model to detect at least one of the plurality of landmarks at a plurality of time points; and cause the detected at least one of the plurality of landmarks at a plurality of time points to be presented on a display.
  • the CNN model may utilize phase information associated with the received 3D MRI images.
  • a method of operating a machine learning system may include at least one nontransitory processor-readable storage medium that may store at least one of processor-executable instructions or data, and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, and may be summarized as including receiving, by the at least one processor, a plurality of sets of 3D MRI images, the images in each of the plurality of sets represent an anatomical structure of a patient; receiving, by the at least one processor, a plurality of annotations for the plurality of sets of 3D MRI images, each annotation indicative of a landmark of an anatomical structure of a patient depicted in a corresponding image; training, by the at least one processor, a convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images; and storing, by the at least one processor, the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system.
  • CNN convolutional neural network
  • Training a CNN model may include training a fully convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images.
  • Training a CNN model may include training a CNN model which has an output which is one or more sets of spatial coordinates, each set of the one or more spatial coordinates identifying a location of one of the plurality of landmarks.
  • Training a CNN model may include training a CNN model which may include a contracting path followed by one or more fully connected layers.
  • Training the CNN model may include training a CNN model which may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and includes a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • the method may further include for each of a plurality of landmarks of the anatomical structure, defining, by the at least one processor, a 3D label map based at least in part on the received sets of 3D MRI images and the received plurality of annotations, each 3D label map encodes a likelihood that the landmark is located at a particular location on the 3D label map;
  • Receiving a plurality of sets of 3D MRI images may include receiving a plurality of sets of 3D MRI images, and the images in each of the plurality of sets may represent a heart of a patient at different respective time points of a cardiac cycle, and each annotation may be indicative of a landmark of a heart of a patient depicted in a corresponding image.
  • the method may further include receiving, by the at least one processor, a set of 3D MRI images; processing, by the at least one processor, the received 3D MRI images through the CNN model to detect at least one of the plurality of landmarks; and causing, by the at least one processor, the detected at least one of the plurality of landmarks to be presented on a display.
  • the method may further include processing, by the at least one processor, the received 3D MRI images through the CNN model to detect at least one of the plurality of landmarks at a plurality of time points; and causing, by the at least one processor, the detected at least one of the plurality of landmarks at a plurality of time points to be presented on a display.
  • Training a CNN model may include training a CNN model which may utilize phase information associated with the received 3D MRI images.
  • a medical image processing system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data, cardiac MRI image data, and initial contours or masks that delineate the endocardium and epicardium of the heart; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, in operation, the at least one processor: accesses the cardiac MRI image data and initial contours or masks for a series; autonomously calculates an image intensity threshold that differentiates blood from papillary and trabeculae muscles in the interior of the endocardium contour; and autonomously applies the image intensity threshold to define a contour or mask that describes the boundary of the papillary and trabeculae muscles.
  • the at least one processor may compare a distribution of intensity values within the endocardium contour to a distribution of intensity values for a region between the endocardium contour and the epicardium contour.
  • the at least one processor may calculate each of the distributions of intensity values using a kernel density estimation of an empirical intensity distribution.
  • the at least one processor may determine the image intensity threshold to be the pixel intensity at the intersection of first and second probability distribution functions, the first probability distribution function being for the set of pixels within the endocardium contour, and the second probability distribution function being for the set of pixels in the region between the endocardium contour and the epicardium contour.
  • the initial contours or masks that delineate the endocardium of the heart may include the papillary and trabeculae muscles in the interior of the endocardium contour.
  • the at least one processor may calculate connected components of the blood pool region and discards one or more of the calculated connected components from the blood pool region.
  • the at least one processor may convert the connected components discarded from the blood pool region into the papillary and trabeculae muscle region.
  • the at least one processor may discard from the blood pool region all but the largest connected component in the blood pool region.
  • the at least one processor may allow the calculated contour or mask that describes the boundary of the papillary and trabeculae muscles to be edited by a user.
  • a machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data, medical imaging data of the heart, and a trained convolutional neural network (CNN) model; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, in operation, the at least one processor: calculates contours or masks delineating the endocardium and epicardium of the heart in the medical imaging data using the trained CNN model; and anatomically localizes pathologies or functional characteristics of the myocardial muscle using the calculated contours or masks.
  • the at least one processor may calculate the ventricular insertion points at which the right ventricular wall attaches to the left ventricle.
  • the at least one processor may calculate the ventricular insertion points based on the proximity of contours or masks delineating the left ventricle epicardium to one or both of the right ventricle endocardium or the right ventricle epicardium.
  • the at least one processor may calculate the ventricular insertion points in one or more two-dimensional cardiac images based on the two points in the cardiac image in which the left ventricle epicardium boundary diverges from one or both of the right ventricle endocardium boundary or the right ventricle epicardium boundary.
  • the at least one processor may calculate the ventricular insertion points based on the intersection between acquired long axis views of the left ventricle and the delineation of the left ventricle epicardium.
  • the at least one processor may calculate at least one ventricular insertion point based on the intersection between the left ventricle epicardium contour and the left heart 3-chamber long axis plane.
  • the at least one processor may calculate at least one ventricular insertion point based on the intersection between the left ventricle epicardium contour and the left heart 4-chamber long axis plane.
  • the at least one processor may calculate at least one ventricular insertion point based on the intersection between the left heart 3-chamber long axis plane and one or both of the right ventricle epicardium contour or the right ventricle endocardium contour.
  • the at least one processor may calculate at least one ventricular insertion point based on the intersection between the left heart 4-chamber long axis plane and one or both of the right ventricle epicardium contour or the right ventricle endocardium contour.
  • the at least one processor may allow a user to manually delineate the location of one or more of the ventricular insertion points.
  • the at least one processor may use a combination of contours and ventricular insertion points to present the anatomical location of pathologies or functional characteristics of the myocardial muscle in a standardized format.
  • the standardized format may be one or both of a 16- or 17-segment model of the myocardial muscle.
  • the medical imaging data of the heart may be one or more of functional cardiac images, myocardial delayed enhancement images or myocardial perfusion images.
  • the medical imaging data of the heart may be cardiac magnetic resonance images.
  • the trained CNN model may have been trained on annotated cardiac images of the same type as those for which the trained CNN model will be used for inference.
  • the trained CNN model may have been trained on one or more of functional cardiac images, myocardial delayed enhancement images or myocardial perfusion images.
  • the data on which the trained CNN model may have been trained may be cardiac magnetic resonance images.
  • the trained CNN model may have been trained on annotated cardiac images of a different type than those for which the trained CNN model will be used for inference.
  • the trained CNN model may have been trained on one or more of functional cardiac images, myocardial delayed enhancement images or myocardial perfusion images.
  • the data on which the trained CNN model may have been trained may be cardiac magnetic resonance images.
  • the at least one processor may fine tune the trained CNN model on data of the same type for which the CNN model will be used for inference. To fine tune the trained CNN model, the at least one processor may retrain some or all of the layers of the trained CNN model.
  • the at least one processor may apply postprocessing to the contours or masks delineating the endocardium and epicardium of the heart to minimize the amount of non-myocardial tissue that is present in the region of the heart identified as myocardium.
  • the at least one processor may apply morphological operations to the region of the heart identified as myocardium to reduce its area. The morphological operations may include one or more of erosion or dilation.
  • the at least one processor may modify the threshold applied to probability maps predicted by the trained CNN model to only identify pixels of myocardium for which the trained CNN model expresses a probability above a threshold that the pixels are part of the myocardium.
  • the threshold by which probability map values may be converted to class labels is greater than 0.5.
  • the at least one processor may shift vertices of contours that delineate the myocardium towards or away from the center of the ventricle of the heart to reduce the identified area of myocardium.
  • the pathologies or functional characteristics of the myocardial muscle may include one or more of myocardial scarring, myocardial infarction, coronary stenosis, or perfusion characteristics.
  • FIG. 1 is an example of a number of LV endocardium segmentations at a single time point over a full SAX (short axis) stack. From left to right, top to bottom, slices proceed from the apex of the left ventricle to the base of the left ventricle.
  • FIG. 2 is an example of an LV endocardium contour generated using a snakes algorithm.
  • FIG. 3 is two SSFP images showing ventricles, myocardium, and papillary muscles on the interior of the left ventricle endocardium.
  • the SSFP image on the left shows end diastole, and the SSFP image on the right shows end systole.
  • FIG. 4 is two images which show the challenges of distinguishing between the ventricle and atrium on the basal slice in the SAX plane.
  • FIG. 5 is a diagram of the U-Net network architecture used in LandMarkDetect.
  • FIG. 6 is a diagram of the Deep Ventricle network architecture with two convolutional layers per pooling layer and four pooling/upsampling operations, according to one illustrated implementation.
  • FIG. 7 is a flow diagram of the creation of an lightning memory-mapped database (LMDB) for training with SSFP data, according to one illustrated implementation.
  • LMDB lightning memory-mapped database
  • FIG. 8 is a flow diagram of a pipeline process for training a convolutional neural network model, according to one illustrated implementation.
  • FIG. 9 is a flow diagram which illustrates a process for an inference pipeline for SSFP data, according to one illustrated implementation.
  • FIG. 10 is a screenshot of an in-application SSFP inference result for LV endo at one time point and slice index.
  • FIG. 11 is a screenshot of an in-application SSFP inference result for LV epi at one time point and slice index.
  • FIG. 12 is a screenshot of an in-application SSFP inference result for RV endo at one time point and slice index.
  • FIG. 13 is a screenshot of an in-application SSFP calculated parameters from automatically segmented ventricles.
  • FIG. 14 is a screenshot which depicts two, three, and four chamber views with parallel lines that indicate the planes of a SAX stack.
  • FIG. 15 is a screenshot which depicts in a left panel two, three, and four chamber views showing a series of segmentation planes that are not parallel for the right ventricle.
  • a right panel depicts a reconstructed image along the highlighted plane seen in the two, three, and four chamber views.
  • FIG. 16 is a screenshot which illustrates segmenting the RV. Points in the contour (right panel) define the spline and are what is stored in a database. The contour is projected into the LAX views (left panel).
  • FIG. 17 is a screenshot which illustrates segmenting the same slice of the RV as in FIG. 16 , but with each of the two, three, and four chamber views slightly rotated to emphasize the segmentation plane with a depth effect.
  • FIG. 18 is a schematic diagram which illustrates creation of a lightning memory-mapped database (LMDB) for training with 4D Flow data, according to one illustrated implementation.
  • LMDB lightning memory-mapped database
  • FIG. 19 is a diagram which shows a multi-planar reconstruction (top left), an RV endo mask (top right), a LV epi mask (bottom left), and an LV endo mask (bottom right) generated from a SAX plane, available labels, and the image data. These masks may be stored in one array, and along with the image, may be stored in an LMDB under a single unique key.
  • FIG. 20 is a diagram which is similar to the diagram of FIG. 19 , except the LV epi mask is missing.
  • FIG. 21 is a flow diagram which illustrates an inference pipeline for 4D Flow, according to one illustrated implementation.
  • FIG. 22 is a screenshot depicting an in-application inference for LV endo on a 4D Flow study.
  • FIG. 23 is a screenshot depicting an in-application inference for LV epi on a 4D Flow study.
  • FIG. 24 is a screenshot depicting an in-application inference for RV endo on a 4D Flow study.
  • FIG. 25 is a screenshot which illustrates locating the left ventricle apex (LVA) using a web application, according to one illustrated implementation.
  • LVA left ventricle apex
  • FIG. 26 is a screenshot which illustrates locating the right ventricle apex (RVA) using a web application, according to one illustrated implementation.
  • RVA right ventricle apex
  • FIG. 27 is a screenshot which illustrates locating the mitral valve (MV) using a web application, according to one illustrated implementation.
  • FIG. 28 is a flow diagram which illustrates a process for creation of a training database, according to one illustrated implementation.
  • FIG. 29 is a diagram which illustrates encoding of a landmark position on an image with a Gaussian evaluated on the image.
  • FIG. 30 is a flow diagram of a preprocessing pipeline for the images and landmarks, according to one illustrated implementation.
  • FIG. 31 is a plurality of screenshots which depict an example of pre-processed an input image and encoded mitral valve landmark for one patient. From top to bottom, from left to right, sagittal, axial, and coronal views are shown.
  • FIG. 32 is a plurality of screenshots which depict an example of pre-processed input image and encoded tricuspid valve landmark for one patient. From top to bottom, from left to right, sagittal, axial, and coronal views are shown.
  • FIG. 33 is a diagram which illustrates prediction of a landmark position from network output.
  • FIG. 34 is an example image showing flow information overlaid on an anatomical image.
  • FIG. 35 is a block diagram of an example processor-based device used to implement one or more of the functions described herein, according to one non-limiting illustrated implementation.
  • FIG. 36 is a diagram of a fully convolutional encoder-decoder architecture with skip connections that utilizes a smaller expanding path than contracting path.
  • FIG. 37 shows box plots comparing the relative absolute volume error (RAVE) between FastVentricle and DeepVentricle for each of left ventricle (LV) Endo, LV Epi, and right ventricle (RV) Endo at ED (left panels) and ES (right panels).
  • RV relative absolute volume error
  • FIG. 38 shows a random input (left) that is optimized using gradient descent for DeepVentricle and FastVentricle (middle) to fit the label map (right, RV endo in red, LV endo in cyan, LV epi in blue).
  • FIG. 39 shows examples of network predictions on different slices and time points for studies with low RAVE for both DeepVentricle and FastVentricle.
  • FIG. 40 is an image showing relevant components of the cardiac anatomy as seen on cardiac MRI.
  • FIG. 41 is an image demonstrating an endocardium contour that includes the papillary muscles on the interior.
  • FIG. 42 is an image demonstrating a blood pool or endocardium contour that excludes the papillary muscles from the interior.
  • FIG. 43 is a flow diagram of one implementation of a process for delineating papillary and trabeculae muscles.
  • FIG. 44 is a flow diagram of one implementation of the papillary and trabeculae muscle intensity threshold calculation.
  • FIG. 45 is an illustration of the calculation of the overlap of pixel distribution between the blood pool and the myocardium.
  • FIG. 46 is a flow diagram of one implementation of a process to identify and display myocardial defects.
  • FIG. 47 is an image illustrating the location of the ventricular insertion points.
  • FIG. 6 shows a convolutional neural network (CNN) architecture 600 , referred to herein as DeepVentricle, utilized for ventricular segmentation on cardiac SSFP studies.
  • the network 600 includes two paths: the left side is a contracting path 602 , which includes convolution layers 606 and pooling layers 608 , and the right side is an expanding path 604 , which includes upsampling or transpose convolution layers 610 and convolution layers 606 .
  • CNN convolutional neural network
  • the number of free parameters in the network 600 determines the entropic capacity of the model, which is essentially the amount of information the model can remember. A significant fraction of these free parameters reside in the convolutional kernels of each layer in the network 600 .
  • the network 600 is configured such that, after every pooling layer 608 , the number of feature maps doubles and the spatial resolution is halved. After every upsampling layer 610 , the number of feature maps is halved and the spatial resolution is doubled. With this scheme, the number of feature maps for each layer across the network 600 can be fully described by the number (e.g., between 1 and 2000 feature maps) in the first layer. In at least some implementations, the number of features maps in the first layer is 128.
  • the network 600 includes two convolutional layers 606 before every pooling layer 608 , with convolution kernels of size 3 ⁇ 3 and stride 1 . Different combinations of these parameters (number of layers, convolution kernel size, convolution stride) may also be used. Based on a hyperparameter search, it was found that four pooling and upsampling operations worked best for the data under examination, though the results are only moderately sensitive to this number.
  • Downsampling the feature maps with a pooling operation may be an important step for learning higher level abstract features by means of convolutions that have a larger field of view in the space of the original image.
  • the network 600 utilizes a 2 ⁇ 2 max pooling operation with stride 2 to downsample images after every set of convolutions. Learned downsampling, i.e., convolving the input volume with a 2 ⁇ 2 convolution with stride 2 may also be used, but such may increase computational complexity. Generally, different combinations of pooling size and stride may also be used.
  • Upsampling the activation volumes back to the original resolution is necessary in a fully convolutional network for pixel-wise segmentation.
  • some systems may use an upsampling operation, then a 2 ⁇ 2 convolution, then a concatenation of feature maps from the corresponding contracting layer through a skip connection, and finally two 3 ⁇ 3 convolutions.
  • the network 600 replaces the upsampling and 2 ⁇ 2 convolution with a single transpose convolution layer 610 , which performs upsampling and interpolation with a learned kernel, improving the ability of the model to resolve fine details. That operation is followed with the skip connection concatenation, as shown by the bold arrows from the contracting path 602 to the expanding path 604 in FIG. 6 . Following this concatenation, two 3 ⁇ 3 convolutional layers are applied.
  • rectified linear units are used for all activations following convolutions.
  • Other nonlinear activation functions including PReLU (parametric ReLU) and ELU (exponential linear unit) may also be used.
  • Model hyperparameters may be stored in at least one nontransitory processor-readable storage medium (e.g., configuration file) that is read during training.
  • Parameters that describe the model may include:
  • Parameters that describe the training data to use may include:
  • Parameters that describe the data augmentation to use during training may include:
  • Parameters that describe training include:
  • a random search may be performed over these hyperparameters and the model with the highest validation accuracy may be chosen.
  • LMDB Lightning Memory-mapped Database
  • This database architecture holds many advantages over other means of storing the training data. Such advantages include: mapping of keys is lexicographical for speed; image/segmentation mask pairs are stored in the format required for training so they require no further preprocessing at training time; and reading image/segmentation mask pairs is a computationally cheap transaction.
  • the training data may generally be stored in a variety of other formats, including named files on disk and real-time generation of masks from the ground truth database for each image. These methods may achieve the same result, though they likely slow down the training process.
  • a new LMDB may be created for every unique set of inputs/targets that are to be used to train a model on. This ensures that there is no slowdown during training for image preprocessing.
  • the SSFP model disclosed herein attempts to distinguish four classes, namely, background, LV Endocardium, LV Epicardium and RV Endocardium.
  • the network output may include three probability maps, one for each non-background class.
  • ground truth binary masks for each of the three classes are provided to the network, along with the pixel data.
  • the network loss may be determined as the sum of the loss over the three classes. If any of the three ground truth masks are missing for an image (meaning that we have no data, as opposed to the ground truth being an empty mask), that mask may be ignored when calculating the loss.
  • Missing ground truth data is explicitly accounted for during the training process.
  • the network may be trained on an image for which the LV endocardium contour is defined, even if the LV epicardium and RV endocardium contour locations are not known.
  • a more basic architecture that could not account for missing data could only have been trained on a subset (e.g., 20 percent) of training images that have all three types of contours defined. Reducing the training data volume in this way would result in significantly reduced accuracy.
  • the full training data volume is used, allowing the network to learn more robust features.
  • FIG. 7 shows a process 700 for creation of an SSFP LMDB.
  • contour information is extracted from an SSFP ground truth database 704 . These contours are stored in the ground truth database 704 as dictionaries of contour X positions and Y positions, associated with specific SSFP slice locations and time points.
  • the pixel data from the corresponding SSFP DICOM (Digital Imaging and Communications in Medicine) image 708 is paired with a Boolean mask created from this information.
  • the system preprocesses the images and masks by normalizing the images, cropping the images/masks, and resizing the images/masks.
  • the MRIs are normalized such that they have a mean of zero and that the 1st and 99th percentile of a batch of images fall at ⁇ 0.5 and 0.5, i.e., their “usable range” falls between ⁇ 0.5 and 0.5.
  • the images may be cropped and resized such that the ventricle contours take up a larger percentage of the image. This results in more total foreground class pixels, making it easier to resolve fine details (especially the corners) of the ventricles and helping the model converge, all with less computing power.
  • a unique key for SSFP LMDBs is defined to be the combination of the series instance UID and SOP instance UID.
  • the image and mask metadata, including the time point, slice index and LMDB key are stored in a dataframe.
  • the normalized, cropped, and resized image and the cropped and resized mask are stored in the LMDB for each key.
  • FIG. 8 shows a process 800 that illustrates model training.
  • Keras an open-source wrapper built on TensorFlow, may be used to train models.
  • equivalent results may be achieved using raw TensorFlow, Theano, Caffe, Torch, MXNet, MATLAB, or other libraries for tensor math.
  • a dataset may be split into a training set, validation set, and test set.
  • the training set is used for model gradient updates
  • the validation set is used to evaluate the model on “held out” data during training (e.g., for early stopping)
  • the test set is not used at all in the training process.
  • training is invoked.
  • image and mask data is read from the LMDB training set, one batch at a time.
  • the images and masks are distorted according to the distortion hyperparameters stored in a model hyperparameter file, as discussed above.
  • the batch is processed through the network.
  • the loss/gradients are calculated.
  • weights are updated as per the specified optimizer and optimizer learning rate. In at least some implementations, loss may be calculated using a per-pixel cross-entropy loss function and the Adam update rule.
  • the system may determine whether the epoch is complete. If the epoch is not complete, the process returns to act 804 to read another batch of training data. At 816 , if the epoch is complete, metrics are calculated on the validation set. Such metrics may include, for example, validation loss, validation accuracy, relative accuracy versus a naive model that predicts only the majority class, f1 score, precision, and recall.
  • validation loss may be monitored to determine if the model improved.
  • the weights of the model at that time may be saved.
  • the early stopping counter may be reset to zero, and training for another epoch may begin at 804 .
  • Metrics other than validation loss, such as validation accuracy, could also be used to indicate evaluate model performance.
  • the early stopping counter is incremented by 1.
  • training is begun for another epoch at 804 .
  • the counter if the counter has reached its limit, training the model is stopped. This “early stopping” methodology is used to prevent overfitting, but other methods of overfitting prevention exist, such as utilizing a smaller model, increasing the level of dropout or L2 regularization, for example.
  • Data from the test set may be used to show examples of segmentations, but this information is not used for training or for ranking models with respect to one another.
  • Inference is the process of utilizing a trained model for prediction on new data.
  • a web application (or “web app”) may be used for inference.
  • FIG. 9 displays an example pipeline or process 900 by which predictions may be made on new SSFP studies.
  • the user may invoke the inference service (e.g., by clicking a “generate missing contours” icon), which automatically generates any missing (not yet created) contours.
  • Such contours may include LV Endo, LV Epi, or RV Endo, for example.
  • inference may be invoked automatically when the study is either loaded by the user in the application or when the study is first uploaded by the user to a server. If inference is performed at upload time, the predictions may be stored in a nontransitory processor-readable storage medium at that time but not displayed until the user opens the study.
  • the inference service is responsible for loading a model, generating contours, and displaying them for the user.
  • images are sent to an inference server.
  • the production model or network that is used by the inference service is loaded onto the inference server.
  • the network may have been previously selected from the corpus of models trained during hyperparameter search. The network may be chosen based on a tradeoff between accuracy, memory usage and speed of execution. The user may alternatively be given a choice between a “fast” or “accurate” model via a user preference option.
  • one batch of images at a time is processed by the inference server.
  • the images are preprocessed (e.g., normalized, cropped) using the same parameters that were utilized during training, discussed above.
  • inference-time distortions are applied and the average inference result is taken on, for example, 10 distorted copies of each input image. This feature creates inference results that are robust to small variations in brightness, contrast, orientation, etc.
  • Inference is performed at the slice locations and time points in the requested batch.
  • a forward pass through the network is computed.
  • the model For a given image, the model generates per-class probabilities for each pixel during the forward pass, which results in a set of probability maps, one for each class, with values ranging from 0 to 1.
  • the probability maps are transformed into a single label mask by setting the class of each pixel to the class with the highest label map probability.
  • the system may perform postprocessing. For example, in at least some implementations, if all probabilities for a pixel are below 0.5, the pixel class for that pixel is set to background. Further, to remove spurious predicted pixels, any pixels in the label map that are not part of the largest connected region for that class may be converted to background. In at least some implementations, spurious pixels may be removed by comparing neighboring segmentation maps in time and space and removing outliers. Alternately, because a given ventricles may occasionally appear in a single slice as two distinct connected regions because, for example, the RV is non-convex near the base of the heart, multiple connected regions may be allowed but small regions or regions that are distant from centroid of all detected regions across slice locations and times may be removed.
  • postprocessing to satisfy one or more physical constraints may be performed at 914 .
  • postprocessing may ensure that the myocardium volume is the same at all time points.
  • the system may dynamically adjust the threshold used to binarize the endocardium and epicardium probability maps before converting them to contours. The thresholds can be adjusted to minimize the discrepancy in reported myocardium volume using nonlinear least squares, for example.
  • the postprocessing act may ensure that the RV and LV do not overlap. To achieve this, the system may only allow any given pixel to belong to one class, which is the class with the highest inferred probability. The user may have a configuration option to enable or disable imposition of selected constraints.
  • a new batch is added to the processing pipeline at 908 until inference has been performed at all slice locations and all time points.
  • the mask may be converted to a spline contour.
  • the first step is to convert the mask to a polygon by marking all the pixels on the border of the mask.
  • This polygon is then converted to a set of control points for a spline using a corner detection algorithm, based on A. Rosenfeld and J. S. Weszka. “An improved method of angle detection on digital curves.” Computers, IEEE Transactions on, C-24(9):940-941, September 1975.
  • a typical polygon from one of these masks will have hundreds of vertices.
  • the corner detection attempts to reduce this to a set of approximately sixteen spline control points. This reduces storage requirements and results in a smoother-looking segmentation.
  • these splines are stored in a database and displayed to the user in the web application. If the user modifies a spline, the database may be updated with the modified spline.
  • volumes are calculated by creating a volumetric mesh from all vertices for a given time point.
  • the vertices are ordered on every slice of the 3D volume.
  • An open cubic spline is generated that connects the first vertex in each contour, a second spline that connects the second vertex, etc., for each vertex in the contour, until a cylindrical grid of vertices is obtained which is used to define the mesh.
  • the internal volume of the polygonal mesh is then calculated. Based on calculated volumes, which time points represent the end systole phase and end diastole phase is autonomously determined based on the times with the minimum and maximum volumes, respectively, and these time points are labeled for the user.
  • FIGS. 10, 11, and 12 show example images 1000 , 1100 and 1200 , respectively, of in-application inference results for LV Endo contour 1002 , LV Epi contour 1102 , and RV Endo contour 1202 , respectively, at a single time point and slice location.
  • contours e.g., contours 1002 , 1102 and 1202
  • the system calculates and shows ventricle volumes at ED and ES to the user, as well as multiple computed measurements.
  • An example interface 1300 is shown in FIG. 13 which displays multiple computed measurements.
  • these measurements include stroke volume (SV) 1302 , which is the volume of blood ejected from the ventricle in one cardiac cycle; ejection fraction (EF) 1304 , which is the fraction of the blood pool ejected from the ventricle in one cardiac cycle; cardiac output (CO) 1306 , which is the average rate at which blood leaves the ventricle, ED mass 1308 , which is the mass of the myocardium (i.e., epicardium-endocardium) for the ventricle at end diastole; and ES mass 1310 , which is the mass of the myocardium for the ventricle at end systole.
  • SV stroke volume
  • EF ejection fraction
  • CO cardiac output
  • ED mass 1308 which is the mass of the myocardium (i.e., epicardium-endocardium) for the ventricle at end diastole
  • ES mass 1310 which is the mass of the myocardium for the ventricle at end s
  • 4D Flow data For 4D Flow data, the same DeepVentricle architecture, hyperparameter search methodology, and training database as described above for SSFP data may be used. Training a 4D Flow model may be the same as in the SSFP operation discussed above, but the creation of an LMDB and inference may be different for the 4D Flow implementation.
  • FIG. 14 shows a set 1400 of SAX planes (also referred to as a SAX stack) for the LV in which each SAX plane is parallel for a two chamber view 1402 , a three chamber view 1404 and a four chamber view 1406 .
  • FIG. 15 shows a set 1500 of views of a SAX stack in which the segmentation planes are not parallel for the RV for a two chamber view 1502 , a three chamber view 1504 , a four chamber view 1506 and a reconstructed image 1508 .
  • This is motivated by the fact that it is slightly easier to segment the ventricle if the segmentation plane does not intersect the valve plane but rather is parallel to it. However, this is not a requirement, and it is possible to get accurate results without using this feature.
  • FIGS. 16 and 17 segmentations are performed on a multi-planar reconstruction of the image data on each SAX plane.
  • Points 1602 on a contour 1604 in an image 1606 define the spline and are what is stored in the database.
  • the contour 1604 is projected into a two chamber LAX view 1608 , three chamber LAX view 1610 and four chamber LAX view 1612 .
  • FIG. 17 shows images 1702 , 1704 , 1706 and 1708 in which the same slice of FIG. 16 is segmented, but with each of the two chamber view 1704 , three chamber view 1706 and four chamber view 1708 slightly rotated to emphasize the segmentation plane with a depth effect.
  • FIG. 18 shows a process 1800 of creating a training LMDB from clinician annotations.
  • 4D Flow annotations may be stored in a MongoDB 1802 .
  • the system extracts the contours and landmarks, respectively. Contours are stored as a series of (x, y, z) points defining the splines of the contour.
  • Landmarks are stored as a single four-dimensional coordinate (x, y, z, t) for each landmark.
  • the system calculates a rotation matrix to rotate the contour points onto the x-y plane.
  • the system may also define a sampling grid, i.e., a set of (x, y, z) points, on the original plane of the contour.
  • the system rotates both the contour and the sampling grid by the same rotation matrix such that they are in the x-y plane. It is now a simple task to determine which points of the sampling grid are within the 2D vertices defining the contour. This is a simple computational geometry problem for 2D polygons.
  • 4D Flow DICOMs are stored in a database 1810 .
  • the system uses the landmark annotations from act 1806 and the 4D Flow DICOMs from the database 1810 to define and generate images along a SAX stack.
  • this SAX stack is different from that of the original SAX stacks in which the ground truth contours were defined.
  • the system defines the stack to be orthogonal to the line connecting the left ventricle apex (LVA) and the mitral valve (MV).
  • LVA left ventricle apex
  • MV mitral valve
  • the system defines there to be a number (e.g., 14) of slices between the LVA and MV, as this is similar to the number of slices in most SSFP SAX stacks. Different numbers of slices may also be used. More slices would increase the training set diversity, though the actual on-disk size would increase more rapidly than the increase in diversity. The results are expected to be insensitive to the exact number of slices.
  • the SAX stack may be oriented such that the RV is always on the left of the image (as is conventional in cardiac MR) by ensuring that aortic valve (AV) is oriented to the right of the line connecting the LVA to the MV. Although consistency of orientation is likely important for achieving good results, the exact chosen orientation is arbitrary.
  • all the available contours for a given study are interpolated to be on a single non-curved SAX stack for simplicity and speed in training and inference.
  • a linear interpolator is set up for each ventricle and time point described by the original sampling grids, i.e., series of (x, y, z) points, and their corresponding masks.
  • the system then interpolates the ground truth masks from their original SAX stacks onto the study's common SAX stack. An example of this is shown in the view 1900 of FIG.
  • FIG. 19 which shows a multi-planar reconstruction 1902 , an RV endo mask 1904 , an LV epi mask 1906 and an LV endo mask 1908 .
  • a sentinel is used within the interpolated ground truth masks to indicate when labels are missing.
  • An example visualization 2000 of this is shown in FIG. 20 , which shows a multi-planar reconstruction 2002 , an RV endo mask 2004 , a missing LV epi mask 2006 , and an LV endo mask 2008 .
  • the masks may be projected onto the axial plane and perform training and inference in the axial plane. This may achieve similar accuracy, but may result in a slight loss of resolution due to the need to re-project inferred contours back into the SAX stack to display within the application's user interface.
  • preprocessing acts may include normalizing the images, cropping the images/masks, and resizing the images/masks.
  • the system defines a unique key for 4D Flow LMDBs to be a 32 character hash of the string combination of the time index, slice index, side (“right” or “left”), layer (“endo” or “epi”), upload ID, workspace ID (unique identifier for one person's annotations), and workflow key (unique identifier for a given user's workflow in which they did the work). Any of a number of other unique keys for each image/mask pair may alternatively be used.
  • the system stores the image and mask metadata, including the time point, slice index and LMDB key in a dataframe. The normalized, cropped, and resized image and the cropped and resized mask are stored in an LMDB 1822 for each key.
  • FIG. 21 shows a pipeline for a process 2100 by which the system makes predictions on new 4D Flow studies.
  • the user can invoke the inference service through a pipeline that is similar to the inference pipeline described above and shown in FIG. 9 .
  • Landmarks have already been defined, either manually or automatically (e.g., by an automatic landmark finding algorithm discussed below).
  • the landmark positions are used to create a standard LV SAX stack on which to perform inference.
  • the SAX stack is created in the same way that the SAX stack was created during training, described above.
  • the metadata required to describe each MPR in the SAX stack is calculated from the landmark locations.
  • the plane of each MPR is fully defined by a point on the plane and the normal of the plane, but the system also passes the vector connecting the mitral valve and aortic valve in this implementation to ensure the image will be oriented correctly. That is, the right ventricle is to the left in the images.
  • Another set of landmarks, such as the mitral valve and tricuspid valve may also suffice for ensuring the right ventricle was to the left in the images.
  • the MPR metadata is then sent to the compute servers, which hold a distributed version of the data (each compute node has a few time points of data).
  • each node renders the requested MPRs for the time points it has available.
  • the generated MPR images, along with their metadata, including the time point, orientation, position, and slice index, are then distributed evenly by time point across multiple inference servers.
  • the network is loaded onto each inference node.
  • one batch of images at a time is processed by each inference node.
  • the images are preprocessed.
  • a forward pass is computed.
  • the predictions are postprocessed, and spline contours are created in the same way as in the SSFP implementations discussed above.
  • the generated splines are forwarded back to the web server after all batches have been processed, where the splines are joined with the inference results from other inference nodes.
  • the web server ensures that the volume is contiguous (i.e., no missing contours in the middle of the volume) by interpolating between neighboring slices if a contour is missing.
  • the web server saves the contours in the database, then presents the contours to the user via the web application. If the user edits a spline, the spline's updated version is saved in the database alongside the original, automatically-generated version. In at least some implementations, comparing manually edited contours with their original, automatically-generated versions, may be used to retrain or fine-tune a model only on inference results that required manual correction.
  • FIGS. 22, 23, and 24 show images 2200 , 2300 and 2400 , respectively, of in-application inference for LV Endo (contour 2202 ), LV Epi (contour 2302 ), and RV Endo (contour 2402 ), respectively at a single time point and slice location.
  • the calculated volumes at ED and ES may be presented to the user, as well as multiple computed measurements (see FIG. 13 ).
  • the DeepVentricle architecture for this implementation is nearly identical to that discussed above, except convolutional kernels are (N ⁇ M ⁇ K) pixels rather than just (N ⁇ M) pixels, where N, M and K are positive integers which may be equal to or different from each other.
  • the model parameters also look similar, but the addition of a depth component in describing the training data may be necessary to fully describe the shape of the volumetric input image.
  • a training LMDB is utilized for this implementation, as with other implementations.
  • the LMDB for this implementation may be created in a similar way to that of the 4D Flow implementation discussed above.
  • many more slices are utilized to define the SAX such that the slice spacing between neighboring slices is similar to that of the pixel spacing in the x and y directions (i.e., pixel spacing is nearly isotropic in three dimensions). It is likely that similar results could be achieved with non-isotropic pixel spacing, as long as the ratio between pixel spacings is conserved across all studies.
  • the SAX MPRs and masks are then ordered by spatial slice and these slices are concatenated into one coherent volumetric image. Model training occurs according to the same pipeline as described above with reference to FIG. 8 .
  • the inference pipeline closely resembles that of the 4D Flow implementation as well. However, in this implementations neighboring MPRs need to be concatenated into one volumetric image before inference.
  • An additional implementation of the DeepVentricle automatic segmentation model is one in which only the blood pool of the ventricle is segmented and the papillary muscles are excluded.
  • the papillary muscles are small and irregularly shaped, they are typically included in the segmented areas for convenience.
  • the architecture, hyperparameters, and training database of this implementation, which excludes the papillary muscles from the blood pool, are all similar to the SSFP implementation discussed above.
  • the ground truth segmentation database contains left and right ventricle endocardium annotations that exclude the papillary muscles rather than include them.
  • segmentations that exclude the papillary muscles from endocardium contours are onerous to create, the quantity of training data may be significantly less than what can be acquired for segmentations that do not exclude the papillary muscles.
  • a convolutional neural network that was trained on data for which papillary muscles were included in endocardium segmentations and excluded from epicardium segmentations may be used. This allows the network to learn to segment the general size and shape of each class. That network is then fine-tuned on a smaller set of data that excludes the papillary muscles from the segmentation.
  • the result is a segmentation model that segments the same classes as before, but with papillary muscles excluded from endocardium segmentations. This results in a more accurate measure of ventricular blood pool volume than has been previously available when the papillary muscles were included within the endocardium contour.
  • a standard 2D approach includes the network operating on a single slice from the 3D volume at a time. In this case, only the information from that single slice is used to classify or segment the data in that slice. The problem with this approach is that no context from surrounding time points or slices is incorporated into inference for the slice of interest.
  • a standard 3D approach utilizes a 3D kernel and incorporates volumetric information in order to make volumetric predictions. However, this approach is slow and requires significant computational resources for both training and inference.
  • Spatial context is particularly useful for ventricular segmentation near the base of the heart, where the mitral and tricuspid valves are difficult to distinguish on a single 2D slice.
  • Temporal context, and enforcing consistency of the segmentations, may be useful for all parts of the segmentation.
  • the problem is interpreted as 2D problem, making predictions on a single slice at a time, but with adjacent slices (either in space, time or both) interpreted as additional “channels” of the image.
  • the network operates with 2D convolutions, but incorporates data from nearby spatial and temporal locations, and synthesizes the information via the standard neural network technique of creating feature maps via linear combinations of the input channels convolved with learned kernels.
  • a second approach is specific to some intricacies of cardiac MM, though it may be used in any scenario in which orthogonal (or oblique) planes of data are acquired.
  • SAX short axis
  • LAX long axis
  • the LAX planes are orthogonal to the SAX stack, and the LAX planes typically have significantly higher spatial resolution in the direction along the left ventricle's long axis. That is, an LAX image created by an MPR of a SAX stack has poorer resolution than a native LAX image, since the SAX inter-slice spacing is significantly coarser than the LAX in-plane pixel spacing. Because of the higher spatial resolution in the long axis direction, it is much easier to see the valves in the LAX images compared with the SAX images.
  • a two-stage ventricle segmentation model may be utilized.
  • the ventricles are segmented in one or more LAX planes. Because of the high spatial resolution of these images, the segmentation can be very precise.
  • a disadvantage is the LAX plane consists of only a single plane instead of a volume. If this LAX segmentation is projected to the SAX stack, the LAX segmentation appears as a line on each of the SAX images. This line may be created precisely if the line is aggregated across segmentations from multiple LAX views (e.g., 2CH, 3CH, 4CH; see the heading “Interface for defining valve planes for manual LV/RV volumes” below).
  • This line may be used to bound the SAX segmentation, which is generated via a different model that operates on the SAX images.
  • the SAX segmentation model uses both the raw SAX DICOM data as well as the predicted projected lines from the LAX model(s) as inputs in order to make its prediction.
  • the predicted LAX lines serve to guide and bound the SAX predictions, and particularly aid the model near the base of the heart and valve plane, where the segmentations are often ambiguous when viewed on the SAX stack alone.
  • This technique may be used for any cardiac imaging, including 4D Flow in which the entire volume is acquired at once (and SAX and LAX images are not collected separately), and has the advantage of requiring only 2D kernels to be employed, albeit in two chained models.
  • SSFP cine studies contain of 4 dimensions of data ( 3 space, 1 time), and 4D Flow studies contain 5 dimensions of data ( 3 space, 1 time, 4 channels of information). These 4 channels of information are the anatomy (i.e. signal intensity), x axis phase, y axis phase, and z axis phase.
  • the simplest way to build a model uses only signal intensities at each 3D spatial point, and does not incorporate the temporal information or, for 4D Flow, the flow information. This simple model takes as input 3D data cubes of shape (x, y, z).
  • the time and phase data are incorporated as well. This is particularly useful for at least a few reasons.
  • time may be added as an additional “channel” to the intensity data.
  • the model then takes as input 3D data blobs of shape (X, Y, NTIMES) or 4D data blobs of shape (X, Y, Z, NTIMES), where NTIMES is the number of time points to include. This may be all time points, or a few time points surrounding the time point of interest. If all time points are included, it may be desirable or necessary to pad the data with a few “wrapped around” time points, since time represents a cardiac cycle and is intrinsically cyclical.
  • the model may then either involve 2D/3D convolutions with time points as additional “channels” of the data, or 3D/4D convolutions. In the former case, the output may be 2D/3D at a single time of interest. In the latter case, the output may be 3D/4D and may include data at the same time points as were included in the input.
  • Phase data as acquired in 4D Flow, may also be incorporated in an analogous way, using either each direction of phase (x, y, z) as an additional channel of the input data, or using only the phase magnitude as a single additional channel.
  • the input has shape (X, Y, Z, 4) where the 4 indicates pixel intensity and the three components of phase. With time, this shape is (X, Y, Z, NTIMES, 4). In such implementations, the model therefore operates with 4 or 5-dimensional convolutions.
  • Systems and methods discussed herein also allow for automated detection of the region of multiple cardiac landmarks in a 3D MRI.
  • the system handles diverse sets of MRIs with varying position, orientation, and appearance of the imaged heart.
  • the system effectively deals with the problem of learning from a database with incomplete annotations. More specifically, the system addresses the problem of detecting every landmark in an image, when only some landmarks have been located for each input volumetric image on the training set.
  • the pipeline is an end-to-end machine learning algorithm which autonomously outputs the required landmark position from raw 3D images.
  • the system requires no pre-processing or prior knowledge from the user.
  • the detected landmarks in the volumetric image may be used to project the image along the 2CH, 3CH, 4CH, and SAX views. Such allows these views to be created automatically, with no intervention by the user.
  • cardiac landmarks are located using a neural network with many layers.
  • the architecture is three dimensional (3D) and uses 3D convolutions. This description focuses on the detection of three left ventricle landmarks (LV apex, mitral valve, and aortic valve), and three right ventricle landmarks (RV apex, tricuspid valve, and pulmonary valve). However, it is noted that this method may be applied for the detection of more diverse cardiac landmarks with comparable results, if these annotations are available as part of the ground truth.
  • the landmark detection method of the present disclosure is based on convolutional neural networks.
  • the information necessary for landmark detection is extracted from a database of clinical images, along with their annotations (i.e., landmark positions).
  • FIGS. 25, 26, and 27 show images 2500 , 2600 , 2700 , respectively, of three patients where the left ventricle apex, mitral valve, and right ventricle apex, respectively, have been positioned using a web application, such as the web application discussed above. Note how annotations for the aortic valve, pulmonary valve, and tricuspid valve are missing in this example.
  • the data handling pipeline is described. This section details the process which is followed to create the database of images with their annotations, along with the specific method used to encode landmark location.
  • the architecture of the machine learning approach is presented. How the network transforms the input 3D image into a prediction of landmark location is presented.
  • Third, how the model is trained to the available data is described.
  • the inference pipeline is detailed. It is shown how one can apply the neural network to an image never used before to predict the region of all six landmarks.
  • a database of 4D Flow data which includes three dimensional (3D) magnetic resonance images (MRI) of the heart, stored as series of two dimensional (2D) DICOM images.
  • 3D volumetric images are acquired throughout a single cardiac cycle, each corresponding to one snapshot of the heartbeat.
  • the initial database thus corresponds to the 3D images of different patients at different time steps.
  • Each 3D MRI presents a number of landmark annotations, from zero landmark to six landmarks, placed by the user of the web application.
  • the landmark annotations if present, are stored as vectors of coordinates (x, y, z, t) indicating the position (x, y, z) of the landmark in the 3D MRI corresponding to the time point t.
  • FIG. 28 shows a process 2800 which may be followed to handle 2D DICOM slices of the 4D Flow images 2802 , and the annotations 2804 stored in a MongoDB database.
  • the landmark coordinates are extracted from the MongoDB database.
  • the 3D MRIs are extracted from the series of 2D DICOM images by stacking 2D DICOM images from a single time point together according to their location along the z-axis (i.e., the 2D images are stacked along the depth dimension to create 3D volumes). This results in a volumetric 3D image representing a full view of the heart.
  • the LMDB is built with 3D images that have been annotated with at least one landmark position. This means that images with no ground truth landmarks are not included in the LMDB.
  • the label maps are defined which encode the annotation information in a way understandable by the neural network which will be used in later stages.
  • the position of a landmark is encoded by indicating, at each position in the 3D volume, how likely the position is to be at the landmark position. To do so, a 3D Gaussian probability distribution is created, centered on the position of the ground truth landmark with standard deviation corresponding to observed inter-rater variability of that type of landmark across all the training data.
  • the standard deviation of the LV Apex coordinates across all users is computed.
  • the standard deviation for Gaussian used to encode each landmark is defined. This process allows for the setting of this parameter in a principled manner.
  • the standard deviation is different for each landmark, and depends on the complexity of locating the landmark. Specifically, more difficult landmarks have larger Gaussian standard deviation in the target probability maps. Further, the standard deviation is different along the x, y, and z axis, reflecting the fact that the uncertainty might be larger along one direction rather than another because of the anatomy of the heart and/or the resolution of the images.
  • FIG. 29 shows this transition from a landmark position, identified with a cross 2902 in a view 2904 , to a Gaussian 2906 in a view 2908 evaluated on the image for the 2D case.
  • the images are prepocessed.
  • the goal is to normalize the images size and appearance for future training.
  • FIG. 30 shows a process 3000 for a preprocessing pipeline.
  • the 3D MRIs 3002 and label maps 3004 are resized to a predefined size $n_x ⁇ times n_y ⁇ times n_z$ such that all of the MRIs can be fed to the same neural network.
  • the intensity of the MRI pixels are clipped between the 1st and 99th percentile. This means that the pixel intensity will saturate at the value of the intensity corresponding to the 1st and 99th percentile. This removes outlier pixel intensities that may be caused by artifacts.
  • the intensities are then scaled to lie between 0 and 1.
  • the intensity histogram is then normalized using contrast limited adaptive histogram equalization to maximize contrast in the image and minimize intra-image intensity differences (as may be caused by, for example, magnetic field inhomogeneities).
  • the image is centered to have zero mean.
  • Other strategies may be used for the normalization of the image intensity, such as normalizing the variance of the input to one, and may yield similar results.
  • This pipeline results in preprocessed images 3018 and labels 3020 which can be fed to the network.
  • FIGS. 31 and 32 show example images 3100 and 3200 for two patients of the pre-processed 3D MRI and encoded labels.
  • FIG. 31 shows a sagittal view 3102 , an axial view 3104 , and a coronal view 3106 of a preprocessed input image and encoded mitral valve landmark for one patient
  • FIG. 32 shows a sagittal view 3202 , an axial view 3204 , and a coronal view 3206 of a preprocessed input image and encoded mitral valve landmark for another patient.
  • the uncertainty for the localization of the tricuspid valve is larger than the uncertainty for the mitral valve.
  • the uncertainty is different from one axis to the other.
  • an upload ID is defined to be the key that identifies the pair (MRI, label map), which is stored in a training LMDB database at 2816 .
  • the pair (MRI, label map) is written to the LMDB.
  • a deep neural network is used for the detection of the landmarks.
  • the network takes as input a preprocessed 3D MRI and outputs six 3D label maps, one per landmark.
  • the architecture used in this implementation is similar or identical to the architecture described above.
  • the network is composed of two symmetric paths: a contracting path and an expanding path (see FIG. 6 ).
  • the systems and methods of the present disclosure advantageously handle missing information in the labels while still being able to predict all landmarks simultaneously.
  • the network used for landmark detection differs from the DeepVentricle implementation discussed above in three main ways.
  • the architecture is three dimensional: the network processes a 3D MRI in a single pass, producing a 3D label map for every landmark.
  • the network predicts 6 classes, one for each landmark.
  • the parameters selected after the hyperparameter search can differ from the DeepVentricle parameters, and are specifically selected to solve the problem at hand. Additionally, the standard deviation used to define the label maps, discussed above, may be considered as a hyperparameter.
  • the output of the network is a 3D map which encodes where the landmark is positioned. High values of the map may correspond to likely landmark position, and low values may correspond to unlikely landmark position.
  • the following discussion describes how the deep neural network can be trained using the LMDB database of 3D MRI and label map pairs.
  • the overall objective is to tune the parameters of the network such that the network is able to predict the position of the heart landmarks on previously unseen images.
  • a flowchart of the training process is shown in FIG. 8 and described above.
  • the training database may be split into a training set used to train the model, a validation set used to quantify the quality of the model, and a test set.
  • the split enforces all the images from a single patient to lie in the same set. This guarantees that the model is not validated with patients used for training.
  • Data from the test set may be used to show examples of landmark localization, but this information is not used for training or for ranking models with respect to one another.
  • the gradient of the loss is used in order to update the parameters of the neural network.
  • weighting the loss in regions where the landmark is present may be utilized to provide faster convergence of the network. More precisely, when computing the loss, a larger weight may be applied to the region of the image near the landmarks, compared to the rest of the image. As a result, the network converges more rapidly. However, using non weighted-loss may also be used with good results, albeit with a longer training time.
  • the landmark position is inferred by pre-processing the image in a similar fashion as to what is described above with reference to FIG. 28 . More precisely, the image may be resized, clipped, scaled, the image's histogram equalized, and the image may be centered.
  • the network outputs one map per landmark, for a total of six 3D maps in the case of six landmarks. These maps describe the probability that each landmark is found in a particular position. Alternatively, the maps can be considered as encoding an inverse distance function from the true location of the landmark (i.e., high value results in small distance, low value results in large distance).
  • the landmark position 3302 can be determined by looking for the maximum value of the output of the neural network for each landmark. This position is then projected into the space of the original non-preprocessed 3D input MRI for the final landmarks localization (e.g., undoing any spatial distortions that were applied to the volume during inference).
  • the final landmarks localization e.g., undoing any spatial distortions that were applied to the volume during inference.
  • several other strategies may be used to translate the label maps into landmark position coordinates. For instance, one could take the expected location using the label map as a 3D probability density. Note that taking the maximum corresponds to considering the mode of the density. Alternatively, the probability estimate may be first smoothed before selecting the maximum or expected value as the location.
  • the dataset is made of clinical studies uploaded on the web application by previous users.
  • the annotations may be placed by the user on the different images. As explained previously, this dataset is split into a train, validation, and test set.
  • the neural network may be trained using the pipeline previously described above and shown in FIG. 8 . Batches of data extracted from the train set are sequentially fed to the neural network. The gradient of the loss between the network prediction and the real landmark location is computed and backpropagated to update the intrinsic parameters of the network. Other model hyperparameters (e.g., network size, shape) are chosen using hyper-parameter search, as discussed above.
  • the trained model may be stored on servers as part of a cloud service.
  • the model can be loaded on multiple servers at inference time in order to carry the detection of landmarks at several time points in parallel. This process is similar to the approach used for DeepVentricle, which is shown in FIG. 9 and discussed above.
  • the user can select a “Views” button under a “Cardiac” section. This opens a new panel on the right of the image with a “Locate Landmarks” button. Selecting this button automatically locates the six landmarks on every 3D image at every time point. A list of the located landmark is visible on the right panel. Selecting the landmark name brings the focus of the image to the predicted landmark location, and the user is allowed to make any modifications deemed necessary. Once the user is satisfied, the user can select a “standard views” button which creates the standard 2, 3, 4 chamber and SAX views of the heart.
  • the 3D images acquired are 4D Flow sequences.
  • the previously described model may be augmented to include flow information.
  • flow velocity information is available at every time point of the acquisition for every patient.
  • the standard deviation along the time axis may be computed at every voxel of the 3D image.
  • Standard deviation magnitude is associated with the amount of blood flow variation of that pixel over the course of one heartbeat.
  • This standard deviation image is then normalized according the previously described normalization pipeline: resizing, clipping, scaling, histogram equalization, centering.
  • the Fourier transform of the 4D signal may be computed along the last dimension, and various frequency bins may be used to encode the signal. More generally, the whole time series may be input to the network, at the expense of requiring additional computation and memory power.
  • the input to the neural network may also be extended with an additional channel. More precisely, a four dimensional (4D) tensor may be defined where the last dimension encodes as separate channels the anatomical pixel intensity and the flow magnitude or components of velocity.
  • 4D four dimensional
  • the network described above may be extended to accept such tensor as input. This requires the extension of the first layer to accept a 4D tensor. The subsequent steps of network training, inference, and user interactions remain similar to what has been previously described.
  • the automatic location of cardiac landmarks may be achieved by directly predicting the coordinates (x, y, z) of the different landmarks.
  • a different network architecture may be used.
  • This alternative network may be composed of a contracting path, followed with several fully connected layers, with a length-three vector of (x, y, z) coordinates as the output for each landmark.
  • This is a regression, rather than a segmentation network. Note that, in the regression network, unlike in the segmentation network, there is no expanding path in the network.
  • Other architectures may also be used with the same output format.
  • time may also be included in the output as a fourth dimension if 4D data (x, y, z, time) is given as input.
  • the output of the network is eighteen scalars corresponding to three coordinates for each of the six landmarks in the input image.
  • Such an architecture may be trained in a similar fashion to the previously described landmark detector.
  • the only update needed is the re-formulation of the loss to account for the change in the network output format (a spatial point in this implementation, as opposed to the probability map used in the first implementation).
  • One reasonable loss function may be the L2 (squared) distance between the output of the network and the real landmark coordinate, but other loss functions may be used as well, as long as the loss functions are related to the quantity of interest, namely the distance error.
  • the first landmark detection implementation discussed above may also be extended using a second neural network which acts as a discriminator network.
  • the discriminator network may be trained in order to distinguish good and realistic landmark locations, from bad and unrealistic ones.
  • the initial network of the implementation may be used to generate several landmark proposals for each type of landmark, such as by using all local maxima of the predicted landmark probability distribution.
  • the discriminator network may then evaluate each proposal, for example, by using a classification architecture on a high-resolution patch, centered around the landmark proposal.
  • the proposal with the highest probability of being the true landmark may then be taken as the output.
  • This implementation may possibly help choose the correct landmark location in ambiguous situations, for example, in the presence of noise or artifacts.
  • Another approach for the detection of cardiac landmarks is the use of reinforcement learning.
  • an agent walking along the 3D image is considered.
  • the agent is at first placed in the center of the image.
  • the agent then follows a policy until the agent reaches the landmark position.
  • the policy represents the decision making process of the agent at each step: take a step left, right, up, down, above, or under.
  • This policy may be learned using a deep neural network approximating the Bellman equation for the state-action function Q using the Q-learning algorithm.
  • One Q function can then be learned for each of the landmarks to be detected.
  • a neural network may be used to directly predict parameters defining the locations and orientations of the planes for standard views. For instance, a network may be trained to calculate the 3D rotation angle, translation and rescaling needed to move a starting pixel grid to a long axis view. Separate models may be trained to predict different transformations or a single model may be used to output several views.
  • a user is able to mark points that lie on the valve plane using the available long axis views.
  • the valve plane is determined from these input points by performing a regression to find the plane that best fits.
  • the normal for the plane is set to point away from the apex of the ventricle.
  • FIG. 35 shows an environment 3500 that includes a processor-based device 3504 suitable for implementing various functionality described herein.
  • processor-based device 3504 suitable for implementing various functionality described herein.
  • some portion of the implementations will be described in the general context of processor-executable instructions or logic, such as program application modules, objects, or macros being executed by one or more processors.
  • processors can be practiced with various processor-based system configurations, including handheld devices, such as smartphones and tablet computers, wearable devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, personal computers (“PCs”), network PCs, minicomputers, mainframe computers, and the like.
  • PCs personal computers
  • the processor-based device 3504 may include one or more processors 3506 , a system memory 3508 and a system bus 3510 that couples various system components including the system memory 3508 to the processor(s) 3506 .
  • the processor-based device 3504 will at times be referred to in the singular herein, but this is not intended to limit the implementations to a single system, since in certain implementations, there will be more than one system or other networked computing device involved.
  • Non-limiting examples of commercially available systems include, but are not limited to, ARM processors from a variety of manufactures, Core microprocessors from Intel Corporation, U.S.A., PowerPC microprocessor from IBM, Sparc microprocessors from Sun Microsystems, Inc., PA-RISC series microprocessors from Hewlett-Packard Company, 68xxx series microprocessors from Motorola Corporation.
  • the processor(s) 3506 may be any logic processing unit, such as one or more central processing units (CPUs), microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), etc. Unless described otherwise, the construction and operation of the various blocks shown in FIG. 35 are of conventional design. As a result, such blocks need not be described in further detail herein, as they will be understood by those skilled in the relevant art.
  • CPUs central processing units
  • DSPs digital signal processors
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • the system bus 3510 can employ any known bus structures or architectures, including a memory bus with memory controller, a peripheral bus, and a local bus.
  • the system memory 3508 includes read-only memory (“ROM”) 1012 and random access memory (“RAM”) 3515 .
  • ROM read-only memory
  • RAM random access memory
  • a basic input/output system (“BIOS”) 3516 which can form part of the ROM 3512 , contains basic routines that help transfer information between elements within processor-based device 3504 , such as during start-up. Some implementations may employ separate buses for data, instructions and power.
  • the processor-based device 3504 may also include one or more solid state memories, for instance flash memory or a solid state drive (SSD), which provides nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the processor-based device 3504 .
  • solid state drives SSD
  • the processor-based device 3504 can employ other nontransitory computer- or processor-readable media, for example a hard disk drive, an optical disk drive, or memory card media drive.
  • Program modules can be stored in the system memory 3508 , such as an operating system 3530 , one or more application programs 3532 , other programs or modules 3534 , drivers 3536 and program data 3538 .
  • the application programs 3532 may, for example, include panning/scrolling 3532 a .
  • Such panning/scrolling logic may include, but is not limited to logic that determines when and/or where a pointer (e.g., finger, stylus, cursor) enters a user interface element that includes a region having a central portion and at least one margin.
  • Such panning/scrolling logic may include, but is not limited to logic that determines a direction and a rate at which at least one element of the user interface element should appear to move, and causes updating of a display to cause the at least one element to appear to move in the determined direction at the determined rate.
  • the panning/scrolling logic 3532 a may, for example, be stored as one or more executable instructions.
  • the panning/scrolling logic 3532 a may include processor and/or machine executable logic or instructions to generate user interface objects using data that characterizes movement of a pointer, for example data from a touch-sensitive display or from a computer mouse or trackball, or other user interface device.
  • the system memory 3508 may also include communications programs 3540 , for example a server and/or a Web client or browser for permitting the processor-based device 3504 to access and exchange data with other systems such as user computing systems, Web sites on the Internet, corporate intranets, or other networks as described below.
  • the communications programs 3540 in the depicted implementation is markup language based, such as Hypertext Markup Language (HTML), Extensible Markup Language (XML) or Wireless Markup Language (WML), and operates with markup languages that use syntactically delimited characters added to the data of a document to represent the structure of the document.
  • HTML Hypertext Markup Language
  • XML Extensible Markup Language
  • WML Wireless Markup Language
  • a number of servers and/or Web clients or browsers are commercially available such as those from Mozilla Corporation of California and Microsoft of Washington.
  • the operating system 3530 can be stored on any other of a large variety of nontransitory processor-readable media (e.g., hard disk drive, optical disk drive, SSD and/or flash memory).
  • nontransitory processor-readable media e.g., hard disk drive, optical disk drive, SSD and/or flash memory.
  • a user can enter commands and information via a pointer, for example through input devices such as a touch screen 3548 via a finger 3544 a , stylus 3544 b , or via a computer mouse or trackball 3544 c which controls a cursor.
  • Other input devices can include a microphone, joystick, game pad, tablet, scanner, biometric scanning device, etc.
  • I/O devices are connected to the processor(s) 3506 through an interface 3546 such as touch-screen controller and/or a universal serial bus (“USB”) interface that couples user input to the system bus 3510 , although other interfaces such as a parallel port, a game port or a wireless interface or a serial port may be used.
  • the touch screen 3548 can be coupled to the system bus 3510 via a video interface 3550 , such as a video adapter to receive image data or image information for display via the touch screen 3548 .
  • a video interface 3550 such as a video adapter to receive image data or image information for display via the touch screen 3548 .
  • the processor-based device 3504 can include other output devices, such as speakers, vibrator, haptic actuator, etc.
  • the processor-based device 3504 may operate in a networked environment using one or more of the logical connections to communicate with one or more remote computers, servers and/or devices via one or more communications channels, for example, one or more networks 3514 a , 3514 b .
  • These logical connections may facilitate any known method of permitting computers to communicate, such as through one or more LANs and/or WANs, such as the Internet, and/or cellular communications networks.
  • Such networking environments are well known in wired and wireless enterprise-wide computer networks, intranets, extranets, the Internet, and other types of communication networks including telecommunications networks, cellular networks, paging networks, and other mobile networks.
  • the processor-based device 3504 may include one or more wired or wireless communications interfaces 3552 a , 3552 b (e.g., cellular radios, WI-FI radios, Bluetooth radios) for establishing communications over the network, for instance the Internet 3514 a or cellular network 3514 b.
  • wired or wireless communications interfaces 3552 a , 3552 b e.g., cellular radios, WI-FI radios, Bluetooth radios
  • program modules, application programs, or data, or portions thereof can be stored in a server computing system (not shown).
  • server computing system not shown.
  • FIG. 35 the network connections shown in FIG. 35 are only some examples of ways of establishing communications between computers, and other connections may be used, including wirelessly.
  • the processor(s) 3506 , system memory 3508 , network and communications interfaces 3552 a , 3554 b are illustrated as communicably coupled to each other via the system bus 3510 , thereby providing connectivity between the above-described components.
  • the above-described components may be communicably coupled in a different manner than illustrated in FIG. 35 .
  • one or more of the above-described components may be directly coupled to other components, or may be coupled to each other, via intermediary components (not shown).
  • system bus 3510 is omitted and the components are coupled directly to each other using suitable connections.
  • Cardiac Magnetic Resonance (CMR) imaging is commonly used to assess cardiac structure and function.
  • CMR Cardiac Magnetic Resonance
  • One disadvantage of CMR is that postprocessing of exams is tedious. Without automation, precise assessment of cardiac function via CMR typically requires an annotator to spend tens of minutes per case manually contouring ventricular structures. Automatic contouring can lower the required time per patient by generating contour suggestions that can be lightly modified by the annotator.
  • Fully convolutional networks FCNs
  • FCNs a variant of convolutional neural networks
  • FCNs are limited by their computational cost, which increases the monetary cost and degrades the user experience of production systems.
  • FastVentricle architecture a FCN architecture for ventricular segmentation based on the recently developed ENet architecture.
  • FastVentricle is 4 ⁇ faster and runs with 6 ⁇ less memory than the previous state-of-the-art ventricular segmentation architecture while still maintaining excellent clinical accuracy.
  • ES end systole
  • ED end diastole
  • FIG. 36 shows a schematic representation of a fully convolutional encoder-decoder architecture with skip connections that utilizes a smaller expanding path than contracting path.
  • Active contour models are a heuristic-based approach to segmentation that have been utilized previously for segmentation of the ventricles. See Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision (1988) 321-331; Zhu, W., et al.: A geodesic-active-contour-based variational model for short-axis cardiac MRI segmentation. Int. Journal of Computer Math. 90(1) (2013).
  • active contour-based methods not only perform poorly on images with low contrast, they are also sensitive to initialization and hyperparameter values. Deep learning methods for segmentation have recently defined state-of-the-art with the use of fully convolutional networks (FCNs).
  • FCNs fully convolutional networks
  • U-Net originally developed for use in the biomedical community where there are often fewer training images and even finer resolution is required, added the use of skip connections between the contracting and expanding paths to preserve details. See Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2015) 234-241.
  • ENet an alternative FCN design
  • ENet is an asymmetrical architecture that is optimized for speed. Paszke, A., Chaurasia, A., et al.: ENet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016).
  • ENet utilizes early downsampling to reduce the input size using only a few feature maps. This improves speed, given that much of the network's computational load takes place when the image is at full resolution, and has minimal effect on accuracy since much of the visual information at this stage is redundant.
  • ENet discloses the primary purpose of the expanding path in FCNs is to upsample and fine-tune the details learned by the contracting path rather than to learn complicated upsampling features; hence, ENet utilizes an expanding path that is smaller than its contracting path.
  • ENet also makes use of bottleneck modules, which are convolutions with a small receptive field that are applied in order to project the feature maps into a lower dimensional space in which larger kernels can be applied. See He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE CVPR. (2016) 770-778. Bottlenecks also contain residual connections in the spirit of the He, K. paper referenced immediately above.
  • ENet also uses a path parallel to the bottleneck path that solely includes zero or more pooling layers to directly pass information from a higher resolution layer to the lower resolution layers.
  • ENet leverages a diversity of low cost convolution operations.
  • ENet also uses cheaper asymmetric (1 ⁇ n and n ⁇ 1) convolutions and dilated convolutions. See Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
  • Deep learning has been successfully applied to ventricle segmentation. See Avendi, M., et al.: A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. MedIA 30 (2016); Tran, P. V.: A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv preprint arXiv:1604.00494 (2016).
  • FastVentricle an ENet variation with UNet style skip connections for segmentation of the LV endocardium, LV epicardium, and RV endocardium. More specifically, we add the possibility to use skip connections from the contracting path to the expanding path where the image size is similar.
  • Annotated contour types include LV endocardium, LV epicardium and RV endocardium. Scans are annotated at ED and ES. Contours were annotated with different frequencies; 96% (1097) of scans have LV endocardium contours, 22% (247) have LV epicardium contours and 85% (966) have RV endocardium contours.
  • RAVE is defined as
  • Other normalization schemes such as adaptive histogram equalization, are also possible.
  • We crop and resize the images such that the ventricle contours take up a larger percentage of the image; the actual crop and resize factors are hyperparameters. Cropping the image increases the fraction of the image that is taken up by the foreground (ventricle) class, making it easier to resolve fine details and helping the model converge.
  • FIG. 37 shows box plots comparing the relative absolute volume error (RAVE) between FastVentricle and DeepVentricle for each of LV Endo, LV Epi, and RV Endo at ED (left panels) and ES (right panels).
  • the line at the center of the box denotes the median RAVE, the ends of the box show 25% (Q1) and 75% (Q3) of the distribution. Whiskers are defined as per Matplotlib defaults.
  • the hyperparameters of the UNet architecture include the use of batch normalization, the dropout probability, the number of convolution layers, the number of initial filters, and the number of pooling layers.
  • the hyperparameters of the ENet architecture include the kernel size for asymmetric convolutions, the number of times Section 2 of the network is repeated, the number of initial bottleneck modules, the number of initial filters, the projection ratio, the dropout probability and whether or not to use skip connections (See Paszke, referenced above, for details on these parameters).
  • the hyperparameters also include the batch size, learning rate, crop fraction and image size.
  • FIG. 37 shows box plots of the RAVE comparing DeepVentricle and FastVentricle for each combination of ventricular structure (LV Endo, LV Epi, RV Endo) and phase (ES, ED), where the sample size is specified in Table 2 below.
  • the median RAVE is: i) 4.5% for DeepVentricle and 5.5% for FastVentricle for LV endo, i) 5.6% for DeepVentricle and 4.2% for FastVentricle for LV epi, i) 7.6% for DeepVentricle and 9.0% for FastVentricle for RV endo.
  • FIG. 39 presents examples of network predictions on different slices and time points for studies with low RAVE for both Deep Ventricle and FastVentricle.
  • FIG. 39 shows Deep Ventricle and FastVentricle predictions for a healthy patient with low RAVE (top) and on a patient with hypertrophic cardiomyopathy (bottom).
  • RV endo is outlined in red, LV endo in green, and LV epi in blue.
  • the X axis of the grid corresponds to time indices sampled throughout the cardiac cycle and the Y axis corresponds to slice indices sampled from apex (low slice index) to base (high slice index).
  • Model performance at the apex and center of the ventricle is better than at the base, as it is often ambiguous from just the basal slice where the valve plane (separating ventricle from atrium) is located.
  • segmentations at ED tend to be better than at ES, as the chambers at ES are smaller and dark-colored papillary muscles tend to blend with the myocardium when the heart is contracted.
  • any automated algorithm should be faster than manual annotations, and lightweight enough to deploy easily.
  • Table 1 we find that this embodiment of FastVentricle is roughly 4 ⁇ faster than DeepVentricle and uses 6 ⁇ less memory for inference. Because the model contains more layers, FastVentricle takes longer to initialize before being ready to perform inference. However, in a production setting, the model only needs to be initialized once when provisioning the server, so this additional cost is incidental.
  • Neural networks are infamous for being black boxes, i.e., it is very difficult to “look inside” and understand why a certain prediction is being made. This is especially troublesome in the medical setting, as doctors prefer to use tools that they can understand.
  • a model “input” and a real segmentation mask as the target, we perform backpropagation to update the pixel values in the input image such that the loss is minimized.
  • FIG. 38 shows the result of such an optimization for DeepVentricle and FastVentricle.
  • the model is confident in its predictions when the endocardium is light and the contrast with the epicardium is high. The model seems to have learned to ignore the anatomy surrounding the heart.
  • the optimized input for DeepVentricle is less noisy than that for FastVentricle, probably because the former model is larger and utilizes skip connections at the full resolution of the input image. DeepVentricle also seems to “imagine” structures which look like papillary muscles inside the ventricles.
  • FIG. 38 shows a random input (left) that is optimized using gradient descent for DeepVentricle and FastVentricle (middle) to fit the label map (right, RV endo in red, LV endo in cyan, LV epi in blue).
  • the generated image has many qualities that the network is “looking for” when making its predictions, such as high contrast between endocardium and epicardium and the presence of papillary muscles.
  • the myocardial muscle and the blood pool i.e., blood within the ventricles of the heart
  • the papillary and trabeculae muscles are small muscles within the ventricle of the heart that abut both the myocardial muscle and the blood pool.
  • different institutions have different policies about whether the papillary and trabeculae muscles should be included in the volume of the blood pool or not.
  • the papillary and trabeculae muscles should be excluded from the contour that defines the blood pool.
  • the contour that defines the blood pool is often assumed to be the inner boundary of the myocardial muscle.
  • the volumes of the papillary and trabeculae muscles are included in the blood pool volume, leading to a small overestimate of the blood pool volume.
  • FIG. 40 is an image 4000 that shows the relevant parts of the cardiac anatomy, including the myocardial muscle 4002 surrounding the left ventricle.
  • the blood pool 4004 of the left ventricle is also shown.
  • the contour 4006 of the epicardium i.e., the outer surface of the heart
  • the epicardium contour 4006 defines the outer boundary of the left ventricle's myocardial muscle.
  • the contour 4008 of the endocardium i.e., the surface separating the left ventricle's blood pool from the myocardial muscle
  • the endocardium contour defines the inner boundary of the left ventricle myocardial muscle. Note that in FIG.
  • the endocardium contour 4008 includes papillary and trabeculae muscles 4010 in the interior. It would also be valid to exclude the papillary and trabeculae muscles 4010 from the interior of the endocardium contour 4008 .
  • FIG. 41 is an image 4100 that shows the case where the papillary and trabeculae muscles 4110 are included in the interior of the endocardium contour 4108 .
  • the myocardial muscle 4102 surrounding the left ventricle is also shown.
  • the measured volume of the blood pool will be a slight overestimate, since the volume also includes the papillary and trabeculae muscles 4110 .
  • FIG. 42 is an image 4200 that shows an alternate case where the papillary and trabeculae muscles 4210 are excluded from the endocardium contour 4208 .
  • the myocardial muscle 4202 surrounding the left ventricle and the blood pool 4204 of the left ventricle are also shown.
  • the estimate of the volume of the blood pool will be more accurate, but the contour 4208 is significantly more tortuous and, if drawn manually, more cumbersome to delineate.
  • SSFP Steady state free precession
  • Perfusion imaging using gadolinium-based contrast is used to identify biomarkers of coronary stenosis.
  • Late gadolinium enhancement imaging is used to assess myocardial infarction. In all of these imaging protocols, and in others, the anatomical orientations and the need for contouring tend to be similar.
  • Images are typically acquired both in short axis orientations, in which the imaging plane is parallel to the short axis of the left ventricle, and in long axis orientations, in which the imaging plane is parallel to the long axis of the left ventricle. Contours delineating the myocardial muscle and the blood pool are used in all three imaging protocols to assess different components of the cardiac function and anatomy.
  • CNN convolutional neural network
  • FIG. 43 shows one implementation of a process 4300 for automatically delineating papillary and trabeculae muscles from the ventricular blood pool.
  • cardiac MRI image data 4302 and initial contours 4304 delineating the inner and outer boundary of the myocardium are available.
  • the papillary and trabeculae muscles are on the interior of the initial left ventricle endocardium contour; i.e., they are included within the blood pool and excluded from the myocardium.
  • masks defining the myocardium and the blood pool are calculated at 4306 .
  • masks defining the myocardium and blood pool are available at the beginning of the process 4300 and do not need to be calculated from the initial contours 4304 .
  • An intensity threshold that will be used to delineate the blood pool from the papillary and trabeculae muscles is then calculated at 4308 . At least one implementation of the intensity threshold calculation is described below with reference to a method 4400 shown in FIG. 44 .
  • the intensity threshold is then applied to the pixels within the blood pool mask at 4310 .
  • Those pixels include the blood pool and the papillary and trabeculae muscles.
  • pixels of high signal intensity are assigned to the blood pool class and pixels of low signal intensity are assigned to the papillary and trabeculae muscle class.
  • a connected component analysis is used to determine the largest connected component of pixels of the blood pool class. Pixels that are part of the blood pool class (due to their high signal intensity) but are not part of the largest connected component of blood pool pixels are then assumed to be holes in the papillary and trabeculae muscles and are converted to the papillary and trabeculae muscles class.
  • the resulting boundaries separating the papillary and trabeculae muscles from the blood pool and myocardium are then calculated at 4314 and stored or displayed to the user.
  • the pixels that are determined to be part of the blood pool are summed to determine the net volume of the ventricular blood pool. That volume may then be stored, displayed to the user, or used in a subsequent calculation, such as cardiac ejection fraction.
  • FIG. 44 shows one example implementation of a process 4400 for calculating image intensity threshold. It should be appreciated that other methods may be used to calculate image intensity threshold.
  • cardiac MRI image data 4402 and masks 4404 representing the myocardial muscle and the blood pool are available.
  • the blood pool mask includes both the blood pool and the papillary and trabeculae muscles.
  • the masks may have been derived from contours delineating the myocardial muscle, or they may have been derived via some other method.
  • the papillary and trabeculae muscles are contained within the blood pool mask (see FIG. 41 ).
  • Pixel intensity distributions of the myocardium and blood pool are calculated at 4406 .
  • a kernel density estimate of the pixel intensities may be calculated at 4408 . If the data is approximately normally distributed, Silverman's rule of thumb can be used to determine the kernel bandwidth in the density estimate. See, e.g., Silverman, Bernard W. Density estimation for statistics and data analysis. Vol. 26. CRC press, 1986. Other bandwidths may alternatively be used based on the distribution of the data.
  • the pixel intensity at which the density estimates overlap i.e., where the probability that a given pixel intensity was drawn from the myocardium pixel intensity distribution is equal to the probability that the pixel was drawn from the blood pool distribution, is then computed at 4410 .
  • This pixel intensity may be chosen as the intensity threshold that separates the blood pool pixels from the papillary and trabeculae muscle pixels.
  • FIG. 45 is a graph 4500 that shows the distribution overlap 4410 of pixel intensity distribution between the blood pool and the myocardium. Shown are example distributions of the pixel intensities in the myocardium 4502 and the pixel intensities in the blood pool 4504 .
  • the y-axis represents the probability distribution function and the x-axis, in arbitrary units, represents pixel intensity.
  • the threshold used to separate the blood pool from the papillary and trabeculae muscles is the location 4506 of overlap between the two distributions 4502 and 4504 .
  • FIG. 46 shows one implementation for a process 4600 that uses a pre-trained CNN model to identify and localize myocardial properties.
  • cardiac imaging data 4602 and a pre-trained CNN model 4604 are available.
  • cardiac imaging data is a short-axis magnetic resonance (MR) acquisition, but other imaging planes (e.g., the long axis) and other imaging modalities (e.g., computed tomography or ultrasound) would work similarly.
  • the trained CNN model 4604 has been trained on data that is of the same type as the cardiac image data 4602 (e.g., the same imaging modality, same contrast injection protocol, and, if applicable, same MR pulse sequence).
  • the trained CNN model 4604 has been trained on data of a different type than the cardiac image data 4602 .
  • the data on which the CNN model 4604 has been trained is data from functional cardiac magnetic resonance imaging (e.g., via a contrast-free SSFP imaging sequence) and the cardiac image data is data from a cardiac perfusion or myocardial delayed enhancement study.
  • the CNN model 4604 will have been trained on data of a different type than the cardiac image data 4602 and then fine tuned (i.e., by re-training some or all of the layers while potentially holding some weights fixed) on data of the same type as the cardiac image data 4602 .
  • the trained CNN model 4604 is used to infer inner and outer myocardial contours at 4606 .
  • the CNN model 4604 first generates one or more probability maps, which are then converted to contours.
  • the contours are postprocessed at 4608 to minimize the probability that tissue that is not part of the myocardium is included within the region delineated as the myocardium.
  • This postprocessing may take on many forms. For example, postprocessing may include applying morphological operations, such as morphological erosion, to the region of the heart identified as myocardium to reduce its area.
  • Postprocessing may additionally or alternatively include modifying the threshold that is applied to the probability map output of the trained CNN model such that the region identified as myocardium is limited to CNN output for which the probability map indicates high confidence that the region is myocardium.
  • Postprocessing may additionally or alternatively include shifting vertices of contours that delineate the myocardium towards or away from the center of the ventricle of the heart to reduce the identified area of myocardium, or any combination of the above processes.
  • the postprocessing is applied to masks delineating the cardiac regions as opposed to contours.
  • the ventricular insertion points at which the right ventricular wall attaches to the left ventricle are determined at 4610 .
  • the insertion points are designated manually by users of the software. In other implementations, the insertion points are calculated automatically.
  • FIG. 47 is an image 4700 that shows the ventricular insertion points.
  • the left ventricle epicardium contour 4702 and the right ventricle contour 4704 are shown.
  • the right ventricle contour may be the right ventricle endocardium contour or the right ventricle epicardium contour.
  • the inferior insertion point 4706 and the anterior insertion point 4708 are indicated.
  • the automated system for identifying the insertion points e.g., act 4610 of FIG. 46
  • the insertion point locations 4706 and 4708 are defined as the locations where the distances between the two contours diverge.
  • the location of the insertion points 4706 and 4708 are defined as the points of intersection between the either the left ventricle epicardium contour 4702 or one of the right ventricle contours 4704 with the planes that define the left heart 2-chamber view (left ventricle and left atrium) and the left heart 3-chamber view (left ventricle, left atrium and aorta).
  • myocardial regions are localized and quantified at 4612 . Any potential pathologies (such as infarction) or characteristics (such as perfusion characteristics) of the myocardium may be quantified.
  • act 4612 of FIG. 46 may be performed at any phase of this process and in at least some implementations may precede the determination of one or more of contours (e.g., act 4606 ) and insertion points (e.g., act 4610 ).
  • regions of interest are manually detected and delineated by a user.
  • regions of interest are detected, delineated or both by an automated system, such as a CNN or other image processing technique, such as, but not limited to, any of the image processing techniques discussed in Karim, Rashed, et al. “Evaluation of current algorithms for segmentation of scar tissue from late gadolinium enhancement cardiovascular magnetic resonance of the left atrium: an open-access grand challenge.” Journal of Cardiovascular Magnetic Resonance 15.1 (2013): 105. Quantification of defects, such as relative hyper- or hypo-intensity, or biologically inferred quantities, such as absolute myocardial perfusion, is then performed. See, e.g., [Christian 2004] Christian, Timothy F., et al.
  • the myocardial contours and insertion points are used to segment the myocardium into a standard format at 4614 , such as a 17-segment model. See, e.g., Cerqueira, Manuel D., et al. “Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart.” Circulation 105.4 (2002): 539-542.
  • the defects or myocardial properties are then localized using the standard format.
  • the resulting characteristics are displayed on a display 4616 to the user.
  • signal bearing media include, but are not limited to, the following: recordable type media such as floppy disks, hard disk drives, CD ROMs, digital tape, and computer memory.

Abstract

Systems and methods for automated segmentation of anatomical structures (e.g., heart). Convolutional neural networks (CNNs) may be employed to autonomously segment parts of an anatomical structure represented by image data, such as 3D MRI data. The CNN utilizes two paths, a contracting path and an expanding path. In at least some implementations, the expanding path includes fewer convolution operations than the contracting path. Systems and methods also autonomously calculate an image intensity threshold that differentiates blood from papillary and trabeculae muscles in the interior of an endocardium contour, and autonomously apply the image intensity threshold to define a contour or mask that describes the boundary of the papillary and trabeculae muscles. Systems and methods also calculate contours or masks delineating the endocardium and epicardium using the trained CNN model, and anatomically localize pathologies or functional characteristics of the myocardial muscle using the calculated contours or masks.

Description

    BACKGROUND Technical Field
  • The present disclosure generally relates to automated segmentation of anatomical structures.
  • Description of the Related Art
  • Magnetic Resonance Imaging (MRI) is often used in cardiac imaging to assess patients with known or suspected cardiac pathologies. In particular, cardiac MRI may be used to quantify metrics related to heart failure and similar pathologies through its ability to accurately capture high-resolution cine images of the heart. These high-resolution images allow the volumes of relevant anatomical regions of the heart (such as the ventricles and muscle) to be measured, either manually, or with the help of semi- or fully-automated software.
  • A cardiac MRI cine sequence consists of one or more spatial slices, each of which contains multiple time points (e.g., 20 time points) throughout a full cardiac cycle. Typically, some subset of the following views are captured as separate series: The short axis (SAX) view, which consists of a series of slices along the long axis of the left ventricle. Each slice is in the plane of the short axis of the left ventricle, which is orthogonal to the ventricle's long axis; the 2-chamber (2CH) view, a long axis (LAX) view that shows either the left ventricle and left atrium or the right ventricle and right atrium; the 3-chamber (3CH) view, an LAX view that shows the either the left ventricle, left atrium and aorta, or the right ventricle, right atrium and aorta; and the 4-chamber (4CH) view, a LAX view that shows the left ventricle, left atrium, right ventricle and right atrium.
  • Depending on the type of acquisition, these views may be captured directly in the scanner (e.g., steady-state free precession (SSFP) MRI) or may be created via multi-planar reconstructions (MPRs) of a volume aligned in a different orientation (such as the axial, sagittal or coronal planes, e.g., 4D Flow MRI). The SAX view has multiple spatial slices, usually covering the entire volume of the heart, but the 2CH, 3CH and 4CH views often only have a single spatial slice. All series are cine, and have multiple time points encompassing a complete cardiac cycle.
  • Several important measurements of cardiac function depend on accurate measurements of ventricular volume. For example, ejection fraction (EF) represents the fraction of blood in the left ventricle (LV) that is pumped out with every heartbeat. Abnormally low EF readings are often associated with heart failure. Measurement of EF depends on the ventricular blood pool volume both at the end systolic phase, when the LV is maximally contracted, and at the end diastolic phase, when the LV is maximally dilated.
  • In order to measure the volume of the LV, the ventricle is typically segmented in the SAX view. The radiologist reviewing the case will first determine the end systole (ES) and end diastole (ED) time points by manually cycling through time points for a single slice and determining the time points at which the ventricle is maximally contracted or dilated, respectively. After determining those two time points, the radiologist will draw contours around the LV in all slices of the SAX series where the ventricle is visible.
  • Once the contours are created, the area of the ventricle in each slice may be calculated by summing the pixels within the contour and multiplying by the in-plane pixel spacing (e.g., in mm per pixel) in the x and y directions. The total ventricular volume can then be determined by summing the areas in each spatial slice and multiplying by the distance between slices (e.g., in millimeters (mm)). This yields a volume in cubic mm. Other methods of integrating over the slice areas to determine the total volume may also be used, such as variants of Simpson's rule, which, instead of approximating the discrete integral using straight line segments, does so using quadratic segments. Volumes are typically calculated at ES and ED, and ejection fraction and similar metrics may be determined from the volumes.
  • To measure the LV blood pool volume, the radiologist typically creates contours along the LV endocardium (interior wall of the myocardial muscle) on about 10 spatial slices at each of two time points (ES and ED), for a total of about 20 contours. Although some semi-automated contour placement tools exist (e.g., using an active contours or “snakes” algorithm), these still typically require some manual adjustment of the contours, particularly with images that have noise or artifacts. The whole process of creating these contours may take 10 minutes or more, mostly involving manual adjustments. Example LV endocardium contours are shown as images 100 a-100 k in FIG. 1, which shows the contours at a single time point over a full SAX stack. From 100 a to 100 k, the slices proceed from the apex of the left ventricle to the base of the left ventricle.
  • Although the above description is specific to measurements of the LV blood pool (via contouring of the LV endocardium) the same volume measurements often need to be performed on the right ventricle (RV) blood pool for assessing functional pathology in the right ventricle. In addition, measurements are sometimes needed of the myocardial (heart muscle) mass, which requires contouring the epicardium (the outer surface of the myocardium). Each of these four contours (LV endocardium, LV epicardium, RV endocardium, RV epicardium) can take an experienced radiologist on the order of 10 minutes or more to create and correct, even using semi-automated tools. Creating all four contours can take 30 minutes or longer.
  • The most obvious consequence of the onerousness of this process is that reading cardiac MRI studies is expensive. Another important consequence is that contour-based measurements are generally not performed unless absolutely necessary, which limits the diagnostic information that can be extracted from each performed cardiac MRI study. Fully automated contour generation and volume measurement would clearly have a significant benefit, not only to radiologist throughput, but also to the quantity of diagnostic information that can be extracted from each study.
  • Limitations of Active Contours-Based Methods
  • The most basic method of creating ventricular contours is to complete the process manually with some sort of polygonal or spline drawing tool, without any automated algorithms or tools. In this case, the user may, for example, create a freehand drawing of the outline of the ventricle, or drop spline control points which are then connected with a smoothed spline contour. After initial creation of the contour, depending on the software's user interface, the user typically has some ability to modify the contour, e.g., by moving, adding or deleting control points or by moving the spline segments.
  • To reduce the onerousness of this process, most software packages that support ventricular segmentation include semi-automated segmentation tools. One algorithm for semi-automated ventricular segmentation is the “snakes” algorithm (known more formally as “active contours”). See Kass, M., Witkin, A., & Terzopoulos, D. (1988). “Snakes: Active contour models.” International Journal of Computer Vision, 1(4), 321-331. The snakes algorithm generates a deformable spline, which is constrained to wrap to intensity gradients in the image through an energy-minimization approach. Practically, this approach seeks to both constrain the contour to areas of high gradient in the image (edges) and also minimize “kinks” or areas of high orientation gradient (curvature) in the contour. The optimal result is a smooth contour that wraps tightly to the edges of the image. An example successful result from the snakes algorithm on the left ventricle endocardium in a 4D Flow cardiac study is shown in an image 200 FIG. 2, which shows a contour 202 for the LV endocardium.
  • Although the snakes algorithm is common, and although modifying its resulting contours can be significantly faster than generating contours from scratch, the snakes algorithm has several significant disadvantages. In particular, the snakes algorithm requires a “seed.” The “seed contour” that will be improved by the algorithm must be either set by the user or by a heuristic. Moreover, the snakes algorithm knows only about local context. The cost function for snakes typically awards credit when the contour overlaps edges in the image; however, there is no way to inform the algorithm that the edge detected is the one desired; e.g., there is no explicit differentiation between the endocardium versus the border of other anatomical entities (e.g., the other ventricle, the lungs, the liver). Therefore, the algorithm is highly reliant on predictable anatomy and the seed being properly set. Further, the snakes algorithm is greedy. The energy function of snakes is often optimized using a greedy algorithm, such as gradient descent, which iteratively moves the free parameters in the direction of the gradient of the cost function. However, gradient descent, and many similar optimization algorithms, are susceptible to getting stuck in local minima of the cost function. This manifests as a contour that is potentially bound to the wrong edge in the image, such as an imaging artifact or an edge between the blood pool and a papillary muscle. Additionally, the snakes algorithm has a small representation space. The snakes algorithm generally has only a few dozen tunable parameters, and therefore does not have the capacity to represent a diverse set of possible images on which segmentation is desired. Many different factors can affect the perceived captured image of the ventricle, including anatomy (e.g., size, shape of ventricle, pathologies, prior surgeries, papillary muscles), imaging protocol (e.g., contrast agents, pulse sequence, scanner type, receiver coil quality and type, patient positioning, image resolution) and other factors (e.g., motion artifacts). Because of the great diversity on recorded images and the small number of tunable parameters, a snakes algorithm can only perform well on a small subset of “well-behaved” cases.
  • Despite these and other disadvantages of the snakes algorithm, the snakes algorithm's popularity primarily stems from the fact that the snakes algorithm can be deployed without any explicit “training,” which makes it relatively simple to implement. However, the snakes algorithm cannot be adequately tuned to work on more challenging cases.
  • Challenges of Excluding Papillary Muscles from Blood Pool
  • Papillary muscles are muscles on the interior of the endocardium of both the left and right ventricles. Papillary muscles serve to keep the mitral and tricuspid valves closed when the pressure on the valves increases during ventricular contraction. FIG. 3 shows example SSFP MRI images 300 a (end diastole) and 300 b (end systole) which show the papillary muscles and myocardium of the left ventricle. Note that at end diastole (image 300 a), the primary challenge is in distinguishing the papillary muscles from the blood pool in which they are embedded, while at end systole (image 300 b), the primary challenge is in distinguishing the papillary muscles from the myocardium.
  • When performing a segmentation of the ventricular blood pool (either manual or automated), the papillary muscles may be either included within the contour or excluded from the contour. Note that the contour that surrounds the blood pool is often colloquially referred to as an “endocardium contour,” regardless of whether the papillary muscles are included within the contour or excluded from the contour. In the latter case, the term “endocardium” is not strictly accurate because the contour does not smoothly map to the true surface of the endocardium; despite this, the term “endocardium contour” is used for convenience.
  • Endocardium contours are typically created on every image in the SAX stack to measure the blood volume within the ventricle. The most accurate measure of blood volume will therefore be made if the papillary muscles are excluded from the endocardium contour. However, because the muscles are numerous and small, excluding them from a manual contour requires significantly more care to be taken when creating the contour, dramatically increasing the onerousness of the process. As a result, when creating manual contours, the papillary muscles are typically included within the endocardium contour, resulting in a modest overestimate of the ventricular blood volume. Technically, this measures the sum of the blood pool volume and the papillary muscle volume.
  • Automated or semi-automated utilities may speed up the process of excluding the papillary muscles from the endocardium contour, but they have significant caveats. The snakes algorithm (discussed above) is not appropriate for excluding the papillary muscles at end diastole because its canonical formulation only allows for contouring of a single connected region without holes. Although the algorithm may be adapted to handle holes within the contour, the algorithm would have to be significantly reformulated to handle both small and large connected regions simultaneously since the papillary muscles are so much smaller than the blood pool. In short, it is not possible for the canonical snakes algorithm to be used to segment the blood pool and exclude the papillary muscles at end diastole.
  • At end systole, when the majority of the papillary muscle mass abuts the myocardium, the snakes algorithm will by default exclude the majority of the papillary muscles from the endocardium contour and it cannot be made to include them (since there is little or no intensity boundary between the papillary muscles and the myocardium). Therefore, in the standard formulation, the snakes algorithm can only include the papillary muscles at end diastole and only exclude them at end systole, resulting in inconsistent measurements of blood pool volume over the course of a cardiac cycle. This is a major limitation of the snakes algorithm, preventing clinical use of its output without significant correction by the user.
  • An alternate semi-automated method of creating a blood pool contour is using a “flood fill” algorithm. Under the flood fill algorithm, the user selects an initial seed point, and all pixels that are connected to the seed point whose intensity gradients and distance from the seed point do not exceed a threshold are included within the selected mask. Although, like the snakes algorithm, flood fill requires the segmented region to be connected, flood fill carries the advantage that it allows for the connected region to have holes. Therefore, because papillary muscles can be distinguished from the blood pool based on their intensity, a flood fill algorithm can be formulated—either dynamically through user input, or in a hard-coded fashion—to exclude papillary muscles from the segmentation. Flood fill could also be used to include papillary muscles from the endocardium segmentation at end diastole; however, at end systole, because the bulk of the papillary muscles are connected to the myocardium (making the two regions nearly indistinguishable from one another), flood fill cannot be used to include the papillary muscles within the endocardium segmentation.
  • Beyond the inability to distinguish papillary muscles from myocardium at end systole, the major disadvantage of flood fill is that, though it may significantly reduce the effort required for the segmentation process when compared to a fully-manual segmentation, it still requires a great deal of user input to dynamically determine the flood fill gradient and distance thresholds. The applicant has found that, while accurate segmentations can be created using a flood fill tool, creating them with acceptable clinical precision still requires significant manual adjustment.
  • Challenges of Segmenting Basal Slices on the Short Axis View
  • Cardiac segmentations are typically created on images from a short axis or SAX stack. One major disadvantage of performing segmentations on the SAX stack is that the SAX plane is nearly parallel to the plane of the mitral and tricuspid valves. This has two effects. First, the valves are very difficult to distinguish on slices from the SAX stack. Second, assuming the SAX stack is not exactly parallel to the valve plane, there will be at least one slice near the base of the heart that is partially in the ventricle and partially in the atrium.
  • An example case where both the left ventricle and left atrium are visible in a single slice is shown in images 400 a and 400 b of FIG. 4. If the clinician fails to refer to the current SAX slice projected on a corresponding LAX view, it may not be obvious that the SAX slice spans both the ventricle and atrium. Further, even if the LAX view is available, it may be difficult to tell on the SAX slice where the valve is located, and therefore, where the segmentation of the ventricle should end, since the ventricle and atrium have similar signal intensities. Segmentation near the base of the heart is therefore one of the major sources of error for ventricular segmentation.
  • Landmarks
  • In the 4D Flow workflow of a cardiac imaging application, the user may be required to define the regions of different landmarks in the heart in order to see different cardiac views (e.g., 2CH, 3CH, 4CH, SAX) and segment the ventricles. The landmarks required to segment the LV and see 2CH, 3CH, and 4CH left heart views include LV apex, mitral valve, and aortic valve. The landmarks required to segment the RV and see the corresponding views include RV apex, tricuspid valve and pulmonary valve.
  • A pre-existing method to locate landmarks on 3D T1-weighted MRI is described in Payer, Christian, Darko Stern, Horst Bischof, and Martin Urschler. “Regressing Heatmaps for Multiple Landmark Localization using CNNs.” In Proc Medical Image Computing & Computer Assisted Intervention (MICCAI) 2016. Springer Verlag. The method developed in this paper is referred to herein as “LandMarkDetect.” LandMarkDetect is based on two notable components. First, a variation of the U-Net neural network is used, as discussed in Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241. Springer International Publishing, 2015. Second, the landmark is encoded during training using a Gaussian function of arbitrarily chosen standard deviation. The LandMarkDetect neural network 500 of FIG. 5 differs from U-Net in the use of average pooling layers in place of max pooling layers.
  • One limitation of LandMarkDetect is the lack of method to handle missing landmarks. It is assumed that every single landmark has been precisely located on each image. Another limitation is the absence of hyperparameter search except for the kernel and layer size. Yet another limitation is the fixed upsampling layer with no parameter to learn. Further, LandMarkDetect relies on a limited pre-processing strategy which consists in removing the mean (i.e. centering the input data) of the 3D image.
  • Accordingly, there is a need for systems and methods which address some or all of the above-discussed shortcomings.
  • BRIEF SUMMARY
  • A machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, the at least one processor: receives learning data including a plurality of batches of labeled image sets, each image set including image data representative of an anatomical structure, and each image set including at least one label which identifies the region of a particular part of the anatomical structure depicted in each image of the image set; trains a fully convolutional neural network (CNN) model to segment at least one part of the anatomical structure utilizing the received learning data; and stores the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system. The CNN model may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and includes a transpose convolution operation which performs upsampling and interpolation with a learned kernel. Subsequent to each upsampling layer, the CNN model may include a concatenation of feature maps from a corresponding layer in the contracting path through a skip connection. The image data may be representative of a heart during one or more time points throughout a cardiac cycle. The image data may include ultrasound data or visible light photograph data. The CNN model may include a contracting path which may include a first convolutional layer which has between 1 and 2000 feature maps. The CNN model may include a number of convolutional layers, and each convolutional layer may include a convolutional kernel of size 3×3 and a stride of 1. The CNN model may include a number of pooling layers, and each pooling layer may include a 2×2 max-pooling layer with a stride of 2. The CNN model may include four pooling layers and four upsampling layers. The CNN model may include a number of convolutional layers, and the CNN model may pad the input to each convolutional layer using a zero padding operation. The CNN model may include a plurality of nonlinear activation function layers.
  • The at least one processor may augment the learning data via modification of at least some of the image data in the plurality of batches of labeled image sets.
  • The at least one processor may modify at least some of the image data in the plurality of batches of labeled image sets according to at least one of: a horizontal flip, a vertical flip, a shear amount, a shift amount, a zoom amount, a rotation amount, a brightness level, or a contrast level.
  • The CNN model may include a plurality of hyperparameters stored in the at least one nontransitory processor-readable storage medium, and the at least one processor may configure the CNN model according to a plurality of configurations, each configuration including a different combination of values for the hyperparameters; for each of the plurality of configurations, validate the accuracy of the CNN model; and select at least one configuration based at least in part on the accuracies determined by the validations.
  • The at least one processor may, for each image set, identify whether the image set is missing a label for any of a plurality of parts of the anatomical structure; and for image sets identified as missing at least one label, modify a training loss function to account for the identified missing labels. The image data may include volumetric images, and each label may include a volumetric label mask or contour. Each convolutional layer of the CNN model may include a convolutional kernel of size N×N×K pixels, where N and K are positive integers. Each convolutional layer of the CNN model may include a convolutional kernel of size N×M pixels, where N and M are positive integers. The image data may be representative of a heart during one or more time points throughout a cardiac cycle, wherein a subset of the plurality of batches of labeled image sets may include labels which exclude papillary muscles. For each processed image, the CNN model may utilize data for at least one image which may be at least one of: adjacent to the processed image with respect to space or adjacent to the processed image with respect to time. For each processed image, the CNN model may utilize data for at least one image which may be adjacent to the processed image with respect to space and may utilize data for at least one image which is adjacent to the processed image with respect to time. For each processed image, the CNN model may utilize at least one of temporal information or phase information. The image data may include at least one of steady-state free precession (SSFP) magnetic resonance imaging (MRI) data or 4D flow MRI data.
  • A method of operating a machine learning system may include at least one nontransitory processor-readable storage medium that may store at least one of processor-executable instructions or data, and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, and may be summarized as including receiving, by the at least one processor, learning data including a plurality of batches of labeled image sets, each image set including image data representative of an anatomical structure, and each image set including at least one label which identifies the region of a particular part of the anatomical structure depicted in each image of the image set; training, by the at least one processor, a fully convolutional neural network (CNN) model to segment at least one part of the anatomical structure utilizing the received learning data; and storing, by the at least one processor, the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system. Training the CNN model may include training a CNN model including a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and may include a transpose convolution operation which performs upsampling and interpolation with a learned kernel. Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data and, subsequent to each upsampling layer, the CNN model may include a concatenation of feature maps from a corresponding layer in the contracting path through a skip connection. Receiving learning data may include receiving image data that may be representative of a heart during one or more time points throughout a cardiac cycle. Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may include a contracting path which may include a first convolutional layer which has between 1 and 2000 feature maps. Training a CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and each convolutional layer may include a convolutional kernel of size 3×3 and a stride of 1. Training a CNN model may include training a CNN model which may include a plurality of pooling layers to segment at least one part of the anatomical structure utilizing the received learning data, and each pooling layer may include a 2×2 max-pooling layer with a stride of 2.
  • A CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may include four pooling layers and four upsampling layers.
  • A CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may pad the input to each convolutional layer using a zero padding operation.
  • A CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and the CNN model may include a plurality of nonlinear activation function layers.
  • The method may further include augmenting, by the at least one processor, the learning data via modification of at least some of the image data in the plurality of batches of labeled image sets.
  • The method may further include modifying, by the at least one processor, at least some of the image data in the plurality of batches of labeled image sets according to at least one of: a horizontal flip, a vertical flip, a shear amount, a shift amount, a zoom amount, a rotation amount, a brightness level, or a contrast level.
  • The CNN model may include a plurality of hyperparameters stored in the at least one nontransitory processor-readable storage medium, and may further include configuring, by the at least one processor, the CNN model according to a plurality of configurations, each configuration comprising a different combination of values for the hyperparameters; for each of the plurality of configurations, validating, by the at least one processor, the accuracy of the CNN model; and selecting, by the at least one processor, at least one configuration based at least in part on the accuracies determined by the validations.
  • The method may further include for each image set, identifying, by the at least one processor, whether the image set is missing a label for any of a plurality of parts of the anatomical structure; and for image sets identified as missing at least one label, modifying, by the at least one processor, a training loss function to account for the identified missing labels. Receiving learning data may include receiving image data which may include volumetric images, and each label may include a volumetric label mask or contour.
  • A CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and each convolutional layer of the CNN model may include a convolutional kernel of size N×N×K pixels, where N and K are positive integers.
  • A CNN model may include training a CNN model which may include a plurality of convolutional layers to segment at least one part of the anatomical structure utilizing the received learning data, and each convolutional layer of the CNN model may include a convolutional kernel of size N×M pixels, where N and M are positive integers. Receiving learning data may include receiving image data representative of a heart during one or more time points throughout a cardiac cycle, and wherein a subset of the plurality of batches of labeled image sets may include labels which exclude papillary muscles. Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and for each processed image, the CNN model may utilize data for at least one image which is at least one of: adjacent to the processed image with respect to space or adjacent to the processed image with respect to time. Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and for each processed image, the CNN model may utilize data for at least one image which is adjacent to the processed image with respect to space and utilizes data for at least one image which is adjacent to the processed image with respect to time. Training a CNN model may include training a CNN model to segment at least one part of the anatomical structure utilizing the received learning data, and for each processed image, the CNN model may utilize at least one of temporal information or phase information. Receiving learning data may include receiving image data which may include at least one of steady-state free precession (SSFP) magnetic resonance imaging (MRI) data or 4D flow MRI data.
  • A machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, the at least one processor: receives image data which represents an anatomical structure; processes the received image data through a fully convolutional neural network (CNN) model to generate per-class probabilities for each pixel of each image of the image data, each class corresponding to one of a plurality of parts of the anatomical structure represented by the image data; and for each image of the image data, generates a probability map for each of the plurality of classes using the generated per-class probabilities; and stores the generated probability maps in the at least one nontransitory processor-readable storage medium. The CNN model may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and may include a transpose convolution operation which performs upsampling and interpolation with a learned kernel. The image data may be representative of a heart during one or more time points throughout a cardiac cycle.
  • The at least one processor may autonomously cause an indication of at least one of the plurality of parts of the anatomical structure to be displayed on a display based at least in part on the generated probability maps. The at least one processor may post-process the processed image data to ensure at least one physical constraint is met. The image data may be representative of a heart during one or more time points throughout a cardiac cycle, and the at least one physical constraint may include at least one of: the volume of the myocardium is the same at all time points, or the right ventricle and the left ventricle cannot overlap each other. The at least one processor may, for each image of the image data, transform the plurality of probability maps into a label mask by setting the class of each pixel to the class with the highest probability. The at least one processor may, for each image of the image data, set the class of each pixel to a background class when all of the class probabilities for the pixel are below a determined threshold. The at least one processor may, for each image of the image data, set the class of each pixel to a background class when the pixel is not part of a largest connected region for the class to which the pixel is associated. The at least one processor may convert each of the label masks for the image data into respective spline contours. The at least one processor may autonomously cause the generated contours to be displayed with the image data on a display. The at least one processor may receive a user modification of at least one of the displayed contours; and store the modified contour in the at least one nontransitory processor-readable storage medium. The at least one processor may determine the volume of at least one of the plurality of parts of the anatomical structure utilizing the generated contours. The anatomical structure may include a heart, and the at least one processor may determine the volume of at least one of the plurality of parts of the heart at a plurality of time points of a cardiac cycle utilizing the generated contours. The at least one processor may automatically determine which of the plurality of time points of the cardiac cycle correspond to an end systole phase and an end diastole phase of the cardiac cycle based on the time points determined to have a minimum volume and a maximum volume, respectively. The at least one processor may cause the determined volume of the at least one of the plurality of parts of the anatomical structure to be displayed on a display. The image data may include volumetric images. Each convolutional layer of the CNN model may include a convolutional kernel of sizes N×N×K pixels, where N and K are positive integers.
  • A method of operating a machine learning system may include at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data, and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, and may be summarized as including receiving, by the at least one processor, image data which represents an anatomical structure; processing, by the at least one processor, the received image data through a fully convolutional neural network (CNN) model to generate per-class probabilities for each pixel of each image of the image data, each class corresponding to one of a plurality of parts of the anatomical structure represented by the image data; and for each image of the image data, generating, by the at least one processor, a probability map for each of the plurality of classes using the generated per-class probabilities; and storing, by the at least one processor, the generated probability maps in the at least one nontransitory processor-readable storage medium. Processing the received image data through the CNN model may include processing the received image data through a CNN model which may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and may include a transpose convolution operation which performs upsampling and interpolation with a learned kernel. Receiving image data may include receiving image data that is representative of a heart during one or more time points throughout a cardiac cycle.
  • The method may further include autonomously causing, by the at least one processor, an indication of at least one of the plurality of parts of the anatomical structure to be displayed on a display based at least in part on the generated probability maps.
  • The method may further include post-processing, by the at least one processor, the processed image data to ensure at least one physical constraint is met. Receiving image data may include receiving image data that may be representative of a heart during one or more time points throughout a cardiac cycle, and the at least one physical constraint may include at least one of: the volume of the myocardium is the same at all time points, or the right ventricle and the left ventricle cannot overlap each other.
  • The method may further include for each image of the image data, transforming, by the at least one processor, the plurality of probability maps into a label mask by setting the class of each pixel to the class with the highest probability. The method may further include for each image of the image data, setting, by the at least one processor, the class of each pixel to a background class when all of the class probabilities for the pixel are below a determined threshold.
  • The method may further include for each image of the image data, setting, by the at least one processor, the class of each pixel to a background class when the pixel is not part of a largest connected region for the class to which the pixel is associated.
  • The method may further include converting, by the at least one processor, each of the label masks for the image data into respective spline contours.
  • The method may further include autonomously causing, by the at least one processor, the generated contours to be displayed with the image data on a display.
  • The method may further include receiving, by the at least one processor, a user modification of at least one of the displayed contours; and storing, by the at least one processor, the modified contour in the at least one nontransitory processor-readable storage medium.
  • The method may further include determining, by the at least one processor, the volume of at least one of the plurality of parts of the anatomical structure utilizing the generated contours.
  • The anatomical structure may include a heart, and the method may further include determining, by the at least one processor, the volume of at least one of the plurality of parts of the heart at a plurality of time points of a cardiac cycle utilizing the generated contours.
  • The method may further include automatically determining, by the at least one processor, which of the plurality of time points of the cardiac cycle correspond to an end systole phase and an end diastole phase of the cardiac cycle based on the time points determined to have a minimum volume and a maximum volume, respectively.
  • The method may further include causing, by the at least one processor, the determined volume of the at least one of the plurality of parts of the anatomical structure to be displayed on a display. Receiving image data may include receiving volumetric image data. Processing the received image data through a CNN model may include processing the received image data through a CNN model in which each convolutional layer may include a convolutional kernel of sizes N×N×K pixels, where N and K are positive integers.
  • A machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, the at least one processor: receives a plurality of sets of 3D MRI images, the images in each of the plurality of sets represent an anatomical structure of a patient; receives a plurality of annotations for the plurality of sets of 3D MRI images, each annotation indicative of a landmark of an anatomical structure of a patient depicted in a corresponding image; trains a convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images; and stores the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system. The at least one processor may train a fully convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images. The at least one processor may train a CNN model which has an output which is one or more sets of spatial coordinates, each set of the one or more spatial coordinates identifying a location of one of the plurality of landmarks. The CNN model may include a contracting path followed by one or more fully connected layers. The CNN model may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by one or more convolutional layers, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by one or more convolutional layers and comprises a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • The at least one processor may, for each of one or more landmarks of the anatomical structure, define a 3D label map based at least in part on the received sets of 3D MRI images and the received plurality of annotations, each 3D label map may encode a likelihood that the landmark is located at a particular location on the 3D label map, wherein the at least one processor may train the CNN model to segment the one or more landmarks utilizing the 3D MRI images and the generated 3D label maps. The images in each of the plurality of sets may represent a heart of a patient at different respective time points of a cardiac cycle, and each annotation may be indicative of a landmark of a heart of a patient depicted in a corresponding image.
  • The at least one processor may, receive a set of 3D MRI images; process the received 3D MRI images through the CNN model to detect at least one of the one or more landmarks; and cause the detected at least one of the plurality of landmarks to be presented on a display. The at least one processor may process the received 3D MRI images through the CNN model and outputs at least one of: a point or a label map. The at least one processor may process the received 3D MRI images through the CNN model to detect at least one of the plurality of landmarks at a plurality of time points; and cause the detected at least one of the plurality of landmarks at a plurality of time points to be presented on a display. The CNN model may utilize phase information associated with the received 3D MRI images.
  • A method of operating a machine learning system may include at least one nontransitory processor-readable storage medium that may store at least one of processor-executable instructions or data, and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, and may be summarized as including receiving, by the at least one processor, a plurality of sets of 3D MRI images, the images in each of the plurality of sets represent an anatomical structure of a patient; receiving, by the at least one processor, a plurality of annotations for the plurality of sets of 3D MRI images, each annotation indicative of a landmark of an anatomical structure of a patient depicted in a corresponding image; training, by the at least one processor, a convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images; and storing, by the at least one processor, the trained CNN model in the at least one nontransitory processor-readable storage medium of the machine learning system. Training a CNN model may include training a fully convolutional neural network (CNN) model to predict the locations of the plurality of landmarks utilizing the 3D MRI images. Training a CNN model may include training a CNN model which has an output which is one or more sets of spatial coordinates, each set of the one or more spatial coordinates identifying a location of one of the plurality of landmarks. Training a CNN model may include training a CNN model which may include a contracting path followed by one or more fully connected layers. Training the CNN model may include training a CNN model which may include a contracting path and an expanding path, the contracting path may include a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer, and the expanding path may include a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer and includes a transpose convolution operation which performs upsampling and interpolation with a learned kernel.
  • The method may further include for each of a plurality of landmarks of the anatomical structure, defining, by the at least one processor, a 3D label map based at least in part on the received sets of 3D MRI images and the received plurality of annotations, each 3D label map encodes a likelihood that the landmark is located at a particular location on the 3D label map; Receiving a plurality of sets of 3D MRI images may include receiving a plurality of sets of 3D MRI images, and the images in each of the plurality of sets may represent a heart of a patient at different respective time points of a cardiac cycle, and each annotation may be indicative of a landmark of a heart of a patient depicted in a corresponding image.
  • The method may further include receiving, by the at least one processor, a set of 3D MRI images; processing, by the at least one processor, the received 3D MRI images through the CNN model to detect at least one of the plurality of landmarks; and causing, by the at least one processor, the detected at least one of the plurality of landmarks to be presented on a display.
  • The method may further include processing, by the at least one processor, the received 3D MRI images through the CNN model to detect at least one of the plurality of landmarks at a plurality of time points; and causing, by the at least one processor, the detected at least one of the plurality of landmarks at a plurality of time points to be presented on a display.
  • Training a CNN model may include training a CNN model which may utilize phase information associated with the received 3D MRI images.
  • A medical image processing system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data, cardiac MRI image data, and initial contours or masks that delineate the endocardium and epicardium of the heart; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, in operation, the at least one processor: accesses the cardiac MRI image data and initial contours or masks for a series; autonomously calculates an image intensity threshold that differentiates blood from papillary and trabeculae muscles in the interior of the endocardium contour; and autonomously applies the image intensity threshold to define a contour or mask that describes the boundary of the papillary and trabeculae muscles. To calculate the image intensity threshold, the at least one processor may compare a distribution of intensity values within the endocardium contour to a distribution of intensity values for a region between the endocardium contour and the epicardium contour. The at least one processor may calculate each of the distributions of intensity values using a kernel density estimation of an empirical intensity distribution. The at least one processor may determine the image intensity threshold to be the pixel intensity at the intersection of first and second probability distribution functions, the first probability distribution function being for the set of pixels within the endocardium contour, and the second probability distribution function being for the set of pixels in the region between the endocardium contour and the epicardium contour. The initial contours or masks that delineate the endocardium of the heart may include the papillary and trabeculae muscles in the interior of the endocardium contour. The at least one processor may calculate connected components of the blood pool region and discards one or more of the calculated connected components from the blood pool region. The at least one processor may convert the connected components discarded from the blood pool region into the papillary and trabeculae muscle region. The at least one processor may discard from the blood pool region all but the largest connected component in the blood pool region. The at least one processor may allow the calculated contour or mask that describes the boundary of the papillary and trabeculae muscles to be edited by a user.
  • A machine learning system may be summarized as including at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data, medical imaging data of the heart, and a trained convolutional neural network (CNN) model; and at least one processor communicably coupled to the at least one nontransitory processor-readable storage medium, in operation, the at least one processor: calculates contours or masks delineating the endocardium and epicardium of the heart in the medical imaging data using the trained CNN model; and anatomically localizes pathologies or functional characteristics of the myocardial muscle using the calculated contours or masks. The at least one processor may calculate the ventricular insertion points at which the right ventricular wall attaches to the left ventricle. The at least one processor may calculate the ventricular insertion points based on the proximity of contours or masks delineating the left ventricle epicardium to one or both of the right ventricle endocardium or the right ventricle epicardium.
  • The at least one processor may calculate the ventricular insertion points in one or more two-dimensional cardiac images based on the two points in the cardiac image in which the left ventricle epicardium boundary diverges from one or both of the right ventricle endocardium boundary or the right ventricle epicardium boundary. The at least one processor may calculate the ventricular insertion points based on the intersection between acquired long axis views of the left ventricle and the delineation of the left ventricle epicardium. The at least one processor may calculate at least one ventricular insertion point based on the intersection between the left ventricle epicardium contour and the left heart 3-chamber long axis plane. The at least one processor may calculate at least one ventricular insertion point based on the intersection between the left ventricle epicardium contour and the left heart 4-chamber long axis plane. The at least one processor may calculate at least one ventricular insertion point based on the intersection between the left heart 3-chamber long axis plane and one or both of the right ventricle epicardium contour or the right ventricle endocardium contour. The at least one processor may calculate at least one ventricular insertion point based on the intersection between the left heart 4-chamber long axis plane and one or both of the right ventricle epicardium contour or the right ventricle endocardium contour.
  • The at least one processor may allow a user to manually delineate the location of one or more of the ventricular insertion points. The at least one processor may use a combination of contours and ventricular insertion points to present the anatomical location of pathologies or functional characteristics of the myocardial muscle in a standardized format. The standardized format may be one or both of a 16- or 17-segment model of the myocardial muscle. The medical imaging data of the heart may be one or more of functional cardiac images, myocardial delayed enhancement images or myocardial perfusion images. The medical imaging data of the heart may be cardiac magnetic resonance images.
  • The trained CNN model may have been trained on annotated cardiac images of the same type as those for which the trained CNN model will be used for inference. The trained CNN model may have been trained on one or more of functional cardiac images, myocardial delayed enhancement images or myocardial perfusion images. The data on which the trained CNN model may have been trained may be cardiac magnetic resonance images. The trained CNN model may have been trained on annotated cardiac images of a different type than those for which the trained CNN model will be used for inference. The trained CNN model may have been trained on one or more of functional cardiac images, myocardial delayed enhancement images or myocardial perfusion images. The data on which the trained CNN model may have been trained may be cardiac magnetic resonance images.
  • The at least one processor may fine tune the trained CNN model on data of the same type for which the CNN model will be used for inference. To fine tune the trained CNN model, the at least one processor may retrain some or all of the layers of the trained CNN model. The at least one processor may apply postprocessing to the contours or masks delineating the endocardium and epicardium of the heart to minimize the amount of non-myocardial tissue that is present in the region of the heart identified as myocardium. To postprocess the contours or masks, the at least one processor may apply morphological operations to the region of the heart identified as myocardium to reduce its area. The morphological operations may include one or more of erosion or dilation. To postprocess the contours or masks, the at least one processor may modify the threshold applied to probability maps predicted by the trained CNN model to only identify pixels of myocardium for which the trained CNN model expresses a probability above a threshold that the pixels are part of the myocardium. The threshold by which probability map values may be converted to class labels is greater than 0.5. To postprocess the contours or masks, the at least one processor may shift vertices of contours that delineate the myocardium towards or away from the center of the ventricle of the heart to reduce the identified area of myocardium. The pathologies or functional characteristics of the myocardial muscle may include one or more of myocardial scarring, myocardial infarction, coronary stenosis, or perfusion characteristics.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • In the drawings, identical reference numbers identify similar elements or acts. The sizes and relative positions of elements in the drawings are not necessarily drawn to scale. For example, the shapes of various elements and angles are not necessarily drawn to scale, and some of these elements may be arbitrarily enlarged and positioned to improve drawing legibility. Further, the particular shapes of the elements as drawn, are not necessarily intended to convey any information regarding the actual shape of the particular elements, and may have been solely selected for ease of recognition in the drawings.
  • FIG. 1 is an example of a number of LV endocardium segmentations at a single time point over a full SAX (short axis) stack. From left to right, top to bottom, slices proceed from the apex of the left ventricle to the base of the left ventricle.
  • FIG. 2 is an example of an LV endocardium contour generated using a snakes algorithm.
  • FIG. 3 is two SSFP images showing ventricles, myocardium, and papillary muscles on the interior of the left ventricle endocardium. The SSFP image on the left shows end diastole, and the SSFP image on the right shows end systole.
  • FIG. 4 is two images which show the challenges of distinguishing between the ventricle and atrium on the basal slice in the SAX plane.
  • FIG. 5 is a diagram of the U-Net network architecture used in LandMarkDetect.
  • FIG. 6 is a diagram of the Deep Ventricle network architecture with two convolutional layers per pooling layer and four pooling/upsampling operations, according to one illustrated implementation.
  • FIG. 7 is a flow diagram of the creation of an lightning memory-mapped database (LMDB) for training with SSFP data, according to one illustrated implementation.
  • FIG. 8 is a flow diagram of a pipeline process for training a convolutional neural network model, according to one illustrated implementation.
  • FIG. 9 is a flow diagram which illustrates a process for an inference pipeline for SSFP data, according to one illustrated implementation.
  • FIG. 10 is a screenshot of an in-application SSFP inference result for LV endo at one time point and slice index.
  • FIG. 11 is a screenshot of an in-application SSFP inference result for LV epi at one time point and slice index.
  • FIG. 12 is a screenshot of an in-application SSFP inference result for RV endo at one time point and slice index.
  • FIG. 13 is a screenshot of an in-application SSFP calculated parameters from automatically segmented ventricles.
  • FIG. 14 is a screenshot which depicts two, three, and four chamber views with parallel lines that indicate the planes of a SAX stack.
  • FIG. 15 is a screenshot which depicts in a left panel two, three, and four chamber views showing a series of segmentation planes that are not parallel for the right ventricle. A right panel depicts a reconstructed image along the highlighted plane seen in the two, three, and four chamber views.
  • FIG. 16 is a screenshot which illustrates segmenting the RV. Points in the contour (right panel) define the spline and are what is stored in a database. The contour is projected into the LAX views (left panel).
  • FIG. 17 is a screenshot which illustrates segmenting the same slice of the RV as in FIG. 16, but with each of the two, three, and four chamber views slightly rotated to emphasize the segmentation plane with a depth effect.
  • FIG. 18 is a schematic diagram which illustrates creation of a lightning memory-mapped database (LMDB) for training with 4D Flow data, according to one illustrated implementation.
  • FIG. 19 is a diagram which shows a multi-planar reconstruction (top left), an RV endo mask (top right), a LV epi mask (bottom left), and an LV endo mask (bottom right) generated from a SAX plane, available labels, and the image data. These masks may be stored in one array, and along with the image, may be stored in an LMDB under a single unique key.
  • FIG. 20 is a diagram which is similar to the diagram of FIG. 19, except the LV epi mask is missing.
  • FIG. 21 is a flow diagram which illustrates an inference pipeline for 4D Flow, according to one illustrated implementation.
  • FIG. 22 is a screenshot depicting an in-application inference for LV endo on a 4D Flow study.
  • FIG. 23 is a screenshot depicting an in-application inference for LV epi on a 4D Flow study.
  • FIG. 24 is a screenshot depicting an in-application inference for RV endo on a 4D Flow study.
  • FIG. 25 is a screenshot which illustrates locating the left ventricle apex (LVA) using a web application, according to one illustrated implementation.
  • FIG. 26 is a screenshot which illustrates locating the right ventricle apex (RVA) using a web application, according to one illustrated implementation.
  • FIG. 27 is a screenshot which illustrates locating the mitral valve (MV) using a web application, according to one illustrated implementation.
  • FIG. 28 is a flow diagram which illustrates a process for creation of a training database, according to one illustrated implementation.
  • FIG. 29 is a diagram which illustrates encoding of a landmark position on an image with a Gaussian evaluated on the image.
  • FIG. 30 is a flow diagram of a preprocessing pipeline for the images and landmarks, according to one illustrated implementation.
  • FIG. 31 is a plurality of screenshots which depict an example of pre-processed an input image and encoded mitral valve landmark for one patient. From top to bottom, from left to right, sagittal, axial, and coronal views are shown.
  • FIG. 32 is a plurality of screenshots which depict an example of pre-processed input image and encoded tricuspid valve landmark for one patient. From top to bottom, from left to right, sagittal, axial, and coronal views are shown.
  • FIG. 33 is a diagram which illustrates prediction of a landmark position from network output.
  • FIG. 34 is an example image showing flow information overlaid on an anatomical image.
  • FIG. 35 is a block diagram of an example processor-based device used to implement one or more of the functions described herein, according to one non-limiting illustrated implementation.
  • FIG. 36 is a diagram of a fully convolutional encoder-decoder architecture with skip connections that utilizes a smaller expanding path than contracting path.
  • FIG. 37 shows box plots comparing the relative absolute volume error (RAVE) between FastVentricle and DeepVentricle for each of left ventricle (LV) Endo, LV Epi, and right ventricle (RV) Endo at ED (left panels) and ES (right panels).
  • FIG. 38 shows a random input (left) that is optimized using gradient descent for DeepVentricle and FastVentricle (middle) to fit the label map (right, RV endo in red, LV endo in cyan, LV epi in blue).
  • FIG. 39 shows examples of network predictions on different slices and time points for studies with low RAVE for both DeepVentricle and FastVentricle.
  • FIG. 40 is an image showing relevant components of the cardiac anatomy as seen on cardiac MRI.
  • FIG. 41 is an image demonstrating an endocardium contour that includes the papillary muscles on the interior.
  • FIG. 42 is an image demonstrating a blood pool or endocardium contour that excludes the papillary muscles from the interior.
  • FIG. 43 is a flow diagram of one implementation of a process for delineating papillary and trabeculae muscles.
  • FIG. 44 is a flow diagram of one implementation of the papillary and trabeculae muscle intensity threshold calculation.
  • FIG. 45 is an illustration of the calculation of the overlap of pixel distribution between the blood pool and the myocardium.
  • FIG. 46 is a flow diagram of one implementation of a process to identify and display myocardial defects.
  • FIG. 47 is an image illustrating the location of the ventricular insertion points.
  • DETAILED DESCRIPTION
  • In the following description, certain specific details are set forth in order to provide a thorough understanding of various disclosed implementations. However, one skilled in the relevant art will recognize that implementations may be practiced without one or more of these specific details, or with other methods, components, materials, etc. In other instances, well-known structures associated with computer systems, server computers, and/or communications networks have not been shown or described in detail to avoid unnecessarily obscuring descriptions of the implementations.
  • Unless the context requires otherwise, throughout the specification and claims that follow, the word “comprising” is synonymous with “including,” and is inclusive or open-ended (i.e., does not exclude additional, unrecited elements or method acts).
  • Reference throughout this specification to “one implementation” or “an implementation” means that a particular feature, structure or characteristic described in connection with the implementation is included in at least one implementation. Thus, the appearances of the phrases “in one implementation” or “in an implementation” in various places throughout this specification are not necessarily all referring to the same implementation. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more implementations.
  • As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. It should also be noted that the term “or” is generally employed in its sense including “and/or” unless the context clearly dictates otherwise.
  • The headings and Abstract of the Disclosure provided herein are for convenience only and do not interpret the scope or meaning of the implementations.
  • SSFP Automated Ventricle Segmentation DeepVentricle Architecture
  • FIG. 6 shows a convolutional neural network (CNN) architecture 600, referred to herein as DeepVentricle, utilized for ventricular segmentation on cardiac SSFP studies. The network 600 includes two paths: the left side is a contracting path 602, which includes convolution layers 606 and pooling layers 608, and the right side is an expanding path 604, which includes upsampling or transpose convolution layers 610 and convolution layers 606.
  • The number of free parameters in the network 600 determines the entropic capacity of the model, which is essentially the amount of information the model can remember. A significant fraction of these free parameters reside in the convolutional kernels of each layer in the network 600. The network 600 is configured such that, after every pooling layer 608, the number of feature maps doubles and the spatial resolution is halved. After every upsampling layer 610, the number of feature maps is halved and the spatial resolution is doubled. With this scheme, the number of feature maps for each layer across the network 600 can be fully described by the number (e.g., between 1 and 2000 feature maps) in the first layer. In at least some implementations, the number of features maps in the first layer is 128. It was found that using additional feature maps improved model accuracy, with a moderate cost of increased computational complexity, memory usage and trained model disk usage. Other values for the number of initial feature maps could also suffice, dependent on the amount of training data and desired tradeoffs between available computational resources and model performance.
  • In at least some implementations, the network 600 includes two convolutional layers 606 before every pooling layer 608, with convolution kernels of size 3×3 and stride 1. Different combinations of these parameters (number of layers, convolution kernel size, convolution stride) may also be used. Based on a hyperparameter search, it was found that four pooling and upsampling operations worked best for the data under examination, though the results are only moderately sensitive to this number.
  • Without applying any padding to input images (this lack of padding is called “valid” padding), convolutions that are larger than 1×1 naturally reduce the size of the output feature maps, as only (image_size−conv_size+1) convolutions can fit across a given image. Using valid padding, output segmentation maps are only 388×388 pixels for input images which are 572×572 pixels. Segmenting the full image therefore requires a tiling approach, and segmentation of the borders of the original image is not possible. In the network 600 according to at least some implementations of the present disclosure, zero-padding of width (conv_size−2) is utilized before every convolution such that segmentation maps are always the same resolution as the input (known as “same” padding). Valid padding may also be used.
  • Downsampling the feature maps with a pooling operation may be an important step for learning higher level abstract features by means of convolutions that have a larger field of view in the space of the original image. In at least some implementations, the network 600 utilizes a 2×2 max pooling operation with stride 2 to downsample images after every set of convolutions. Learned downsampling, i.e., convolving the input volume with a 2×2 convolution with stride 2 may also be used, but such may increase computational complexity. Generally, different combinations of pooling size and stride may also be used.
  • Upsampling the activation volumes back to the original resolution is necessary in a fully convolutional network for pixel-wise segmentation. To increase the resolution of the activation volumes in the network, some systems may use an upsampling operation, then a 2×2 convolution, then a concatenation of feature maps from the corresponding contracting layer through a skip connection, and finally two 3×3 convolutions. In at least some implementations, the network 600 replaces the upsampling and 2×2 convolution with a single transpose convolution layer 610, which performs upsampling and interpolation with a learned kernel, improving the ability of the model to resolve fine details. That operation is followed with the skip connection concatenation, as shown by the bold arrows from the contracting path 602 to the expanding path 604 in FIG. 6. Following this concatenation, two 3×3 convolutional layers are applied.
  • In at least some implementations, rectified linear units (ReLUs) are used for all activations following convolutions. Other nonlinear activation functions, including PReLU (parametric ReLU) and ELU (exponential linear unit) may also be used.
  • Model Hyperparameters
  • Model hyperparameters may be stored in at least one nontransitory processor-readable storage medium (e.g., configuration file) that is read during training. Parameters that describe the model may include:
      • num_pooling_layers: the total number of pooling (and upsampling) layers;
      • pooling_type: the type of pooling operation to use (e.g., max);
      • num_init_filters: the number of filters (convolutional kernels) for the first layer;
      • num_conv_layers: the number of convolution layers between each pooling operation;
      • conv_kernel_size: the edge length, in pixels, of the convolutional kernel;
      • dropout_prob: the probability that a particular node's activation is set to zero on a given forward/backward pass of a batch through the network;
      • border_mode: the method of zero-padding the input feature map before convolution;
      • activation: the nonlinear activation function to use after each convolution;
      • weight_init: the means for initializing the weights in the network;
      • batch_norm: whether or not to utilize batch normalization after each nonlinearity in the down-sampling/contracting part of the network;
      • batch_norm momentum: momentum in the batch normalization computation of means and standard deviations on a per-feature basis;
      • down_trainable: whether to allow the downsampling part of the network to learn upon seeing new data;
      • bridge_trainable: whether to allow the bridge convolutions to learn;
      • up_trainable: whether to allow the upsampling part of the network to learn; and
      • out_trainable: whether to allow the final convolution that produces pixel-wise probabilities to learn.
  • Parameters that describe the training data to use may include:
      • crop_frac: the fractional size of the images in the LMDB relative to the originals;
      • height: the height of the images, in pixels; and
      • width: the width of the images, in pixels.
  • Parameters that describe the data augmentation to use during training may include:
      • horizontal_flip: whether to randomly flip the input/label pair in the horizontal direction;
      • vertical_flip: whether to randomly flip the input/label pair in the vertical direction;
      • shear_amount: the positive/negative limiting value by which to shear the image/label pair;
      • shift_amount: the max fractional value by which to shift the image/label pair;
      • zoom_amount: the max fractional value by which to zoom in on the image/label pair;
      • rotation_amount: the positive/negative limiting value by which to rotate the image/label pair;
      • zoom_warping: whether to utilize zooming and warping together;
      • brightness: the positive/negative limiting value by which to change the image brightness;
      • contrast: the positive/negative limiting value by which to change the image contrast; and
      • alpha, beta: the first and second parameters describing the strength of elastic deformation.
  • Parameters that describe training include:
      • batch_size: the number of examples to show the network on each forward/backward pass;
      • max_epoch: the maximum number of iterations through the data;
      • optimizer_name: the name of the optimizer function to use;
      • optimizer_lr: the value of the learning rate;
      • objective: the objective function to use;
      • early_stopping_monitor: the parameter to monitor to determine when model training should stop training; and
      • early_stopping_patience: the number of epochs to wait after the early_stopping_monitor value has not improved before stopping model training.
  • To choose the optimal model, a random search may be performed over these hyperparameters and the model with the highest validation accuracy may be chosen.
  • Training Database
  • In at least some implementations, a Lightning Memory-mapped Database (LMDB) that stores preprocessed image/segmentation mask pairs for training may be used. This database architecture holds many advantages over other means of storing the training data. Such advantages include: mapping of keys is lexicographical for speed; image/segmentation mask pairs are stored in the format required for training so they require no further preprocessing at training time; and reading image/segmentation mask pairs is a computationally cheap transaction.
  • The training data may generally be stored in a variety of other formats, including named files on disk and real-time generation of masks from the ground truth database for each image. These methods may achieve the same result, though they likely slow down the training process.
  • A new LMDB may be created for every unique set of inputs/targets that are to be used to train a model on. This ensures that there is no slowdown during training for image preprocessing.
  • Treatment of Missing Data During Training
  • Unlike previous models which were only concerned with two classes for a cell discrimination task, foreground and background, the SSFP model disclosed herein attempts to distinguish four classes, namely, background, LV Endocardium, LV Epicardium and RV Endocardium. To accomplish this, the network output may include three probability maps, one for each non-background class. During training, ground truth binary masks for each of the three classes are provided to the network, along with the pixel data. The network loss may be determined as the sum of the loss over the three classes. If any of the three ground truth masks are missing for an image (meaning that we have no data, as opposed to the ground truth being an empty mask), that mask may be ignored when calculating the loss.
  • Missing ground truth data is explicitly accounted for during the training process. For example, the network may be trained on an image for which the LV endocardium contour is defined, even if the LV epicardium and RV endocardium contour locations are not known. A more basic architecture that could not account for missing data could only have been trained on a subset (e.g., 20 percent) of training images that have all three types of contours defined. Reducing the training data volume in this way would result in significantly reduced accuracy. Thus, by explicitly modifying the loss function to account for missing data, the full training data volume is used, allowing the network to learn more robust features.
  • Training Database Creation
  • FIG. 7 shows a process 700 for creation of an SSFP LMDB. At 702, contour information is extracted from an SSFP ground truth database 704. These contours are stored in the ground truth database 704 as dictionaries of contour X positions and Y positions, associated with specific SSFP slice locations and time points. At 706, the pixel data from the corresponding SSFP DICOM (Digital Imaging and Communications in Medicine) image 708 is paired with a Boolean mask created from this information. At 710, the system preprocesses the images and masks by normalizing the images, cropping the images/masks, and resizing the images/masks. In at least some implementations, the MRIs are normalized such that they have a mean of zero and that the 1st and 99th percentile of a batch of images fall at −0.5 and 0.5, i.e., their “usable range” falls between −0.5 and 0.5. The images may be cropped and resized such that the ventricle contours take up a larger percentage of the image. This results in more total foreground class pixels, making it easier to resolve fine details (especially the corners) of the ventricles and helping the model converge, all with less computing power.
  • At 712, a unique key for SSFP LMDBs is defined to be the combination of the series instance UID and SOP instance UID. At 714, the image and mask metadata, including the time point, slice index and LMDB key are stored in a dataframe. At 716, the normalized, cropped, and resized image and the cropped and resized mask are stored in the LMDB for each key.
  • DeepVentricle Training
  • FIG. 8 shows a process 800 that illustrates model training. In at least some implementations, Keras, an open-source wrapper built on TensorFlow, may be used to train models. However, equivalent results may be achieved using raw TensorFlow, Theano, Caffe, Torch, MXNet, MATLAB, or other libraries for tensor math.
  • In at least some implementations, a dataset may be split into a training set, validation set, and test set. The training set is used for model gradient updates, the validation set is used to evaluate the model on “held out” data during training (e.g., for early stopping), and the test set is not used at all in the training process.
  • At 802, training is invoked. At 804, image and mask data is read from the LMDB training set, one batch at a time. At 806, in at least some implementations the images and masks are distorted according to the distortion hyperparameters stored in a model hyperparameter file, as discussed above. At 808, the batch is processed through the network. At 810, the loss/gradients are calculated. At 812, weights are updated as per the specified optimizer and optimizer learning rate. In at least some implementations, loss may be calculated using a per-pixel cross-entropy loss function and the Adam update rule.
  • At 814, the system may determine whether the epoch is complete. If the epoch is not complete, the process returns to act 804 to read another batch of training data. At 816, if the epoch is complete, metrics are calculated on the validation set. Such metrics may include, for example, validation loss, validation accuracy, relative accuracy versus a naive model that predicts only the majority class, f1 score, precision, and recall.
  • At 818, validation loss may be monitored to determine if the model improved. At 820, if the model did improve, the weights of the model at that time may be saved. At 822, the early stopping counter may be reset to zero, and training for another epoch may begin at 804. Metrics other than validation loss, such as validation accuracy, could also be used to indicate evaluate model performance. At 824, if the model did not improve after an epoch, the early stopping counter is incremented by 1. At 826, if the counter has not reached its limit, training is begun for another epoch at 804. At 828, if the counter has reached its limit, training the model is stopped. This “early stopping” methodology is used to prevent overfitting, but other methods of overfitting prevention exist, such as utilizing a smaller model, increasing the level of dropout or L2 regularization, for example.
  • At no point is data from the test set used when training the model. Data from the test set may be used to show examples of segmentations, but this information is not used for training or for ranking models with respect to one another.
  • Inference
  • Inference is the process of utilizing a trained model for prediction on new data. In at least some implementations, a web application (or “web app”) may be used for inference. FIG. 9 displays an example pipeline or process 900 by which predictions may be made on new SSFP studies. At 902, after a user has loaded a study in the web application, the user may invoke the inference service (e.g., by clicking a “generate missing contours” icon), which automatically generates any missing (not yet created) contours. Such contours may include LV Endo, LV Epi, or RV Endo, for example. In at least some implementations, inference may be invoked automatically when the study is either loaded by the user in the application or when the study is first uploaded by the user to a server. If inference is performed at upload time, the predictions may be stored in a nontransitory processor-readable storage medium at that time but not displayed until the user opens the study.
  • The inference service is responsible for loading a model, generating contours, and displaying them for the user. After inference is invoked at 902, at 904 images are sent to an inference server. At 906, the production model or network that is used by the inference service is loaded onto the inference server. The network may have been previously selected from the corpus of models trained during hyperparameter search. The network may be chosen based on a tradeoff between accuracy, memory usage and speed of execution. The user may alternatively be given a choice between a “fast” or “accurate” model via a user preference option.
  • At 908, one batch of images at a time is processed by the inference server. At 910, the images are preprocessed (e.g., normalized, cropped) using the same parameters that were utilized during training, discussed above. In at least some implementations, inference-time distortions are applied and the average inference result is taken on, for example, 10 distorted copies of each input image. This feature creates inference results that are robust to small variations in brightness, contrast, orientation, etc.
  • Inference is performed at the slice locations and time points in the requested batch. At 912, a forward pass through the network is computed. For a given image, the model generates per-class probabilities for each pixel during the forward pass, which results in a set of probability maps, one for each class, with values ranging from 0 to 1. The probability maps are transformed into a single label mask by setting the class of each pixel to the class with the highest label map probability.
  • At 914, the system may perform postprocessing. For example, in at least some implementations, if all probabilities for a pixel are below 0.5, the pixel class for that pixel is set to background. Further, to remove spurious predicted pixels, any pixels in the label map that are not part of the largest connected region for that class may be converted to background. In at least some implementations, spurious pixels may be removed by comparing neighboring segmentation maps in time and space and removing outliers. Alternately, because a given ventricles may occasionally appear in a single slice as two distinct connected regions because, for example, the RV is non-convex near the base of the heart, multiple connected regions may be allowed but small regions or regions that are distant from centroid of all detected regions across slice locations and times may be removed.
  • In at least some implementations, postprocessing to satisfy one or more physical constraints may be performed at 914. For example, postprocessing may ensure that the myocardium volume is the same at all time points. To achieve this, the system may dynamically adjust the threshold used to binarize the endocardium and epicardium probability maps before converting them to contours. The thresholds can be adjusted to minimize the discrepancy in reported myocardium volume using nonlinear least squares, for example. As another example of a physical constraint, the postprocessing act may ensure that the RV and LV do not overlap. To achieve this, the system may only allow any given pixel to belong to one class, which is the class with the highest inferred probability. The user may have a configuration option to enable or disable imposition of selected constraints.
  • At 916, if not all batches have been processed, a new batch is added to the processing pipeline at 908 until inference has been performed at all slice locations and all time points.
  • In at least some implementations, once the label mask has been created, to ease viewing, user interaction, and database storage, the mask may be converted to a spline contour. The first step is to convert the mask to a polygon by marking all the pixels on the border of the mask. This polygon is then converted to a set of control points for a spline using a corner detection algorithm, based on A. Rosenfeld and J. S. Weszka. “An improved method of angle detection on digital curves.” Computers, IEEE Transactions on, C-24(9):940-941, September 1975. A typical polygon from one of these masks will have hundreds of vertices. The corner detection attempts to reduce this to a set of approximately sixteen spline control points. This reduces storage requirements and results in a smoother-looking segmentation.
  • At 918, these splines are stored in a database and displayed to the user in the web application. If the user modifies a spline, the database may be updated with the modified spline.
  • In at least some implementations, volumes are calculated by creating a volumetric mesh from all vertices for a given time point. The vertices are ordered on every slice of the 3D volume. An open cubic spline is generated that connects the first vertex in each contour, a second spline that connects the second vertex, etc., for each vertex in the contour, until a cylindrical grid of vertices is obtained which is used to define the mesh. The internal volume of the polygonal mesh is then calculated. Based on calculated volumes, which time points represent the end systole phase and end diastole phase is autonomously determined based on the times with the minimum and maximum volumes, respectively, and these time points are labeled for the user.
  • FIGS. 10, 11, and 12 show example images 1000, 1100 and 1200, respectively, of in-application inference results for LV Endo contour 1002, LV Epi contour 1102, and RV Endo contour 1202, respectively, at a single time point and slice location.
  • At the same time that contours (e.g., contours 1002, 1102 and 1202) are displayed to the user, the system calculates and shows ventricle volumes at ED and ES to the user, as well as multiple computed measurements. An example interface 1300 is shown in FIG. 13 which displays multiple computed measurements. In at least some implementations, these measurements include stroke volume (SV) 1302, which is the volume of blood ejected from the ventricle in one cardiac cycle; ejection fraction (EF) 1304, which is the fraction of the blood pool ejected from the ventricle in one cardiac cycle; cardiac output (CO) 1306, which is the average rate at which blood leaves the ventricle, ED mass 1308, which is the mass of the myocardium (i.e., epicardium-endocardium) for the ventricle at end diastole; and ES mass 1310, which is the mass of the myocardium for the ventricle at end systole.
  • For 4D Flow data, the same DeepVentricle architecture, hyperparameter search methodology, and training database as described above for SSFP data may be used. Training a 4D Flow model may be the same as in the SSFP operation discussed above, but the creation of an LMDB and inference may be different for the 4D Flow implementation.
  • Training Database Creation for 4D Flow Data
  • Whereas the SSFP DICOM files are acquired and stored in SAX orientation, 4D Flow DICOMs are collected and stored as axial slices. In order to create a SAX multi-planar reconstruction (MPR) of the data, the user may need to place the relevant landmarks for the left and/or right heart. These landmarks are then used to define unique SAX planes for each ventricle as defined by the ventricle apex and valves. FIG. 14 shows a set 1400 of SAX planes (also referred to as a SAX stack) for the LV in which each SAX plane is parallel for a two chamber view 1402, a three chamber view 1404 and a four chamber view 1406.
  • The application may also allow the user to have SAX planes that are not parallel if desired. FIG. 15 shows a set 1500 of views of a SAX stack in which the segmentation planes are not parallel for the RV for a two chamber view 1502, a three chamber view 1504, a four chamber view 1506 and a reconstructed image 1508. This is motivated by the fact that it is slightly easier to segment the ventricle if the segmentation plane does not intersect the valve plane but rather is parallel to it. However, this is not a requirement, and it is possible to get accurate results without using this feature.
  • As shown in the images 1600 and 1700 of FIGS. 16 and 17, respectively, segmentations are performed on a multi-planar reconstruction of the image data on each SAX plane. Points 1602 on a contour 1604 in an image 1606 define the spline and are what is stored in the database. The contour 1604 is projected into a two chamber LAX view 1608, three chamber LAX view 1610 and four chamber LAX view 1612. FIG. 17 shows images 1702, 1704, 1706 and 1708 in which the same slice of FIG. 16 is segmented, but with each of the two chamber view 1704, three chamber view 1706 and four chamber view 1708 slightly rotated to emphasize the segmentation plane with a depth effect.
  • FIG. 18 shows a process 1800 of creating a training LMDB from clinician annotations. 4D Flow annotations may be stored in a MongoDB 1802. At 1804 and 1806, the system extracts the contours and landmarks, respectively. Contours are stored as a series of (x, y, z) points defining the splines of the contour. Landmarks are stored as a single four-dimensional coordinate (x, y, z, t) for each landmark.
  • At 1808, in order to convert the contours into boolean masks, the system calculates a rotation matrix to rotate the contour points onto the x-y plane. The system may also define a sampling grid, i.e., a set of (x, y, z) points, on the original plane of the contour. The system rotates both the contour and the sampling grid by the same rotation matrix such that they are in the x-y plane. It is now a simple task to determine which points of the sampling grid are within the 2D vertices defining the contour. This is a simple computational geometry problem for 2D polygons.
  • 4D Flow DICOMs are stored in a database 1810. At 1812, the system uses the landmark annotations from act 1806 and the 4D Flow DICOMs from the database 1810 to define and generate images along a SAX stack. In general, this SAX stack is different from that of the original SAX stacks in which the ground truth contours were defined. The system defines the stack to be orthogonal to the line connecting the left ventricle apex (LVA) and the mitral valve (MV). Other combinations of appropriate landmarks, such as the right ventricle apex (RVA) and tricuspid valve (TV), would work similarly.
  • In at least some implementations, the system defines there to be a number (e.g., 14) of slices between the LVA and MV, as this is similar to the number of slices in most SSFP SAX stacks. Different numbers of slices may also be used. More slices would increase the training set diversity, though the actual on-disk size would increase more rapidly than the increase in diversity. The results are expected to be insensitive to the exact number of slices.
  • Four slices may be appended to the SAX stack past the LVA and four more past the MV. This ensures that the full ventricle is within the SAX stack. The results are likely insensitive to the exact number of additional slices used. The SAX stack may be oriented such that the RV is always on the left of the image (as is conventional in cardiac MR) by ensuring that aortic valve (AV) is oriented to the right of the line connecting the LVA to the MV. Although consistency of orientation is likely important for achieving good results, the exact chosen orientation is arbitrary.
  • At 1814, in at least some implementations, all the available contours for a given study are interpolated to be on a single non-curved SAX stack for simplicity and speed in training and inference. Once the planes of the SAX stack are defined, a linear interpolator is set up for each ventricle and time point described by the original sampling grids, i.e., series of (x, y, z) points, and their corresponding masks. The system then interpolates the ground truth masks from their original SAX stacks onto the study's common SAX stack. An example of this is shown in the view 1900 of FIG. 19, which shows a multi-planar reconstruction 1902, an RV endo mask 1904, an LV epi mask 1906 and an LV endo mask 1908. A sentinel is used within the interpolated ground truth masks to indicate when labels are missing. An example visualization 2000 of this is shown in FIG. 20, which shows a multi-planar reconstruction 2002, an RV endo mask 2004, a missing LV epi mask 2006, and an LV endo mask 2008.
  • In at least some implementations, instead of projecting the ground truth masks to a common SAX stack, the masks may be projected onto the axial plane and perform training and inference in the axial plane. This may achieve similar accuracy, but may result in a slight loss of resolution due to the need to re-project inferred contours back into the SAX stack to display within the application's user interface.
  • At 1816, the system performs preprocessing operations. For example, preprocessing acts may include normalizing the images, cropping the images/masks, and resizing the images/masks.
  • At 1818, the system defines a unique key for 4D Flow LMDBs to be a 32 character hash of the string combination of the time index, slice index, side (“right” or “left”), layer (“endo” or “epi”), upload ID, workspace ID (unique identifier for one person's annotations), and workflow key (unique identifier for a given user's workflow in which they did the work). Any of a number of other unique keys for each image/mask pair may alternatively be used. At 1820, the system stores the image and mask metadata, including the time point, slice index and LMDB key in a dataframe. The normalized, cropped, and resized image and the cropped and resized mask are stored in an LMDB 1822 for each key.
  • DeepVentricle Inference for 4D Flow Data
  • As with SSFP DeepVentricle inference discussed above, a web application may be used for inference for 4D Flow data. FIG. 21 shows a pipeline for a process 2100 by which the system makes predictions on new 4D Flow studies. At 2102, after a user has loaded a study in the web application, the user can invoke the inference service through a pipeline that is similar to the inference pipeline described above and shown in FIG. 9. Landmarks have already been defined, either manually or automatically (e.g., by an automatic landmark finding algorithm discussed below).
  • The landmark positions are used to create a standard LV SAX stack on which to perform inference. The SAX stack is created in the same way that the SAX stack was created during training, described above. At 2104, the metadata required to describe each MPR in the SAX stack is calculated from the landmark locations. The plane of each MPR is fully defined by a point on the plane and the normal of the plane, but the system also passes the vector connecting the mitral valve and aortic valve in this implementation to ensure the image will be oriented correctly. That is, the right ventricle is to the left in the images. Another set of landmarks, such as the mitral valve and tricuspid valve, may also suffice for ensuring the right ventricle was to the left in the images.
  • At 2106, the MPR metadata is then sent to the compute servers, which hold a distributed version of the data (each compute node has a few time points of data). At 2108, each node renders the requested MPRs for the time points it has available. At 2110, the generated MPR images, along with their metadata, including the time point, orientation, position, and slice index, are then distributed evenly by time point across multiple inference servers. At 2112, the network is loaded onto each inference node.
  • At 2114, one batch of images at a time is processed by each inference node. At 2116, the images are preprocessed. At 2118, a forward pass is computed. At 2120, the predictions are postprocessed, and spline contours are created in the same way as in the SSFP implementations discussed above.
  • At 2122, the generated splines are forwarded back to the web server after all batches have been processed, where the splines are joined with the inference results from other inference nodes. The web server ensures that the volume is contiguous (i.e., no missing contours in the middle of the volume) by interpolating between neighboring slices if a contour is missing. At 2124, the web server saves the contours in the database, then presents the contours to the user via the web application. If the user edits a spline, the spline's updated version is saved in the database alongside the original, automatically-generated version. In at least some implementations, comparing manually edited contours with their original, automatically-generated versions, may be used to retrain or fine-tune a model only on inference results that required manual correction.
  • FIGS. 22, 23, and 24 show images 2200, 2300 and 2400, respectively, of in-application inference for LV Endo (contour 2202), LV Epi (contour 2302), and RV Endo (contour 2402), respectively at a single time point and slice location. As with SSFP, the calculated volumes at ED and ES may be presented to the user, as well as multiple computed measurements (see FIG. 13).
  • 3D End-to-End Convolutional Architecture
  • Another approach for end-to-end segmentation of the ventricles is to utilize volumetric images, volumetric masks, and 3D convolutional kernels throughout. Both the description and operation of this implementation closely follow that of the SSFP implementation discussed above, but with a few key differences. Thus, for brevity, the following discussion focuses primarily on such differences.
  • The DeepVentricle architecture for this implementation is nearly identical to that discussed above, except convolutional kernels are (N×M×K) pixels rather than just (N×M) pixels, where N, M and K are positive integers which may be equal to or different from each other. The model parameters also look similar, but the addition of a depth component in describing the training data may be necessary to fully describe the shape of the volumetric input image.
  • A training LMDB is utilized for this implementation, as with other implementations. The LMDB for this implementation may be created in a similar way to that of the 4D Flow implementation discussed above. However, for this implementation, many more slices are utilized to define the SAX such that the slice spacing between neighboring slices is similar to that of the pixel spacing in the x and y directions (i.e., pixel spacing is nearly isotropic in three dimensions). It is likely that similar results could be achieved with non-isotropic pixel spacing, as long as the ratio between pixel spacings is conserved across all studies. The SAX MPRs and masks are then ordered by spatial slice and these slices are concatenated into one coherent volumetric image. Model training occurs according to the same pipeline as described above with reference to FIG. 8.
  • The inference pipeline closely resembles that of the 4D Flow implementation as well. However, in this implementations neighboring MPRs need to be concatenated into one volumetric image before inference.
  • Exclusion of Papillary Muscles
  • An additional implementation of the DeepVentricle automatic segmentation model is one in which only the blood pool of the ventricle is segmented and the papillary muscles are excluded. In practice, because the papillary muscles are small and irregularly shaped, they are typically included in the segmented areas for convenience. The architecture, hyperparameters, and training database of this implementation, which excludes the papillary muscles from the blood pool, are all similar to the SSFP implementation discussed above. However, in this implementation the ground truth segmentation database contains left and right ventricle endocardium annotations that exclude the papillary muscles rather than include them.
  • Because segmentations that exclude the papillary muscles from endocardium contours are onerous to create, the quantity of training data may be significantly less than what can be acquired for segmentations that do not exclude the papillary muscles. To compensate for this, first a convolutional neural network that was trained on data for which papillary muscles were included in endocardium segmentations and excluded from epicardium segmentations may be used. This allows the network to learn to segment the general size and shape of each class. That network is then fine-tuned on a smaller set of data that excludes the papillary muscles from the segmentation. The result is a segmentation model that segments the same classes as before, but with papillary muscles excluded from endocardium segmentations. This results in a more accurate measure of ventricular blood pool volume than has been previously available when the papillary muscles were included within the endocardium contour.
  • Synthesis of Other Views for Automated Volumes
  • Traditional image classification or segmentation neural network architectures operate on a single, possibly multi-channel (e.g., RGB), possibly volumetric, image at a time. A standard 2D approach includes the network operating on a single slice from the 3D volume at a time. In this case, only the information from that single slice is used to classify or segment the data in that slice. The problem with this approach is that no context from surrounding time points or slices is incorporated into inference for the slice of interest. A standard 3D approach utilizes a 3D kernel and incorporates volumetric information in order to make volumetric predictions. However, this approach is slow and requires significant computational resources for both training and inference.
  • A few hybrid approaches, discussed below, may be used optimize the tradeoff between memory/computation and the availability of spatiotemporal context to the model. Spatial context is particularly useful for ventricular segmentation near the base of the heart, where the mitral and tricuspid valves are difficult to distinguish on a single 2D slice. Temporal context, and enforcing consistency of the segmentations, may be useful for all parts of the segmentation.
  • In a first approach, the problem is interpreted as 2D problem, making predictions on a single slice at a time, but with adjacent slices (either in space, time or both) interpreted as additional “channels” of the image. For example, at time t=5 and slice=10, a 9-channel image may be created where the data at the following time/slice combinations is packed into the following 9 channels: t=4, slice=9; t=4, slice=10; t=4, slice=11; t=5, slice=9; t=5, slice=10; t=5, slice=11; t=6, slice=9; t=6, slice=10; and t=6, slice=11. In this configuration, the network operates with 2D convolutions, but incorporates data from nearby spatial and temporal locations, and synthesizes the information via the standard neural network technique of creating feature maps via linear combinations of the input channels convolved with learned kernels.
  • A second approach is specific to some intricacies of cardiac MM, though it may be used in any scenario in which orthogonal (or oblique) planes of data are acquired. In standard SSFP cardiac MRI, a short axis (SAX) stack is acquired along with one or more long axis (LAX) planes. The LAX planes are orthogonal to the SAX stack, and the LAX planes typically have significantly higher spatial resolution in the direction along the left ventricle's long axis. That is, an LAX image created by an MPR of a SAX stack has poorer resolution than a native LAX image, since the SAX inter-slice spacing is significantly coarser than the LAX in-plane pixel spacing. Because of the higher spatial resolution in the long axis direction, it is much easier to see the valves in the LAX images compared with the SAX images.
  • Thus, a two-stage ventricle segmentation model may be utilized. In a first stage, the ventricles are segmented in one or more LAX planes. Because of the high spatial resolution of these images, the segmentation can be very precise. A disadvantage is the LAX plane consists of only a single plane instead of a volume. If this LAX segmentation is projected to the SAX stack, the LAX segmentation appears as a line on each of the SAX images. This line may be created precisely if the line is aggregated across segmentations from multiple LAX views (e.g., 2CH, 3CH, 4CH; see the heading “Interface for defining valve planes for manual LV/RV volumes” below). This line may be used to bound the SAX segmentation, which is generated via a different model that operates on the SAX images. The SAX segmentation model uses both the raw SAX DICOM data as well as the predicted projected lines from the LAX model(s) as inputs in order to make its prediction. The predicted LAX lines serve to guide and bound the SAX predictions, and particularly aid the model near the base of the heart and valve plane, where the segmentations are often ambiguous when viewed on the SAX stack alone.
  • This technique may be used for any cardiac imaging, including 4D Flow in which the entire volume is acquired at once (and SAX and LAX images are not collected separately), and has the advantage of requiring only 2D kernels to be employed, albeit in two chained models.
  • Use of Time or Flow Information for Automated Volumes
  • SSFP cine studies contain of 4 dimensions of data (3 space, 1 time), and 4D Flow studies contain 5 dimensions of data (3 space, 1 time, 4 channels of information). These 4 channels of information are the anatomy (i.e. signal intensity), x axis phase, y axis phase, and z axis phase. The simplest way to build a model uses only signal intensities at each 3D spatial point, and does not incorporate the temporal information or, for 4D Flow, the flow information. This simple model takes as input 3D data cubes of shape (x, y, z).
  • To capitalize on all the data available, in at least some implementations, the time and phase data are incorporated as well. This is particularly useful for at least a few reasons. First, because the movement of the heart usually follows a predictable pattern during the cardiac cycle, relative movement of pixels can particularly help identify anatomical regions. Second, usually around 20 time points are recorded for a cardiac cycle, which means that the heart moves only slightly between frames. Knowing that predictions should change only slightly between frames can serve as a way of regularizing the model output. Third, flow information can be used to locate structures, such as valves, which have very regular flow patterns that vary between low and high flow.
  • To incorporate time data, time may be added as an additional “channel” to the intensity data. In such implementations, the model then takes as input 3D data blobs of shape (X, Y, NTIMES) or 4D data blobs of shape (X, Y, Z, NTIMES), where NTIMES is the number of time points to include. This may be all time points, or a few time points surrounding the time point of interest. If all time points are included, it may be desirable or necessary to pad the data with a few “wrapped around” time points, since time represents a cardiac cycle and is intrinsically cyclical. The model may then either involve 2D/3D convolutions with time points as additional “channels” of the data, or 3D/4D convolutions. In the former case, the output may be 2D/3D at a single time of interest. In the latter case, the output may be 3D/4D and may include data at the same time points as were included in the input.
  • Phase data, as acquired in 4D Flow, may also be incorporated in an analogous way, using either each direction of phase (x, y, z) as an additional channel of the input data, or using only the phase magnitude as a single additional channel. Without time, and with all three components of flow, the input has shape (X, Y, Z, 4) where the 4 indicates pixel intensity and the three components of phase. With time, this shape is (X, Y, Z, NTIMES, 4). In such implementations, the model therefore operates with 4 or 5-dimensional convolutions.
  • Automated 4D Flow Landmarks
  • Systems and methods discussed herein also allow for automated detection of the region of multiple cardiac landmarks in a 3D MRI. The system handles diverse sets of MRIs with varying position, orientation, and appearance of the imaged heart. Moreover, the system effectively deals with the problem of learning from a database with incomplete annotations. More specifically, the system addresses the problem of detecting every landmark in an image, when only some landmarks have been located for each input volumetric image on the training set.
  • Generally, the pipeline is an end-to-end machine learning algorithm which autonomously outputs the required landmark position from raw 3D images. Advantageously, the system requires no pre-processing or prior knowledge from the user. Further, the detected landmarks in the volumetric image may be used to project the image along the 2CH, 3CH, 4CH, and SAX views. Such allows these views to be created automatically, with no intervention by the user.
  • Initially, a first implementation of the solution is discussed. In this implementation, cardiac landmarks are located using a neural network with many layers. The architecture is three dimensional (3D) and uses 3D convolutions. This description focuses on the detection of three left ventricle landmarks (LV apex, mitral valve, and aortic valve), and three right ventricle landmarks (RV apex, tricuspid valve, and pulmonary valve). However, it is noted that this method may be applied for the detection of more diverse cardiac landmarks with comparable results, if these annotations are available as part of the ground truth.
  • Similar to the previously described DeepVentricle architecture, the landmark detection method of the present disclosure is based on convolutional neural networks. The information necessary for landmark detection is extracted from a database of clinical images, along with their annotations (i.e., landmark positions). FIGS. 25, 26, and 27 show images 2500, 2600, 2700, respectively, of three patients where the left ventricle apex, mitral valve, and right ventricle apex, respectively, have been positioned using a web application, such as the web application discussed above. Note how annotations for the aortic valve, pulmonary valve, and tricuspid valve are missing in this example.
  • First, the data handling pipeline is described. This section details the process which is followed to create the database of images with their annotations, along with the specific method used to encode landmark location. Second, the architecture of the machine learning approach is presented. How the network transforms the input 3D image into a prediction of landmark location is presented. Third, how the model is trained to the available data is described. Finally, the inference pipeline is detailed. It is shown how one can apply the neural network to an image never used before to predict the region of all six landmarks.
  • Data Handling Pipeline
  • For the presented machine learning approach, a database of 4D Flow data is used, which includes three dimensional (3D) magnetic resonance images (MRI) of the heart, stored as series of two dimensional (2D) DICOM images. Typically, around 20 3D volumetric images are acquired throughout a single cardiac cycle, each corresponding to one snapshot of the heartbeat. The initial database thus corresponds to the 3D images of different patients at different time steps. Each 3D MRI presents a number of landmark annotations, from zero landmark to six landmarks, placed by the user of the web application. The landmark annotations, if present, are stored as vectors of coordinates (x, y, z, t) indicating the position (x, y, z) of the landmark in the 3D MRI corresponding to the time point t.
  • FIG. 28 shows a process 2800 which may be followed to handle 2D DICOM slices of the 4D Flow images 2802, and the annotations 2804 stored in a MongoDB database.
  • At 2806, the landmark coordinates are extracted from the MongoDB database. Then, at 2808, the 3D MRIs are extracted from the series of 2D DICOM images by stacking 2D DICOM images from a single time point together according to their location along the z-axis (i.e., the 2D images are stacked along the depth dimension to create 3D volumes). This results in a volumetric 3D image representing a full view of the heart. The LMDB is built with 3D images that have been annotated with at least one landmark position. This means that images with no ground truth landmarks are not included in the LMDB.
  • At 2810, the label maps are defined which encode the annotation information in a way understandable by the neural network which will be used in later stages. The position of a landmark is encoded by indicating, at each position in the 3D volume, how likely the position is to be at the landmark position. To do so, a 3D Gaussian probability distribution is created, centered on the position of the ground truth landmark with standard deviation corresponding to observed inter-rater variability of that type of landmark across all the training data.
  • To understand inter-rater variability, consider one specific landmark such as the LV apex. For every study in which the LV Apex was annotated by more than one user or “rater,” the standard deviation of the LV Apex coordinates across all users is computed. By repeating this process for each landmark, the standard deviation for Gaussian used to encode each landmark is defined. This process allows for the setting of this parameter in a principled manner. Among the different advantages of using this approach, it is note that the standard deviation is different for each landmark, and depends on the complexity of locating the landmark. Specifically, more difficult landmarks have larger Gaussian standard deviation in the target probability maps. Further, the standard deviation is different along the x, y, and z axis, reflecting the fact that the uncertainty might be larger along one direction rather than another because of the anatomy of the heart and/or the resolution of the images.
  • Note that alternative strategies may also be used to define the standard deviation (arbitrary value, parameter search) and may lead to comparable results. FIG. 29 shows this transition from a landmark position, identified with a cross 2902 in a view 2904, to a Gaussian 2906 in a view 2908 evaluated on the image for the 2D case.
  • At 2812, once the 3D volumes have been defined for both the MRI and the label map, the images are prepocessed. Generally, the goal is to normalize the images size and appearance for future training.
  • FIG. 30 shows a process 3000 for a preprocessing pipeline. At 3006 and 3008, the 3D MRIs 3002 and label maps 3004, respectively, are resized to a predefined size $n_x \times n_y \times n_z$ such that all of the MRIs can be fed to the same neural network. At 3010, the intensity of the MRI pixels are clipped between the 1st and 99th percentile. This means that the pixel intensity will saturate at the value of the intensity corresponding to the 1st and 99th percentile. This removes outlier pixel intensities that may be caused by artifacts. At 3012, the intensities are then scaled to lie between 0 and 1. At 3014, the intensity histogram is then normalized using contrast limited adaptive histogram equalization to maximize contrast in the image and minimize intra-image intensity differences (as may be caused by, for example, magnetic field inhomogeneities). Finally, at 3016 the image is centered to have zero mean. Other strategies may be used for the normalization of the image intensity, such as normalizing the variance of the input to one, and may yield similar results. This pipeline results in preprocessed images 3018 and labels 3020 which can be fed to the network.
  • FIGS. 31 and 32 show example images 3100 and 3200 for two patients of the pre-processed 3D MRI and encoded labels. In particular, FIG. 31 shows a sagittal view 3102, an axial view 3104, and a coronal view 3106 of a preprocessed input image and encoded mitral valve landmark for one patient, and FIG. 32 shows a sagittal view 3202, an axial view 3204, and a coronal view 3206 of a preprocessed input image and encoded mitral valve landmark for another patient. As can be seen in FIGS. 31 and 32, the uncertainty for the localization of the tricuspid valve is larger than the uncertainty for the mitral valve. Moreover, the uncertainty is different from one axis to the other.
  • Returning to FIG. 28, at 2814 an upload ID is defined to be the key that identifies the pair (MRI, label map), which is stored in a training LMDB database at 2816. Finally, at 2818 the pair (MRI, label map) is written to the LMDB.
  • Network Architecture
  • As noted above, a deep neural network is used for the detection of the landmarks. The network takes as input a preprocessed 3D MRI and outputs six 3D label maps, one per landmark. The architecture used in this implementation is similar or identical to the architecture described above. The network is composed of two symmetric paths: a contracting path and an expanding path (see FIG. 6).
  • As not all landmarks may not be available in the available training data, the systems and methods of the present disclosure advantageously handle missing information in the labels while still being able to predict all landmarks simultaneously.
  • The network used for landmark detection differs from the DeepVentricle implementation discussed above in three main ways. First, the architecture is three dimensional: the network processes a 3D MRI in a single pass, producing a 3D label map for every landmark. Second, the network predicts 6 classes, one for each landmark. Third, the parameters selected after the hyperparameter search can differ from the DeepVentricle parameters, and are specifically selected to solve the problem at hand. Additionally, the standard deviation used to define the label maps, discussed above, may be considered as a hyperparameter. The output of the network is a 3D map which encodes where the landmark is positioned. High values of the map may correspond to likely landmark position, and low values may correspond to unlikely landmark position.
  • Training
  • The following discussion describes how the deep neural network can be trained using the LMDB database of 3D MRI and label map pairs. The overall objective is to tune the parameters of the network such that the network is able to predict the position of the heart landmarks on previously unseen images. A flowchart of the training process is shown in FIG. 8 and described above.
  • The training database may be split into a training set used to train the model, a validation set used to quantify the quality of the model, and a test set. The split enforces all the images from a single patient to lie in the same set. This guarantees that the model is not validated with patients used for training. At no point is data from the test set used when training the model. Data from the test set may be used to show examples of landmark localization, but this information is not used for training or for ranking models with respect to one another.
  • During the training, the gradient of the loss is used in order to update the parameters of the neural network. In at least some implementations, weighting the loss in regions where the landmark is present may be utilized to provide faster convergence of the network. More precisely, when computing the loss, a larger weight may be applied to the region of the image near the landmarks, compared to the rest of the image. As a result, the network converges more rapidly. However, using non weighted-loss may also be used with good results, albeit with a longer training time.
  • Inference
  • Given a new image, the landmark position is inferred by pre-processing the image in a similar fashion as to what is described above with reference to FIG. 28. More precisely, the image may be resized, clipped, scaled, the image's histogram equalized, and the image may be centered. The network outputs one map per landmark, for a total of six 3D maps in the case of six landmarks. These maps describe the probability that each landmark is found in a particular position. Alternatively, the maps can be considered as encoding an inverse distance function from the true location of the landmark (i.e., high value results in small distance, low value results in large distance).
  • As such, as shown in a diagram 3300 of FIG. 33, the landmark position 3302 can be determined by looking for the maximum value of the output of the neural network for each landmark. This position is then projected into the space of the original non-preprocessed 3D input MRI for the final landmarks localization (e.g., undoing any spatial distortions that were applied to the volume during inference). Note that several other strategies may be used to translate the label maps into landmark position coordinates. For instance, one could take the expected location using the label map as a 3D probability density. Note that taking the maximum corresponds to considering the mode of the density. Alternatively, the probability estimate may be first smoothed before selecting the maximum or expected value as the location.
  • Data Gathering
  • In at least some implementations, the dataset is made of clinical studies uploaded on the web application by previous users. The annotations may be placed by the user on the different images. As explained previously, this dataset is split into a train, validation, and test set.
  • The neural network may be trained using the pipeline previously described above and shown in FIG. 8. Batches of data extracted from the train set are sequentially fed to the neural network. The gradient of the loss between the network prediction and the real landmark location is computed and backpropagated to update the intrinsic parameters of the network. Other model hyperparameters (e.g., network size, shape) are chosen using hyper-parameter search, as discussed above.
  • Model Accessibility
  • The trained model may be stored on servers as part of a cloud service. The model can be loaded on multiple servers at inference time in order to carry the detection of landmarks at several time points in parallel. This process is similar to the approach used for DeepVentricle, which is shown in FIG. 9 and discussed above.
  • User Interactions
  • When a cardiac MRI is uploaded to the web application, the user can select a “Views” button under a “Cardiac” section. This opens a new panel on the right of the image with a “Locate Landmarks” button. Selecting this button automatically locates the six landmarks on every 3D image at every time point. A list of the located landmark is visible on the right panel. Selecting the landmark name brings the focus of the image to the predicted landmark location, and the user is allowed to make any modifications deemed necessary. Once the user is satisfied, the user can select a “standard views” button which creates the standard 2, 3, 4 chamber and SAX views of the heart.
  • In at least some implementations, the 3D images acquired are 4D Flow sequences. This means that the phase of the signal is also acquired, and may be used to quantify the velocity of the blood flow in the heart and arteries, as shown in the image 3400 of FIG. 34 which shows four different views. This information can be useful to locate the different landmarks of the heart. In this case, the previously described model may be augmented to include flow information.
  • Image Pre-Processing
  • In 4D Flow, flow velocity information is available at every time point of the acquisition for every patient. In order to make full use of this information, the standard deviation along the time axis may be computed at every voxel of the 3D image. Standard deviation magnitude is associated with the amount of blood flow variation of that pixel over the course of one heartbeat. This standard deviation image is then normalized according the previously described normalization pipeline: resizing, clipping, scaling, histogram equalization, centering. Note that several other approaches can be considered to encode the temporal information of the flow data. For instance, the Fourier transform of the 4D signal may be computed along the last dimension, and various frequency bins may be used to encode the signal. More generally, the whole time series may be input to the network, at the expense of requiring additional computation and memory power.
  • Network Extension
  • The input to the neural network may also be extended with an additional channel. More precisely, a four dimensional (4D) tensor may be defined where the last dimension encodes as separate channels the anatomical pixel intensity and the flow magnitude or components of velocity. The network described above may be extended to accept such tensor as input. This requires the extension of the first layer to accept a 4D tensor. The subsequent steps of network training, inference, and user interactions remain similar to what has been previously described.
  • In at least some implementations, the automatic location of cardiac landmarks may be achieved by directly predicting the coordinates (x, y, z) of the different landmarks. For that, a different network architecture may be used. This alternative network may be composed of a contracting path, followed with several fully connected layers, with a length-three vector of (x, y, z) coordinates as the output for each landmark. This is a regression, rather than a segmentation network. Note that, in the regression network, unlike in the segmentation network, there is no expanding path in the network. Other architectures may also be used with the same output format. In at least some implementations, time may also be included in the output as a fourth dimension if 4D data (x, y, z, time) is given as input.
  • Assuming time is not incorporated, the output of the network is eighteen scalars corresponding to three coordinates for each of the six landmarks in the input image. Such an architecture may be trained in a similar fashion to the previously described landmark detector. The only update needed is the re-formulation of the loss to account for the change in the network output format (a spatial point in this implementation, as opposed to the probability map used in the first implementation). One reasonable loss function may be the L2 (squared) distance between the output of the network and the real landmark coordinate, but other loss functions may be used as well, as long as the loss functions are related to the quantity of interest, namely the distance error.
  • The first landmark detection implementation discussed above may also be extended using a second neural network which acts as a discriminator network. The discriminator network may be trained in order to distinguish good and realistic landmark locations, from bad and unrealistic ones. In this formulation, the initial network of the implementation may be used to generate several landmark proposals for each type of landmark, such as by using all local maxima of the predicted landmark probability distribution. The discriminator network may then evaluate each proposal, for example, by using a classification architecture on a high-resolution patch, centered around the landmark proposal. The proposal with the highest probability of being the true landmark may then be taken as the output. This implementation may possibly help choose the correct landmark location in ambiguous situations, for example, in the presence of noise or artifacts.
  • Another approach for the detection of cardiac landmarks is the use of reinforcement learning. In this different framework, an agent walking along the 3D image is considered. The agent is at first placed in the center of the image. The agent then follows a policy until the agent reaches the landmark position. The policy represents the decision making process of the agent at each step: take a step left, right, up, down, above, or under. This policy may be learned using a deep neural network approximating the Bellman equation for the state-action function Q using the Q-learning algorithm. One Q function can then be learned for each of the landmarks to be detected.
  • In at least some implementations, a neural network may be used to directly predict parameters defining the locations and orientations of the planes for standard views. For instance, a network may be trained to calculate the 3D rotation angle, translation and rescaling needed to move a starting pixel grid to a long axis view. Separate models may be trained to predict different transformations or a single model may be used to output several views.
  • Interface for Defining Valve Planes for Manual LV/RV Volumes
  • In order to have a more accurate segmentation of the left and right ventricles, it may be advantageous to identify the position and orientation of the valves of the heart. In at least some implementations, within the aforementioned ventricle segmentation interface, a user is able to mark points that lie on the valve plane using the available long axis views. The valve plane is determined from these input points by performing a regression to find the plane that best fits. The normal for the plane is set to point away from the apex of the ventricle. Once the plane has been defined, any portion of the volume that lies on the positive side is subtracted from the total volume for the ventricle. This ensures that nothing outside the valve is included in determining the volume of the ventricle.
  • Example Processor-Based Device
  • FIG. 35 shows an environment 3500 that includes a processor-based device 3504 suitable for implementing various functionality described herein. Although not required, some portion of the implementations will be described in the general context of processor-executable instructions or logic, such as program application modules, objects, or macros being executed by one or more processors. Those skilled in the relevant art will appreciate that the described implementations, as well as other implementations, can be practiced with various processor-based system configurations, including handheld devices, such as smartphones and tablet computers, wearable devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, personal computers (“PCs”), network PCs, minicomputers, mainframe computers, and the like.
  • The processor-based device 3504 may include one or more processors 3506, a system memory 3508 and a system bus 3510 that couples various system components including the system memory 3508 to the processor(s) 3506. The processor-based device 3504 will at times be referred to in the singular herein, but this is not intended to limit the implementations to a single system, since in certain implementations, there will be more than one system or other networked computing device involved. Non-limiting examples of commercially available systems include, but are not limited to, ARM processors from a variety of manufactures, Core microprocessors from Intel Corporation, U.S.A., PowerPC microprocessor from IBM, Sparc microprocessors from Sun Microsystems, Inc., PA-RISC series microprocessors from Hewlett-Packard Company, 68xxx series microprocessors from Motorola Corporation.
  • The processor(s) 3506 may be any logic processing unit, such as one or more central processing units (CPUs), microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), etc. Unless described otherwise, the construction and operation of the various blocks shown in FIG. 35 are of conventional design. As a result, such blocks need not be described in further detail herein, as they will be understood by those skilled in the relevant art.
  • The system bus 3510 can employ any known bus structures or architectures, including a memory bus with memory controller, a peripheral bus, and a local bus. The system memory 3508 includes read-only memory (“ROM”) 1012 and random access memory (“RAM”) 3515. A basic input/output system (“BIOS”) 3516, which can form part of the ROM 3512, contains basic routines that help transfer information between elements within processor-based device 3504, such as during start-up. Some implementations may employ separate buses for data, instructions and power.
  • The processor-based device 3504 may also include one or more solid state memories, for instance flash memory or a solid state drive (SSD), which provides nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the processor-based device 3504. Although not depicted, the processor-based device 3504 can employ other nontransitory computer- or processor-readable media, for example a hard disk drive, an optical disk drive, or memory card media drive.
  • Program modules can be stored in the system memory 3508, such as an operating system 3530, one or more application programs 3532, other programs or modules 3534, drivers 3536 and program data 3538.
  • The application programs 3532 may, for example, include panning/scrolling 3532 a. Such panning/scrolling logic may include, but is not limited to logic that determines when and/or where a pointer (e.g., finger, stylus, cursor) enters a user interface element that includes a region having a central portion and at least one margin. Such panning/scrolling logic may include, but is not limited to logic that determines a direction and a rate at which at least one element of the user interface element should appear to move, and causes updating of a display to cause the at least one element to appear to move in the determined direction at the determined rate. The panning/scrolling logic 3532 a may, for example, be stored as one or more executable instructions. The panning/scrolling logic 3532 a may include processor and/or machine executable logic or instructions to generate user interface objects using data that characterizes movement of a pointer, for example data from a touch-sensitive display or from a computer mouse or trackball, or other user interface device.
  • The system memory 3508 may also include communications programs 3540, for example a server and/or a Web client or browser for permitting the processor-based device 3504 to access and exchange data with other systems such as user computing systems, Web sites on the Internet, corporate intranets, or other networks as described below. The communications programs 3540 in the depicted implementation is markup language based, such as Hypertext Markup Language (HTML), Extensible Markup Language (XML) or Wireless Markup Language (WML), and operates with markup languages that use syntactically delimited characters added to the data of a document to represent the structure of the document. A number of servers and/or Web clients or browsers are commercially available such as those from Mozilla Corporation of California and Microsoft of Washington.
  • While shown in FIG. 35 as being stored in the system memory 3508, the operating system 3530, application programs 3532, other programs/modules 3534, drivers 3536, program data 3538 and server and/or communications programs 3540 (e.g., browser) can be stored on any other of a large variety of nontransitory processor-readable media (e.g., hard disk drive, optical disk drive, SSD and/or flash memory).
  • A user can enter commands and information via a pointer, for example through input devices such as a touch screen 3548 via a finger 3544 a, stylus 3544 b, or via a computer mouse or trackball 3544 c which controls a cursor. Other input devices can include a microphone, joystick, game pad, tablet, scanner, biometric scanning device, etc. These and other input devices (i.e., “I/O devices”) are connected to the processor(s) 3506 through an interface 3546 such as touch-screen controller and/or a universal serial bus (“USB”) interface that couples user input to the system bus 3510, although other interfaces such as a parallel port, a game port or a wireless interface or a serial port may be used. The touch screen 3548 can be coupled to the system bus 3510 via a video interface 3550, such as a video adapter to receive image data or image information for display via the touch screen 3548. Although not shown, the processor-based device 3504 can include other output devices, such as speakers, vibrator, haptic actuator, etc.
  • The processor-based device 3504 may operate in a networked environment using one or more of the logical connections to communicate with one or more remote computers, servers and/or devices via one or more communications channels, for example, one or more networks 3514 a, 3514 b. These logical connections may facilitate any known method of permitting computers to communicate, such as through one or more LANs and/or WANs, such as the Internet, and/or cellular communications networks. Such networking environments are well known in wired and wireless enterprise-wide computer networks, intranets, extranets, the Internet, and other types of communication networks including telecommunications networks, cellular networks, paging networks, and other mobile networks.
  • When used in a networking environment, the processor-based device 3504 may include one or more wired or wireless communications interfaces 3552 a, 3552 b (e.g., cellular radios, WI-FI radios, Bluetooth radios) for establishing communications over the network, for instance the Internet 3514 a or cellular network 3514 b.
  • In a networked environment, program modules, application programs, or data, or portions thereof, can be stored in a server computing system (not shown). Those skilled in the relevant art will recognize that the network connections shown in FIG. 35 are only some examples of ways of establishing communications between computers, and other connections may be used, including wirelessly.
  • For convenience, the processor(s) 3506, system memory 3508, network and communications interfaces 3552 a, 3554 b are illustrated as communicably coupled to each other via the system bus 3510, thereby providing connectivity between the above-described components. In alternative implementations of the processor-based device 3504, the above-described components may be communicably coupled in a different manner than illustrated in FIG. 35. For example, one or more of the above-described components may be directly coupled to other components, or may be coupled to each other, via intermediary components (not shown). In some implementations, system bus 3510 is omitted and the components are coupled directly to each other using suitable connections.
  • FastVentricle
  • Cardiac Magnetic Resonance (CMR) imaging is commonly used to assess cardiac structure and function. One disadvantage of CMR is that postprocessing of exams is tedious. Without automation, precise assessment of cardiac function via CMR typically requires an annotator to spend tens of minutes per case manually contouring ventricular structures. Automatic contouring can lower the required time per patient by generating contour suggestions that can be lightly modified by the annotator. Fully convolutional networks (FCNs), a variant of convolutional neural networks, have been used to rapidly advance the start state of the art in automated segmentation, which makes FCNs a natural choice for ventricular segmentation. However, FCNs are limited by their computational cost, which increases the monetary cost and degrades the user experience of production systems. To combat this shortcoming, we have developed the FastVentricle architecture, a FCN architecture for ventricular segmentation based on the recently developed ENet architecture. FastVentricle is 4× faster and runs with 6× less memory than the previous state-of-the-art ventricular segmentation architecture while still maintaining excellent clinical accuracy.
  • FastVentricle—Introduction
  • Patients with known or suspected cardiovascular disease often receive a cardiac MRI to evaluate cardiac function. These scans are annotated with ventricular contours in order to calculate cardiac volumes at end systole (ES) and end diastole (ED). From the cardiac volumes, relevant diagnostic quantities such as ejection fraction and myocardial mass can be calculated. Manual contouring can take upwards of 30 minutes per case, so radiologists often use automation tools to help speed up the process.
  • FIG. 36 shows a schematic representation of a fully convolutional encoder-decoder architecture with skip connections that utilizes a smaller expanding path than contracting path.
  • Active contour models are a heuristic-based approach to segmentation that have been utilized previously for segmentation of the ventricles. See Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision (1988) 321-331; Zhu, W., et al.: A geodesic-active-contour-based variational model for short-axis cardiac MRI segmentation. Int. Journal of Computer Math. 90(1) (2013). However, active contour-based methods not only perform poorly on images with low contrast, they are also sensitive to initialization and hyperparameter values. Deep learning methods for segmentation have recently defined state-of-the-art with the use of fully convolutional networks (FCNs). See Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE CVPR. (2015) 3431-3440. The general idea behind FCNs is to use a downsampling path to learn relevant features at a variety of spatial scales followed by an upsampling path to combine the features for pixelwise prediction (See FIG. 36). DeconvNet pioneered the use of a symmetric contracting-expanding architecture for more detailed segmentations, at the cost of longer training and inference time, and the need for larger computational resources. See Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE ICCV. (2015) 1520-1528. U-Net, originally developed for use in the biomedical community where there are often fewer training images and even finer resolution is required, added the use of skip connections between the contracting and expanding paths to preserve details. See Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2015) 234-241.
  • One disadvantage of fully symmetric architectures in which there is a one-to-one correspondence between downsampling and upsampling layers is that they can be slow, especially for large input images. ENet, an alternative FCN design, is an asymmetrical architecture that is optimized for speed. Paszke, A., Chaurasia, A., et al.: ENet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016). ENet utilizes early downsampling to reduce the input size using only a few feature maps. This improves speed, given that much of the network's computational load takes place when the image is at full resolution, and has minimal effect on accuracy since much of the visual information at this stage is redundant. Furthermore, the ENet authors show that the primary purpose of the expanding path in FCNs is to upsample and fine-tune the details learned by the contracting path rather than to learn complicated upsampling features; hence, ENet utilizes an expanding path that is smaller than its contracting path. ENet also makes use of bottleneck modules, which are convolutions with a small receptive field that are applied in order to project the feature maps into a lower dimensional space in which larger kernels can be applied. See He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE CVPR. (2016) 770-778. Bottlenecks also contain residual connections in the spirit of the He, K. paper referenced immediately above. ENet also uses a path parallel to the bottleneck path that solely includes zero or more pooling layers to directly pass information from a higher resolution layer to the lower resolution layers. Finally, throughout the network, ENet leverages a diversity of low cost convolution operations. In addition to the more-expensive n×n convolutions, ENet also uses cheaper asymmetric (1×n and n×1) convolutions and dilated convolutions. See Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
  • Deep learning has been successfully applied to ventricle segmentation. See Avendi, M., et al.: A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. MedIA 30 (2016); Tran, P. V.: A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv preprint arXiv:1604.00494 (2016). Here, we present FastVentricle, an ENet variation with UNet style skip connections for segmentation of the LV endocardium, LV epicardium, and RV endocardium. More specifically, we add the possibility to use skip connections from the contracting path to the expanding path where the image size is similar. In details, we add skip connections between the output of the initial block and the input of the Section 5, and between the output of Section 1 and the input of Section 4 (refer to Paszke, referenced above, for nomenclature of the Sections). In the present disclosure, we compare FastVentricle to a previous UNet variant, DeepVentricle. Lau, H. K., et al.: DeepVentricle: Automated cardiac MRI ventricle segmentation using deep learning. Conference on Machine Intelligence in Medical Imaging (2016). We show that inference with FastVentricle requires significantly less time and memory than inference with DeepVentricle and that FastVentricle achieves segmentation accuracy that is indistinguishable from that of DeepVentricle.
  • FastVentricle—Methods
  • Training Data.
  • We use a database of 1143 short-axis cine Steady State Free Precession (SSFP) scans annotated as part of standard clinical care at a partner institution to train and validate our model. We split the data chronologically with 80% for training, 10% for validation, and 10% as a hold-out set. All experiments discussed in this section of the present disclosure use the validation set. Annotated contour types include LV endocardium, LV epicardium and RV endocardium. Scans are annotated at ED and ES. Contours were annotated with different frequencies; 96% (1097) of scans have LV endocardium contours, 22% (247) have LV epicardium contours and 85% (966) have RV endocardium contours.
  • Training.
  • In at least some implementations, we use the Keras deep learning package with TensorFlow as the backend to implement and train all of our models, although other deep learning packages would also suffice. See Chollet, F.: Keras. https://github.com/fchollet/keras (2015); Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016). We modify the standard per-pixel cross-entropy loss to account for missing ground truth annotations in our dataset. We discard the component of the loss that is calculated on images for which ground truth is missing; we only backpropagate the component of the loss for which ground truth is known. This allows us to train on our full training dataset, including series with missing contours. In at least some implementations, weights are updated per the Adam rule. See Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). In at least some implementations, we monitor the per pixel accuracy to determine when the model has converged. To compare different models, we use relative absolute volume error (RAVE) as volume accuracy is paramount for precise derived diagnostic quantities. RAVE is defined as |Vpred−Vtruth|/Vtruth, where Vtruth is the ground truth volume, and Vpred is the volume computed from the gathered 2D predicted segmentation mask. Using a relative metric ensures that equal weight is given to pediatric and adult hearts. Volumes may be calculated from segmentation masks using a frustum approximation.
  • Data Preprocessing.
  • In at least some implementations, we normalize all MRIs such that the 1st and 99th percentile of a batch of images fall at −0.5 and 0.5, i.e. their “usable range” falls between −0.5 and 0.5. Other normalization schemes, such as adaptive histogram equalization, are also possible. We crop and resize the images such that the ventricle contours take up a larger percentage of the image; the actual crop and resize factors are hyperparameters. Cropping the image increases the fraction of the image that is taken up by the foreground (ventricle) class, making it easier to resolve fine details and helping the model converge.
  • FIG. 37 shows box plots comparing the relative absolute volume error (RAVE) between FastVentricle and DeepVentricle for each of LV Endo, LV Epi, and RV Endo at ED (left panels) and ES (right panels). The line at the center of the box denotes the median RAVE, the ends of the box show 25% (Q1) and 75% (Q3) of the distribution. Whiskers are defined as per Matplotlib defaults.
  • Hyperparameter Search.
  • We fine tune the ENet and UNet network architectures using random hyperparameter search. See Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(February) (2012) 281-305. In practice, for each of the UNet and ENet architectures, we: i) run models with random sets of hyperparameters for a fixed number of epochs, ii) select from the resulting corpus of models the N models with the highest validation set accuracy (where N is a small integer that we agree on beforehand), iii) select the final model from the N candidates based on lowest average RAVE. In at least some implementations, the hyperparameters of the UNet architecture include the use of batch normalization, the dropout probability, the number of convolution layers, the number of initial filters, and the number of pooling layers. In at least some implementations, the hyperparameters of the ENet architecture include the kernel size for asymmetric convolutions, the number of times Section 2 of the network is repeated, the number of initial bottleneck modules, the number of initial filters, the projection ratio, the dropout probability and whether or not to use skip connections (See Paszke, referenced above, for details on these parameters). For both architectures, in at least some implementations, the hyperparameters also include the batch size, learning rate, crop fraction and image size.
  • FastVentricle—Results
  • Note that these results describe a single embodiment of FastVentricle and different results may be achieved with different design parameters.
  • Volume Error Analysis.
  • FIG. 37 shows box plots of the RAVE comparing DeepVentricle and FastVentricle for each combination of ventricular structure (LV Endo, LV Epi, RV Endo) and phase (ES, ED), where the sample size is specified in Table 2 below. We find that the performance of the models are very similar across structures and phases. Indeed, the median RAVE is: i) 4.5% for DeepVentricle and 5.5% for FastVentricle for LV endo, i) 5.6% for DeepVentricle and 4.2% for FastVentricle for LV epi, i) 7.6% for DeepVentricle and 9.0% for FastVentricle for RV endo. ES is the more difficult phase for both models as the regions to be segmented are smaller and RV Endo is the most difficult of the structures due to its more complex shape for both models. Although trained on only ES and ED annotations, we are able to perform visually pleasing inference on all time points. FIG. 39 presents examples of network predictions on different slices and time points for studies with low RAVE for both Deep Ventricle and FastVentricle. In particular, FIG. 39 shows Deep Ventricle and FastVentricle predictions for a healthy patient with low RAVE (top) and on a patient with hypertrophic cardiomyopathy (bottom). RV endo is outlined in red, LV endo in green, and LV epi in blue. The X axis of the grid corresponds to time indices sampled throughout the cardiac cycle and the Y axis corresponds to slice indices sampled from apex (low slice index) to base (high slice index). Model performance at the apex and center of the ventricle is better than at the base, as it is often ambiguous from just the basal slice where the valve plane (separating ventricle from atrium) is located. Additionally, segmentations at ED tend to be better than at ES, as the chambers at ES are smaller and dark-colored papillary muscles tend to blend with the myocardium when the heart is contracted.
  • We finally note that, using ENet, the best 5 models in the hyperparameter search in terms of validation set accuracy are using skip connections, demonstrating the value of the skip connection for this problem.
  • TABLE 1
    Accuracy, model speed, and computational complexity for DeepVentricle
    and FastVentricle. Inference time per sample and GPU memory required
    for inference calculated with a batch size of 16.
    DeepVentricle FastVentricle
    Average RAVE 0.089 0.093
    Inference GPU time per sample (msec) 31 7
    Initialization GPU time (sec) 1.3 13.3
    Number of parameters 19,249,059 755,529
    GPU memory required for inference (MB) 1,800 270
    Size of the weight file (MB) 220 10
  • Statistical Analysis.
  • We measure the statistical significance of the difference between DeepVentricle and FastVentricle's RAVE distributions for each combination of phase and anatomy for which we have ground truth annotations. We use the Wilcoxon-Mann-Whitney test, using the SciPy 0.17.0 implementation with default parameters to assess the null hypothesis that the distribution of DeepVentricle and FastVentricle's RAVE are equal. Table 2 displays the results. We find that there is no statistical evidence to claim one model as the best, since the lowest measured p-value is 0.1.
  • Computational Complexity and Inference Speed.
  • To be clinically and commercially viable, any automated algorithm should be faster than manual annotations, and lightweight enough to deploy easily. As seen in Table 1 above, we find that this embodiment of FastVentricle is roughly 4× faster than DeepVentricle and uses 6× less memory for inference. Because the model contains more layers, FastVentricle takes longer to initialize before being ready to perform inference. However, in a production setting, the model only needs to be initialized once when provisioning the server, so this additional cost is incidental.
  • Internal Representation.
  • Neural networks are infamous for being black boxes, i.e., it is very difficult to “look inside” and understand why a certain prediction is being made. This is especially troublesome in the medical setting, as doctors prefer to use tools that they can understand. We follow the results of Mordvintsev, A., et al.: Deep Dream. https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html (2015) Accessed: 2017-01-17, to visualize the features that DeepVentricle is “looking for” when performing inference. Beginning with random noise as a model “input” and a real segmentation mask as the target, we perform backpropagation to update the pixel values in the input image such that the loss is minimized. FIG. 38 shows the result of such an optimization for DeepVentricle and FastVentricle. We find that, as a doctor would, the model is confident in its predictions when the endocardium is light and the contrast with the epicardium is high. The model seems to have learned to ignore the anatomy surrounding the heart. We also note that the optimized input for DeepVentricle is less noisy than that for FastVentricle, probably because the former model is larger and utilizes skip connections at the full resolution of the input image. DeepVentricle also seems to “imagine” structures which look like papillary muscles inside the ventricles.
  • TABLE 2
    U statistics and p-values from the Wilcoxon-Mann-Whitney test, along with
    corresponding sample sizes, for our comparison of DeepVentricle and FastVentricle
    for every combination of phase and ventricular anatomy on our validation
    set. With the available data we do not see a statistically significant
    difference between DeepVentricle and FastVentricle.
    LV Endo LV Endo LV Epi LV Epi RV Endo RV Endo
    (ED) (ES) (ED) (ES) (ED) (ES)
    U statistic 6819 6819 181 165 6420 5952
    p-value 0.38 0.10 0.62 0.67 0.59 0.46
    Sample size 113 110 20 19 111 106
  • FIG. 38 shows a random input (left) that is optimized using gradient descent for DeepVentricle and FastVentricle (middle) to fit the label map (right, RV endo in red, LV endo in cyan, LV epi in blue). The generated image has many qualities that the network is “looking for” when making its predictions, such as high contrast between endocardium and epicardium and the presence of papillary muscles.
  • FastVentricle—Discussion
  • Performance.
  • Though accuracy may be the most important property of a model when making clinical decisions, speed of algorithm execution is also critical for maintaining positive user experience and minimizing infrastructure costs. Within the bounds of our experiments, we find no statistically significant difference between the accuracy of DeepVentricle and that of the 4× faster FastVentricle. This suggests that FastVentricle can replace DeepVentricle in a clinical setting with no detrimental effects.
  • FastVentricle—Conclusion
  • We show that a new ENet-based FCN with skip connections, FastVentricle, can be used for quick and efficient segmentation of cardiac anatomy. Trained on a sparsely annotated database, our algorithm provides LV endo, LV epi, and RV endo contours to clinicians for the purpose of calculating important diagnostic quantities such as ejection fraction and myocardial mass. FastVentricle is 4× faster and runs with 6× less memory than the previous state-of-the-art.
  • Papillary and Trabeculae Muscle Delineation
  • Two major structures are of primary interest when assessing the left ventricle via cardiac magnetic resonance: the myocardial muscle and the blood pool (i.e., blood within the ventricles of the heart) of the ventricle. Between these two structures are the papillary and trabeculae muscles, which are small muscles within the ventricle of the heart that abut both the myocardial muscle and the blood pool. When assessing the volume of the ventricular blood pool, different institutions have different policies about whether the papillary and trabeculae muscles should be included in the volume of the blood pool or not. Technically, to assess the blood volume, the papillary and trabeculae muscles should be excluded from the contour that defines the blood pool. However, because of the relatively regular shape of the inner boundary of the myocardial muscle and the relatively irregular shape of the papillary and trabeculae muscles, for convenience, the contour that defines the blood pool is often assumed to be the inner boundary of the myocardial muscle. In that case, the volumes of the papillary and trabeculae muscles are included in the blood pool volume, leading to a small overestimate of the blood pool volume.
  • FIG. 40 is an image 4000 that shows the relevant parts of the cardiac anatomy, including the myocardial muscle 4002 surrounding the left ventricle. The blood pool 4004 of the left ventricle is also shown. The contour 4006 of the epicardium (i.e., the outer surface of the heart), referred to herein as the epicardium contour 4006, defines the outer boundary of the left ventricle's myocardial muscle. The contour 4008 of the endocardium (i.e., the surface separating the left ventricle's blood pool from the myocardial muscle), referred to herein as the endocardium contour, defines the inner boundary of the left ventricle myocardial muscle. Note that in FIG. 40, the endocardium contour 4008 includes papillary and trabeculae muscles 4010 in the interior. It would also be valid to exclude the papillary and trabeculae muscles 4010 from the interior of the endocardium contour 4008.
  • FIG. 41 is an image 4100 that shows the case where the papillary and trabeculae muscles 4110 are included in the interior of the endocardium contour 4108. The myocardial muscle 4102 surrounding the left ventricle is also shown. Under the assumption that the endocardium contour 4108 constitutes the boundary of the blood pool 4104, the measured volume of the blood pool will be a slight overestimate, since the volume also includes the papillary and trabeculae muscles 4110.
  • FIG. 42 is an image 4200 that shows an alternate case where the papillary and trabeculae muscles 4210 are excluded from the endocardium contour 4208. The myocardial muscle 4202 surrounding the left ventricle and the blood pool 4204 of the left ventricle are also shown. In this case, the estimate of the volume of the blood pool will be more accurate, but the contour 4208 is significantly more tortuous and, if drawn manually, more cumbersome to delineate.
  • An automated system that delineates the boundary of the endocardium while excluding the papillary and trabeculae muscles from the interior of the contour is extremely helpful, as such a system allows for accurate measures of the volume of the blood pool with minimal effort from annotators. However, such a system needs to rely on sophisticated image processing techniques to ensure that the delineation of the contours is insensitive to variations in human anatomy and magnetic resonance (MR) acquisition parameters.
  • Identification and Localization of Myocardial Properties
  • There are many types of studies that can be performed with cardiac magnetic resonance, each of which may assess different aspects of the cardiac anatomy or function. Steady state free precession (SSFP) imaging without contrast is used to visualize the anatomy for quantifying cardiac function. Perfusion imaging using gadolinium-based contrast is used to identify biomarkers of coronary stenosis. Late gadolinium enhancement imaging, also using gadolinium-based contrast, is used to assess myocardial infarction. In all of these imaging protocols, and in others, the anatomical orientations and the need for contouring tend to be similar. Images are typically acquired both in short axis orientations, in which the imaging plane is parallel to the short axis of the left ventricle, and in long axis orientations, in which the imaging plane is parallel to the long axis of the left ventricle. Contours delineating the myocardial muscle and the blood pool are used in all three imaging protocols to assess different components of the cardiac function and anatomy.
  • Although there are differences in the imaging protocols of each of these types of imaging, the relative contrast between the blood pool and the myocardium is mostly consistent (with the myocardium being darker than the blood pool); therefore, a single convolutional neural network (CNN) model may be used to delineate the myocardium and blood pool in all three imaging protocols. The use of a single CNN that operates equivalently on data from all of these imaging protocols instead of using separate CNNs for each imaging protocol simplifies the management and deployment and execution of CNN models in practice. However, annotated image data with ground truth contours is required for validation of all imaging protocols for which the CNN is expected to function.
  • Papillary and Trabeculae Muscle Delineation
  • FIG. 43 shows one implementation of a process 4300 for automatically delineating papillary and trabeculae muscles from the ventricular blood pool. Initially, both cardiac MRI image data 4302 and initial contours 4304 delineating the inner and outer boundary of the myocardium are available. The papillary and trabeculae muscles are on the interior of the initial left ventricle endocardium contour; i.e., they are included within the blood pool and excluded from the myocardium. From the contours, masks defining the myocardium and the blood pool (including the papillary and trabeculae muscles) are calculated at 4306. In at least some implementations, masks defining the myocardium and blood pool are available at the beginning of the process 4300 and do not need to be calculated from the initial contours 4304.
  • An intensity threshold that will be used to delineate the blood pool from the papillary and trabeculae muscles is then calculated at 4308. At least one implementation of the intensity threshold calculation is described below with reference to a method 4400 shown in FIG. 44.
  • The intensity threshold is then applied to the pixels within the blood pool mask at 4310. Those pixels include the blood pool and the papillary and trabeculae muscles. After thresholding, pixels of high signal intensity are assigned to the blood pool class and pixels of low signal intensity are assigned to the papillary and trabeculae muscle class.
  • At 4312, in at least some implementations, a connected component analysis is used to determine the largest connected component of pixels of the blood pool class. Pixels that are part of the blood pool class (due to their high signal intensity) but are not part of the largest connected component of blood pool pixels are then assumed to be holes in the papillary and trabeculae muscles and are converted to the papillary and trabeculae muscles class.
  • In at least some implementations, the resulting boundaries separating the papillary and trabeculae muscles from the blood pool and myocardium are then calculated at 4314 and stored or displayed to the user. In at least some implementations, the pixels that are determined to be part of the blood pool are summed to determine the net volume of the ventricular blood pool. That volume may then be stored, displayed to the user, or used in a subsequent calculation, such as cardiac ejection fraction.
  • FIG. 44 shows one example implementation of a process 4400 for calculating image intensity threshold. It should be appreciated that other methods may be used to calculate image intensity threshold. Initially, both cardiac MRI image data 4402 and masks 4404 representing the myocardial muscle and the blood pool are available. The blood pool mask includes both the blood pool and the papillary and trabeculae muscles. The masks may have been derived from contours delineating the myocardial muscle, or they may have been derived via some other method. The papillary and trabeculae muscles are contained within the blood pool mask (see FIG. 41).
  • Pixel intensity distributions of the myocardium and blood pool are calculated at 4406. For each of those two distributions, a kernel density estimate of the pixel intensities may be calculated at 4408. If the data is approximately normally distributed, Silverman's rule of thumb can be used to determine the kernel bandwidth in the density estimate. See, e.g., Silverman, Bernard W. Density estimation for statistics and data analysis. Vol. 26. CRC press, 1986. Other bandwidths may alternatively be used based on the distribution of the data.
  • The pixel intensity at which the density estimates overlap, i.e., where the probability that a given pixel intensity was drawn from the myocardium pixel intensity distribution is equal to the probability that the pixel was drawn from the blood pool distribution, is then computed at 4410. This pixel intensity may be chosen as the intensity threshold that separates the blood pool pixels from the papillary and trabeculae muscle pixels.
  • FIG. 45 is a graph 4500 that shows the distribution overlap 4410 of pixel intensity distribution between the blood pool and the myocardium. Shown are example distributions of the pixel intensities in the myocardium 4502 and the pixel intensities in the blood pool 4504. The y-axis represents the probability distribution function and the x-axis, in arbitrary units, represents pixel intensity. In the illustrated implementation, the threshold used to separate the blood pool from the papillary and trabeculae muscles is the location 4506 of overlap between the two distributions 4502 and 4504.
  • Identification and Localization of Myocardial Properties
  • FIG. 46 shows one implementation for a process 4600 that uses a pre-trained CNN model to identify and localize myocardial properties. Initially, cardiac imaging data 4602 and a pre-trained CNN model 4604 are available. In at least one implementation of the process 4600, cardiac imaging data is a short-axis magnetic resonance (MR) acquisition, but other imaging planes (e.g., the long axis) and other imaging modalities (e.g., computed tomography or ultrasound) would work similarly. In at least some implementations, the trained CNN model 4604 has been trained on data that is of the same type as the cardiac image data 4602 (e.g., the same imaging modality, same contrast injection protocol, and, if applicable, same MR pulse sequence). In other implementations, the trained CNN model 4604 has been trained on data of a different type than the cardiac image data 4602. In some implementations, the data on which the CNN model 4604 has been trained is data from functional cardiac magnetic resonance imaging (e.g., via a contrast-free SSFP imaging sequence) and the cardiac image data is data from a cardiac perfusion or myocardial delayed enhancement study.
  • In at least some implementations, the CNN model 4604 will have been trained on data of a different type than the cardiac image data 4602 and then fine tuned (i.e., by re-training some or all of the layers while potentially holding some weights fixed) on data of the same type as the cardiac image data 4602.
  • The trained CNN model 4604 is used to infer inner and outer myocardial contours at 4606. In at least some implementations, the CNN model 4604 first generates one or more probability maps, which are then converted to contours. In at least some implementations, the contours are postprocessed at 4608 to minimize the probability that tissue that is not part of the myocardium is included within the region delineated as the myocardium. This postprocessing may take on many forms. For example, postprocessing may include applying morphological operations, such as morphological erosion, to the region of the heart identified as myocardium to reduce its area. Postprocessing may additionally or alternatively include modifying the threshold that is applied to the probability map output of the trained CNN model such that the region identified as myocardium is limited to CNN output for which the probability map indicates high confidence that the region is myocardium. Postprocessing may additionally or alternatively include shifting vertices of contours that delineate the myocardium towards or away from the center of the ventricle of the heart to reduce the identified area of myocardium, or any combination of the above processes.
  • In at least some implementations, the postprocessing is applied to masks delineating the cardiac regions as opposed to contours.
  • In at least some implementations, the ventricular insertion points at which the right ventricular wall attaches to the left ventricle are determined at 4610. In at least some implementations, the insertion points are designated manually by users of the software. In other implementations, the insertion points are calculated automatically.
  • FIG. 47 is an image 4700 that shows the ventricular insertion points. The left ventricle epicardium contour 4702 and the right ventricle contour 4704 are shown. The right ventricle contour may be the right ventricle endocardium contour or the right ventricle epicardium contour. The inferior insertion point 4706 and the anterior insertion point 4708 are indicated. In at least one implementation, the automated system for identifying the insertion points (e.g., act 4610 of FIG. 46) analyzes the distance of the right ventricle contour 4704 from the left ventricle epicardium contour 4702. The insertion point locations 4706 and 4708 are defined as the locations where the distances between the two contours diverge.
  • In at least one other implementation, the location of the insertion points 4706 and 4708 are defined as the points of intersection between the either the left ventricle epicardium contour 4702 or one of the right ventricle contours 4704 with the planes that define the left heart 2-chamber view (left ventricle and left atrium) and the left heart 3-chamber view (left ventricle, left atrium and aorta).
  • Once the myocardium has been delineated, myocardial regions are localized and quantified at 4612. Any potential pathologies (such as infarction) or characteristics (such as perfusion characteristics) of the myocardium may be quantified. Note that the act 4612 of FIG. 46 may be performed at any phase of this process and in at least some implementations may precede the determination of one or more of contours (e.g., act 4606) and insertion points (e.g., act 4610). In at least one implementation of the system, regions of interest (such as regions of relative enhancement) are manually detected and delineated by a user. In other implementations of the system, regions of interest are detected, delineated or both by an automated system, such as a CNN or other image processing technique, such as, but not limited to, any of the image processing techniques discussed in Karim, Rashed, et al. “Evaluation of current algorithms for segmentation of scar tissue from late gadolinium enhancement cardiovascular magnetic resonance of the left atrium: an open-access grand challenge.” Journal of Cardiovascular Magnetic Resonance 15.1 (2013): 105. Quantification of defects, such as relative hyper- or hypo-intensity, or biologically inferred quantities, such as absolute myocardial perfusion, is then performed. See, e.g., [Christian 2004] Christian, Timothy F., et al. “Absolute myocardial perfusion in canines measured by using dual-bolus first-pass MR imaging.” Radiology 232.3 (2004): 677-684. In at least some implementations, instead of detecting specific defects, properties of the entire myocardial region (e.g., perfusion) are assessed.
  • Once defects or other characteristics of the myocardium are determined, in at least some implementations, the myocardial contours and insertion points are used to segment the myocardium into a standard format at 4614, such as a 17-segment model. See, e.g., Cerqueira, Manuel D., et al. “Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart.” Circulation 105.4 (2002): 539-542. In those implementations, the defects or myocardial properties are then localized using the standard format. In at least some implementations, the resulting characteristics are displayed on a display 4616 to the user.
  • The foregoing detailed description has set forth various implementations of the devices and/or processes via the use of block diagrams, schematics, and examples. Insofar as such block diagrams, schematics, and examples contain one or more functions and/or operations, it will be understood by those skilled in the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one implementation, the present subject matter may be implemented via Application Specific Integrated Circuits (ASICs). However, those skilled in the art will recognize that the implementations disclosed herein, in whole or in part, can be equivalently implemented in standard integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more controllers (e.g., microcontrollers) as one or more programs running on one or more processors (e.g., microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of ordinary skill in the art in light of this disclosure.
  • Those of skill in the art will recognize that many of the methods or algorithms set out herein may employ additional acts, may omit some acts, and/or may execute acts in a different order than specified.
  • In addition, those skilled in the art will appreciate that the mechanisms taught herein are capable of being distributed as a program product in a variety of forms, and that an illustrative implementation applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include, but are not limited to, the following: recordable type media such as floppy disks, hard disk drives, CD ROMs, digital tape, and computer memory.
  • The various implementations described above can be combined to provide further implementations. To the extent that they are not inconsistent with the specific teachings and definitions herein, all of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet, including but not limited to U.S. Provisional Patent Application No. 61/571,908 filed Jul. 7, 2011; U.S. patent application Ser. No. 14/118,964 filed Nov. 20, 2013; PCT Patent Application No. PCT/US2012/045575 filed Jul. 5, 2012; U.S. Provisional Patent Application No. 61/928,702 filed Jan. 17, 2014; U.S. patent application Ser. No. 15/112,130 filed Jul. 15, 2016; PCT Patent Application No. PCT/US2015/011851 filed Jan. 16, 2015; U.S. Provisional Patent Application No. 62/260,565 filed Nov. 29, 2015; U.S. Provisional Patent Application No. 62/415,203 filed Oct. 31, 2016; U.S. Provisional Patent Application No. 62/415,666 filed Nov. 1, 2016; PCT Patent Application No. PCT/US2016/064028 filed Nov. 29, 2016; and U.S. Provisional Patent Application No. 62/451,482 filed Jan. 27, 2017; are incorporated herein by reference, in their entirety. Aspects of the implementations can be modified, if necessary, to employ systems, circuits and concepts of the various patents, applications and publications to provide yet further implementations.
  • These and other changes can be made to the implementations in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific implementations disclosed in the specification and the claims, but should be construed to include all possible implementations along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.

Claims (21)

1-20. (canceled)
21. A computer-implemented machine learning method, comprising:
training a fully convolutional neural network (CNN) model to generate a trained CNN model for segmenting an anatomical structure based, at least in part, on a plurality of images, wherein each of a subset of the plurality of images includes at least one label which identifies a region of a particular part of the anatomical structure depicted in the image, the trained CNN model comprising an expanding path that includes a plurality of convolutional layers and a plurality of upsampling layers, wherein each upsampling layer is preceded by at least one convolutional layer and comprises a fixed upsampling operation without a learned kernel and a convolution operation with a learned kernel, and the convolution operation is preceded by the fixed upsampling operation and succeeded by a concatenation of feature maps; and
storing the trained CNN model in a nontransitory processor-readable storage medium.
22. The method of claim 21, wherein training the CNN model further comprises selecting the CNN model based, at least in part, on validation accuracy of the CNN model.
23. The method of claim 22, further comprising performing a random search over hyperparameters associated with a set of CNN models to determine a highest validation accuracy.
24. The method of claim 23, wherein the hyperparameters describe at least one of a model, training of the model, training data to use, or data augmentation to use during training.
25. The method of claim 21, wherein the concatenation of feature maps is based, at least in part, on a skip connection from another path of the trained CNN model.
26. The method of claim 25, wherein the another path is a contracting path that includes a plurality of convolutional layers and a plurality of pooling layers.
27. The method of claim 26, wherein the number of pooling layers in the contracting path equals the number of upsampling layers in the expanding path.
28. A computer-readable medium storing contents that, when executed by one or more processors, cause the one or more processors to perform actions comprising:
training a fully convolutional neural network (CNN) model to generate a trained CNN model for segmenting an anatomical structure based, at least in part, on a plurality of images, wherein each of a subset of the plurality of images includes at least one label which identifies at least a portion of the anatomical structure depicted in the image, the trained CNN model comprising an expanding path that includes a plurality of convolutional layers and a plurality of upsampling layers, wherein each upsampling layer is preceded by at least one convolutional layer and comprises a fixed upsampling operation without a learned kernel and a convolution operation with a learned kernel, and the convolution operation is preceded by the fixed upsampling operation and succeeded by a concatenation of feature maps; and
storing the trained CNN model.
29. The computer-readable medium of claim 28, wherein the trained CNN model further includes skip connections between layers in the expanding path and another path of the trained CNN model.
30. The computer-readable medium of claim 29, wherein the skip connections are residual connections that add or subtract values of feature maps.
31. The computer-readable medium of claim 29, wherein the concatenation of feature maps is based on at least one of the skip connections.
32. The computer-readable medium of claim 28, wherein training the CNN model further comprises selecting the CNN model based, at least in part, on validation accuracy of the CNN model.
33. The computer-readable medium of claim 32, wherein the actions further comprise performing a random search over hyperparameters associated with a set of CNN models to determine a highest validation accuracy.
34. The computer-readable medium of claim 33, wherein the hyperparameters describe at least one of a model, training of the model, training data to use, or data augmentation to use during training.
35. A system, comprising:
at least one processor; and
memory storing contents that, when executed by the at least one processor, cause the system to:
train a fully convolutional neural network (CNN) model to generate a trained CNN model for segmenting an anatomical structure based, at least in part, on a plurality of images, wherein each of a subset of the plurality of images includes at least one label which identifies at least a portion of the anatomical structure depicted in the image, the trained CNN model comprising an expanding path that includes a plurality of convolutional layers and a plurality of upsampling layers, wherein each upsampling layer is preceded by at least one convolutional layer and comprises a fixed upsampling operation without a learned kernel and a convolution operation with a learned kernel, and the convolution operation is preceded by the fixed upsampling operation and succeeded by a concatenation of feature maps; and
store the trained CNN model.
36. The system of claim 35, wherein to train the CNN model, the contents further cause the system to select the CNN model based, at least in part, on validation accuracy of the CNN model.
37. The system of claim 36, wherein the contents further cause the system to perform a random search over hyperparameters associated with a set of CNN models to determine a highest validation accuracy.
38. The system of claim 37, wherein the hyperparameters describe at least one of a model, training of the model, training data to use, or data augmentation to use during training.
39. The system of claim 35, wherein each upsampling layer halves the number of feature maps and doubles the spatial resolution.
40. The system of claim 39, wherein the trained CNN model further comprises a same number of pooling layers as the upsampling layers and wherein each pooling layer doubles the number of feature maps and halves the spatial resolution.
US16/800,922 2017-01-27 2020-02-25 Automated segmentation utilizing fully convolutional networks Abandoned US20200193603A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/800,922 US20200193603A1 (en) 2017-01-27 2020-02-25 Automated segmentation utilizing fully convolutional networks

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201762451482P 2017-01-27 2017-01-27
US15/879,732 US10600184B2 (en) 2017-01-27 2018-01-25 Automated segmentation utilizing fully convolutional networks
US16/800,922 US20200193603A1 (en) 2017-01-27 2020-02-25 Automated segmentation utilizing fully convolutional networks

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/879,732 Continuation US10600184B2 (en) 2017-01-27 2018-01-25 Automated segmentation utilizing fully convolutional networks

Publications (1)

Publication Number Publication Date
US20200193603A1 true US20200193603A1 (en) 2020-06-18

Family

ID=62977998

Family Applications (3)

Application Number Title Priority Date Filing Date
US15/879,742 Active 2038-08-22 US10902598B2 (en) 2017-01-27 2018-01-25 Automated segmentation utilizing fully convolutional networks
US15/879,732 Active 2038-05-01 US10600184B2 (en) 2017-01-27 2018-01-25 Automated segmentation utilizing fully convolutional networks
US16/800,922 Abandoned US20200193603A1 (en) 2017-01-27 2020-02-25 Automated segmentation utilizing fully convolutional networks

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US15/879,742 Active 2038-08-22 US10902598B2 (en) 2017-01-27 2018-01-25 Automated segmentation utilizing fully convolutional networks
US15/879,732 Active 2038-05-01 US10600184B2 (en) 2017-01-27 2018-01-25 Automated segmentation utilizing fully convolutional networks

Country Status (5)

Country Link
US (3) US10902598B2 (en)
EP (1) EP3573520A4 (en)
JP (1) JP2020510463A (en)
CN (1) CN110475505B (en)
WO (1) WO2018140596A2 (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10871536B2 (en) 2015-11-29 2020-12-22 Arterys Inc. Automated cardiac volume segmentation
US10902598B2 (en) 2017-01-27 2021-01-26 Arterys Inc. Automated segmentation utilizing fully convolutional networks
US10973486B2 (en) 2018-01-08 2021-04-13 Progenics Pharmaceuticals, Inc. Systems and methods for rapid neural network-based image segmentation and radiopharmaceutical uptake determination
CN112767413A (en) * 2021-01-06 2021-05-07 武汉大学 Remote sensing image depth semantic segmentation method integrating region communication and symbiotic knowledge constraints
US11051779B2 (en) * 2018-09-13 2021-07-06 Siemens Healthcare Gmbh Processing image frames of a sequence of cardiac images
US20210287375A1 (en) * 2020-03-11 2021-09-16 Purdue Research Foundation System architecture and method of processing images
US11321844B2 (en) 2020-04-23 2022-05-03 Exini Diagnostics Ab Systems and methods for deep-learning-based segmentation of composite images
US11361430B2 (en) * 2017-04-18 2022-06-14 King's College London System and method for medical imaging
US11386988B2 (en) 2020-04-23 2022-07-12 Exini Diagnostics Ab Systems and methods for deep-learning-based segmentation of composite images
US11403663B2 (en) * 2018-05-17 2022-08-02 Spotify Ab Ad preference embedding model and lookalike generation engine
US11424035B2 (en) 2016-10-27 2022-08-23 Progenics Pharmaceuticals, Inc. Network for medical image analysis, decision support system, and related graphical user interface (GUI) applications
US11436724B2 (en) 2020-10-30 2022-09-06 International Business Machines Corporation Lesion detection artificial intelligence pipeline computing system
US11537428B2 (en) 2018-05-17 2022-12-27 Spotify Ab Asynchronous execution of creative generator and trafficking workflows and components therefor
US11534125B2 (en) 2019-04-24 2022-12-27 Progenies Pharmaceuticals, Inc. Systems and methods for automated and interactive analysis of bone scan images for detection of metastases
US11544407B1 (en) 2019-09-27 2023-01-03 Progenics Pharmaceuticals, Inc. Systems and methods for secure cloud-based medical image upload and processing
US11551353B2 (en) 2017-11-22 2023-01-10 Arterys Inc. Content based image retrieval for lesion analysis
US11564621B2 (en) 2019-09-27 2023-01-31 Progenies Pharmacenticals, Inc. Systems and methods for artificial intelligence-based image analysis for cancer assessment
US11587236B2 (en) 2020-10-30 2023-02-21 International Business Machines Corporation Refining lesion contours with combined active contour and inpainting
WO2023075480A1 (en) * 2021-10-28 2023-05-04 주식회사 온택트헬스 Method and apparatus for providing clinical parameter for predicted target region in medical image, and method and apparatus for screening medical image for labeling
US11657508B2 (en) 2019-01-07 2023-05-23 Exini Diagnostics Ab Systems and methods for platform agnostic whole body image segmentation
WO2023096985A1 (en) * 2021-11-24 2023-06-01 Riverain Technologies Llc Method for the automatic detection of aortic disease and automatic generation of a reformatted aortic volume
US11688517B2 (en) 2020-10-30 2023-06-27 Guerbet Multiple operating point false positive removal for lesion identification
US11688063B2 (en) 2020-10-30 2023-06-27 Guerbet Ensemble machine learning model architecture for lesion detection
US11694329B2 (en) 2020-10-30 2023-07-04 International Business Machines Corporation Logistic model to determine 3D z-wise lesion connectivity
US11721428B2 (en) 2020-07-06 2023-08-08 Exini Diagnostics Ab Systems and methods for artificial intelligence-based image analysis for detection and characterization of lesions
US11749401B2 (en) 2020-10-30 2023-09-05 Guerbet Seed relabeling for seed-based segmentation of a medical image
US11763456B2 (en) 2020-03-11 2023-09-19 Purdue Research Foundation Systems and methods for processing echocardiogram images
US11900597B2 (en) 2019-09-27 2024-02-13 Progenics Pharmaceuticals, Inc. Systems and methods for artificial intelligence-based image analysis for cancer assessment
US11948283B2 (en) 2019-04-24 2024-04-02 Progenics Pharmaceuticals, Inc. Systems and methods for interactive adjustment of intensity windowing in nuclear medicine images

Families Citing this family (177)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10331852B2 (en) 2014-01-17 2019-06-25 Arterys Inc. Medical imaging and efficient sharing of medical imaging information
US10117597B2 (en) 2014-01-17 2018-11-06 Arterys Inc. Apparatus, methods and articles for four dimensional (4D) flow magnetic resonance imaging using coherency identification for magnetic resonance imaging flow data
US10663711B2 (en) 2017-01-04 2020-05-26 Corista, LLC Virtual slide stage (VSS) method for viewing whole slide images
US10580131B2 (en) * 2017-02-23 2020-03-03 Zebra Medical Vision Ltd. Convolutional neural network for segmentation of medical anatomical images
CN106887225B (en) * 2017-03-21 2020-04-07 百度在线网络技术(北京)有限公司 Acoustic feature extraction method and device based on convolutional neural network and terminal equipment
US10699412B2 (en) * 2017-03-23 2020-06-30 Petuum Inc. Structure correcting adversarial network for chest X-rays organ segmentation
GB201705876D0 (en) 2017-04-11 2017-05-24 Kheiron Medical Tech Ltd Recist
GB201705911D0 (en) * 2017-04-12 2017-05-24 Kheiron Medical Tech Ltd Abstracts
US10261903B2 (en) 2017-04-17 2019-04-16 Intel Corporation Extend GPU/CPU coherency to multi-GPU cores
US11468286B2 (en) * 2017-05-30 2022-10-11 Leica Microsystems Cms Gmbh Prediction guided sequential data learning method
US10699410B2 (en) * 2017-08-17 2020-06-30 Siemes Healthcare GmbH Automatic change detection in medical images
EP3625767B1 (en) * 2017-09-27 2021-03-31 Google LLC End to end network model for high resolution image segmentation
US10891723B1 (en) 2017-09-29 2021-01-12 Snap Inc. Realistic neural network based image style transfer
EP3471054B1 (en) * 2017-10-16 2022-02-09 Siemens Healthcare GmbH Method for determining at least one object feature of an object
US10783640B2 (en) 2017-10-30 2020-09-22 Beijing Keya Medical Technology Co., Ltd. Systems and methods for image segmentation using a scalable and compact convolutional neural network
JP6545887B2 (en) * 2017-11-24 2019-07-17 キヤノンメディカルシステムズ株式会社 Medical data processing apparatus, magnetic resonance imaging apparatus, and learned model generation method
US11580410B2 (en) * 2018-01-24 2023-02-14 Rensselaer Polytechnic Institute 3-D convolutional autoencoder for low-dose CT via transfer learning from a 2-D trained network
US10595727B2 (en) * 2018-01-25 2020-03-24 Siemens Healthcare Gmbh Machine learning-based segmentation for cardiac medical imaging
US10885630B2 (en) 2018-03-01 2021-01-05 Intuitive Surgical Operations, Inc Systems and methods for segmentation of anatomical structures for image-guided surgery
US11024025B2 (en) * 2018-03-07 2021-06-01 University Of Virginia Patent Foundation Automatic quantification of cardiac MRI for hypertrophic cardiomyopathy
CA3100642A1 (en) 2018-05-21 2019-11-28 Corista, LLC Multi-sample whole slide image processing in digital pathology via multi-resolution registration and machine learning
GB2574372B (en) * 2018-05-21 2021-08-11 Imagination Tech Ltd Implementing Traditional Computer Vision Algorithms As Neural Networks
US10853726B2 (en) * 2018-05-29 2020-12-01 Google Llc Neural architecture search for dense image prediction tasks
WO2019241155A1 (en) 2018-06-11 2019-12-19 Arterys Inc. Simulating abnormalities in medical images with generative adversarial networks
CA3103538A1 (en) * 2018-06-11 2019-12-19 Socovar, Societe En Commandite System and method for determining coronal artery tissue type based on an oct image and using trained engines
HUE058687T2 (en) 2018-06-14 2022-09-28 Kheiron Medical Tech Ltd Immediate workup
EP3598344A1 (en) * 2018-07-19 2020-01-22 Nokia Technologies Oy Processing sensor data
CN109087298B (en) * 2018-08-17 2020-07-28 电子科技大学 Alzheimer's disease MRI image classification method
KR102174379B1 (en) * 2018-08-27 2020-11-04 주식회사 딥바이오 System and method for medical diagnosis using neural network performing segmentation
US11164067B2 (en) * 2018-08-29 2021-11-02 Arizona Board Of Regents On Behalf Of Arizona State University Systems, methods, and apparatuses for implementing a multi-resolution neural network for use with imaging intensive applications including medical imaging
CN109345538B (en) * 2018-08-30 2021-08-10 华南理工大学 Retinal vessel segmentation method based on convolutional neural network
US10303980B1 (en) * 2018-09-05 2019-05-28 StradVision, Inc. Learning method, learning device for detecting obstacles and testing method, testing device using the same
JP7213412B2 (en) * 2018-09-12 2023-01-27 学校法人立命館 MEDICAL IMAGE EXTRACTION APPARATUS, MEDICAL IMAGE EXTRACTION METHOD, AND COMPUTER PROGRAM
CN109308695A (en) * 2018-09-13 2019-02-05 镇江纳兰随思信息科技有限公司 Based on the cancer cell identification method for improving U-net convolutional neural networks model
CN109242863B (en) * 2018-09-14 2021-10-26 北京市商汤科技开发有限公司 Ischemic stroke image region segmentation method and device
CN109272512B (en) * 2018-09-25 2022-02-15 南昌航空大学 Method for automatically segmenting left ventricle inner and outer membranes
US20200104678A1 (en) * 2018-09-27 2020-04-02 Google Llc Training optimizer neural networks
CN109559315B (en) * 2018-09-28 2023-06-02 天津大学 Water surface segmentation method based on multipath deep neural network
CN109410318B (en) * 2018-09-30 2020-09-08 先临三维科技股份有限公司 Three-dimensional model generation method, device, equipment and storage medium
WO2020077202A1 (en) * 2018-10-12 2020-04-16 The Medical College Of Wisconsin, Inc. Medical image segmentation using deep learning models trained with random dropout and/or standardized inputs
US11651584B2 (en) * 2018-10-16 2023-05-16 General Electric Company System and method for memory augmented domain adaptation
CN109446951B (en) 2018-10-16 2019-12-10 腾讯科技(深圳)有限公司 Semantic segmentation method, device and equipment for three-dimensional image and storage medium
CN109509203B (en) * 2018-10-17 2019-11-05 哈尔滨理工大学 A kind of semi-automatic brain image dividing method
WO2020080698A1 (en) 2018-10-19 2020-04-23 삼성전자 주식회사 Method and device for evaluating subjective quality of video
KR102525578B1 (en) 2018-10-19 2023-04-26 삼성전자주식회사 Method and Apparatus for video encoding and Method and Apparatus for video decoding
CN112889089B (en) * 2018-10-19 2024-03-05 克莱米特有限责任公司 Machine learning techniques for identifying clouds and cloud shadows in satellite imagery
WO2020080765A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
KR102312337B1 (en) * 2018-10-19 2021-10-14 삼성전자주식회사 AI encoding apparatus and operating method for the same, and AI decoding apparatus and operating method for the same
WO2020080873A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
WO2020080665A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
US11720997B2 (en) 2018-10-19 2023-08-08 Samsung Electronics Co.. Ltd. Artificial intelligence (AI) encoding device and operating method thereof and AI decoding device and operating method thereof
KR102285738B1 (en) * 2018-10-19 2021-08-05 삼성전자주식회사 Method and apparatus for assessing subjective quality of a video
CN109508647A (en) * 2018-10-22 2019-03-22 北京理工大学 A kind of spectra database extended method based on generation confrontation network
CA3117959A1 (en) * 2018-10-30 2020-05-07 Allen Institute Segmenting 3d intracellular structures in microscopy images using an iterative deep learning workflow that incorporates human contributions
CN109448006B (en) * 2018-11-01 2022-01-28 江西理工大学 Attention-based U-shaped dense connection retinal vessel segmentation method
WO2020093042A1 (en) * 2018-11-02 2020-05-07 Deep Lens, Inc. Neural networks for biomedical image analysis
CN109523077B (en) * 2018-11-15 2022-10-11 云南电网有限责任公司 Wind power prediction method
CN113591750A (en) * 2018-11-16 2021-11-02 北京市商汤科技开发有限公司 Key point detection method and device, electronic equipment and storage medium
CN110009640B (en) * 2018-11-20 2023-09-26 腾讯科技(深圳)有限公司 Method, apparatus and readable medium for processing cardiac video
WO2020108009A1 (en) * 2018-11-26 2020-06-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for improving quality of low-light images
TW202022713A (en) * 2018-12-05 2020-06-16 宏碁股份有限公司 Method and system for evaluating cardiac status, electronic device and ultrasonic scanning device
CN109711411B (en) * 2018-12-10 2020-10-30 浙江大学 Image segmentation and identification method based on capsule neurons
CN111309800A (en) * 2018-12-11 2020-06-19 北京京东尚科信息技术有限公司 Data storage and reading method and device
US10943352B2 (en) * 2018-12-17 2021-03-09 Palo Alto Research Center Incorporated Object shape regression using wasserstein distance
US10740901B2 (en) * 2018-12-17 2020-08-11 Nvidia Corporation Encoder regularization of a segmentation model
EP3671660A1 (en) * 2018-12-20 2020-06-24 Dassault Systèmes Designing a 3d modeled object via user-interaction
KR20200084808A (en) 2019-01-03 2020-07-13 삼성전자주식회사 Method and system for performing dilated convolution operation in neural network
CN109584254B (en) * 2019-01-07 2022-12-20 浙江大学 Heart left ventricle segmentation method based on deep full convolution neural network
CN109872325B (en) * 2019-01-17 2022-11-15 东北大学 Full-automatic liver tumor segmentation method based on two-way three-dimensional convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN109949334B (en) * 2019-01-25 2022-10-04 广西科技大学 Contour detection method based on deep reinforced network residual error connection
WO2020154664A1 (en) * 2019-01-25 2020-07-30 The Johns Hopkins University Predicting atrial fibrillation recurrence after pulmonary vein isolation using simulations of patient-specific magnetic resonance imaging models and machine learning
US10373027B1 (en) * 2019-01-30 2019-08-06 StradVision, Inc. Method for acquiring sample images for inspecting label among auto-labeled images to be used for learning of neural network and sample image acquiring device using the same
CN109886159B (en) * 2019-01-30 2021-03-26 浙江工商大学 Face detection method under non-limited condition
US11544572B2 (en) * 2019-02-15 2023-01-03 Capital One Services, Llc Embedding constrained and unconstrained optimization programs as neural network layers
DE102019203024A1 (en) * 2019-03-06 2020-09-10 Robert Bosch Gmbh Padding method for a convolutional neural network
CN109949318B (en) * 2019-03-07 2023-11-14 西安电子科技大学 Full convolution neural network epileptic focus segmentation method based on multi-modal image
EP3716201A1 (en) * 2019-03-25 2020-09-30 Siemens Healthcare GmbH Medical image enhancement
CN110009619A (en) * 2019-04-02 2019-07-12 清华大学深圳研究生院 A kind of image analysis method based on fluorescence-encoded liquid phase biochip
CN110101401B (en) * 2019-04-18 2023-04-07 浙江大学山东工业技术研究院 Liver contrast agent digital subtraction angiography method
CN110111313B (en) * 2019-04-22 2022-12-30 腾讯科技(深圳)有限公司 Medical image detection method based on deep learning and related equipment
CN110047073B (en) * 2019-05-05 2021-07-06 北京大学 X-ray weld image defect grading method and system
CN112396169B (en) * 2019-08-13 2024-04-02 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN110969182A (en) * 2019-05-17 2020-04-07 丰疆智能科技股份有限公司 Convolutional neural network construction method and system based on farmland image
US11328430B2 (en) * 2019-05-28 2022-05-10 Arizona Board Of Regents On Behalf Of Arizona State University Methods, systems, and media for segmenting images
CN112102221A (en) * 2019-05-31 2020-12-18 深圳市前海安测信息技术有限公司 3D UNet network model construction method and device for detecting tumor and storage medium
CN110298366B (en) * 2019-07-05 2021-05-04 北华航天工业学院 Crop distribution extraction method and device
US20210015438A1 (en) * 2019-07-16 2021-01-21 Siemens Healthcare Gmbh Deep learning for perfusion in medical imaging
CN110599499B (en) * 2019-08-22 2022-04-19 四川大学 MRI image heart structure segmentation method based on multipath convolutional neural network
CN110517241A (en) * 2019-08-23 2019-11-29 吉林大学第一医院 Method based on the full-automatic stomach fat quantitative analysis of NMR imaging IDEAL-IQ sequence
CN110619641A (en) * 2019-09-02 2019-12-27 南京信息工程大学 Automatic segmentation method of three-dimensional breast cancer nuclear magnetic resonance image tumor region based on deep learning
US10957031B1 (en) * 2019-09-06 2021-03-23 Accenture Global Solutions Limited Intelligent defect detection from image data
CN110598784B (en) * 2019-09-11 2020-06-02 北京建筑大学 Machine learning-based construction waste classification method and device
JP7408325B2 (en) * 2019-09-13 2024-01-05 キヤノン株式会社 Information processing equipment, learning methods and programs
ES2813777B2 (en) 2019-09-23 2023-10-27 Quibim S L METHOD AND SYSTEM FOR THE AUTOMATIC SEGMENTATION OF WHITE MATTER HYPERINTENSITIES IN BRAIN MAGNETIC RESONANCE IMAGES
JP2022549669A (en) * 2019-09-24 2022-11-28 カーネギー メロン ユニバーシティ System and method for analyzing medical images based on spatio-temporal data
CN110675411B (en) * 2019-09-26 2023-05-16 重庆大学 Cervical squamous intraepithelial lesion recognition algorithm based on deep learning
US11545266B2 (en) 2019-09-30 2023-01-03 GE Precision Healthcare LLC Medical imaging stroke model
US11331056B2 (en) * 2019-09-30 2022-05-17 GE Precision Healthcare LLC Computed tomography medical imaging stroke model
CN114586051A (en) * 2019-10-01 2022-06-03 雪佛龙美国公司 Method and system for predicting permeability of hydrocarbon reservoirs using artificial intelligence
US11640552B2 (en) * 2019-10-01 2023-05-02 International Business Machines Corporation Two stage training to obtain a best deep learning model with efficient use of computing resources
US11501434B2 (en) * 2019-10-02 2022-11-15 Memorial Sloan Kettering Cancer Center Deep multi-magnification networks for multi-class image segmentation
CN112365504A (en) * 2019-10-29 2021-02-12 杭州脉流科技有限公司 CT left ventricle segmentation method, device, equipment and storage medium
US11232859B2 (en) * 2019-11-07 2022-01-25 Siemens Healthcare Gmbh Artificial intelligence for basal and apical slice identification in cardiac MRI short axis acquisitions
KR20210056179A (en) 2019-11-08 2021-05-18 삼성전자주식회사 AI encoding apparatus and operating method for the same, and AI decoding apparatus and operating method for the same
US11423544B1 (en) 2019-11-14 2022-08-23 Seg AI LLC Segmenting medical images
US10762629B1 (en) 2019-11-14 2020-09-01 SegAI LLC Segmenting medical images
CN110930383A (en) * 2019-11-20 2020-03-27 佛山市南海区广工大数控装备协同创新研究院 Injector defect detection method based on deep learning semantic segmentation and image classification
CN110910368B (en) * 2019-11-20 2022-05-13 佛山市南海区广工大数控装备协同创新研究院 Injector defect detection method based on semantic segmentation
CN111161292B (en) * 2019-11-21 2023-09-05 合肥合工安驰智能科技有限公司 Ore scale measurement method and application system
EP3828828A1 (en) 2019-11-28 2021-06-02 Robovision Improved physical object handling based on deep learning
CN111179149B (en) * 2019-12-17 2022-03-08 Tcl华星光电技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN110991408B (en) * 2019-12-19 2022-09-06 北京航空航天大学 Method and device for segmenting white matter high signal based on deep learning method
CN110739050B (en) * 2019-12-20 2020-07-28 深圳大学 Left ventricle full-parameter and confidence coefficient quantification method
CN111144486B (en) * 2019-12-27 2022-06-10 电子科技大学 Heart nuclear magnetic resonance image key point detection method based on convolutional neural network
CN111260705B (en) * 2020-01-13 2022-03-15 武汉大学 Prostate MR image multi-task registration method based on deep convolutional neural network
CN111402203B (en) * 2020-02-24 2024-03-01 杭州电子科技大学 Fabric surface defect detection method based on convolutional neural network
CN111311737B (en) * 2020-03-04 2023-03-10 中南民族大学 Three-dimensional modeling method, device and equipment for heart image and storage medium
CN111401373B (en) * 2020-03-04 2022-02-15 武汉大学 Efficient semantic segmentation method based on packet asymmetric convolution
CN111281387B (en) * 2020-03-09 2024-03-26 中山大学 Segmentation method and device for left atrium and atrial scar based on artificial neural network
WO2021183473A1 (en) * 2020-03-09 2021-09-16 Nanotronics Imaging, Inc. Defect detection system
CN111340816A (en) * 2020-03-23 2020-06-26 沈阳航空航天大学 Image segmentation method based on double-U-shaped network framework
CN111462060A (en) * 2020-03-24 2020-07-28 湖南大学 Method and device for detecting standard section image in fetal ultrasonic image
US11205287B2 (en) * 2020-03-27 2021-12-21 International Business Machines Corporation Annotation of digital images for machine learning
US11704803B2 (en) * 2020-03-30 2023-07-18 The Board Of Trustees Of The Leland Stanford Junior University Methods and systems using video-based machine learning for beat-to-beat assessment of cardiac function
CN111466894B (en) * 2020-04-07 2023-03-31 上海深至信息科技有限公司 Ejection fraction calculation method and system based on deep learning
US20210319539A1 (en) * 2020-04-13 2021-10-14 GE Precision Healthcare LLC Systems and methods for background aware reconstruction using deep learning
CN111666972A (en) * 2020-04-28 2020-09-15 清华大学 Liver case image classification method and system based on deep neural network
CN111652886B (en) * 2020-05-06 2022-07-22 哈尔滨工业大学 Liver tumor segmentation method based on improved U-net network
CN112330674B (en) * 2020-05-07 2023-06-30 南京信息工程大学 Self-adaptive variable-scale convolution kernel method based on brain MRI three-dimensional image confidence coefficient
US11532084B2 (en) 2020-05-11 2022-12-20 EchoNous, Inc. Gating machine learning predictions on medical ultrasound images via risk and uncertainty quantification
US11523801B2 (en) 2020-05-11 2022-12-13 EchoNous, Inc. Automatically identifying anatomical structures in medical images in a manner that is sensitive to the particular view in which each image is captured
CN111739028A (en) * 2020-05-26 2020-10-02 华南理工大学 Nail region image acquisition method, system, computing device and storage medium
CA3180114C (en) * 2020-06-02 2023-08-29 Fabian RICHTER Method for property feature segmentation
WO2021255514A1 (en) * 2020-06-15 2021-12-23 Universidade Do Porto Padding method for convolutional neural network layers adapted to perform multivariate time series analysis
CN111739000B (en) * 2020-06-16 2022-09-13 山东大学 System and device for improving left ventricle segmentation accuracy of multiple cardiac views
US11693919B2 (en) * 2020-06-22 2023-07-04 Shanghai United Imaging Intelligence Co., Ltd. Anatomy-aware motion estimation
CN111798462B (en) * 2020-06-30 2022-10-14 电子科技大学 Automatic delineation method of nasopharyngeal carcinoma radiotherapy target area based on CT image
US11216960B1 (en) * 2020-07-01 2022-01-04 Alipay Labs (singapore) Pte. Ltd. Image processing method and system
EP3940629A1 (en) * 2020-07-13 2022-01-19 Koninklijke Philips N.V. Image intensity correction in magnetic resonance imaging
CN112001887B (en) * 2020-07-20 2021-11-09 南通大学 Full convolution genetic neural network method for infant brain medical record image segmentation
CN111738268B (en) * 2020-07-22 2023-11-14 浙江大学 Semantic segmentation method and system for high-resolution remote sensing image based on random block
CN111928794B (en) * 2020-08-04 2022-03-11 北京理工大学 Closed fringe compatible single interference diagram phase method and device based on deep learning
CN111898211B (en) * 2020-08-07 2022-11-01 吉林大学 Intelligent vehicle speed decision method based on deep reinforcement learning and simulation method thereof
CN112085162B (en) * 2020-08-12 2024-02-09 北京师范大学 Neural network-based magnetic resonance brain tissue segmentation method, device, computing equipment and storage medium
CN111968112B (en) * 2020-09-02 2023-12-26 广州海兆印丰信息科技有限公司 CT three-dimensional positioning image acquisition method and device and computer equipment
TWI792055B (en) * 2020-09-25 2023-02-11 國立勤益科技大學 Establishing method of echocardiography judging model with 3d deep learning, echocardiography judging system with 3d deep learning and method thereof
CN116261743A (en) * 2020-09-27 2023-06-13 上海联影医疗科技股份有限公司 System and method for generating radiation treatment plans
US20220114699A1 (en) * 2020-10-09 2022-04-14 The Regents Of The University Of California Spatiotemporal resolution enhancement of biomedical images
US11601661B2 (en) * 2020-10-09 2023-03-07 Tencent America LLC Deep loop filter by temporal deformable convolution
US11636593B2 (en) 2020-11-06 2023-04-25 EchoNous, Inc. Robust segmentation through high-level image understanding
CN112634243B (en) * 2020-12-28 2022-08-05 吉林大学 Image classification and recognition system based on deep learning under strong interference factors
CN112651987A (en) * 2020-12-30 2021-04-13 内蒙古自治区农牧业科学院 Method and system for calculating grassland coverage of sample
CN112712527A (en) * 2020-12-31 2021-04-27 山西三友和智慧信息技术股份有限公司 Medical image segmentation method based on DR-Unet104
CN112750106B (en) * 2020-12-31 2022-11-04 山东大学 Nuclear staining cell counting method based on incomplete marker deep learning, computer equipment and storage medium
CN112734770B (en) * 2021-01-06 2022-11-25 中国人民解放军陆军军医大学第二附属医院 Multi-sequence fusion segmentation method for cardiac nuclear magnetic images based on multilayer cascade
CN112932535B (en) * 2021-02-01 2022-10-18 杜国庆 Medical image segmentation and detection method
CN112785592A (en) * 2021-03-10 2021-05-11 河北工业大学 Medical image depth segmentation network based on multiple expansion paths
EP4060608A1 (en) 2021-03-17 2022-09-21 Robovision Improved vision-based measuring
EP4060612A1 (en) 2021-03-17 2022-09-21 Robovision Improved orientation detection based on deep learning
US20220335615A1 (en) * 2021-04-19 2022-10-20 Fujifilm Sonosite, Inc. Calculating heart parameters
CN112989107B (en) * 2021-05-18 2021-07-30 北京世纪好未来教育科技有限公司 Audio classification and separation method and device, electronic equipment and storage medium
CN113469948B (en) * 2021-06-08 2022-02-25 北京安德医智科技有限公司 Left ventricle segment identification method and device, electronic equipment and storage medium
US11875559B2 (en) * 2021-07-12 2024-01-16 Obvio Health Usa, Inc. Systems and methodologies for automated classification of images of stool in diapers
CN113284074B (en) * 2021-07-12 2021-12-07 中德(珠海)人工智能研究院有限公司 Method and device for removing target object of panoramic image, server and storage medium
WO2023283765A1 (en) * 2021-07-12 2023-01-19 上海联影医疗科技股份有限公司 Method and apparatus for training machine learning models, computer device, and storage medium
CN113674235B (en) * 2021-08-15 2023-10-10 上海立芯软件科技有限公司 Low-cost photoetching hot spot detection method based on active entropy sampling and model calibration
CN113838001B (en) * 2021-08-24 2024-02-13 内蒙古电力科学研究院 Ultrasonic wave full focusing image defect processing method and device based on nuclear density estimation
CN113838027A (en) * 2021-09-23 2021-12-24 杭州柳叶刀机器人有限公司 Method and system for obtaining target image element based on image processing
US20230126963A1 (en) * 2021-10-25 2023-04-27 Analogic Corporation Prediction of extrema of respiratory motion and related systems, methods, and devices
CN114240951B (en) * 2021-12-13 2023-04-07 电子科技大学 Black box attack method of medical image segmentation neural network based on query
CN114549448B (en) * 2022-02-17 2023-08-11 中国空气动力研究与发展中心超高速空气动力研究所 Complex multi-type defect detection evaluation method based on infrared thermal imaging data analysis
WO2023183486A1 (en) * 2022-03-23 2023-09-28 University Of Southern California Deep-learning-driven accelerated mr vessel wall imaging
CN117197020A (en) * 2022-05-23 2023-12-08 上海微创卜算子医疗科技有限公司 Mitral valve opening pitch detection method, electronic device, and storage medium
WO2023235653A1 (en) * 2022-05-30 2023-12-07 Northwestern University Panatomic imaging derived 4d hemodynamics using deep learning
CN115471659B (en) * 2022-09-22 2023-04-25 北京航星永志科技有限公司 Training method and segmentation method of semantic segmentation model and electronic equipment
CN117036376B (en) * 2023-10-10 2024-01-30 四川大学 Lesion image segmentation method and device based on artificial intelligence and storage medium

Family Cites Families (111)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2776844B2 (en) 1988-11-30 1998-07-16 株式会社日立製作所 Magnetic resonance imaging system
US5115812A (en) 1988-11-30 1992-05-26 Hitachi, Ltd. Magnetic resonance imaging method for moving object
JP3137378B2 (en) 1991-09-25 2001-02-19 株式会社東芝 Magnetic resonance imaging equipment
JP3452400B2 (en) 1994-08-02 2003-09-29 株式会社日立メディコ Magnetic resonance imaging equipment
JP3501182B2 (en) 1994-11-22 2004-03-02 株式会社日立メディコ Magnetic resonance imaging device capable of calculating flow velocity images
KR19990082557A (en) 1996-02-09 1999-11-25 윌리암 제이. 버크 Method and apparatus for training neural networks for detecting and classifying objects using uncertain training data
US6324532B1 (en) 1997-02-07 2001-11-27 Sarnoff Corporation Method and apparatus for training a neural network to detect objects in an image
WO2000067185A1 (en) 1999-05-05 2000-11-09 Healthgram, Inc. Portable health record
US6711433B1 (en) 1999-09-30 2004-03-23 Siemens Corporate Research, Inc. Method for providing a virtual contrast agent for augmented angioscopy
US6934698B2 (en) 2000-12-20 2005-08-23 Heart Imaging Technologies Llc Medical image management system
US8166381B2 (en) 2000-12-20 2012-04-24 Heart Imaging Technologies, Llc Medical image management system
DE10117685C2 (en) 2001-04-09 2003-02-27 Siemens Ag Process for processing objects of a standardized communication protocol
US7139417B2 (en) 2001-08-14 2006-11-21 Ge Medical Systems Global Technology Company Llc Combination compression and registration techniques to implement temporal subtraction as an application service provider to detect changes over time to medical imaging
US7158692B2 (en) 2001-10-15 2007-01-02 Insightful Corporation System and method for mining quantitive information from medical images
US7355597B2 (en) 2002-05-06 2008-04-08 Brown University Research Foundation Method, apparatus and computer program product for the interactive rendering of multivalued volume data with layered complementary values
GB0219408D0 (en) 2002-08-20 2002-09-25 Mirada Solutions Ltd Computation o contour
CN100377165C (en) 2003-04-24 2008-03-26 皇家飞利浦电子股份有限公司 Non-invasive left ventricular volume determination
US7254436B2 (en) 2003-05-30 2007-08-07 Heart Imaging Technologies, Llc Dynamic magnetic resonance angiography
JP2008510499A (en) 2004-06-23 2008-04-10 エムツーエス・インコーポレーテッド Anatomical visualization / measurement system
US7292032B1 (en) 2004-09-28 2007-11-06 General Electric Company Method and system of enhanced phase suppression for phase-contrast MR imaging
US7127095B2 (en) 2004-10-15 2006-10-24 The Brigham And Women's Hospital, Inc. Factor analysis in medical imaging
EP1659511A1 (en) 2004-11-18 2006-05-24 Cedara Software Corp. Image archiving system and method for handling new and legacy archives
US7736313B2 (en) 2004-11-22 2010-06-15 Carestream Health, Inc. Detecting and classifying lesions in ultrasound images
US8000768B2 (en) 2005-01-10 2011-08-16 Vassol Inc. Method and system for displaying blood flow
US20070061460A1 (en) 2005-03-24 2007-03-15 Jumpnode Systems,Llc Remote access
US7567707B2 (en) 2005-12-20 2009-07-28 Xerox Corporation Red eye detection and correction
US8126227B2 (en) 2006-12-04 2012-02-28 Kabushiki Kaisha Toshiba X-ray computed tomographic apparatus and medical image processing apparatus
US7764846B2 (en) 2006-12-12 2010-07-27 Xerox Corporation Adaptive red eye correction
WO2008144751A1 (en) * 2007-05-21 2008-11-27 Cornell University Method for segmenting objects in images
US8098918B2 (en) 2007-09-21 2012-01-17 Siemens Corporation Method and system for measuring left ventricle volume
US7806843B2 (en) 2007-09-25 2010-10-05 Marin Luis E External fixator assembly
JP5191787B2 (en) 2008-04-23 2013-05-08 株式会社日立メディコ X-ray CT system
WO2009142167A1 (en) 2008-05-22 2009-11-26 株式会社 日立メディコ Magnetic resonance imaging device and blood vessel image acquisition method
FR2932599A1 (en) 2008-06-12 2009-12-18 Eugene Franck Maizeroi METHOD AND DEVICE FOR IMAGE PROCESSING, IN PARTICULAR FOR PROCESSING MEDICAL IMAGES FOR DETERMINING VO LUMES 3D
WO2010003041A2 (en) 2008-07-03 2010-01-07 Nec Laboratories America, Inc. Mitotic figure detector and counter system and method for detecting and counting mitotic figures
US20110230756A1 (en) 2008-09-30 2011-09-22 University Of Cape Town Fluid flow assessment
JP5422171B2 (en) 2008-10-01 2014-02-19 株式会社東芝 X-ray diagnostic imaging equipment
US8148984B2 (en) 2008-10-03 2012-04-03 Wisconsin Alumni Research Foundation Method for magnitude constrained phase contrast magnetic resonance imaging
US8301224B2 (en) 2008-10-09 2012-10-30 Siemens Aktiengesellschaft System and method for automatic, non-invasive diagnosis of pulmonary hypertension and measurement of mean pulmonary arterial pressure
WO2010056900A1 (en) 2008-11-13 2010-05-20 Avid Radiopharmaceuticals, Inc. Histogram-based analysis method for the detection and diagnosis of neurodegenerative diseases
US20100158332A1 (en) 2008-12-22 2010-06-24 Dan Rico Method and system of automated detection of lesions in medical images
US8457373B2 (en) 2009-03-16 2013-06-04 Siemens Aktiengesellschaft System and method for robust 2D-3D image registration
EP2238897B1 (en) * 2009-04-06 2011-04-13 Sorin CRM SAS Active medical device including means for reconstructing a surface electrocardiogram using an intracardiac electrogram
US10303986B2 (en) 2009-04-07 2019-05-28 Kayvan Najarian Automated measurement of brain injury indices using brain CT images, injury data, and machine learning
US8527251B2 (en) 2009-05-01 2013-09-03 Siemens Aktiengesellschaft Method and system for multi-component heart and aorta modeling for decision support in cardiac disease
JP4639347B1 (en) 2009-11-20 2011-02-23 株式会社墨運堂 Writing instrument
US20110182493A1 (en) 2010-01-25 2011-07-28 Martin Huber Method and a system for image annotation
US8805048B2 (en) 2010-04-01 2014-08-12 Mark Batesole Method and system for orthodontic diagnosis
WO2012018560A2 (en) 2010-07-26 2012-02-09 Kjaya, Llc Adaptive visualization for direct physician use
EP2603136B1 (en) 2010-08-13 2023-07-12 Smith & Nephew, Inc. Detection of anatomical landmarks
US8897519B2 (en) 2010-09-28 2014-11-25 Siemens Aktiengesellschaft System and method for background phase correction for phase contrast flow images
US8374414B2 (en) 2010-11-05 2013-02-12 The Hong Kong Polytechnic University Method and system for detecting ischemic stroke
EP2649587B1 (en) 2010-12-09 2019-04-03 Koninklijke Philips N.V. Volumetric rendering of image data
US8600476B2 (en) 2011-04-21 2013-12-03 Siemens Aktiengesellschaft Patient support table control system for use in MR imaging
CN103635909B (en) 2011-06-27 2017-10-27 皇家飞利浦有限公司 A kind of clinical discovery management system
JP6006307B2 (en) 2011-07-07 2016-10-12 ザ ボード オブ トラスティーズ オブ ザ レランド スタンフォード ジュニア ユニバーシティー Comprehensive cardiovascular analysis by volumetric phase contrast MRI
US9585568B2 (en) 2011-09-11 2017-03-07 Steven D. Wolff Noninvasive methods for determining the pressure gradient across a heart valve without using velocity data at the valve orifice
US8837800B1 (en) 2011-10-28 2014-09-16 The Board Of Trustees Of The Leland Stanford Junior University Automated detection of arterial input function and/or venous output function voxels in medical imaging
US8682049B2 (en) 2012-02-14 2014-03-25 Terarecon, Inc. Cloud-based medical image processing system with access control
US9014781B2 (en) 2012-04-19 2015-04-21 General Electric Company Systems and methods for magnetic resonance angiography
US9165360B1 (en) 2012-09-27 2015-10-20 Zepmed, Llc Methods, systems, and devices for automated analysis of medical scans
US9495752B2 (en) 2012-09-27 2016-11-15 Siemens Product Lifecycle Management Software Inc. Multi-bone segmentation for 3D computed tomography
US20150374237A1 (en) 2013-01-31 2015-12-31 The Regents Of The University Of California Method for accurate and robust cardiac motion self-gating in magnetic resonance imaging
WO2014186838A1 (en) 2013-05-19 2014-11-27 Commonwealth Scientific And Industrial Research Organisation A system and method for remote medical diagnosis
WO2015031576A1 (en) * 2013-08-28 2015-03-05 Siemens Aktiengesellschaft Systems and methods for estimating physiological heart measurements from medical images and clinical data
WO2015031641A1 (en) 2013-08-29 2015-03-05 Mayo Foundation For Medical Education And Research System and method for boundary classification and automatic polyp detection
US9406142B2 (en) 2013-10-08 2016-08-02 The Trustees Of The University Of Pennsylvania Fully automatic image segmentation of heart valves using multi-atlas label fusion and deformable medial modeling
US9668699B2 (en) 2013-10-17 2017-06-06 Siemens Healthcare Gmbh Method and system for anatomical object detection using marginal space deep neural networks
US9700219B2 (en) 2013-10-17 2017-07-11 Siemens Healthcare Gmbh Method and system for machine learning based assessment of fractional flow reserve
US20150139517A1 (en) 2013-11-15 2015-05-21 University Of Iowa Research Foundation Methods And Systems For Calibration
EP3082575A2 (en) * 2013-12-19 2016-10-26 Cardiac Pacemakers, Inc. System and method for locating neural tissue
JP6301133B2 (en) 2014-01-14 2018-03-28 キヤノンメディカルシステムズ株式会社 Magnetic resonance imaging system
US10117597B2 (en) 2014-01-17 2018-11-06 Arterys Inc. Apparatus, methods and articles for four dimensional (4D) flow magnetic resonance imaging using coherency identification for magnetic resonance imaging flow data
US9430829B2 (en) 2014-01-30 2016-08-30 Case Western Reserve University Automatic detection of mitosis using handcrafted and convolutional neural network features
KR20150098119A (en) 2014-02-19 2015-08-27 삼성전자주식회사 System and method for removing false positive lesion candidate in medical image
US20150324690A1 (en) 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
EP3143599A1 (en) 2014-05-15 2017-03-22 Brainlab AG Indication-dependent display of a medical image
KR20160010157A (en) 2014-07-18 2016-01-27 삼성전자주식회사 Apparatus and Method for 3D computer aided diagnosis based on dimension reduction
US9707400B2 (en) 2014-08-15 2017-07-18 Medtronic, Inc. Systems, methods, and interfaces for configuring cardiac therapy
EP3043318B1 (en) 2015-01-08 2019-03-13 Imbio Analysis of medical images and creation of a report
WO2016141214A1 (en) 2015-03-03 2016-09-09 Nantomics, Llc Ensemble-based research recommendation systems and methods
US10115194B2 (en) 2015-04-06 2018-10-30 IDx, LLC Systems and methods for feature detection in retinal images
US9633306B2 (en) * 2015-05-07 2017-04-25 Siemens Healthcare Gmbh Method and system for approximating deep neural networks for anatomical object detection
US10176408B2 (en) 2015-08-14 2019-01-08 Elucid Bioimaging Inc. Systems and methods for analyzing pathologies utilizing quantitative imaging
CA2994713C (en) 2015-08-15 2019-02-12 Salesforce.Com, Inc. Three-dimensional (3d) convolution with 3d batch normalization
US9569736B1 (en) 2015-09-16 2017-02-14 Siemens Healthcare Gmbh Intelligent medical image landmark detection
US9792531B2 (en) 2015-09-16 2017-10-17 Siemens Healthcare Gmbh Intelligent multi-scale medical image landmark detection
US10192129B2 (en) 2015-11-18 2019-01-29 Adobe Systems Incorporated Utilizing interactive deep learning to select objects in digital visual media
CN108603922A (en) 2015-11-29 2018-09-28 阿特瑞斯公司 Automatic cardiac volume is divided
EP3391284B1 (en) 2015-12-18 2024-04-17 The Regents of The University of California Interpretation and quantification of emergency features on head computed tomography
US10163028B2 (en) 2016-01-25 2018-12-25 Koninklijke Philips N.V. Image data pre-processing
DE102016204225B3 (en) 2016-03-15 2017-07-20 Friedrich-Alexander-Universität Erlangen-Nürnberg Method for automatic recognition of anatomical landmarks and device
CN105825509A (en) 2016-03-17 2016-08-03 电子科技大学 Cerebral vessel segmentation method based on 3D convolutional neural network
US9886758B2 (en) 2016-03-31 2018-02-06 International Business Machines Corporation Annotation of skin image using learned feature representation
CN205665697U (en) 2016-04-05 2016-10-26 陈进民 Medical science video identification diagnostic system based on cell neural network or convolution neural network
CN106127725B (en) 2016-05-16 2019-01-22 北京工业大学 A kind of millimetre-wave radar cloud atlas dividing method based on multiresolution CNN
CN106096632A (en) 2016-06-02 2016-11-09 哈尔滨工业大学 Based on degree of depth study and the ventricular function index prediction method of MRI image
CN106096616A (en) 2016-06-08 2016-11-09 四川大学华西医院 A kind of nuclear magnetic resonance image feature extraction based on degree of depth study and sorting technique
US9589374B1 (en) 2016-08-01 2017-03-07 12 Sigma Technologies Computer-aided diagnosis system for medical images using deep convolutional neural networks
US10582907B2 (en) 2016-10-31 2020-03-10 Siemens Healthcare Gmbh Deep learning based bone removal in computed tomography angiography
US10902598B2 (en) 2017-01-27 2021-01-26 Arterys Inc. Automated segmentation utilizing fully convolutional networks
US10373313B2 (en) 2017-03-02 2019-08-06 Siemens Healthcare Gmbh Spatially consistent multi-scale anatomical landmark detection in incomplete 3D-CT data
EP3616120A1 (en) 2017-04-27 2020-03-04 Retinascan Limited System and method for automated funduscopic image analysis
US20200085382A1 (en) 2017-05-30 2020-03-19 Arterys Inc. Automated lesion detection, segmentation, and longitudinal identification
CN107341265B (en) 2017-07-20 2020-08-14 东北大学 Mammary gland image retrieval system and method fusing depth features
EP3714467A4 (en) 2017-11-22 2021-09-15 Arterys Inc. Content based image retrieval for lesion analysis
US10902591B2 (en) 2018-02-09 2021-01-26 Case Western Reserve University Predicting pathological complete response to neoadjuvant chemotherapy from baseline breast dynamic contrast enhanced magnetic resonance imaging (DCE-MRI)
KR101952887B1 (en) 2018-07-27 2019-06-11 김예현 Method for predicting anatomical landmarks and device for predicting anatomical landmarks using the same
KR102575569B1 (en) 2018-08-13 2023-09-07 쑤저우 레킨 세미컨덕터 컴퍼니 리미티드 Smeiconductor device
JP7125312B2 (en) 2018-09-07 2022-08-24 富士フイルムヘルスケア株式会社 MAGNETIC RESONANCE IMAGING APPARATUS, IMAGE PROCESSING APPARATUS, AND IMAGE PROCESSING METHOD
US10646156B1 (en) 2019-06-14 2020-05-12 Cycle Clarity, LLC Adaptive image processing in assisted reproductive imaging modalities

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10871536B2 (en) 2015-11-29 2020-12-22 Arterys Inc. Automated cardiac volume segmentation
US11894141B2 (en) 2016-10-27 2024-02-06 Progenics Pharmaceuticals, Inc. Network for medical image analysis, decision support system, and related graphical user interface (GUI) applications
US11424035B2 (en) 2016-10-27 2022-08-23 Progenics Pharmaceuticals, Inc. Network for medical image analysis, decision support system, and related graphical user interface (GUI) applications
US10902598B2 (en) 2017-01-27 2021-01-26 Arterys Inc. Automated segmentation utilizing fully convolutional networks
US11361430B2 (en) * 2017-04-18 2022-06-14 King's College London System and method for medical imaging
US11551353B2 (en) 2017-11-22 2023-01-10 Arterys Inc. Content based image retrieval for lesion analysis
US10973486B2 (en) 2018-01-08 2021-04-13 Progenics Pharmaceuticals, Inc. Systems and methods for rapid neural network-based image segmentation and radiopharmaceutical uptake determination
US11537428B2 (en) 2018-05-17 2022-12-27 Spotify Ab Asynchronous execution of creative generator and trafficking workflows and components therefor
US11403663B2 (en) * 2018-05-17 2022-08-02 Spotify Ab Ad preference embedding model and lookalike generation engine
US11051779B2 (en) * 2018-09-13 2021-07-06 Siemens Healthcare Gmbh Processing image frames of a sequence of cardiac images
US11657508B2 (en) 2019-01-07 2023-05-23 Exini Diagnostics Ab Systems and methods for platform agnostic whole body image segmentation
US11941817B2 (en) 2019-01-07 2024-03-26 Exini Diagnostics Ab Systems and methods for platform agnostic whole body image segmentation
US11948283B2 (en) 2019-04-24 2024-04-02 Progenics Pharmaceuticals, Inc. Systems and methods for interactive adjustment of intensity windowing in nuclear medicine images
US11937962B2 (en) 2019-04-24 2024-03-26 Progenics Pharmaceuticals, Inc. Systems and methods for automated and interactive analysis of bone scan images for detection of metastases
US11534125B2 (en) 2019-04-24 2022-12-27 Progenies Pharmaceuticals, Inc. Systems and methods for automated and interactive analysis of bone scan images for detection of metastases
US11544407B1 (en) 2019-09-27 2023-01-03 Progenics Pharmaceuticals, Inc. Systems and methods for secure cloud-based medical image upload and processing
US11564621B2 (en) 2019-09-27 2023-01-31 Progenies Pharmacenticals, Inc. Systems and methods for artificial intelligence-based image analysis for cancer assessment
US11900597B2 (en) 2019-09-27 2024-02-13 Progenics Pharmaceuticals, Inc. Systems and methods for artificial intelligence-based image analysis for cancer assessment
US20210287375A1 (en) * 2020-03-11 2021-09-16 Purdue Research Foundation System architecture and method of processing images
US11763456B2 (en) 2020-03-11 2023-09-19 Purdue Research Foundation Systems and methods for processing echocardiogram images
US11810303B2 (en) * 2020-03-11 2023-11-07 Purdue Research Foundation System architecture and method of processing images
US11386988B2 (en) 2020-04-23 2022-07-12 Exini Diagnostics Ab Systems and methods for deep-learning-based segmentation of composite images
US11321844B2 (en) 2020-04-23 2022-05-03 Exini Diagnostics Ab Systems and methods for deep-learning-based segmentation of composite images
US11721428B2 (en) 2020-07-06 2023-08-08 Exini Diagnostics Ab Systems and methods for artificial intelligence-based image analysis for detection and characterization of lesions
US11688517B2 (en) 2020-10-30 2023-06-27 Guerbet Multiple operating point false positive removal for lesion identification
US11688063B2 (en) 2020-10-30 2023-06-27 Guerbet Ensemble machine learning model architecture for lesion detection
US11694329B2 (en) 2020-10-30 2023-07-04 International Business Machines Corporation Logistic model to determine 3D z-wise lesion connectivity
US11749401B2 (en) 2020-10-30 2023-09-05 Guerbet Seed relabeling for seed-based segmentation of a medical image
US11688065B2 (en) 2020-10-30 2023-06-27 Guerbet Lesion detection artificial intelligence pipeline computing system
US11587236B2 (en) 2020-10-30 2023-02-21 International Business Machines Corporation Refining lesion contours with combined active contour and inpainting
US11436724B2 (en) 2020-10-30 2022-09-06 International Business Machines Corporation Lesion detection artificial intelligence pipeline computing system
CN112767413A (en) * 2021-01-06 2021-05-07 武汉大学 Remote sensing image depth semantic segmentation method integrating region communication and symbiotic knowledge constraints
WO2023075480A1 (en) * 2021-10-28 2023-05-04 주식회사 온택트헬스 Method and apparatus for providing clinical parameter for predicted target region in medical image, and method and apparatus for screening medical image for labeling
WO2023096985A1 (en) * 2021-11-24 2023-06-01 Riverain Technologies Llc Method for the automatic detection of aortic disease and automatic generation of a reformatted aortic volume

Also Published As

Publication number Publication date
EP3573520A4 (en) 2020-11-04
WO2018140596A3 (en) 2018-09-07
US10902598B2 (en) 2021-01-26
CN110475505A (en) 2019-11-19
JP2020510463A (en) 2020-04-09
US10600184B2 (en) 2020-03-24
WO2018140596A2 (en) 2018-08-02
CN110475505B (en) 2022-04-05
US20180218497A1 (en) 2018-08-02
EP3573520A2 (en) 2019-12-04
US20180218502A1 (en) 2018-08-02

Similar Documents

Publication Publication Date Title
US20200193603A1 (en) Automated segmentation utilizing fully convolutional networks
US10871536B2 (en) Automated cardiac volume segmentation
US9968257B1 (en) Volumetric quantification of cardiovascular structures from medical imaging
US20230106440A1 (en) Content based image retrieval for lesion analysis
US20200085382A1 (en) Automated lesion detection, segmentation, and longitudinal identification
US11010630B2 (en) Systems and methods for detecting landmark pairs in images
Zheng et al. Automatic aorta segmentation and valve landmark detection in C-arm CT: application to aortic valve implantation
EP3788633A1 (en) Modality-agnostic method for medical image representation
US11464491B2 (en) Shape-based generative adversarial network for segmentation in medical imaging
Xia et al. Super-resolution of cardiac MR cine imaging using conditional GANs and unsupervised transfer learning
Santiago et al. Fast segmentation of the left ventricle in cardiac MRI using dynamic programming
Abbasi et al. Medical image registration using unsupervised deep neural network: A scoping literature review
Kim et al. Automatic segmentation of the left ventricle in echocardiographic images using convolutional neural networks
Laumer et al. Weakly supervised inference of personalized heart meshes based on echocardiography videos
Singh et al. Attention-guided residual W-Net for supervised cardiac magnetic resonance imaging segmentation
Su et al. Res-DUnet: A small-region attentioned model for cardiac MRI-based right ventricular segmentation
Pereira A comparison of deep learning algorithms for medical image classification and image enhancement
Yang Deformable Models and Machine Learning for Large-Scale Cardiac MRI Image Analytics
Jafari Towards a more robust machine learning framework for computer-assisted echocardiography
Lima et al. Full motion focus: convolutional module for improved left ventricle segmentation over 4D MRI
Abhijit Yadav et al. Improving Right Ventricle Contouring in Cardiac MR Images Using Integrated Approach for Small Datasets
Ahmad et al. Fully automated cardiac MRI segmentation using dilated residual network
Pérez-Pelegrí et al. Convolutional Neural Networks for Segmentation in Short-Axis Cine Cardiac Magnetic Resonance Imaging: Review and Considerations
Metaxas et al. Segmentation and blood flow simulations of patient-specific heart data
Lima et al. for Improved Left Ventricle Segmentation

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: ARES CAPITAL CORPORATION, AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:ARTERYS INC.;REEL/FRAME:061857/0870

Effective date: 20221122