WO2021030629A1 - Three dimensional object segmentation of medical images localized with object detection - Google Patents

Three dimensional object segmentation of medical images localized with object detection Download PDF

Info

Publication number
WO2021030629A1
WO2021030629A1 PCT/US2020/046239 US2020046239W WO2021030629A1 WO 2021030629 A1 WO2021030629 A1 WO 2021030629A1 US 2020046239 W US2020046239 W US 2020046239W WO 2021030629 A1 WO2021030629 A1 WO 2021030629A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
interest
contrast
segmentation
medical imaging
Prior art date
Application number
PCT/US2020/046239
Other languages
English (en)
French (fr)
Inventor
Omid BAZGIR
Kai Henrik BARCK
Luke Xie
Original Assignee
Genentech, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genentech, Inc. filed Critical Genentech, Inc.
Priority to JP2022508485A priority Critical patent/JP2022544229A/ja
Priority to EP20761979.2A priority patent/EP4014201A1/en
Priority to CN202080057028.2A priority patent/CN114503159A/zh
Publication of WO2021030629A1 publication Critical patent/WO2021030629A1/en
Priority to US17/665,932 priority patent/US11967072B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10101Optical tomography; Optical coherence tomography [OCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10104Positron emission tomography [PET]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/031Recognition of patterns in medical or anatomical images of internal organs

Definitions

  • the present disclosure relates to automated object segmentation of medical images, and in particular to techniques for segmenting objects within medical images using a deep learning network that is localized with object detection based on a derived contrast mechanism.
  • Computer vision involves working with digital images and videos to deduce some understanding of contents within these images and videos.
  • Object recognition is associated with computer vision and refers to a collection of related computer vision tasks that involve identifying objects present in an image frame.
  • the tasks include image classification, object localization, object detection, and object segmentation.
  • Image classification involves predicting the class of one or more objects in an image frame.
  • Object localization refers to identifying the location of one or more objects in an image frame and drawing abounding box around their extent.
  • Object detection combines these two tasks and localizes and classifies one or more objects in an image frame.
  • Object segmentation involves highlighting the specific pixels (generating a mask) of the localized or detected objects instead of a coarse bounding box.
  • Techniques for object recognition generally fall into either machine learning- based approaches or deep learning-based approaches.
  • features within images are initially defined using a feature descriptor such as Haar-like features, a scale invariant feature transform, or a histogram of oriented gradients (HOG), then objects of interest are detected using a technique such as a support vector machine (SVM) based on the feature descriptor.
  • SVM support vector machine
  • deep learning techniques are able to perform end-to-end object detection and segmentation without specifically defining features, and are typically based on a convolutional neural networks (CNN) such as regional-based networks (R-CNN, Fast R-CNN, Faster R-CNN, and cascade R-CNN).
  • CNN convolutional neural networks
  • a computer-implemented method for segmenting objects within medical images includes: obtaining medical images of a subject, the medical images include a first image having a first characteristic and a second image having a second characteristic, where the medical images are generated using one or more medical imaging modalities; locating and classifying, using a localization model, objects within the first image into a plurality of object classes, where the classifying assigns sets of pixels or voxels of the first image into one or more of the plurality of object classes; determining, using the localization model, a bounding box or segmentation mask for an object of interest within the first image based on sets of pixels or voxels assigned with an object class of the plurality of object classes; transferring the bounding box or the segmentation mask onto the second image to define a portion of the second image comprising the object of interest; inputting the portion of the second image into a three-dimensional neural network model constructed for volumetric segmentation using a weighted loss function; generating, using
  • the one or more medical imaging modalities comprise a first medical imaging modality and a second medical imaging modality that is different from the first medical imaging modality, and where the first image is generated from the first medical imaging modality and the second image is generated from the second medical imaging modality.
  • the one or more medical imaging modalities comprise a first medical imaging modality and a second medical imaging modality that is the same as the first medical imaging modality, and where the first image is generated from the first medical imaging modality and the second image is generated from the second medical imaging modality.
  • the first image is a first type of image and the second image is a second type of image, and where the first type of image is different from the second type of image.
  • the first image is a first type of image and the second image is a second type of image, and where the first type of image is the same as the second type of image.
  • the first characteristic is different from the second characteristic.
  • the first characteristic is the same as the second characteristic.
  • the first medical imaging modality is magnetic resonance imaging, diffusion tensor imaging, computerized tomography, positron emission tomography, photoacoustic tomography, X-ray, sonography, or a combination thereof
  • the second medical imaging modality is magnetic resonance imaging, diffusion tensor imaging, computerized tomography, positron emission tomography, photoacoustic tomography, X-ray, sonography, or a combination thereof.
  • the first type of image is a magnetic resonance image, a diffusion tensor image or map, a computerized tomography image, a positron emission tomography image, photoacoustic tomography image, an X-ray image, a sonography image, or a combination thereof
  • the second type of image is a magnetic resonance image, a diffusion tensor image or map, a computerized tomography image, a positron emission tomography image, photoacoustic tomography image, an X-ray image, a sonography image, or a combination thereof.
  • the first characteristic is fractional anisotropy contrast, mean diffusivity contrast, the axial diffusivity contrast, radial diffusivity contrast, proton density contrast, T1 relaxation time contrast, T2 relaxation time contrast, diffusion coefficient contrast, low resolution, high resolution, agent contrast, radiotracer contrast, optical absorption contrast, echo distance contrast, or a combination thereof
  • the second characteristic is fractional anisotropy contrast, mean diffusivity contrast, the axial diffusivity contrast, radial diffusivity contrast, proton density contrast, T1 relaxation time contrast, T2 relaxation time contrast, diffusion coefficient contrast, low resolution, high resolution, agent contrast, radiotracer contrast, optical absorption contrast, echo distance contrast, or a combination thereof.
  • the one or more medical imaging modalities is diffusion tensor imaging
  • the first image is a fractional anisotropy (FA) map
  • the second image is a mean diffusivity (MD) map
  • the first characteristic is fractional anisotropy contrast
  • the second characteristic is mean diffusivity contrast
  • the object of interest is a kidney of the subject.
  • the locating and classifying the objects within the first image comprises applying one or more clustering algorithms to a plurality of pixels or voxels of the first image.
  • the one or more clustering algorithms include a k-means algorithm that assigns observations to clusters associated with the plurality of object classes.
  • the one or more clustering algorithms further include an expectation maximization algorithm that computes probabilities of cluster memberships based on one or more probability distributions, and where the k-means algorithm initializes the expectation maximization algorithm by estimating initial parameters for each object class of the plurality of object classes.
  • the segmentation mask is determined and the determining the segmentation mask comprises: identifying a seed location of the object of interest using the sets of pixels or voxels assigned with the object class; growing the seed location by projecting the seed location towards a z-axis representing depth of the segmentation mask; and determining the segmentation mask based on the projected seed location.
  • determining the segmentation mask further comprises performing morphological closing and filling on the segmentation mask.
  • the method further includes, prior to inputting the portion of the second image into the three-dimensional neural network model, cropping the second image based on the object mask plus a margin to generate the portion of the second image.
  • the method further includes, prior to inputting the portion of the second image into the three-dimensional neural network model, inputting the second image into a deep super resolution neural network to increase resolution of the portion of the second image.
  • the three-dimensional neural network model comprises a plurality of model parameters identified using a set of training data comprising: a plurality of medical images with annotations associated with segmentation boundaries around objects of interest; and a plurality of additional medical images with annotations associated with segmentation boundaries around objects of interest, where the plurality of additional medical images are artificially generated by matching image histograms from the plurality of medical images to image histograms from a plurality of reference maps; and where the plurality of model parameters are identified using the set of training data based on minimizing the weighted loss function.
  • the weighted loss function is a weighted Dice loss function.
  • the three-dimensional neural network model is a modified
  • the modified 3D U-Net model comprises a total number of between 5,000,000 and 12,000,000 leamable parameters.
  • the modified 3D U-Net model comprises a total number of between 800 and 1,700 kernels.
  • the method further includes: determining a size, surface area, and/or volume of the object of interest based on the estimated boundary around the object of interest; and providing: (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest.
  • the method further includes: determining, by a user, a diagnosis of the subject based on (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest.
  • the method further includes: acquiring, by a user using an imaging system, the medical images of the subject, where the imaging system uses the one or more medical imaging modalities to generate the medical images; determining a size, surface area, and/or volume of the object of interest based on the estimated segmentation boundary around the object of interest; providing: (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) the size, surface area, and/or volume of the object of interest; receiving, by the user, (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) the size, surface area, and/or volume of the object of interest; and determining, by the user, a diagnosis of the subject based on (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest.
  • the method further includes administering, by the user, a treatment with a compound based on (i) the portion of the second image with the estimated segmentation boundary around the object of interest, (ii) a size, surface area, and/or volume of the object of interest, and/or (iii) the diagnosis of the subject.
  • a system includes one or more data processors and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods disclosed herein.
  • a computer-program product is provided that is tangibly embodied in a non-transitory machine-readable storage medium and that includes instructions configured to cause one or more data processors to perform part or all of one or more methods disclosed herein.
  • Some embodiments of the present disclosure include a system including one or more data processors.
  • the system includes a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
  • Some embodiments of the present disclosure include a computer-program product tangibly embodied in a non- transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
  • FIG. 1 shows an example computing environment for segmenting instances of an object of interest according to various embodiments
  • FIG. 2 shows histogram matching used to simulate other contrasts and increase variance of the training dataset according to various embodiments
  • FIG. 3 shows an exemplary U-Net according to various embodiments
  • FIG. 4 shows a process for segmenting instances of an object of interest according to various embodiments
  • FIG. 5 A shows diffusion tensor elements according to various embodiments
  • FIG. 5B shows a fractional anisotropy image used for expectation maximization (EM) segmentation (12 classes) and object detection steps according to various embodiments;
  • FIG. 5C shows super resolved images in the slice direction according to various embodiments;
  • FIGS. 6A-6E show segmentation results using various strategies.
  • FIG. 6A 3D U- Net.
  • FIG. 6B detecting the foreground with connected component preprocessing.
  • FIG. 6C EM segmentation.
  • FIG 6D kidney detection via EM segmentation.
  • FIG. 6E kidney detection via EM segmentation on super-resolved images.
  • the first row shows: ground truth manual labels overlaid on magnetic resonance imaging (MRI).
  • the second row shows: transparent surface renderings of the ground truth and segmentation masks. Coronal and axial views are shown in pairs.
  • the third row shows: Dice similarity coefficients (DSCs) shown as violin plots.
  • Example datasets were selected based on the mean DSCs for each segmentation strategy. All segmentation results are 3D U-Net based except for C, which is only EM segmentation.
  • the present disclosure describes techniques for automated object segmentation of medical images. More specifically, embodiments of the present disclosure provide techniques for segmenting objects within medical images using a deep learning network that is localized with object detection based on a derived contrast mechanism.
  • Medical image segmentation identifying the pixels of objects (e.g., organs, lesions, or tumors) from background medical images such as computerized tomography (CT) or MRI images, is a fundamental task in medical image analysis for providing information about the shapes, sizes, and volumes of the objects.
  • a change in organ size or volume can be a predominant feature of a disease process or a manifestation of pathology elsewhere in a subject.
  • tumor or lesion size or volume can be an important independent indicator in subjects with carcinoma (e.g., repeated size measurements during primary systemic therapy produce detailed information about response that could be used to select the most effective treatment regimen and to estimate the subject’s prognosis).
  • the deep learning may not be optimally trained to segment the foreground object of interest (‘background effect’), and (v) in the background, there can be similar appearing objects (e.g., sometimes a liver, heart, kidney can look like a tumor), the deep learning and simpler machine-learning algorithms may not be optimally trained to differentiate between these structures.
  • the techniques for automated object segmentation of the present embodiments use various imaging modalities and/or types of medical images with different characteristics (characteristics that make an object (or its representation in an image or display) distinguishable such as contrast or resolution) as a derived contrast mechanism to locate an object of interest, isolate the object of interest, and subsequently segment the object of interest using a deep learning model.
  • a first image of an object obtained by a first imaging modality may have a first characteristic (e.g., good contrast) that works well to provide a general outline of the object so that the first image may be used for object detection (providing a coarse grain boundary around the object and classification).
  • this first image may be blurry or fuzzy enough that a deep learning network cannot determine for sure exactly where the edges of the object are for accurate object segmentation.
  • a second image of the object obtained using a second imaging modality or second image of a image feature/contrast mechanism of the same modality may have a second characteristic (e.g., high resolution) that works well to provide a well-defined boundary of the object so that the second image could be used for edge detection and fine grained object segmentation.
  • the coarse grain boundary of the object is projected on the second image to localize the object within the second image.
  • the coarse grain boundary of the object on the second image is then used to crop the second image prior to object segmentation.
  • the localization and cropping of the second image alleviates the background effect and focuses the deep learning model on the edges of the object to leam the boundary knowledge for fine grained object segmentation.
  • One illustrative embodiment of the present disclosure is directed to a method that includes initially localizing (e.g., using an algorithm such as Expectation Maximization) an object of interest such as an organ, tumor, or lesion within a first medical image having a first characteristic, projecting a bounding box or segmentation mask of the object of interest onto a second medical image having a second characteristic to define a portion of the second medical image comprising the object of interest, and subsequently inputting the portion of the second medical image into a deep learning model such as convolutional neural network model, that is constructed as a detector using a weighted loss function capable of segmenting the portion of the second medical image and generating a segmentation boundary around the object of interest.
  • a deep learning model such as convolutional neural network model
  • the segmentation boundary may be used to calculate a volume of the object of interest for determining a diagnosis and/or a prognosis.
  • the calculated volume may further be associated with a time point.
  • a volume of the object from the time point may be compared to a volume of the object from a previous time point in order to determine an efficacy of a treatment.
  • the time point analysis provides context of organ or tumor change over time.
  • the specific contents within the object of interest defined by the segmentation boundary can have changes, e.g., more necrotic content or aggressive tumor type.
  • the segmentation boundary and a corresponding segmented area or volume may be used for quantifying an image metric such as image intensity.
  • these approaches utilize multiple medical images with varying characteristics and object detection techniques to detect the general area of an object of interest prior to attempting to segment the object of interest using a deep learning model.
  • This reduces the background effect, decreases the complexity of the input data, and focuses the deep learning model on the edges of the object to learn the boundary knowledge for object segmentation.
  • the complexity of the deep learning model could be decreased (e.g., by reducing the number of kernels per convolutional layer).
  • the deep learning model is constructed with a weighted loss function, which minimizes segmentation error, improves training performance optimization, and further reduces the background effect that may still be apparent in some general areas determined by the object detection techniques.
  • the terms “substantially,” “approximately” and “about” are defined as being largely but not necessarily wholly what is specified (and include wholly what is specified) as understood by one of ordinary skill in the art. In any disclosed embodiment, the term “substantially,” “approximately,” or “about” may be substituted with “within [a percentage] of’ what is specified, where the percentage includes 0.1, 1, 5, and 10 percent.
  • a "mask” refers to an image that represents a surface area of a detected object.
  • a mask may include pixels of nonzero intensity to indicate one or more regions of interest (e.g., one or more detected objects) and pixels of zero intensity to indicate background.
  • a "binary mask” refers to a mask in which each pixel value is set to one of two values (e.g., 0 or 1). Zero intensity values can indicate that corresponding pixels are part of a background, and non-zero intensity values (e.g., values of 1) can indicate that corresponding pixels are part of a region of interest.
  • classification refers to the process of taking an input (e.g., an image or a portion of an image) and outputting a class (e.g., “organ” or “tumor”) or a probability that the input is a particular class.
  • a class e.g., “organ” or “tumor”
  • This may include binary classification (is a member of a class or not), multiclass classification (assigning to one or many classes), providing probabilities of membership in each class (e.g., there’s a 90% probability that this input is an organ), and like classification schemas.
  • object localization or “object detection” refers to the process of detecting instances of objects of a particular class in an image.
  • a “bounding box” refers to a rectangular box that represents the general location of an object of a particular class in an image.
  • the bounding box may be defined by the x and y axis coordinates in the upper-left and/or upper-right comer and the x and y axis coordinates in the lower-right and/or lower-left comer of the rectangle.
  • segmentation boundary refers to an estimated perimeter of an object within an image.
  • a segmentation boundary may be generated during a segmentation process where features of the image are analyzed to determine locations of the edges of the object.
  • the segmentation boundary may further be represented by a mask such as a binary mask.
  • Segmentation refers to determining a location and shape of an object within an image. Segmentation may involve determining a set of pixels that depict an area or perimeter of the object within the image. Segmentation may involve generating a mask such as a binary mask for an object. Segmentation may further involve processing multiple masks corresponding to the object in order to generate a 3D mask of the object.
  • the goal of imaging procedures of the anatomy e.g., organs or other human or mammalian tissue
  • the physiological processes of a subject is generation of an image contrast with a good spatial resolution.
  • Initial evolution of medical imaging focused on tissue (proton) density function and tissue relaxation properties for signal contrast generation, which are the main principles behind conventional MRI.
  • MRI detects signals from protons of water molecules, however, it can only provide grayscale images in which each pixel contains one integer value. Unless two anatomical regions A and B contain water molecules with different physical or chemical properties, these two regions cannot be distinguished from each other with MRI. Otherwise, no matter how high the image resolution is, region A is indistinguishable from region B.
  • PD proton density
  • T1 and T2 relaxation times and the diffusion coefficient (D) are widely used.
  • the PD represents water concentration.
  • T1 and T2 are signal relaxation (decay) times after excitation, which are related to environmental factors, such as viscosity and the existence of nearby macromolecules.
  • the diffusion term, D represents the thermal (or Brownian) motion of water molecules.
  • DI Diffusion imaging
  • DSI diffusion-spectrum imaging
  • DWI diffusion-weighted imaging
  • the supplemental MR gradients are employed to create an image that is sensitized to diffusion in a particular direction.
  • the intensity of each image element reflects the best estimate of the rate of water diffusion in the particular direction.
  • biological tissues are highly anisotropic, meaning that their diffusion rates are not the same in every direction.
  • ADC apparent diffusion coefficient
  • An alternative method is to model diffusion in complex materials using a diffusion tensor, a [3 x 3] array of numbers corresponding to diffusion rates in each combination of directions.
  • the three diagonal elements represent diffusion coefficients measured along each of the principal (x-, y- and z-) laboratory axes.
  • the six off-diagonal terms reflect the correlation of random motions between each pair of principal directions.
  • DTI diffusion tensor imaging
  • DTI is usually displayed by either condensing the information contained in the tensor into one number (a scalar), or into 4 numbers (to give an R,G,B color and a brightness value, which is known as color fractional anisotropy).
  • the diffusion tensor can also be viewed using glyphs, which are small three dimensional (3D) representations of the major eigenvector or whole tensor.
  • glyphs are small three dimensional (3D) representations of the major eigenvector or whole tensor.
  • CT computed tomography
  • positron emission tomography PET
  • PAT photoacoustic tomography
  • sonography combinations thereof
  • PET-CT PET-MR
  • CT and X-ray use x-ray absorption to differentiate between air, soft tissue, and dense structures such as bone.
  • One technique to increase image contrast in X-rays or CT scans is to utilize contrast agents that contain substances better at stopping x-rays making them more visible on an X-ray or CT image and, thus can be used to better visualize soft tissues such as blood vessels.
  • PET uses small amounts of radioactive materials called radiotracers that can be detected and measured in a scan. The measurement differences between areas accumulating or labeled with the radiotracers versus non-accumulating or non- labeled areas is used to generate contrast to visualize structures and functions within the subject.
  • PAT is a imaging modality based on the photoacoustic (PA) effect.
  • a short-pulsed light source is typically used to irradiate the tissue, resulting in broadband PA waves.
  • an initial temperature rise induces a pressure rise, which propagates as a photoacoustic wave and is detected by an ultrasonic transducer to image optical absorption contrast.
  • Ultrasound is a non-invasive diagnostic technique used to image inside the body.
  • a transducer sends out a beam of sound waves into the body. The sound waves are reflected back to the transducer by boundaries between tissues in the path of the beam (e.g. the boundary between fluid and soft tissue or tissue and bone).
  • the echoes When these echoes hit the transducer, the echoes generate electrical signals that are sent to the ultrasound scanner. Using the speed of sound and the time of each echo’s return, the scanner calculates the distance from the transducer to the tissue boundary. These distances are then used to generate contrast to visualize tissues and organs.
  • Imaging modalities generate image contrast with spatial resolution adequate enough to visualize representations of the interior of a body for clinical analysis, medical intervention, and/or medical diagnosis, as well as visual representation of the function of some organs or tissues.
  • image contrast and spatial resolution provided by each of these imaging modalities individually is not sufficient for accurate object segmentation, especially object segmentation used for obtaining size and volumetric data, that is performed by deep learning networks.
  • the techniques described herein use a combination of imaging modalities, types of images, and/or varying characteristics to locate an object of interest, isolate the object of interest, and subsequently segment the object of interest using a deep learning model.
  • imaging modalities, types of images, and/or characteristics perform better with object detection as compared to object segmentation; whereas other imaging modalities, types of images, and/or characteristics perform better with object segmentation.
  • computer vision task e.g., object detection or object segmentation
  • the characteristics of images that can be leveraged by the derived contrast mechanism include brightness, contrast, and spatial resolution.
  • Brightness or luminous brightness is a measure of relative intensity values across the pixel array after an image has been acquired with a digital camera or digitized by an analog-to-digital converter. The higher the relative intensity value the brighter the pixels and generally the whiter an image will appear; whereas the lower the relative intensity value the darker the pixels and generally the blacker an image will appear.
  • Contrast refers to differentiation that exists between various image features in both analog and digital images. The differentiation within the image can be in the form of different shades of gray, light intensities, or colors. Images having a higher contrast level generally display a greater degree of grayscale, color, or intensity variation than those of lower contrast.
  • Spatial resolution refers to the number of pixels utilized in construction of a digital image. Images having higher spatial resolution are composed with a greater number of pixels than those of lower spatial resolution.
  • the derived contrast mechanism comprises: (i) a first imaging modality capable of obtaining images with characteristics (e.g., DTI-FA) that are used for detecting an object of interest, and (ii) a second imaging modality capable of obtaining images with characteristics (e.g., DTI-MD) that are used for segmenting the object of interest.
  • a first imaging modality capable of obtaining images with characteristics (e.g., DTI-FA) that are used for detecting an object of interest
  • a second imaging modality capable of obtaining images with characteristics (e.g., DTI-MD) that are used for segmenting the object of interest.
  • Various imaging modalities, types of images, and/or characteristics may be combined to improve upon each computer vision task (e.g., object detection or object segmentation).
  • the imaging modalities of the derived contrast mechanism are the same such as MRI or DTI.
  • an imaging modality is used to obtain a first image having a first characteristic and a second image having a second characteristic, where the first image is different from the second image.
  • MRI may be used to obtain diffusion tensor parametric maps for a subject.
  • the diffusion tensor parametric maps may include a first measurement map such as FA map and a second measurement map such as a MD map.
  • an imaging modality is used to obtain an first image having a first characteristic and a second image having a second characteristic, where the first characteristic is different from the second characteristic.
  • CT may be used to obtain multiple CT scans for a subject.
  • the CT scan may include a first CT scan such as a low resolution CT scan and a second CT scan such as a high resolution CT (HRCT) scan.
  • MRI may be used to obtain diffusion tensor parametric maps for a subject.
  • the diffusion tensor parametric maps may include a first MD map such as low resolution MD map and a second MD map such as a high resolution MD map.
  • the imaging modalities of the derived contrast mechanism are different such as PAT and ultrasound.
  • the PAT may be used to obtain an first type of image having a first characteristic and the ultrasound may be used to obtain a second type of image having a second characteristic, where the first type of image and the first characteristic are different from the second type of image and the second characteristic.
  • Kidney Segmentation (i) FA measurement map for object detection (contrast generated by fractional anisotropy); and (ii) MD measurement map (contrast generated by mean diffusivity) or T2-weighted anatomical image (contrast generated from signal relaxation times after excitation) for object segmentation.
  • Liver Segmentation (i) single echo T2 image for object detection (contrast generated from single-shot echo-planar imaging of signal relaxation times after excitation); and (ii) echo enhanced or T2-weighted anatomical image (contrast generated from a low flip angle, long echo time, and long repetition time used to accentuate the signal relaxation times after excitation) for object segmentation.
  • Liver Segmentation (i) MD measurement map (contrast generated by mean diffusivity) for object detection, and (ii) high resolution MD measurement map (high resolution and contrast generated by mean diffusivity), T2 -weighted anatomical image (contrast generated from signal relaxation times after excitation) or PD (contrast generated from water concentration) for object segmentation.
  • Lung and Liver Tumor Segmentation (i) CT scan (low resolution) for object detection of the lung or liver, and (ii) CT scan (HRCT) for object segmentation of the tumor(s).
  • Trabecular Bone (i) CT scan (low resolution) for object detection of the trabecular space (non-cortical bone), and (ii) CT scan (HRCT) for object segmentation of the trabeculae.
  • Tumor or Organ Detection (i) PET high contrast/low resolution (contrast generated by radiotracer measurements) for object detection, and (ii) PET-CT or PET-MR high contrast/high resolution (contrast generated by radiotracer measurements) for object segmentation.
  • Tumor or Organ Detection (i) PAT (contrast generated by optical absorption) for object detection, and (ii) Ultrasound (contrast generated from echo return distance between the transducer and the tissue boundary) for object segmentation.
  • imaging modalities e.g., fluoroscopy, magnetic resonance angiography (MRA), and mammography
  • MRA magnetic resonance angiography
  • mammography e.g., mammography
  • the parameters of any of these imaging modalities can be modified (e.g., different tracers, angle configurations, wave lengths, etc.) to capture different structures or regions of the body, and one or more of these types of modified imaging techniques may be combined with one or more other imaging techniques for implementing various derived contrast mechanisms in accordance with aspects of the present disclosure.
  • a first part of the segmenting pertains to a first vision model constructed to perform localization (object detection) of classes within a first image (e.g., a diffusion tensor parametric map or a CT image). These classes are “semantically interpretable” and correspond to real-world categories such as the liver, the kidney, the heart, and the like.
  • the localization is executed using EM, you only look once (YOLO) or (YOLOv2) or (YOLOv3), or similar object detection algorithms, which is initialized with a standard clustering technique (e.g., k-means clustering technique, Otsu’s method, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique, mini-batch K-means technique, or the like) heuristically.
  • the initialization is used to provide the initial estimate of the parameters of the likelihood model for each class.
  • Expectation maximization is an iterative process to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in one or more statistical models.
  • the EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log- likelihood found on the E step.
  • E expectation
  • M maximization
  • These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
  • the result of the localization is boundary boxes or segmentation masks around each object with a probability for each class.
  • the general location of an object of interest e.g., the kidney
  • the boundary box or segmentation mask for the object of interest is projected in a slice direction (axial, coronal, and sagittal) onto a second image (e.g., a diffusion tensor parametric map or a CT image).
  • a second image e.g., a diffusion tensor parametric map or a CT image.
  • the boundaries of the projected segmentation mask are used to define a bounding box in the second image around the general location of the object of interest (e.g., the kidney) (a rectangular box drawn completely around a pixel wise mask associated with the object of interest).
  • the bounding box (determined via localization or defined based on boundaries of the projected segmentation mask) is enlarged by a predetermined number of pixels on all sides to ensure coverage of the object of interest.
  • the area within the bounding box is then cropped from the second image to obtain apportion of the second image having the object of interest, and the portion of the second images is used as input into a second vision model to segment the object of interest.
  • the second part of the of the segmenting pertains to a second vision model (a deep learning neural network) constructed with a weighted loss function (e.g., a Dice loss) to overcome the unbalanced nature between the object of interest and the background, and thus focus training on evaluating segmenting the object of interest.
  • a weighted loss function e.g., a Dice loss
  • the second vision model may be trained using an augmented data set such that the deep learning neural network is capable of being trained on a limited set of medical images.
  • the trained second vision model takes as input the cropped portion of the second image and outputs the portion of the second image with an estimated segmentation boundary around the object of interest.
  • the estimated segmentation boundary may be used to calculate a volume, surface area, axial dimensions, largest axial dimension, or other size-related metrics of the object of interest. Any one or more of these metrics may, in turn, be used alone or in conjunction with other factors to determine a diagnosis and/or a prognosis of a subject.
  • FIG. 1 illustrates an example computing environment 100 (i.e., a data processing system) for segmenting instances of an object of interest within images using a multi-stage segmentation network according to various embodiments.
  • the segmenting performed by the computing environment 100 in this example includes several stages: an image acquisition stage 105, a model training stage 110, an object detection stage 115, a segmentation stage 120, and an analysis stage 125.
  • the image acquisition stage 110 includes one or more imaging systems 130 (e.g., an MRI imaging system) for obtaining images 135 (e.g., MR images) of various parts of a subject.
  • the imaging systems 130 are configured to use one or more radiological imaging techniques such as x-ray radiography, fluoroscopy, MRI, ultrasound, nuclear medicine functional imaging (e.g., PET), thermography, CT, mammography and the like to obtain the images 135.
  • the imaging systems 130 are able to determine the difference between various structures and functions within the subject based on characteristics (e.g., brightness, contrast, and spatial resolution) associated with each of the imaging systems 130 and generate a series of two-dimensional images.
  • the two-dimensional images can be digitally “stacked” together by computer analysis to reconstruct a three-dimensional image of the subject or a portion of the subject.
  • the two-dimensional images and/or the reconstructed three-dimensional images 135 allow for easier identification and location of basic structures (e.g., organs) as well as possible tumors or abnormalities.
  • Each two-dimensional image and/or the reconstructed three-dimensional image 135 may correspond to a session time and a subject and depict an interior region of the subject.
  • Each two-dimensional image and/or the reconstructed three- dimensional image 135 may further be of a standardized size, resolution, and/or magnification.
  • the one or more imaging systems 130 include a DI system (e.g., an MRI system with special software) configured to apply supplemental MR gradients during the image acquisition.
  • a DI system e.g., an MRI system with special software
  • the motion of protons during the application of these gradients affects the signal in the images, thereby, providing information on molecular diffusion.
  • a DTI matrix is obtained from a series of diffusion-weighted images in various gradient directions.
  • the three diffusivity parameters or eigenvalues (l ⁇ , l2, and l3), are generated by matrix diagonalization.
  • the diffusivities are scalar indices describing water diffusion in a specific voxels (the smallest volumetric elements in the image) associated with the geometry of tissue.
  • DTI properties or indices represented by these maps may include (but are not limited to) molecular diffusion rate (MD map or ADC map), the directional preference of diffusion (FA map), the AD map (diffusion rate along the main axis of diffusion), and RD map (rate of diffusion in the transverse direction).
  • MD map or ADC map molecular diffusion rate
  • FA map directional preference of diffusion
  • AD map diffusion rate along the main axis of diffusion
  • RD map rate of diffusion in the transverse direction.
  • the diffusivities (l ⁇ , l2, and l3) obtained by DTI matrix diagonalization can be delimitated into parallel (l ⁇ ) and perpendicular (l2 and l3) components to the tissue.
  • Fractional anisotropy is an index for the amount of diffusion asymmetry within a voxel, defined in terms of its diffusivities (l ⁇ , l2, and l3).
  • the value of FA varies between 0 and 1.
  • the diffusion ellipsoid is a sphere
  • FA 0.
  • progressive diffusion anisotropy the eigenvalues become more unequal, the ellipsoid becomes more elongated, and the FA 1.
  • ol! >l2, l3, describes the mean diffusion coefficient of water molecules diffusing parallel to a tract within the voxel of interest.
  • radial diffusivity (RD), l-*- o (l2 + l3)/2 can be defined as the magnitude of water diffusion perpendicular to the main eigenvector.
  • the images 135 depict one or more objects of interest.
  • the objects of interest can be any ‘thing’ of interest within the subject such as a region (e.g., an abdominal region), an organ (e.g., the kidney), a lesion/tumor (e.g., a malignant liver tumor or a brain lesion), a metabolic function (e.g., synthesis of plasma protein in the liver), and the like.
  • a region e.g., an abdominal region
  • an organ e.g., the kidney
  • a lesion/tumor e.g., a malignant liver tumor or a brain lesion
  • a metabolic function e.g., synthesis of plasma protein in the liver
  • Each of the multiple images 135 may have a same viewing angle, such that each image 135 depicts a plane that it parallel to other planes depicted in other images 135 corresponding to the subject and object of interest. Each of the multiple images 135 may further correspond to a different distance along a perpendicular axis to the plane. In some instances, the multiple images 135 depicting the object of interest undergo a pre-processing step to align each image and generate a three- dimensional image structure for the object of interest.
  • the images 135 comprise diffusion tensor parametric maps depicting one or more objects of interest of the subject.
  • at least two diffusion tensor parametric maps are generated for the object of interest.
  • a diffusion tensor parametric map may be generated by a DTI system and describe a rate and/or direction of diffusion of water molecules in order to provide further context for the object of interest. More than one diffusion tensor parametric map may be generated, such that each diffusion tensor parametric map corresponds to a different direction.
  • diffusion tensor parametric maps may include an image depicting a FA, an image depicting a MD, an image depicting an AD, and/or an image depicting a RD.
  • Each of the diffusion tensor parametric maps may additionally have a viewing angle depicting a same plane and a same distance along a perpendicular axis of the plane as a corresponding MR image, such that each MR image depicting a virtual “slice” of an object of interest has a corresponding diffusion tensor image depicting a same virtual “slice” of the object of interest.
  • the model training stage 110 builds and trains one or more models 140a-140n (‘n’ represents any natural number)(which may be referred to herein individually as a model 140 or collectively as the models 140) to be used by the other stages.
  • the model 140 can be a machine-learning (“ML”) model, such as a convolutional neural network (“CNN”), e.g. an inception neural network, a residual neural network (“Resnet”), a U-Net, a V-Net, a single shot multibox detector (“SSD”) network, or a recurrent neural network (“RNN”), e.g., long short-term memory (“LSTM”) models or gated recurrent units (“GRUs”) models, or any combination thereof.
  • CNN convolutional neural network
  • Resnet residual neural network
  • U-Net e.g. an inception neural network
  • V-Net e.g. an inception neural network
  • V-Net e.g. an inception neural network
  • SSD single shot multibox detector
  • the model 140 can also be any other suitable ML model trained in object detection and/or segmentation from images, such as a three-dimensional CNN (“3DCNN”), a dynamic time warping (“DTW”) technique, a hidden Markov model (“HMM”), etc., or combinations of one or more of such techniques — e.g., CNN-HMM or MCNN (Multi-Scale Convolutional Neural Network).
  • the computing environment 100 may employ the same type of model or different types of models for segmenting instances of an object of interest.
  • model 140 is constructed with weighted loss function, which compensates for the imbalanced nature within each image between the large field-of- view or background and the small foreground object of interest, as described in further detail herein.
  • samples 145 are generated by acquiring digital images, splitting the images into a subset of images 145a for training (e.g., 90%) and a subset of images 145b for validation (e.g., 10%), preprocessing the subset of images 145a and the subset of images 145b, augmenting the subset of images 145a, and in some instances annotating the subset of images 145a with labels 150.
  • the subset of images 145a are acquired from one or more imaging modalities (e.g., MRI and CT).
  • the subset of images 145a are acquired from a data storage structure such as a database, an image system (e.g., one or more imaging systems 130), or the like associated with the one or more imaging modalities.
  • a data storage structure such as a database, an image system (e.g., one or more imaging systems 130), or the like associated with the one or more imaging modalities.
  • Each image depicts one or more objects of interest such as a cephalic region, a chest region, an abdominal region, a pelvic region, a spleen, a liver, a kidney, a brain, a tumor, a lesion, or the like.
  • the splitting may be performed randomly (e.g., a 90/10% or 70/30%) or the splitting may be performed in accordance with a more complex validation technique such as K-Fold Cross-Validation, Leave-one-out Cross-Validation, Leave-one-group-out Cross- Validation, Nested Cross-Validation, or the like to minimize sampling bias and overfitting.
  • the preprocessing may comprise cropping the images such that each image only contains a single object of interest. In some instances, the preprocessing may further comprise standardization or normalization to put all features on a same scale (e.g., a same size scale or a same color scale or color saturation scale).
  • the images are resized with a minimum size (width or height) of predetermined pixels (e.g., 2500 pixels) or with a maximum size (width or height) of predetermined pixels (e.g., 3000 pixels) and kept with the original aspect ratio.
  • a minimum size (width or height) of predetermined pixels e.g., 2500 pixels
  • a maximum size (width or height) of predetermined pixels e.g., 3000 pixels
  • Augmentation can be used to artificially expand the size of the subset of images 145a by creating modified versions of images in the datasets.
  • Image data augmentation may be performed by creating transformed versions of images in the datasets that belong to the same class as the original image.
  • Transforms include a range of operations from the field of image manipulation, such as shifts, flips, zooms, and the like. In some instances, the operations include random erasing, shifting, brightness, rotation, Gaussian blurring, and/or elastic transformation to ensure that the model 140 is able to perform under circumstances outside those available from the subset of images 145a.
  • Augmentation can additionally or alternatively be used to artificially expand a number of images in the datasets that the model 140 can take as input during training.
  • a training data set i.e., the subset of images 145a
  • the first set of images corresponding to a region of one or more subjects are histogram matched to the second set of images corresponding to a different region of the same or different subjects within the training data set.
  • histograms of images corresponding to a cephalic region may be processed as the reference histogram, and then histograms of images corresponding to an abdominal region are matched with the reference histograms.
  • Histogram matching can be based upon pixel intensity, pixel color, and/or pixel luminance, such that processing of a histogram entails matching the pixel intensity, pixel color, and/or pixel luminance of the histogram to the pixel intensity, pixel color, and/or pixel luminance of a reference histogram.
  • Annotation can be performed manually by one or more humans (annotators such as a radiologists or pathologists) confirming the presence of one or more objects of interest in each image of the subset of images 145a and providing labels 150 to the one or more objects of interest, for example, drawing a bounding box (a ground truth) or segmentation boundary, using annotation software, around the area confirmed by the human to include the one or more objects of interest.
  • annotation data may further indicate a type of an object of interest. For example, if an object of interest is a tumor or lesion, then annotation data may indicate a type of tumor or lesion, such as a tumor or lesion in a liver, a lung, a pancreas, and/or a kidney.
  • a subset of images 145 may be transmitted to an annotator device 155 to be included within a training data set (i.e., the subset of images 145a).
  • Input may be provided (e.g., by a radiologist) to the annotator device 155 using (for example) a mouse, track pad, stylus and/or keyboard that indicates (for example) whether the image depicts an object of interest (e.g., a lesion, an organ, etc.); a number of objects of interest depicted within the image; and a perimeter (bounding box or segmentation boundary) of each depicted object of interest within the image.
  • Annotator device 155 may be configured to use the provided input to generate labels 150 for each image.
  • the labels 150 may include a number of objects of interest depicted within an image; a type classification for each depicted objects of interest; a number of each depicted object of interest of a particular type; and a perimeter and/or mask of one or more identified objects of interest within an image. In some instances, labels 150 may further include a perimeter and/or mask of one or more identified objects of interest overlaid onto a first type of image and a second type of image.
  • the training process for model 140 includes selecting hyperparameters for the model 140 and performing iterative operations of inputting images from the subset of images 145a into the model 140 to find a set of model parameters (e.g., weights and/or biases) that minimizes a loss or error function for the model 140.
  • the hyperparameters are settings that can be tuned or optimized to control the behavior of the model 140.
  • Most models explicitly define hyperparameters that control different aspects of the models such as memory or cost of execution.
  • additional hyperparameters may be defined to adapt a model to a specific scenario.
  • the hyperparameters may include the number of hidden units of a model, the learning rate of a model, the convolution kernel width, or the number of kernels for a model.
  • the number of model parameters are reduced per convolutional and deconvolutional layer and/or the number of kernels are reduced per convolutional and deconvolutional layer by one half as compared to typical CNNs, as described in detail herein.
  • Each iteration of training can involve finding a set of model parameters for the model 140 (configured with a defined set of hyperparameters) so that the value of the loss or error function using the set of model parameters is smaller than the value of the loss or error function using a different set of model parameters in a previous iteration.
  • the loss or error function can be constructed to measure the difference between the outputs inferred using the models 140 (in some instances, the segmentation boundary around one or more instances of an object of interest is measured with a Dice similarity coefficient) and the ground truth segmentation boundary annotated to the images using the labels 150.
  • the model 140 has been trained and can be validated using the subset of images 145b (testing or validation data set).
  • the validation process includes iterative operations of inputting images from the subset of images 145b into the model 140 using a validation technique such as K-Fold Cross-Validation, Leave-one-out Cross-Validation, Leave-one-group-out Cross-Validation, Nested Cross- Validation, or the like to tune the hyperparameters and ultimately find the optimal set of hyperparameters.
  • a reserved test set of images from the subset of images 145b are input into the model 145 to obtain output (in this example, the segmentation boundary around one or more objects of interest), and the output is evaluated versus ground truth segmentation boundaries using correlation techniques such as Bland- Altman method and the Spearman’s rank correlation coefficients and calculating performance metrics such as the error, accuracy, precision, recall, receiver operating characteristic curve (ROC), etc.
  • output in this example, the segmentation boundary around one or more objects of interest
  • the model may be trained and hyperparameters may be tuned on images from the subset of images 145a and the images from the subset of images 145b may only be used for testing and evaluating performance of the model.
  • the training mechanisms described herein focus on training a new model 140.
  • These training mechanisms can also be utilized to fine tune existing models 140 trained from other datasets.
  • a model 140 might have been pre-trained using images of other objects or biological structures or from sections from other subjects or studies (e.g., human trials or murine experiments). In those cases, the models 140 can be used for transfer learning and retrained/validated using the images 135.
  • the model training stage 110 outputs trained models including one or more trained object detection models 160 and one or more trained segmentation models 165.
  • a first image 135 is obtained by a localization controller 170 within the object detection stage 115.
  • the first image 135 depicts an object of interest.
  • the first image is a diffusion tensor parametric map having a first characteristic such as FA or MD contrast.
  • the first image is an MR image having a first characteristic such as a single echo T2 contrast or a T2-weighted anatomical contrast.
  • the first image 135 is a CT image having a first characteristic such as a low resolution or a high resolution.
  • the first image 135 is a CT image having a first characteristic such as agent contrast.
  • the first image 135 is a PET image having a first characteristic such as radiotracer contrast or low resolution. In other instances, the first image 135 is a PET- CT image having a first characteristic such as radiotracer contrast or high resolution. In other instances, the first image 135 is a PET-MR image having a first characteristic such as radiotracer contrast or high resolution. In other instances, the first image 135 is a PAT image having a first characteristic such as optical absorption contrast. In other instances, the first image 135 is a ultrasound image having a first characteristic such as echo or transducer to the tissue boundary distance.
  • the localization controller 170 includes processes for localizing, using the one or more object detection models 160, an object of interest within the image 135.
  • the localizing includes: (i) locating and classifying, using object detection models 160, objects within the first image having the first characteristic into a plurality of object classes, where the classifying assigns sets of pixels or voxels of the first image into one or more of the plurality of object classes; and (ii) determining, using the object detection models 160, a bounding box or segmentation mask for the object of interest within the first image based on sets of pixels or voxels assigned with an object class of the plurality of object classes.
  • the object detection models 160 utilize one or more object detection algorithms in order to extract statistical features used to locate and label objects within the first image and predict a bounding box or segmentations mask for the object of interest.
  • the localizing is executed using EM, YOLO, YOLOv2,
  • YOLOv3, or similar object detection algorithms which is initialized with a standard clustering technique (e.g., K-means or Otsu’s method) heuristically.
  • the initialization is used to provide the initial estimate of the parameters of the likelihood model for each class. For example in the instance of using EM with a K-means clustering technique, given a fixed number of k clusters, observations are assigned to the k clusters so that the means across clusters (for all variables) are as different from each other as possible.
  • the EM clustering technique then computes posterior probabilities of cluster memberships and cluster boundaries based on one or more prior probability distributions parametrized with the initial estimate of the parameters for each cluster (class).
  • the goal of the EM clustering technique then is to maximize the overall probability or likelihood of the data, given the (final) clusters.
  • the results of the EM clustering technique are different from those computed by K-means clustering technique.
  • the K-means clustering technique will assign observations (pixels or voxels such as pixel or voxel intensities) to clusters to maximize the distances between clusters.
  • the EM clustering technique does not compute actual assignments of observations to clusters, but classification probabilities. In other words, each observation belongs to each cluster with a certain probability. Thereafter, observations may be assigned, by the localization controller 170, to clusters based on the (largest) classification probability.
  • the result of the localization is bounding boxes or segmentation masks with a probability for each class.
  • the general location of an object of interest e.g., the kidney
  • the general location of an object of interest is isolated based on sets of pixels or voxels assigned with an object class associated with the object of interest.
  • the bounding box or segmentation mask for the object of interest are availed to a map processing controller 175 within the object detection stage 115.
  • a second image 135 is obtained by a map processing controller 175 within the object detection stage 115.
  • the second image 135 depicts the same object of interest depicted in the first image 135.
  • the second image is a diffusion tensor parametric map having a second characteristic such as FA or MD contrast.
  • the second image is an MR image having a second characteristic such as a single echo T2 contrast or a T2-weighted anatomical contrast.
  • the second image 135 is a CT image having a second characteristic such as a low resolution or a high resolution.
  • the second image 135 is a CT image having a second characteristic such as agent contrast.
  • the second image 135 is a PET image having a second characteristic such as radiotracer contrast or low resolution.
  • the second image 135 is a PET-CT image having a second characteristic such as radiotracer contrast or high resolution.
  • the second image 135 is a PET-MR image having a second characteristic such as radiotracer contrast or high resolution.
  • the second image 135 is a PAT image having a second characteristic such as optical absorption contrast.
  • the second image 135 is a ultrasound image having a second characteristic such as echo or transducer to the tissue boundary distance.
  • Map processing controller 170 includes processes for overlaying the bounding box or segmentation mask corresponding to the detected object of interest from the first image onto the same of object of interest as depicted in the second image.
  • the segmentation mask is projected onto the second image (e.g., two-dimensional slices of the second image) such that the boundaries of the segmentation mask can be used to define a rectangular bounding box enclosing a region of interest corresponding to the object of interest within the second image.
  • the bounding box includes additional padding (e.g., padding of 5 pixels, 10 pixels, 15 pixels, etc.) to each edge of a perimeter of the segmentation mask in order to ensure an entirety of the region of interest is enclosed.
  • Map processing controller 175 further includes processes configured to crop the second image such that only a cropped portion 180 corresponding to the bounding box is depicted.
  • a cropped portion of the second image is generated for each bounding box.
  • each cropped portion is further be resized (e.g., with additional padding) in order to maintain a uniform size.
  • the cropped portion(s) 180 of the second image is transmitted to a segmentation controller 185 within the segmentation stage 120.
  • the segmentation controller 185 includes processes for segmenting, using the one or more segmentation models 165, the object of interest within the cropped portion(s) 180 of the second image.
  • the segmenting includes generating, using the one or more segmentation models 165, an estimated segmentation boundary around the object of interest; and outputting, using the one or more segmentation models 165, the cropped portion(s) of the second image with the estimated segmentation boundary 190 around the object of interest.
  • Segmentation may include assessing variations in pixel or voxel intensities for each cropped portion to identify a set of edges and/or contours corresponding to an object of interest.
  • the one or more segmentation models 165 Upon identifying the set of edges and/or contours, the one or more segmentation models 165 generate an estimated segmentation boundary 190 for the object of interest.
  • the estimated segmentation boundary 190 corresponds to a three-dimensional representation of the object of interest.
  • the segmenting further includes determining a probability score of the object of interest being present in the estimated segmentation boundary 190, and outputting the probability score with the estimated segmentation boundary 190.
  • the cropped portion(s) of the second image with the estimated segmentation boundary 190 around the object of interest (and optional probability score) may be transmitted to an analysis controller 195 within the analysis stage 125.
  • the analysis controller 195 includes processes for obtaining or receiving the cropped portion(s) of the second image with the estimated segmentation boundary 190 around the object of interest (and optional probability score) and determining analysis results 197 based on the estimated segmentation boundary 190 around the object of interest (and optional probability score).
  • the analysis controller 195 may further includes processes for determining a size, axial dimensions, a surface area, and/or a volume of the object of interest based on the estimated segmentation boundary 190 around the object of interest.
  • the estimated segmentation boundary 190 or derivations thereof e.g., size, axial dimensions, volume of the object of interest, etc.
  • the estimated segmentation boundary 190 for the object of interest is compared to an estimated segmentation boundary 190 for the same object of interest imaged at a previous time point in order to determine a treatment efficacy for a subject.
  • estimated segmentation boundaries 190 of lesions for a subject may provide information regarding a type of cancer (e.g., a location of lesion), a metastasis progression (e.g., if a number of lesions increase and/or if a number of locations of lesion(s) increase for the subject), and a drug efficacy (e.g., whether a number, size, and/or volume of lesion(s) increase or decrease.
  • a type of cancer e.g., a location of lesion
  • a metastasis progression e.g., if a number of lesions increase and/or if a number of locations of lesion(s) increase for the subject
  • a drug efficacy e.g., whether a number, size, and/or volume of lesion(s) increase or decrease.
  • the computing environment 100 may further include a developer device associated with a developer. Communications from a developer device to components of the computing environment 100 may indicate what types of input images are to be used for the models, a number and type of models to be used, hyperparameters of each model, for example, learning rate and number of hidden layers, how data requests are to be formatted, which training data is to be used (e.g., and how to gain access to the training data) and which validation technique is to be used, and/or how the controller processes are to be configured.
  • a developer device associated with a developer. Communications from a developer device to components of the computing environment 100 may indicate what types of input images are to be used for the models, a number and type of models to be used, hyperparameters of each model, for example, learning rate and number of hidden layers, how data requests are to be formatted, which training data is to be used (e.g., and how to gain access to the training data) and which validation technique is to be used, and/or how the controller processes are to be configured.
  • the second part of the segmenting pertains to a second vision model (e.g., a deep learning neural network) constructed with a weighted loss function (e.g., a Dice loss).
  • the deep learning neural network is trained on images of one or more objects of interest from subjects.
  • the images are generated from one or more medical imaging modalities.
  • data sets of images generated from some medical imaging modalities can be sparse.
  • the images of the training data are augmented to artificially increase the number and variety of images within the data sets. More specifically, the augmentation may be performed by performing histogram matching to simulate other contrasts within other regions of a same or different subject (e.g., regions of a subject where the object of interest may not be found) and increase variance of the training dataset.
  • each image of a training set or subset of images from a training set may be histogram matched with one or more reference images to generate a new set of images to artificially increase the training dataset size in a so called data augmentation process which reduces overfitting.
  • each image of the training set (left image) may be histogram matched with a reference image (center image) to generate new set of images (right image).
  • Histogram matching is the transformation of the original image so that its histogram matches a reference histogram.
  • Histogram matching is performed by first equalizing both original and reference histogram using histogram equalization (e.g., stretches the histogram to fill the dynamic range and at the same time tries to keep the histogram uniform), and then mapping from the original to the reference histogram based on the equalized image and a transformation function. For example, suppose a pixel intensity value 20 in the original image gets mapped to 35 in the equalized image and suppose a pixel intensity value 55 in the reference image gets mapped to 35 in the equalized image, then it is determinable that a pixel intensity value 20 in the original image should be mapped to a pixel intensity value 55 in the reference image. The mapping from original to equalized to the reference image may then be used to transform the original image into the new image.
  • histogram equalization e.g., stretches the histogram to fill the dynamic range and at the same time tries to keep the histogram uniform
  • one or both data sets can be further augmented using standard techniques such as rotation and flipping (e.g., rotate each image 90°, flipped left-to- right, flipped up-to-down, and the like) to further increase the number and variety images available for training.
  • rotation and flipping e.g., rotate each image 90°, flipped left-to- right, flipped up-to-down, and the like
  • the benefit of using histogram matching based data augmentation is: (i) the technique takes advantage of another dataset from the same species/instrument/image intensity etc.; (ii) the mask(label) corresponding to the histogram matched images are exactly the same as the original images in the training set; (iii) the number of images in the training set is multiplied by the number of images that are uses as the reference; and (iv) the variance in the training set increases and as such the structure of the images of the training set is preserved; whereas the intensity of the pixels are changing which makes the segmentation framework independent from the pixel intensity and dependent on the image and the structure of the object of interest.
  • a modified 3D U-Net 300 extracts features from input images (e.g., cropped portion(s) of the second image) individually, detected an object of interest within the input images, generates a three-dimensional segmentation mask around the shape of the object of interest, and outputs the input images with the three-dimensional segmentation mask around the shape of the object of interest.
  • the 3D U-Net 300 includes a contracting path 305 and an expansive path 310, which gives it a u- shaped architecture.
  • the contracting path 305 is a CNN network that includes repeated application of convolutions (e.g., 3x3x3 convolutions (unpadded convolutions)), each followed by a rectified linear unit (ReLU) and a max pooling operation (e.g., a 2x2x2 max pooling with stride 2 in each direction) for downsampling.
  • the input for a convolutional operation is a three-dimensional volume (i.e., the input images of size n x n x channels, where n is a number of input features) and a set of ‘k’ filters (also called as kernels or feature extractors) each one of size (f x f x f channels, where f is any number, for example, 3 or 5).
  • the output of a convolutional operation is also a three-dimensional volume (also called an output image or feature map) of size (m x m x k, where M is a number of output features and k is the convolutional kernel size).
  • Each block 315 of a contraction path 315 includes one or more convolutional layers (denoted by gray horizontal arrows), and the number of feature channels changes, e.g., from 1 64 (e.g., in the first process depending on the starting number of channels), as convolution processes will increase the depth of the input image.
  • the gray arrow pointing down between each block 315 is the max pooling process which halves down the size of the input image.
  • the number of feature channels may be doubled.
  • the spatial information of the image data is reduced while feature information is increased.
  • the image after block 320 has been resized to e.g., 28x28x1024 (this size is merely illustrative and the size at the end of process 320 could be different depending on the starting size of the input image - size n x n x channels).
  • the expansive path 310 is a CNN network that combines the feature and spatial information from the contracting path 305 (upsampling of the feature map from the contracting path 305).
  • the output of three-dimensional segmentation is not just a class label or bounding box parameters. Instead, the output (the three-dimensional segmentation mask) is a complete image (e.g., a high resolution image) in which all the voxels are classified. If a regular convolutional network with pooling layers and dense layers was used, the CNN network would lose the “where” information and only retain the “what” information which is not acceptable for image segmentation. In the instance of image segmentation, both “what” as well as “where” information are used.
  • the image is upsampled to convert a low resolution image to a high resolution image to recover the “where” information.
  • Transposed convolution represented by the white arrow pointing up is an exemplary upsampling technic that may be used in the expansive path 310 for upsampling of the feature map and expanding the size of images.
  • the image is upsized, e.g., from 28x28x1024 56x56x512 via up-convolution (upsampling operators) of 2 x 2 x 2 by strides of two in each dimension, and then, the image is concatenated with the corresponding image from the contracting path (see the horizontal gray bar 330 from the contracting path 305) and together makes an image of e.g., size 56x56x1024.
  • the reason for the concatenation is to combine the information from the previous layers (i.e., the high-resolution features from the contracting path 305 are combined with the upsampled output from the expansive path 310) in order to get a more precise prediction.
  • This process continues as a sequence of up- convolutions that halves the number of channels, concatenations with a correspondingly cropped feature map from the contracting path 305, repeated application of convolutions (e.g., two 3x3x3 convolutions) that are each followed by a rectified linear unit (ReLU), and a final convolution in block 335 (e.g., one lxlxl convolution) to generate a multi-channel segmentation as a three-dimensional segmentation mask.
  • convolutions e.g., two 3x3x3 convolutions
  • ReLU rectified linear unit
  • the U-Net 300 uses the valid part of each convolution without any fully connected layers, i.e., the segmentation map only contains the voxels for which the full context is available in the input image, and uses skip connections that link the context features learned during a contracting block and the localization features learned in an expansion block.
  • the output and the ground truth labels are typically compared using a softmax function with cross-entropy loss. While these networks demonstrate improved segmentation performance over traditional CNNs, they do not immediately translate to small foreground objects, small sample sizes, and anisotropic resolution in medical imaging datasets.
  • the 3D U-Net 300 is constructed to include a reduced number of parameters and/or kernels relative to conventional 3D U-Nets (total of 2,784 kernels and 19,069,955 leamable parameters - see, e.g., O. (3 ⁇ 4ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O.
  • the 3D U-Net 300 has 9,534,978 leamable parameters as compared to a conventional 3D U-Net with 19,069,955 leamable parameters.
  • the total number of kernels is reduced from 2,784 in the conventional 3D U-Net to 1,392 kernels in the 3D U-Net 300.
  • a total number of the leamable parameters for the 3D U-Net 300 is reduced to between 5,000,000 and 12,000,000 leamable parameters.
  • a total number of the kernels for the 3D U-Net 300 is reduced to between 800 and 1,700 kernels. Reduction of parameters and/or kernels is advantageous as it enables the model to handle a smaller sample size (i.e., cropped portion(s) of the second DTI parametric map) more efficiently.
  • the 3D U-Net 300 is constructed for volumetric segmentation using a weighted loss function.
  • the metric used for evaluating segmentation performance was the Dice similarity coefficient (DSC, Equation 1). Therefore, to train 3D U- Net 300 with the objective of maximizing the DSC, the DSC was minimized for all the images (Equation 2).
  • Dice loss (Equation 3) where weights for the frequently seen background are reduced and weights for the object of interest in the foreground are increased to reach a balanced influence of foreground and background voxels on the loss.
  • N is the number of images
  • pi represents a predicted mask
  • qi represents a ground truth mask corresponding to a target object.
  • the 3D U-Net 300 does not have to be incorporated into the overall computing environment 100 described with respect to FIG. 1 in order to implement object segmentation in accordance with aspects of the present disclosure.
  • various types of models can be used for object segmentation (e.g., a CNN, Resnet, a typical U-Net, a V-Net, a SSD network, or a recurrent neural network RNN, etc.) so long as the type of models can be learned for object segmentation of medical images.
  • FIG. 4 illustrates a flowchart for an exemplary process 400 for using the described multi-stage segmentation network to segment instances of an object of interest.
  • Process 400 may be performed using one or more computing systems, models, and networks, as described in Section IV with respect to FIGS. 1-3.
  • Process 400 begins at block 405 where medical images are acquired of a subject.
  • the medical images may depict a cephalic region, a chest region, an abdominal region, a pelvic region, and/or a region corresponding to a limb of the subject.
  • the medical images are generated using one or more medical imaging modalities.
  • a user may operate one or more imaging systems that use the one or more medical imaging modalities to generate the medical images, as discussed in Section IV with respect to FIG. 1.
  • medical images of the subject are obtained.
  • the medical images acquired in step 405 may be retrieved from a data storage device or the one or more medical imaging systems.
  • the medical images include a first image having a first characteristic and a second image having a second characteristic.
  • the images are DTI parametric maps comprising a first measurement map (a first image having a first characteristic) and a second measurement map (a second image having a second characteristic). The first measurement map is different from the second measurement map.
  • the DTI parametric maps are generated by applying supplemental MR gradients during acquisition of the MR image.
  • a user may input parameters for one or more diffusion gradients into an imaging system, and DTI parametric maps are generated by applying supplemental MR gradients during acquisition of the MR image based on the parameters for one or more diffusion gradients (the motion of protons during the application of the gradients affects the signal in the image).
  • a diffusion gradient is applied in more than one direction during acquisition of the MR image.
  • the first measurement map is a fractional anisotropy map and the second measurement map is a mean diffusivity map.
  • objects within the first image are located and classified using a localization model.
  • the classifying assigns sets of pixels or voxels of the first image into one or more of the plurality of object classes.
  • Object classes may include a class corresponding to an object of interest (e.g., depending on a type of object of interest), one or more classes corresponding to different biological structures, one or more classes corresponding to different organs, and/or one or more classes corresponding to different tissues. For example, if an object of interest is a lesion, object classes may be defined for identifying lesions, blood vessels, and/or organs.
  • the locating and classifying may be performed by the localization model using one or more clustering algorithms that assigns sets of pixels or voxels into one or more object classes of the plurality of object classes.
  • the one or more clustering algorithms include a k-means algorithm that assigns observations to clusters associated with the plurality of object classes.
  • the one or more clustering algorithms further include an expectation maximization algorithm that computes probabilities of cluster memberships based on one or more probability distributions.
  • the k-means algorithm may be used to initialize the expectation maximization algorithm by estimating initial parameters for each object class of the plurality of object classes.
  • a bounding box or segmentation mask is determined for an object of interest in the first image using the localization model.
  • the bounding box or segmentation mask is determined for an object of interest based on sets of pixels or voxels assigned with an object class of the plurality of object classes.
  • a seed location of the object of interest is identified using the set of pixels assigned with the object class corresponding to the object of interest.
  • the identified seed location is projected towards a z-axis in order to grow the seed location and determine the segmentation mask.
  • the z-axis represents depth and the seed location is grown to fill the entire volume of the object mask in the third or final dimension.
  • a morphological closing and filling is additionally performed on the segmentation mask.
  • the bounding box or the segmentation mask is transferred onto the second image to define a portion of the second image comprising the object of interest.
  • Transferring the object mask includes projecting the bounding box or the segmentation mask in a slice direction onto a corresponding region of the image (portion of the second image comprising the object of interest) and/or overlaying the bounding box or the segmentation mask onto a corresponding region of the second image (portion of the second image comprising the object of interest).
  • the segmentation mask is projected onto two-dimensional slices of the second image, such that the boundaries of the segmentation mask within a two-dimensional space can be used to define a rectangular bounding box enclosing a region of interest corresponding to the detected object of interest.
  • the bounding box includes additional padding (e.g., padding of 5 pixels, 10 pixels, 15 pixels, etc.) to each edge of a perimeter of the segmentation mask in order to ensure an entirety of the region of interest is enclosed.
  • the second image is cropped based on the bounding box or segmentation mask plus an optional margin to generate the portion of the second image. Each cropped portion may further be resized (e.g., with additional padding) in order to maintain a uniform size.
  • the portion of the second image is transmitted to a deep super resolution neural network for preprocessing.
  • the deep super resolution neural network may be (for example) a convolutional neural network, a residual neural network, an attention- based neural network, and/or a recursive convolutional neural network.
  • the deep super resolution neural network processes the transmitted portion of the second image in order to improve image spatial resolution (e.g., an enlargement and/or a refining of image details) of the portion of the second image.
  • the portion of the second image is input into a three-dimensional neural network model constructed for volumetric segmentation using a weighted loss function (e.g., a modified 3D U-Net model).
  • the weighted loss function is a weighted Dice loss function.
  • the three-dimensional neural network model comprises a plurality of parameters trained using a set of training data. A number of the plurality of model parameters may be reduced relative to a standard three-dimensional U-Net architecture.
  • the set of training data may comprise: a plurality of images with annotations associated with segmentation boundaries around objects of interest; and a plurality of additional images with annotations associated with segmentation boundaries around objects of interest.
  • the plurality of additional images are artificially generated by matching image histograms from the plurality of images to image histograms from a plurality of reference maps (e.g., maps obtain from other regions of subjects).
  • the plurality of model parameters are identified using the set of training data based on minimizing the weighted loss function.
  • the three-dimensional neural network model further comprises a plurality of kernels, and the number of kernels may be reduced relative to a standard three-dimensional U-Net architecture.
  • the three-dimensional neural network model segments the portion of the second images.
  • the segmenting includes generating an estimated segmentation boundary for the object of interest using identified features.
  • the segmenting may include assessing features such as variations in pixel intensities for each cropped portion to identify a set of edges and/or contours corresponding to the object of interest, and generating an estimated segmentation boundary for the object of interest using the identified set of edges and/or contours.
  • the estimated segmentation boundary may represent a three-dimensional perimeter of the object of interest.
  • three-dimensional neural network model may also determine a classification of the object of interest.
  • objects corresponding to lesions may be classified based on its type or a location within a subject, such as a lung lesion, liver lesion, and/or a pancreatic lesion.
  • objects corresponding to an organ and/or tissue may be classified as healthy, inflamed, fibrotic, necrotic, and/or cast filled.
  • the portion of the second image with the estimated segmentation boundary around the object of interest is outputted.
  • the portion of the second image is provided.
  • the portion of the second image may be stored in a storage device and/or displayed on a user device.
  • action is taken based on the estimated segmentation boundary around the object of interest.
  • the action includes determining a size, surface area, and/or volume of the object of interest based on the estimated segmentation boundary around the object of interest.
  • the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest is provided.
  • the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest may be stored in a storage device and/or displayed on a user device.
  • a user may receive or obtain (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest.
  • (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest are used for quantifying an image metric such as image intensity.
  • an image metric such as image intensity.
  • SUV uptake value
  • T2, Tl, etc. that correlate with certain image metrics such as image intensity, and thus quantification of the image metric could be used for determining values/metrics such as SUV specific to the object of interest.
  • the action includes determining a diagnosis of the subject using: (i) the portion of the second image with the estimated segmentation boundary around the object of interest, and/or (ii) a size, surface area, and/or volume of the object of interest.
  • the action includes administering, by a user, a treatment with a compound (e.g., to the subject) based on (i) the portion of the second image with the estimated segmentation boundary around the object of interest, (ii) a size, surface area, and/or volume of the object of interest, and/or (iii) the diagnosis of the subject.
  • the action includes determining a treatment plan based on (i) the portion of the second image with the estimated segmentation boundary around the object of interest, (ii) a size, surface area, and/or volume of the object of interest, and/or (iii) the diagnosis of the subject, such that a dosage for a drug may be calculated based on the size, surface area, and/or volume of the object of interest.
  • the action includes determining if a treatment is effective or if a dosage for a drug needs to be adjusted based on a comparison of the a size, surface area, and/or volume corresponding to the object of interest for a first time point to a size, surface area, and/or volume corresponding to the object of interest for a second time point.
  • Kidney segmentation using 3D U-Net localized with Expectation Maximization.
  • Kidney function and activity is highly dependent on kidney volume in a variety of diseases such as polycystic kidney disease, lupus nephritis, renal parenchymal disease, and kidney graft rejection.
  • Automatic evaluation of the kidney through imaging can be used to determine a diagnosis, prognosis, and/or treatment plan for a subject.
  • In vivo imaging modalities offer unique strengths and limitations. MRI, in particular, does not have ionizing radiation, is not operator dependent, and has good tissue contrast that enables kidney segmentation and volume related information.
  • Traditional methods have been used to evaluate the kidney more locally, such as manual tracing, stereology, or general image processing. These methods can be labor intensive or inconsistent. To address these issues, an integrated deep learning model was utilized to segment the kidney.
  • Deep learning segmentation networks have been used for semantic segmentation of large biomedical image datasets. Although these networks offer state-of-the-art performance, they suffer from high computational cost and memory consumption, which limits their field- of-view and depth. Hence, these networks can be particularly problematic for segmenting small objects in limited images typically found in MRI studies. MRI tends to include a large field-of-view or background for preventing aliasing artifacts. When the background represents a significant portion, the network may not be optimally trained to segment the foreground object of interest. Thus, an alternative strategy is needed to reduce the parameters of a large 3D segmentation network, avoid overfitting, and improve network performance.
  • a derived MRI contrast mechanism (use of DTI) was incorporated for the localization step prior to learned segmentation.
  • a 3D U-Net was modified to reduce the number of parameters and incorporated a Dice loss function for the segmentation.
  • augmentation and MRI histogram matching were incorporated to increase the number of training datasets. Additionally, these technique were in some instances applied on super resolved images of the dataset to determine whether enhanced images can improve segmentation performance.
  • MRI was performed on a Bruker 7T (Billerica, MA) with a volume transmit and cryogenic surface receive coil.
  • a custom in vivo holder was constructed with 3D printing (Stratasys Dimension) to provide secure positioning of the brain and spine.
  • Diffusion tensor parametric maps were computed, which include: FA, MD, AD, and RD. FA and MD images were used for the integrated semantic segmentation algorithm.
  • the FA images were used for the localization step.
  • the FA images were segmented using EM, which was initialized with K-means (12 classes) heuristically.
  • K-means (12 classes) heuristically.
  • the general kidney vicinity was isolated using one of the tissue classes and used as the detected object.
  • the 3D U-Net was trained and tested on the MD images inside the detected area. The same detected area was used for the super-resolved images. Since the cropped object had an arbitrary size in the first two dimensions based on the 2D projected mask, all cropped images were resized to 64x64x16 for the original resolution images and resized to 64x64x64 for the super-resolved images.
  • MD images were super resolved in the through-plane direction to improve spatial resolution.
  • the original matrix of 110x110x15 were resolved 5x to give a resultant matrix of 110x110x75.
  • Images were enhanced using a deep super resolution neural network.
  • FIG. 5A shows the six elements of the diffusion tensor.
  • the changing diffusion contrast is most noticeable in the inner and outer medullary regions.
  • the changing contrast is noticeable in the diagonal (D xx , D yy , D zz ) and off-diagonal elements (D xy . D xz . D yz ). Consequently, the contrast does not change in the cortex, thus resulting in a very low FA (FIG. 5B).
  • This low FA allowed the kidney to be segmented from the background.
  • MR images were super resolved in the through-plane direction as shown in FIG. 5C.
  • the improvements are most obvious in the sagittal and coronal directions. In-plane resolution is minimally affected as shown in the axial slice (FIG. 5C).
  • FIG. 6A shows the results of training the 3D U-Net on the MD images, without any preprocessing.
  • the DSC plot shows a uniform distribution with a mean of 0.49.
  • FIG. 6B the abdominal area is detected as foreground with connected component analysis and cropped using the MD images.
  • the DSC plot displays a normal distribution with a mean of 0.52.
  • FIG. 6C shows the results using EM segmentation alone. A mean DSC of 0.65 was achieved.
  • FIG. 6D represents the results of the integrated strategy: first the kidney was detected using EM segmentation on FA images, then the 3D U-Net was trained on the detected kidney area from MD images. The average DSC of this approach was 0.88.
  • the DSC plot of semantic segmentation with super-resolved MD images (FIG. 6E) is fairly similar to semantic segmentation at the original resolution (FIG. 3D). Here, the average DSC was 0.86.
  • Table 1 with additional comparison metrics, such as volume difference (VD) and positive predictive value (PPV)
  • Table 1 mean and standard deviation of segmentation results using DSC, VD, and PPV. The best value for each method is shown in bold.
  • This example demonstrates the integration of EM based localization and 3D U-Net for kidney segmentation.
  • the localization step led to a significantly improved result of the deep learning method. It was also demonstrated that while the EM segmentation lead to improvement in performance of deep learning, the EM segmentation method alone performed poorly.
  • the EM segmentation method isolated the kidney in the central slice, however, it did not preserve the joint representation of kidney volume. Thus, the central slice was used for all slices across the volume as the detected rectangular object.
  • a weighted Dice loss can be significant for the error minimization and balance of the object and background. Without the localization step, however it was found that the performance did not significantly increase with the inclusion of a weighted Dice loss. Consequently, the background contained objects and organs that appeared similar to the kidney that the 3D U-Net alone could not distinguish.
  • Some embodiments of the present disclosure include a system including one or more data processors.
  • the system includes a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
  • Some embodiments of the present disclosure include a computer-program product tangibly embodied in a non- transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Geometry (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Nuclear Medicine (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)
  • Image Analysis (AREA)
PCT/US2020/046239 2019-08-14 2020-08-13 Three dimensional object segmentation of medical images localized with object detection WO2021030629A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022508485A JP2022544229A (ja) 2019-08-14 2020-08-13 オブジェクト検出を用いて位置特定された医用画像の三次元オブジェクトセグメンテーション
EP20761979.2A EP4014201A1 (en) 2019-08-14 2020-08-13 Three dimensional object segmentation of medical images localized with object detection
CN202080057028.2A CN114503159A (zh) 2019-08-14 2020-08-13 通过对象检测定位的医学图像的三维对象分割
US17/665,932 US11967072B2 (en) 2019-08-14 2022-02-07 Three-dimensional object segmentation of medical images localized with object detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962886844P 2019-08-14 2019-08-14
US62/886,844 2019-08-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/665,932 Continuation US11967072B2 (en) 2019-08-14 2022-02-07 Three-dimensional object segmentation of medical images localized with object detection

Publications (1)

Publication Number Publication Date
WO2021030629A1 true WO2021030629A1 (en) 2021-02-18

Family

ID=72243239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/046239 WO2021030629A1 (en) 2019-08-14 2020-08-13 Three dimensional object segmentation of medical images localized with object detection

Country Status (5)

Country Link
US (1) US11967072B2 (zh)
EP (1) EP4014201A1 (zh)
JP (1) JP2022544229A (zh)
CN (1) CN114503159A (zh)
WO (1) WO2021030629A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297910A (zh) * 2021-04-25 2021-08-24 云南电网有限责任公司信息中心 一种配网现场作业安全带识别方法
CN113450320A (zh) * 2021-06-17 2021-09-28 浙江德尚韵兴医疗科技有限公司 一种基于较深网络结构的超声结节分级与良恶性预测方法
CN113674254A (zh) * 2021-08-25 2021-11-19 上海联影医疗科技股份有限公司 医学图像异常点识别方法、设备、电子装置和存储介质
CN114220032A (zh) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 一种基于通道裁剪的无人机视频小目标检测方法
WO2022236279A1 (en) * 2021-05-04 2022-11-10 The General Hospital Corporation Collaborative artificial intelligence annotation platform leveraging blockchain for medical imaging
WO2022242046A1 (zh) * 2021-05-18 2022-11-24 上海商汤智能科技有限公司 医学图像的展示方法及装置、电子设备、存储介质和计算机程序
CN116030260A (zh) * 2023-03-27 2023-04-28 湖南大学 一种基于长条状卷积注意力的手术全场景语义分割方法
CN116416289A (zh) * 2023-06-12 2023-07-11 湖南大学 基于深度曲线学习的多模图像配准方法、系统及介质
US20230230356A1 (en) * 2021-12-30 2023-07-20 GE Precision Healthcare LLC Methods and systems for image selection
WO2023196474A1 (en) * 2022-04-06 2023-10-12 University Of Virginia Patent Foundation Automation of the blood input function computation pipeline for dynamic fdg pet for human brain using machine learning
US11826109B2 (en) 2020-09-24 2023-11-28 Stryker European Operations Limited Technique for guiding acquisition of one or more registration points on a patient's body
EP4345747A1 (en) * 2022-09-30 2024-04-03 Stryker European Operations Limited Medical image data processing technique

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3649933A4 (en) * 2017-07-07 2021-07-14 Osaka University PAIN DETERMINATION USING TREND ANALYSIS, MEDICAL DEVICE WITH MACHINE LEARNING, ECONOMIC DISCRIMINATION MODEL AND IOT, TAILORED MACHINE LEARNING AND NOVEL BRAIN WAVE CHARACTERISTICS DETERMINATION OF PAIN
US11950953B1 (en) * 2018-02-23 2024-04-09 Robert Edwin Douglas Image processing via a modified segmented structure
US11544509B2 (en) * 2020-06-30 2023-01-03 Nielsen Consumer Llc Methods, systems, articles of manufacture, and apparatus to classify labels based on images using artificial intelligence
KR102490645B1 (ko) * 2020-07-16 2023-01-25 고려대학교 산학협력단 흡수에너지 기반 전기장 암치료 계획 시스템 및 방법
TWI790508B (zh) * 2020-11-30 2023-01-21 宏碁股份有限公司 血管偵測裝置及基於影像的血管偵測方法
EP4016453B1 (en) * 2020-12-21 2023-08-09 Siemens Healthcare GmbH Method and system for automated segmentation of biological object parts in mri
CN112699937B (zh) * 2020-12-29 2022-06-21 江苏大学 基于特征引导网络的图像分类与分割的装置、方法、设备及介质
US11630882B2 (en) * 2021-04-23 2023-04-18 Hewlett-Packard Development Company, L.P. Adaptation of one shot machine learning model trained on other registered images of other content for usage to instead identify whether input image includes content of registered images
CN114708973B (zh) * 2022-06-06 2022-09-13 首都医科大学附属北京友谊医院 一种用于对人体健康进行评估的设备和存储介质
CN115311135A (zh) * 2022-06-24 2022-11-08 西南交通大学 一种基于3dcnn的各向同性mri分辨率重建方法
CN115082529B (zh) * 2022-06-30 2023-04-14 华东师范大学 一种大体组织多维信息采集和分析系统与方法
WO2024026255A1 (en) * 2022-07-25 2024-02-01 Memorial Sloan-Kettering Cancer Center Systems and methods for automated tumor segmentation in radiology imaging using data mined line annotations
CN115187950B (zh) * 2022-09-13 2022-11-22 安徽中科星驰自动驾驶技术有限责任公司 用于深度学习图像数据增强的新型平衡掩码二次采样方法
CN115359325B (zh) * 2022-10-19 2023-01-10 腾讯科技(深圳)有限公司 图像识别模型的训练方法、装置、设备和介质
CN115619799B (zh) * 2022-12-15 2023-04-18 北京科技大学 一种基于迁移学习的晶粒图像分割方法及系统
CN115953413B (zh) * 2023-03-13 2023-06-02 同心智医科技(北京)有限公司 一种mra图像分割方法、装置及存储介质
CN116486196B (zh) * 2023-03-17 2024-01-23 哈尔滨工业大学(深圳) 病灶分割模型训练方法、病灶分割方法及装置
CN116843648A (zh) * 2023-07-04 2023-10-03 北京大学口腔医学院 基于锥形束ct影像的髁突骨改建三维自动定量测量系统
CN116758289B (zh) * 2023-08-14 2023-10-24 中国石油大学(华东) 一种自补偿学习的小样本图像分割方法
CN117893532B (zh) * 2024-03-14 2024-05-24 山东神力索具有限公司 基于图像处理的模锻索具用模具裂纹缺陷检测方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014115151A1 (en) * 2013-01-24 2014-07-31 Tylerton International Holdings Inc. Body structure imaging
US20180330207A1 (en) * 2016-01-08 2018-11-15 Siemens Healthcare Gmbh Deep Image-to-Image Network Learning for Medical Image Analysis
WO2019136349A2 (en) * 2018-01-08 2019-07-11 Progenics Pharmaceuticals, Inc. Systems and methods for rapid neural network-based image segmentation and radiopharmaceutical uptake determination

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6909805B2 (en) * 2001-01-31 2005-06-21 Matsushita Electric Industrial Co., Ltd. Detecting and utilizing add-on information from a scanned document image
US9996919B2 (en) * 2013-08-01 2018-06-12 Seoul National University R&Db Foundation Method for extracting airways and pulmonary lobes and apparatus therefor
US10980519B2 (en) * 2015-07-14 2021-04-20 Duke University Systems and methods for extracting prognostic image features
US20190295260A1 (en) * 2016-10-31 2019-09-26 Konica Minolta Laboratory U.S.A., Inc. Method and system for image segmentation using controlled feedback
US10885630B2 (en) * 2018-03-01 2021-01-05 Intuitive Surgical Operations, Inc Systems and methods for segmentation of anatomical structures for image-guided surgery

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014115151A1 (en) * 2013-01-24 2014-07-31 Tylerton International Holdings Inc. Body structure imaging
US20180330207A1 (en) * 2016-01-08 2018-11-15 Siemens Healthcare Gmbh Deep Image-to-Image Network Learning for Medical Image Analysis
WO2019136349A2 (en) * 2018-01-08 2019-07-11 Progenics Pharmaceuticals, Inc. Systems and methods for rapid neural network-based image segmentation and radiopharmaceutical uptake determination

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
O. ÇIÇEKA. ABDULKADIRS. S. LIENKAMPT. BROXO. RONNEBERGER: "International conference on medical image computing and computer-assisted intervention", 2016, SPRINGER, article "3D U-Net: learning dense volumetric segmentation from sparse annotation", pages: 424 - 432

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11826109B2 (en) 2020-09-24 2023-11-28 Stryker European Operations Limited Technique for guiding acquisition of one or more registration points on a patient's body
CN113297910A (zh) * 2021-04-25 2021-08-24 云南电网有限责任公司信息中心 一种配网现场作业安全带识别方法
WO2022236279A1 (en) * 2021-05-04 2022-11-10 The General Hospital Corporation Collaborative artificial intelligence annotation platform leveraging blockchain for medical imaging
WO2022242046A1 (zh) * 2021-05-18 2022-11-24 上海商汤智能科技有限公司 医学图像的展示方法及装置、电子设备、存储介质和计算机程序
CN113450320A (zh) * 2021-06-17 2021-09-28 浙江德尚韵兴医疗科技有限公司 一种基于较深网络结构的超声结节分级与良恶性预测方法
CN113674254A (zh) * 2021-08-25 2021-11-19 上海联影医疗科技股份有限公司 医学图像异常点识别方法、设备、电子装置和存储介质
CN113674254B (zh) * 2021-08-25 2024-05-14 上海联影医疗科技股份有限公司 医学图像异常点识别方法、设备、电子装置和存储介质
CN114220032A (zh) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 一种基于通道裁剪的无人机视频小目标检测方法
US11908174B2 (en) * 2021-12-30 2024-02-20 GE Precision Healthcare LLC Methods and systems for image selection
US20230230356A1 (en) * 2021-12-30 2023-07-20 GE Precision Healthcare LLC Methods and systems for image selection
WO2023196474A1 (en) * 2022-04-06 2023-10-12 University Of Virginia Patent Foundation Automation of the blood input function computation pipeline for dynamic fdg pet for human brain using machine learning
EP4345747A1 (en) * 2022-09-30 2024-04-03 Stryker European Operations Limited Medical image data processing technique
CN116030260A (zh) * 2023-03-27 2023-04-28 湖南大学 一种基于长条状卷积注意力的手术全场景语义分割方法
CN116416289B (zh) * 2023-06-12 2023-08-25 湖南大学 基于深度曲线学习的多模图像配准方法、系统及介质
CN116416289A (zh) * 2023-06-12 2023-07-11 湖南大学 基于深度曲线学习的多模图像配准方法、系统及介质

Also Published As

Publication number Publication date
EP4014201A1 (en) 2022-06-22
CN114503159A (zh) 2022-05-13
US11967072B2 (en) 2024-04-23
JP2022544229A (ja) 2022-10-17
US20220230310A1 (en) 2022-07-21

Similar Documents

Publication Publication Date Title
US11967072B2 (en) Three-dimensional object segmentation of medical images localized with object detection
US10769791B2 (en) Systems and methods for cross-modality image segmentation
US9968257B1 (en) Volumetric quantification of cardiovascular structures from medical imaging
US10593035B2 (en) Image-based automated measurement model to predict pelvic organ prolapse
Zhao et al. A novel U-Net approach to segment the cardiac chamber in magnetic resonance images with ghost artifacts
CN113711271A (zh) 用于通过正电子发射断层摄影进行肿瘤分割的深度卷积神经网络
Diciotti et al. 3-D segmentation algorithm of small lung nodules in spiral CT images
EP3444781B1 (en) Image processing apparatus and image processing method
Oghli et al. Automatic fetal biometry prediction using a novel deep convolutional network architecture
ES2914387T3 (es) Estudio inmediato
Zhang et al. A novel and efficient tumor detection framework for pancreatic cancer via CT images
US11896407B2 (en) Medical imaging based on calibrated post contrast timing
US20090306496A1 (en) Automatic segmentation of articular cartilage from mri
KR20230059799A (ko) 병변 검출을 위해 공동 훈련을 이용하는 연결형 머신 러닝 모델
Jung et al. Deep learning for medical image analysis: Applications to computed tomography and magnetic resonance imaging
WO2022099303A1 (en) Machine learning techniques for tumor identification, classification, and grading
Lu et al. Cardiac chamber segmentation using deep learning on magnetic resonance images from patients before and after atrial septal occlusion surgery
Firjani et al. A novel image-based approach for early detection of prostate cancer
Pengiran Mohamad et al. Transition of traditional method to deep learning based computer-aided system for breast cancer using Automated Breast Ultrasound System (ABUS) images: a review
US11244472B2 (en) Method, system and computer program for determining position and/or orientation parameters of an anatomical structure
Xiao et al. PET and CT image fusion of lung cancer with siamese pyramid fusion network
Cairone et al. Robustness of radiomics features to varying segmentation algorithms in magnetic resonance images
Patel et al. Detection of prostate cancer using deep learning framework
JP2023545570A (ja) 形状プライアを用いた及び用いないセグメンテーション結果によって解剖学的異常を検出すること
Meça Applications of Deep Learning to Magnetic Resonance Imaging (MRI)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20761979

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022508485

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020761979

Country of ref document: EP

Effective date: 20220314