WO2019108695A1 - Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning - Google Patents

Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning Download PDF

Info

Publication number
WO2019108695A1
WO2019108695A1 PCT/US2018/062911 US2018062911W WO2019108695A1 WO 2019108695 A1 WO2019108695 A1 WO 2019108695A1 US 2018062911 W US2018062911 W US 2018062911W WO 2019108695 A1 WO2019108695 A1 WO 2019108695A1
Authority
WO
WIPO (PCT)
Prior art keywords
patch
scaled
level
patches
molecular subtype
Prior art date
Application number
PCT/US2018/062911
Other languages
French (fr)
Inventor
Mustafa Jaber
Bing SONG
Christopher Szeto
Charles VASKE
Original Assignee
Nantomics, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantomics, Llc filed Critical Nantomics, Llc
Priority to SG11202003330PA priority Critical patent/SG11202003330PA/en
Priority to CA3079438A priority patent/CA3079438A1/en
Priority to KR1020207014947A priority patent/KR20200066732A/en
Priority to AU2018374207A priority patent/AU2018374207A1/en
Publication of WO2019108695A1 publication Critical patent/WO2019108695A1/en
Priority to IL274101A priority patent/IL274101A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B10/00Other methods or instruments for diagnosis, e.g. instruments for taking a cell sample, for biopsy, for vaccination diagnosis; Sex determination; Ovulation-period determination; Throat striking implements
    • A61B10/0041Detection of breast cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • This disclosure relates generally to imaging for pathology applications, and more specifically to using pathology slide images for molecular subtyping techniques.
  • breast cancer is the most common noncutaneous cancer diagnosed in women, with over 266, 120 new cases estimated in the United States in 2018.
  • Several distinct breast cancer molecular subtypes based on hormone receptor (HR) and human epidermal growth factor receptor 2 (HER2) status, have been identified. These molecular subtypes include: luminal A and luminal B (HR-positive/HER2 -negative breast cancer), HER2-positive, basal- like (HR-negative/HER2-negative), and normal-like.
  • HR and HER2 status are important in predicting prognosis and response to therapy as these vary among the subtypes.
  • Immunohistochemistry (IHC) or in situ hybridization (ISH) assays are the standard methods used to distinguish subtypes.
  • molecular signature assays such as MammaPrint, Oncotype DX, and Predictor Analysis of Microarray 50 (PAM50) have gained ground as supplementary prognostic indicators due to studies demonstrating more significant differential survival between identified subgroups when compared to standard clinicopathological factors.
  • PAM50 sub typing (as part of the NanoString Prosigna panel) is becoming more routine in early-stage breast cancers to determine the likelihood of responding to chemotherapy.
  • these signature-based tests are not ubiquitously employed in part due to their high cost and extended processing times compared to IHC.
  • H&E hematoxylin and eosin- stained biopsy slides are routinely cohected during pathological examination and are often digitally recorded as pathology shde images (Sis).
  • slide imaging e.g., whole slide imaging refers to the scanning of conventional glass slides to produce digital slides and is used by pathologists for diagnostic,
  • Machine learning approaches can extract knowledge from Sis beyond that of which a human is capable, as evidenced by the many computer-assisted diagnosis (CAD) software solutions created to augment pathological inspection workflows. It has been previously demonstrated that even genetic subtyping can be approximated using Sis as input to machine learning models. Deep learning methods are an emerging set of influential machine learning technologies well suited to these image-based classification tasks. Recent advances in both
  • IDC invasive ductal carcinomas
  • DCIS benign ductal carcinoma in situ
  • cancer subtypes such as those classified by the expression-based PAM50 assay are prognostic independent of standard
  • heterogeneity of molecular subtypes using only pathology slide images (Sis) e.g., of hematoxylin and eosin (H&E)-stained biopsy tissue sections
  • Sis pathology slide images
  • H&E hematoxylin and eosin
  • a classifier model may be trained using previously subtyped Sis and subsequently used to classify cancer- specific patches within a test SI into major molecular subtypes (e.g., basal-like, HER2 -enriched, luminal A, and luminal B, and normal-like).
  • major molecular subtypes e.g., basal-like, HER2 -enriched, luminal A, and luminal B, and normal-like.
  • advanced machine learning methods can approximate molecular tests using only routinely collected Sis, and thus may increase prognostic capabilities by detecting aggressive minority subclones.
  • a plurality of training Sis e.g., each corresponding to a patient, is obtained and segmented into a plurality of scaled patches.
  • the plurality of training Sis may comprise hematoxylin and eosin (H&E)-stained whole slide images.
  • Each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI.
  • Each scaled patch of the plurality of scaled patches is converted into a multiscale descriptor using a deep-learning neural network such as one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning convolutional neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch -level descriptor and combining the patch-level descriptors.
  • a logits layer of an Inception- v3 neural network may be configured to map each of the one or more patch representations to a patch-level descriptor.
  • the patch-level descriptors may comprise multidimensional descriptive vectors.
  • Principal component analysis or another dimensionality reduction technique may be used to reduce dimensions of the multidimensional descriptive vectors.
  • a classifier model is configured and trained to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch-level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI and determine a SI -level molecular subtype classification based on the patch-level molecular subtype classifications.
  • the patch-level molecular subtype classification and Si-level molecular subtype classification may be heterogenous classifications comprising a plurality of molecular subtypes.
  • a molecular subtyping engine is configured to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI.
  • each of the scaled patches may comprise relatively high-zoom level patches and relatively low-zoom level patches with respect to each other, such as one or more of 5x, lOx, 20x, and 40x zoom-level patch representations.
  • the Si-level molecular subtype classification may be determined based on majority area voting criteria or weighting criteria for the plurality of scaled patches.
  • the weighting criteria may be based on at least one of cellular density and transcriptional activity.
  • the Si-level molecular subtype classification may comprise at least one of a Prosigna Breast Cancer Prognostic Gene Signature Assay or PAM50 subtype classification, such as one of basal-like, HER2 -enriched, luminal A, and luminal B, and normal-like or a combination of different subtype classifications.
  • a Prosigna Breast Cancer Prognostic Gene Signature Assay or PAM50 subtype classification such as one of basal-like, HER2 -enriched, luminal A, and luminal B, and normal-like or a combination of different subtype classifications.
  • a subset of the plurality of scaled patches may be selected for further processing, for example, by clustering the plurahty of scaled patches using unsupervised clustering such as //-means clustering or random selection to define cancer-enriched areas.
  • the subset of the plurality of scaled patches may be selected to summarize tumor content within a training SI.
  • the plurality of scaled patches may be filtered for a minimum color variance, and each scaled patch determined to be empty space or background may be eliminated from further processing based on the filtering.
  • the classifier model may comprise one or more of a multiclass support voting machine (SVM) including a radial basis function (RBF) kernel, a naive Bayes classifier, a decision tree, a boosted tree, a random forest classifier, a neural network, a nearest neighbor classifier, a linear classifier, and a nonlinear classifier.
  • SVM multiclass support voting machine
  • RBF radial basis function
  • a test SI may be obtained.
  • the test SI may be segmented into a plurality of scaled patches, where each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI.
  • Each scaled patch of the plurality of scaled patches may be converted into a multiscale descriptor using a deep-learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch -level descriptor and combining the patch-level descriptors.
  • the multiscale descriptors may be processed using the trained classifier model, where the trained classifier model is operable to assign a patch -level molecular subtype classification to each of the plurality of scaled patches and determine a Si-level molecular subtype classification based on the patch-level molecular subtype classifications.
  • An indication of a selected region of interest determined to be cancer-enriched within the test SI may be obtained, e.g., from a user via a user interface, or may be selected automatically based on one or more of biological criteria, an output of a heuristic machine learning or image processing algorithm, or an output of a deep-learning convolutional algorithm.
  • the selected region of interest may be a centroid or closed curve, and the plurality of scaled patches may comprise the selected region of interest.
  • FIG. 1 illustrates a graphical representation of a pathology slide image analyzed in accordance with an embodiment.
  • FIG. 2 illustrates a block diagram of example operations for
  • FIG. 3 illustrates a block diagram of a system for determining molecular subtype classifications based on pathology slide images in accordance with an embodiment.
  • FIG. 4 illustrates a flow diagram of example operations for determining molecular subtype classifications based on pathology slide images in accordance with an embodiment.
  • FIG. 5 illustrates a flow diagram of example operations for
  • FIG. 6 illustrates a graphical representation of exemplary scaled patches of pathology slide images in accordance with an embodiment.
  • FIG. 7 illustrates a graphical representation of a subtyping cancer- enriched scaled patches of pathology slide images in accordance with an
  • FIG. 8 illustrates graphical representations of independent evidence of heterogeneity in accordance with an embodiment.
  • FIG. 9 illustrates a block diagram of an exemplary client-server relationship that can be used for implementing one or more aspects of the various embodiments.
  • FIG. 10 illustrates a block diagram of a distributed computer system that can be used for implementing one or more aspects of the various embodiments.
  • FIG. 10 illustrates a block diagram of a distributed computer system that can be used for implementing one or more aspects of the various embodiments.
  • any of the various embodiments herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
  • the following specification is, therefore, not to be taken in a hmiting sense.
  • Coupled to is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms “coupled to” and“coupled with” are used synonymously. Within the context of a networked environment where two or more components or devices are able to exchange data, the terms“coupled to” and“coupled with” are also used to mean “communicatively coupled with”, possibly via one or more intermediary devices.
  • inventive subject matter is considered to include all possible combinations of the disclosed elements. As such, if one embodiment comprises elements A, B, and C, and another embodiment comprises elements B and D, then the inventive subject matter is also considered to include other remaining combinations of A, B, C, or D, even if not explicitly discussed herein.
  • transitional term“comprising” means to have as parts or members, or to be those parts or members. As used herein, the transitional term“comprising” is inclusive or open-ended and does not exclude additional, unrecited elements or method steps.
  • computing devices having at least one processor (e.g., ASIC, FPGA, DSP, x86, ARM, ColdFire, GPU, multi-core processors, etc.) configured to execute software
  • processor e.g., ASIC, FPGA, DSP, x86, ARM, ColdFire, GPU, multi-core processors, etc.
  • a server can include one or more computers operating as a web server, database server, or other type of computer server in a manner to fulfill described roles, responsibilities, or functions.
  • a computer program product comprising a non-transitory, tangible computer readable medium storing the instructions that cause a processor to execute the disclosed steps.
  • the various servers, systems, databases, or interfaces can exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public -private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods.
  • Data exchanges can be conducted over a packet-switched network, a circuit-switched network, the Internet, LAN, WAN, VPN, or other type of network.
  • any language directed to a computer should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, modules, or other types of computing devices operating individually or collectively.
  • the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, FPGA, PLA, solid state drive, RAM, flash, ROM, etc.).
  • the software instructions configure or program the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus.
  • the disclosed technologies can be embodied as a computer program product that includes a non-transitory computer readable medium storing the software instructions that causes a processor to execute the disclosed steps associated with implementations of computer-based algorithms, processes, methods, or other instructions.
  • the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods.
  • Data exchanges among devices can be conducted over a packet-switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network; a circuit switched network; cell switched network; or other type of network.
  • the focus of the disclosed inventive subject matter is to enable construction or configuration of a computing device to operate on vast quantities of digital data, beyond the capabilities of a human for purposes including determining molecular subtype classifications and detecting intratumor heterogeneity within a digitally recorded pathology whole slide image.
  • cancer subtypes e.g., breast cancer subtypes
  • PAM50 assay prognostic independent of standard clinicopathologic factors
  • intratumoral heterogeneity has been difficult to detect using less targeted approaches such as RNA sequencing.
  • a classifier model is trained to identify cancer-rich patches using subtyped Sis and subsequently used to classify cancer-specific patches within a test SI into major molecular subtypes (e.g., basal-like, HER2- enriched, luminal A, luminal B, and normal-like in the case of breast cancer).
  • major molecular subtypes e.g., basal-like, HER2- enriched, luminal A, luminal B, and normal-like in the case of breast cancer.
  • a relatively minimal number of such cancer-rich SI patches may be used to classify patients into molecularly defined subtypes (i.e., PAM50), which are typically undifferentiable using pathology-slide analysis.
  • test results show that patch -level analyses as described herein can accurately identify heterogeneous tumors.
  • One distinct advantage of the patch-based subtyping systems and methods presented herein is that the ability to directly observe intratumor heterogeneity is retained without resorting to numerical deconvolution methods.
  • the techniques herein can be leveraged to identify cancer patients presenting at least two molecular subtypes within the same tissue section, and to support these cases as mixed populations using independent data, including overall survival data.
  • a machine learning method has been achieved that can approximate advanced testing for molecular subtypes by using only routinely collected diagnostic pathology Sis, and possibly increase prognostic capabilities by detecting aggressive minority subclones that may become dominant in a tumor over time.
  • the methods herein relate to prognostic intrinsic subtype heterogeneity identified in diagnostic Sis.
  • FIG. 1 illustrates a graphical representation of a pathology slide image analyzed in accordance with an embodiment of the present invention.
  • a pathology slide image (SI) 100 may be generated when a pathologist wishes to look at a biopsy of a suspected cancer or make other medical diagnoses.
  • SI 100 may include more than two million cells.
  • a hematoxylin and eosin stain may be used for distinguishing the various structures the whole slide pathology image.
  • hematoxylin is a dark blue or violet stain that binds to various
  • tissue/cellular regions 102 i.e., basophilic substances such as DNA and RNA
  • eosin is a red or pink stain that binds to acidophihc substances
  • a scaled patch 106 of SI 100 may be selected for medical diagnosis and study based on the various distinguished tissue areas.
  • one or more scaled patches of SI 100 may be selected to determine molecular subtype classifications and detect intratumor heterogeneity.
  • FIG. 2 illustrates a block diagram of a system for determining molecular subtype classifications based on pathology shde images in accordance with an embodiment.
  • system 200 a fixed-size scaled patch-based approach allows analysis of regions as well as capturing micro- and macroscopic characteristics of a SI simultaneously.
  • Sis e.g., breast invasive carcinoma (BRCA) diagnostic whole-slide images of formalin -fixed paraffin-embedded (FFPE) blocks with associated PAM50 labels obtained from TCGA data sources
  • BRCA breast invasive carcinoma
  • FFPE formalin -fixed paraffin-embedded
  • the 1600 x 1600- pixel patches 202 may be filtered for a minimum color variance to eliminate empty (i.e., background) patches from further processing. Further, each 1600 x 1600-pixel patch 202 may be converted into 400 x 400-pixel patch representations 204 at, for example, one or more of 5x, lOx, 20x, and 40x magnification scales centered on a same location or point by down-samphng and cropping to the center 400 x 400- pixels.
  • At least one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning convolutional neural network may be used to map each 400 x 400-pixel color patch 204 to patch-level descriptors (i.e., descriptive vectors) 208 at each zoom level.
  • patch-level descriptors i.e., descriptive vectors
  • Inception-v3 image recognition neural network 206 may be used to map each color patch 204 to patch-level descriptors 208.
  • principal component analysis (PCA) or another dimensionality reduction technique may be used to reduce dimensions of the patch -level descriptors.
  • PCA principal component analysis
  • the patch-level descriptors for the one or more zoom levels (e.g., one or more of 5x, lOx, 20x, and 40x magnification) of the reduced dimension patch-level descriptors 208 may be combined (e.g., concatenated) into a multiscale descriptor 210.
  • analyzed locations may be filtered to include only cancer- enriched locations (as opposed to extracellular matrix or adjacent normal tissue) to reduce computational complexity and ensure a hygienic input to train classifier model 214, which may be one or more of a multiclass support voting machine (SVM) including a radial basis function (RBF) kernel, a naive Bayes classifier, a decision tree, a boosted tree, a random forest classifier, a neural network, a nearest neighbor classifier, a linear classifier, and a nonlinear classifier.
  • SVM multiclass support voting machine
  • RBF radial basis function
  • a plurality of scaled patches 204 selected (e.g., randomly) for training may be grouped using, for example, unsupervised clustering such as //-means clustering, where the number of clusters may be determined empirically.
  • Clusters of scaled patches with sufficient cellularity may be investigated further (e.g., by a pathologist) to identify clusters enriched for tumor content. For example, for each SI, patches that fall within the cancer-rich clusters may be used for further analysis.
  • classifier model 214 may comprise a multiclass support voting machine, which are generally known to exhibit superior performance on large data sets and may be trained to determine patch-level molecular subtype classifications 216, e.g., for multiscale descriptor 210. These patch-level molecular subtype classifications 216 may then be used to infer a Si-level molecular subtype classification 218 and detect molecular subtype heterogeneity 220.
  • FIG. 3 illustrates a block diagram of a system for determining molecular subtype classifications based on pathology sbde images in accordance with an embodiment.
  • elements for determining molecular subtype classifications based on pathlogy sbde images include training engine 310, subtype classification engine 320, persistent storage device 330, and main memory device 340.
  • training engine 310 may be configured to obtain training Sis 1 to N 302, 304, 306 from either one or both of persistent storage device 330 and main memory device 340.
  • Training engine 310 may then configure and train classifier model 214 (e.g., an SVM), which may be stored in either one or both of persistent storage device 330 and main memory device 340, using the training Sis 1 to N 302, 304, 306 as training inputs. For example, training engine 310 may segment each of the training Sis 1 to N 302, 304, 306 into a plurality of scaled patches 204, where each scaled patch of the plurality of scaled patches 204 comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI.
  • classifier model 214 e.g., an SVM
  • training engine 310 may segment each of the training Sis 1 to N 302, 304, 306 into a plurality of scaled patches 204, where each scaled patch of the plurality of scaled patches 204 comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI.
  • Training engine 310 may then convert each scaled patch of the plurahty of scaled patches 204 into a multiscale descriptor using a deep-learning neural network 206 (e.g., one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning convolutional neural network) by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor 208 and combining the patch- level descriptors to generate a multiscale descriptor 210.
  • the patch- level descriptors may be one or more of concatenated, averaged, stacked, or mathematically or empirically mixed or manipulated to generate a multiscale descriptor 210.
  • Training engine 310 may configure and train classifier model 214 to process the multiscale descriptors 210 such that, for each training SI 1 to N 302,
  • 306 classifier model 214 is operable to assign a patch-level molecular subtype classification 216 to each of the plurality of scaled patches corresponding to a training SI, and determine a Si-level molecular subtype classification 218 or heterogeneous classification 220 based on the patch-level molecular subtype classifications 216.
  • Training engine 310 may configure subtype classification engine 320 to use trained classifier model 314 to determine a Si-level molecular subtype
  • subtype classification engine 320 may obtain test SI 312; segment test SI 312 into a plurality of scaled patches 204, where each scaled patch of the plurality of scaled patches 204 comprises one or more patch representations at one or more zoom levels that are centered at a location within test SI 312; convert each scaled patch of the plurality of scaled patches into a multiscale descriptor using a deep-learning neural network 206 by, for each scaled patch, mapping each of the set of patch representations to a patch- level descriptor 208 and combining (e.g., concatenating, averaging, stacking, mathematically or empirically mixing or manipulating, etc.) the patch-level descriptors into a multiscale descriptor 210.
  • combining e.g., concatenating, averaging, stacking, mathematically or empirically mixing or manipulating, etc.
  • Subtype classification engine 320 may then process the multiscale descriptors 210 using trained classifier model 314, where trained classifier model 314 is operable to assign a patch -level molecular subtype classification 216 to each of the plurality of scaled patches and determine a Si-level molecular subtype classification 218 or heterogeneous classification 220 based on the patch-level molecular subtype classifications 216.
  • classification engine 320 a persistent storage device 330 and a main memory device 340 should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, modules, or other types of computing devices operating individually or collectively to perform the functions ascribed to the various elements.
  • computing devices including servers, interfaces, systems, databases, agents, peers, engines, controllers, modules, or other types of computing devices operating individually or collectively to perform the functions ascribed to the various elements.
  • client-server relationship such as by one or more servers, one or more client devices (e.g., one or more user devices) and/or by a combination of one or more servers and client devices.
  • FIG. 4 illustrates a flow diagram of example operations for
  • a plurality of training Sis 1 to N 302, 304, 306, e.g., each corresponding to a patient, is obtained and segmented into a plurality of scaled patches at step 402.
  • each scaled patch of the plurality of scaled patches may comprise one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI.
  • each scaled patch of the plurality of scaled patches is converted into a multiscale descriptor using a deep -learning neural network such as at least one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep -learning convolutional neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch -level descriptor and combining the patch-level descriptors.
  • a logits layer of an Inception- v3 neural network may be configured to map each of the one or more patch representations to a patch-level descriptor.
  • the patch-level descriptors may comprise multidimensional descriptive vectors, and principal component analysis (PCA) or another dimensionality reduction technique may be used to reduce dimensions of the multidimensional descriptive vectors.
  • PCA principal component analysis
  • combining the patch-level descriptors may comprise one or more of concatenating, averaging, stacking, or mathematically or empirically mixing or manipulating the patch-level descriptors to generate a multiscale descriptor.
  • a neural network may be used to determine or learn an optimal method of combining the patch -level descriptors to generate a multiscale descriptor.
  • a classifier model (e.g., an SVM) is configured and trained to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch-level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI and determine a SI- level molecular subtype classification based on the patch-level molecular subtype classifications.
  • the patch-level molecular subtype classification and Si-level molecular subtype classification may be heterogenous classifications comprising a plurality of molecular subtypes.
  • a molecular subtyping engine is configured to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI at step 408.
  • FIG. 5 illustrates a flow diagram of example operations for
  • a subtype classification engine e.g., subtype classification engine 320, is configured to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI.
  • a test SI is obtained at step 502.
  • the test SI is segmented into a plurality of scaled patches, where each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI.
  • each scaled patch of the plurality of scaled patches is converted into a multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the set of patch representations to a patch-level descriptor and combining the patch-level descriptors.
  • combining the patch-level descriptors may comprise one or more of concatenating, averaging, stacking, or mathematically or empirically mixing or manipulating the patch-level descriptors to generate a multiscale descriptor.
  • a neural network may be used to determine or learn an optimal method of combining the patch -level descriptors to generate a multiscale descriptor.
  • the multiscale descriptors are processed using the trained classifier model, where the trained classifier model is operable to assign a patch- level molecular subtype classification to each of the plurality of scaled patches and determine a SI -level molecular subtype classification based on the patch-level molecular subtype classifications.
  • an indication of a selected region of interest determined to be cancer-enriched within the test SI may be obtained, e.g., from a user via a user interface, or selected automatically based on, for example, one or more of biological criteria, an output of a heuristic machine learning or image processing algorithm, or an output of a deep -learning
  • the selected region of interest may be a centroid or closed curve
  • the plurality of scaled patches may comprise the selected region of interest.
  • Test results with respect to the various embodiments herein have been obtained based on 1,142 diagnostic (training) Sis from 793 breast cancer patients with associated PAM50 labels that were obtained from TCGA sources.
  • each training SI was 122,600 x 220,968 pixels at the 5x magnification level, resulting in 2,709,065 total analysis locations.
  • 1,985,745 locations remained.
  • Each location was down- sampled from the 20x zoom -level to represent 20x, lOx, and 5x zoom levels centered on a same location, resulting in 5,957,235 400 x 400-pixel color patches. These two- dimensional color patches were converted to vectors of length 2048 using an
  • PCA Principal component analysis
  • a patch-level descriptor length of 768 was found to retain > 96% variance in each zoom level.
  • the total data set size was a matrix of 1,985,745 locations x 2304 features.
  • FIG. 6 illustrates a graphical representation of exemplary scaled patches of pathology slide images in accordance with an embodiment.
  • five leading clusters had mostly cancer-rich samples (>80% of patches were cancer-rich).
  • Cluster 3 602 was 100% cancer-rich and represented 5.51% of the patches
  • Cluster 5 604 was 91.67% cancer-rich and represented 4.52% of the patches
  • Cluster 11 606 was 87.50% cancer-rich and represented 4.03% of the patches
  • Cluster 16 608 was 87.50% cancer-rich and represented 4.31% of the patches
  • Cluster 2 610 was 82.61% cancer-rich and represented 5.21% of the patches.
  • Table 3 summarizes the accuracy of subtype classifications at the patch, SI, and patient level in held-out test samples in fivefold cross-validation of the training SI samples.
  • Table 4 shows performance in two validation sets: 1 unselected group of 223 patients, and a second group containing 104 patients with low- confidence RNAseq-based PAM50 classifications.
  • RNAseq-based classification Confusion matrices between true labels (columns) and predicted labels (rows) at the patient-level for unselected (left) and low-confidence (right) by RNAseq-based classification
  • FIG. 7 illustrates a graphical representation of a subtyping cancer- enriched scaled patches of pathology shde images in accordance with an
  • patch-level subtype classification results on four SI examples are shown. Particularly, patch A 702 was determined to comprise 100% basal-like subtypes; patch B 704 was determined to have 2.53% basal-like, 68.35% HER2 -enriched, and 29.11% luminal A subtypes; patch C 706 was determined to have a 100% luminal A subtypes; and patch D 708 was determined to have 2.50% basal-like, 1.25% HER2 -enriched, 8.75% luminal A and 87.50% luminal B subtypes.
  • FIG. 8 illustrates graphical representations 800 of independent evidence of heterogeneity in accordance with an embodiment.
  • representation A 802 seventy-six Sis with > 30% of patches classified as basal-like and > 30% of patches classified as luminal A were considered as possible heterogenous (HET) samples. These HET samples were analyzed by comparing them to pure luminal A (PLA) and pure basal-like (PBL) samples. To define pure subtypes, thresholds that maximized agreement between patch-based classifications and RNAseq-based classifications were identified using Youden analysis.
  • a threshold of at least 63.7% of patches classifying as luminal A was found to maximize agreement with RNAseq-based luminal A classification, with a true-positive rate (TPR) of 0.80 and false positive rate (FPR) of 0.15.
  • TPR true-positive rate
  • FPR false positive rate
  • RNAseq expression profiles were compared between pure and heterogeneous settings as defined by image-based classifications.
  • SSC Scatter Separability Criterion
  • SSC Scatter Separability Criterion
  • progesterone receptor PR /PGR
  • human epidermal growth factor receptor 2 SER2IERBB2
  • HET HR expression levels were significantly distinct from both pure subsets in all 3 receptors (p-values range from 3.4 x 10-7 to 3.0 x 10-3).
  • luminal A and basal-like subtypes have been shown to have significantly different prognoses, survival analysis was used to confirm that the HET subset has prognostic value, as illustrated in representation D 808.
  • heterogeneous samples similar analyses could be performed using the embodiments herein for other subtype combinations such as, e.g., HER2-enriched and luminal A, luminal A and luminal B, or even three-way subtype combinations.
  • subtype combinations such as, e.g., HER2-enriched and luminal A, luminal A and luminal B, or even three-way subtype combinations.
  • Intra-tumor heterogeneity may play a role in reducing concordance with expression-based subtyping.
  • the embodiments herein summarize scaled patches into a patient -level classification by majority area, whereas expression profiles are summaries based on total transcript counts.
  • the classification framework presented herein has novel apphcation as a method for detecting intratumor heterogeneity. Inspection of patients that were misclassified revealed patterns of discordant subtypes at the patch level. Further evidence that these tumors are in fact heterogeneous populations was found in hormone-receptor expression levels, transcriptomic profiles, and survival characteristics. Specifically, patients that were classified as luminal A subtype but had basal-like subclones have poorer survival compared to homogeneous luminal A patients. The ability to identify aggressive subclonal populations from diagnostic pathology images has significant prognostic implications. For example, the specific regions located by such methods could be further confirmed as molecularly distinct subclones by laser
  • a computer includes a processor for executing instructions and one or more memories for storing instructions and data.
  • a computer may also include, or be coupled to, one or more mass storage devices, such as one or more magnetic disks, internal hard disks and removable disks, magneto optical disks, optical disks, etc.
  • Client-server relationship 900 comprises chent 910 in communication with server 920 via network 930 and illustrates one possible division of determining molecular subtype classifications based on pathology slide images between client 910 and server 920.
  • chent 910 in accordance with the various embodiments described above, may obtain a test SI and send the test SI to server 920.
  • Server 920 may, in turn, receive the test SI from client 910; segment the test SI into a plurality of scaled patches, where each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI; convert each scaled patch of the plurahty of scaled patches into a multiscale descriptor using a deep-learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor and combining the patch- level descriptors; determine a Si-level molecular subtype classification for the test SI using a classifier model trained to process the multiscale descriptors such that a patch-level molecular subtype classification is assigned to each of the plurality of scaled patches, and the Si-level molecular subtype classification is determined based on the patch-level molecular subtype classifications; and send the Si-level molecular subtype classification to client 910.
  • client-server relationship illustrated in FIG. 9 is only one of many client-server relationships that are possible for implementing the systems, apparatus, and methods described herein. As such, the client-server relationship illustrated in FIG. 9 should not, in any way, be construed as limiting.
  • client devices 910 can include cellular smartphones, kiosks, personal data assistants, tablets, robots, vehicles, web cameras, or other types of computing devices.
  • a computer program is a set of computer program instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result.
  • a computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • Apparatus 1000 comprises a processor 1010 operatively coupled to a persistent storage device 1020 and a main memory device 1030.
  • Processor 1010 controls the overall operation of apparatus 1000 by executing computer program instructions that define such operations.
  • the computer program instructions may be stored in persistent storage device 1020, or other computer-readable medium, and loaded into main memory device 1030 when execution of the computer program instructions is desired.
  • training engine 310 and subtype classification engine 320 may comprise one or more components of computer 1000.
  • Apparatus 1000 can be defined by the computer program instructions stored in main memory device 1030 and/or persistent storage device 1020 and controlled by processor 1010 executing the computer program instructions.
  • the computer program instructions can be implemented as computer executable code programmed by one skilled in the art to perform an algorithm defined by the method steps of FIGS. 4 and 5.
  • the processor 1010 executes an algorithm defined by the method steps of FIGS. 4 and 5.
  • Apparatus 1000 also includes one or more network interfaces 1080 for communicating with other devices via a network.
  • Apparatus 1000 may also include one or more input/output devices 1090 that enable user interaction with apparatus 1000 (e.g., display, keyboard, mouse, speakers, buttons, etc.).
  • input/output devices 1090 that enable user interaction with apparatus 1000 (e.g., display, keyboard, mouse, speakers, buttons, etc.).
  • Processor 1010 may include both general and special purpose microprocessors and may be the sole processor or one of multiple processors of apparatus 1000.
  • Processor 1010 may comprise one or more central processing units (CPUs), and one or more graphics processing units (GPUs), which, for example, may work separately from and/or multi-task with one or more CPUs to accelerate processing, e.g., for various image processing applications described herein.
  • CPUs central processing units
  • GPUs graphics processing units
  • Processor 1010, persistent storage device 1020, and/or main memory device 1030 may include, be supplemented by, or incorporated in, one or more application-specific integrated circuits (ASICs) and/or one or more field programmable gate arrays (FPGAs).
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • Persistent storage device 1020 and main memory device 1030 each comprise a tangible non-transitory computer readable storage medium.
  • Persistent storage device 1020, and main memory device 1030 may each include high-speed random access memory, such as dynamic random access memory (DRAM), static random access memory (SRAM), double data rate synchronous dynamic random access memory (DDR RAM), or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices such as internal hard disks and removable disks, magneto-optical disk storage devices, optical disk storage devices, flash memory devices, semiconductor memory devices, such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory (DVD-ROM) disks, or other non-volatile solid state storage devices.
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • DDR RAM double data rate synchronous dynamic random access memory
  • Input/output devices 1090 may include peripherals, such as a printer, scanner, display screen, etc.
  • input/output devices 1090 may include a display device such as a cathode ray tube (CRT), plasma or liquid crystal display (LCD) monitor for displaying information (e.g., a DNA accessibility prediction result) to a user, a keyboard, and a pointing device such as a mouse or a trackball by which the user can provide input to apparatus 1000.
  • a display device such as a cathode ray tube (CRT), plasma or liquid crystal display (LCD) monitor for displaying information (e.g., a DNA accessibility prediction result) to a user, a keyboard, and a pointing device such as a mouse or a trackball by which the user can provide input to apparatus 1000.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • apparatus 1000 may utilize one or more neural networks or other deep-learning techniques to perform training engine 310 and subtype classification engine 320 or other systems or apparatuses discussed herein.
  • FIG. 10 is a high-level representation of some of the components of such a computer for illustrative purposes.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Radiology & Medical Imaging (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biotechnology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Veterinary Medicine (AREA)
  • Surgery (AREA)
  • Molecular Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Oncology (AREA)
  • Image Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Techniques are provided for determining molecular subtype classifications based on pathology slide images (SIs). A plurality of training SIs is segmented into a plurality of scaled patches. Each scaled patch is converted into a multiscale descriptor using a deep-learning neural network by mapping each of one or more patch representations to a patch-level descriptor and combining the patch-level descriptors. A classifier model is configured and trained to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch-level molecular subtype classification to each of the scaled patches corresponding to the training SI and determine a Si-level molecular subtype classification based on the patch-level molecular subtype classifications. A molecular subtyping engine is configured to use the trained classifier model to determine a SI-level molecular subtype classification for a test SI.

Description

DETECTING INTRATUMOR HETEROGENEITY OF MOLECULAR SUBTYPES IN PATHOLOGY SLIDE IMAGES USING DEEP-LEARNING
TECHNICAL FIELD
[0001] This disclosure relates generally to imaging for pathology applications, and more specifically to using pathology slide images for molecular subtyping techniques.
BACKGROUND
[0002] In general, various cancers can have distinct molecular subtypes that affect patient responses to therapy. For example, breast cancer is the most common noncutaneous cancer diagnosed in women, with over 266, 120 new cases estimated in the United States in 2018. Several distinct breast cancer molecular subtypes, based on hormone receptor (HR) and human epidermal growth factor receptor 2 (HER2) status, have been identified. These molecular subtypes include: luminal A and luminal B (HR-positive/HER2 -negative breast cancer), HER2-positive, basal- like (HR-negative/HER2-negative), and normal-like. HR and HER2 status are important in predicting prognosis and response to therapy as these vary among the subtypes. Immunohistochemistry (IHC) or in situ hybridization (ISH) assays are the standard methods used to distinguish subtypes. Recently, molecular signature assays such as MammaPrint, Oncotype DX, and Predictor Analysis of Microarray 50 (PAM50) have gained ground as supplementary prognostic indicators due to studies demonstrating more significant differential survival between identified subgroups when compared to standard clinicopathological factors. Particularly, PAM50 sub typing (as part of the NanoString Prosigna panel) is becoming more routine in early-stage breast cancers to determine the likelihood of responding to chemotherapy. However, these signature-based tests are not ubiquitously employed in part due to their high cost and extended processing times compared to IHC.
[0003] Unlike molecular signature assays, hematoxylin and eosin (H&E)- stained biopsy slides are routinely cohected during pathological examination and are often digitally recorded as pathology shde images (Sis). In general, slide imaging (e.g., whole slide imaging) refers to the scanning of conventional glass slides to produce digital slides and is used by pathologists for diagnostic,
educational and research purposes.
[0004] Machine learning approaches can extract knowledge from Sis beyond that of which a human is capable, as evidenced by the many computer-assisted diagnosis (CAD) software solutions created to augment pathological inspection workflows. It has been previously demonstrated that even genetic subtyping can be approximated using Sis as input to machine learning models. Deep learning methods are an emerging set of influential machine learning technologies well suited to these image-based classification tasks. Recent advances in both
computational power and convolutional network architectures have greatly increased the applicabihty of these techniques for several new domains in biology including -omics analysis, biomedical signal processing and biomedical imaging. [0005] Of interest to SI analysis is the use of scaled patch representations that allow concurrent use of high-zoom patches that capture cellular level information with lower-zoom patches that capture global interdependence of tissue structures. Scaled patch representations of Sis have been used to build highly accurate context-aware stacked convolutional neural networks (CNN) for
distinguishing between invasive ductal carcinomas (IDC) and benign ductal carcinoma in situ (DCIS). Similarly, this same approach has been used to accurately detect whether biopsy samples from nearby lymph node tissue were positive for metastases.
[0006] While the use of scaled patch representations may increase
performance in SI -based classification tasks, the computational complexity of training on all possible scaled patches from gigapixel Sis is substantial. As such, previous studies have employed strategies that limit the analyzed patches to a subset of the total image. For example, in a study of subtypes in breast cancer, a minimum filter on the blue-yellow channel at 20x magnification has been used to select patches rich in epithehal cells. Similarly, in a study of non-small cell lung cancer Sis, only the top ten cell-dense 1,000 x 1,000-pixel patches at 40x
magnification were used. However, these strategies leveraged tissue-specific knowledge of cell morphology in their respective indications. Until now,
generahzable methods for focusing on information-rich image patches has been seen only as an area of ongoing research. SUMMARY
[0007] As described above, cancer subtypes such as those classified by the expression-based PAM50 assay are prognostic independent of standard
clinicopathologic factors, yet the molecular testing required to elucidate these subtypes has not been routinely performed. Furthermore, intratumoral
heterogeneity has been difficult to detect using less targeted approaches such as RNA sequencing.
[0008] However, systems, methods, and articles of manufacture for
determining molecular subtype classifications and detecting intratumor
heterogeneity of molecular subtypes using only pathology slide images (Sis) (e.g., of hematoxylin and eosin (H&E)-stained biopsy tissue sections) are described herein. Further, evidence is presented showing that these patch-level analyses can accurately identify heterogeneous tumors. Particularly, a classifier model may be trained using previously subtyped Sis and subsequently used to classify cancer- specific patches within a test SI into major molecular subtypes (e.g., basal-like, HER2 -enriched, luminal A, and luminal B, and normal-like). As such, advanced machine learning methods can approximate molecular tests using only routinely collected Sis, and thus may increase prognostic capabilities by detecting aggressive minority subclones.
[0009] In one embodiment, a plurality of training Sis, e.g., each corresponding to a patient, is obtained and segmented into a plurality of scaled patches. The plurality of training Sis may comprise hematoxylin and eosin (H&E)-stained whole slide images. Each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI. Each scaled patch of the plurality of scaled patches is converted into a multiscale descriptor using a deep-learning neural network such as one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning convolutional neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch -level descriptor and combining the patch-level descriptors. A logits layer of an Inception- v3 neural network may be configured to map each of the one or more patch representations to a patch-level descriptor. The patch-level descriptors may comprise multidimensional descriptive vectors. Principal component analysis (PCA) or another dimensionality reduction technique may be used to reduce dimensions of the multidimensional descriptive vectors. A classifier model is configured and trained to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch-level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI and determine a SI -level molecular subtype classification based on the patch-level molecular subtype classifications. The patch-level molecular subtype classification and Si-level molecular subtype classification may be heterogenous classifications comprising a plurality of molecular subtypes. A molecular subtyping engine is configured to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI.
[0010] In some embodiments, each of the scaled patches may comprise relatively high-zoom level patches and relatively low-zoom level patches with respect to each other, such as one or more of 5x, lOx, 20x, and 40x zoom-level patch representations.
[0011] In some embodiments, the Si-level molecular subtype classification may be determined based on majority area voting criteria or weighting criteria for the plurality of scaled patches. The weighting criteria may be based on at least one of cellular density and transcriptional activity.
[0012] In some embodiments, the Si-level molecular subtype classification may comprise at least one of a Prosigna Breast Cancer Prognostic Gene Signature Assay or PAM50 subtype classification, such as one of basal-like, HER2 -enriched, luminal A, and luminal B, and normal-like or a combination of different subtype classifications.
[0013] In some embodiments, a subset of the plurality of scaled patches may be selected for further processing, for example, by clustering the plurahty of scaled patches using unsupervised clustering such as //-means clustering or random selection to define cancer-enriched areas. The subset of the plurality of scaled patches may be selected to summarize tumor content within a training SI. [0014] In some embodiments, the plurality of scaled patches may be filtered for a minimum color variance, and each scaled patch determined to be empty space or background may be eliminated from further processing based on the filtering.
[0015] In some embodiments, the classifier model may comprise one or more of a multiclass support voting machine (SVM) including a radial basis function (RBF) kernel, a naive Bayes classifier, a decision tree, a boosted tree, a random forest classifier, a neural network, a nearest neighbor classifier, a linear classifier, and a nonlinear classifier.
[0016] In some embodiments, a test SI may be obtained. The test SI may be segmented into a plurality of scaled patches, where each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI. Each scaled patch of the plurality of scaled patches may be converted into a multiscale descriptor using a deep-learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch -level descriptor and combining the patch-level descriptors. The multiscale descriptors may be processed using the trained classifier model, where the trained classifier model is operable to assign a patch -level molecular subtype classification to each of the plurality of scaled patches and determine a Si-level molecular subtype classification based on the patch-level molecular subtype classifications. An indication of a selected region of interest determined to be cancer-enriched within the test SI may be obtained, e.g., from a user via a user interface, or may be selected automatically based on one or more of biological criteria, an output of a heuristic machine learning or image processing algorithm, or an output of a deep-learning convolutional algorithm. The selected region of interest may be a centroid or closed curve, and the plurality of scaled patches may comprise the selected region of interest.
[0017] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following specification, along with the accompanying drawings in which like numerals represent like components.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0019] FIG. 1 illustrates a graphical representation of a pathology slide image analyzed in accordance with an embodiment.
[0020] FIG. 2 illustrates a block diagram of example operations for
determining molecular subtype classifications based on pathlogy slide images in accordance with an embodiment.
[0021] FIG. 3 illustrates a block diagram of a system for determining molecular subtype classifications based on pathology slide images in accordance with an embodiment. [0022] FIG. 4 illustrates a flow diagram of example operations for determining molecular subtype classifications based on pathology slide images in accordance with an embodiment.
[0023] FIG. 5 illustrates a flow diagram of example operations for
determining molecular subtype classifications based on pathology slide images in accordance with an embodiment.
[0024] FIG. 6 illustrates a graphical representation of exemplary scaled patches of pathology slide images in accordance with an embodiment.
[0025] FIG. 7 illustrates a graphical representation of a subtyping cancer- enriched scaled patches of pathology slide images in accordance with an
embodiment.
[0026] FIG. 8 illustrates graphical representations of independent evidence of heterogeneity in accordance with an embodiment.
[0027] FIG. 9 illustrates a block diagram of an exemplary client-server relationship that can be used for implementing one or more aspects of the various embodiments; and
[0028] FIG. 10 illustrates a block diagram of a distributed computer system that can be used for implementing one or more aspects of the various embodiments. [0029] While the invention is described with reference to the above drawings, the drawings are intended to be illustrative, and other embodiments are consistent with the spirit, and within the scope, of the invention.
DETAILED DESCRIPTION
[0030] The various embodiments now will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific examples of practicing the embodiments. This specification may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein; rather, these embodiments are provided so that this specification will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Among other things, this specification may be embodied as methods or devices.
Accordingly, any of the various embodiments herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. The following specification is, therefore, not to be taken in a hmiting sense.
[0031] Throughout the specification and claims, the following terms take the meanings exphcitly associated herein, unless the context clearly dictates otherwise:
[0032] The phrase“in one embodiment” as used herein does not necessarily refer to the same embodiment, though it may. Thus, as described below, various embodiments of the invention may be readily combined, without departing from the scope or spirit of the invention.
[0033] As used herein, the term“or” is an inclusive“or” operator and is equivalent to the term“and/or,” unless the context clearly dictates otherwise.
[0034] The term“based on” is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise.
[0035] As used herein, and unless the context dictates otherwise, the term “coupled to” is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms “coupled to” and“coupled with” are used synonymously. Within the context of a networked environment where two or more components or devices are able to exchange data, the terms“coupled to” and“coupled with” are also used to mean “communicatively coupled with”, possibly via one or more intermediary devices.
[0036] In addition, throughout the specification, the meaning of“a”,“an”, and “the” includes plural references, and the meaning of“in” includes“in” and“on”.
[0037] Although some of the various embodiments presented herein constitute a single combination of inventive elements, it should be appreciated that the inventive subject matter is considered to include all possible combinations of the disclosed elements. As such, if one embodiment comprises elements A, B, and C, and another embodiment comprises elements B and D, then the inventive subject matter is also considered to include other remaining combinations of A, B, C, or D, even if not explicitly discussed herein. Further, the transitional term“comprising” means to have as parts or members, or to be those parts or members. As used herein, the transitional term“comprising” is inclusive or open-ended and does not exclude additional, unrecited elements or method steps.
[0038] Throughout the following discussion, numerous references will be made regarding servers, services, interfaces, engines, modules, clients, peers, portals, platforms, or other systems formed from computing devices. It should be appreciated that the use of such terms is deemed to represent one or more
computing devices having at least one processor (e.g., ASIC, FPGA, DSP, x86, ARM, ColdFire, GPU, multi-core processors, etc.) configured to execute software
instructions stored on a computer readable tangible, non-transitory medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). For example, a server can include one or more computers operating as a web server, database server, or other type of computer server in a manner to fulfill described roles, responsibilities, or functions. One should further appreciate the disclosed computer-based algorithms, processes, methods, or other types of instruction sets can be embodied as a computer program product comprising a non-transitory, tangible computer readable medium storing the instructions that cause a processor to execute the disclosed steps. The various servers, systems, databases, or interfaces can exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public -private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges can be conducted over a packet-switched network, a circuit-switched network, the Internet, LAN, WAN, VPN, or other type of network.
[0039] As used in the description herein and throughout the claims that follow, when a system, engine, server, device, module, or other computing element is described as being configured to perform or execute functions on data in a memory, the meaning of“configured to” or“programmed to” is defined as one or more processors or cores of the computing element being programmed by a set of software instructions stored in the memory of the computing element to execute the set of functions on target data or data objects stored in the memory.
[0040] It should be noted that any language directed to a computer should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, modules, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, FPGA, PLA, solid state drive, RAM, flash, ROM, etc.). The software instructions configure or program the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus. Further, the disclosed technologies can be embodied as a computer program product that includes a non-transitory computer readable medium storing the software instructions that causes a processor to execute the disclosed steps associated with implementations of computer-based algorithms, processes, methods, or other instructions. In some embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges among devices can be conducted over a packet-switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network; a circuit switched network; cell switched network; or other type of network.
[0041] The focus of the disclosed inventive subject matter is to enable construction or configuration of a computing device to operate on vast quantities of digital data, beyond the capabilities of a human for purposes including determining molecular subtype classifications and detecting intratumor heterogeneity within a digitally recorded pathology whole slide image.
[0042] One should appreciate that the disclosed techniques provide many advantageous technical effects including improving the scope, accuracy,
compactness, efficiency, and speed of determining molecular subtype classifications and detecting intratumor heterogeneity using pathology whole slide images. It should also be appreciated that the following specification is not intended as an extensive overview, and as such, concepts may be simplified in the interests of clarity and brevity. [0043] As described above, cancer subtypes, e.g., breast cancer subtypes, as classified by the expression-based PAM50 assay are prognostic independent of standard clinicopathologic factors, yet the molecular testing required to elucidate these subtypes has not been routinely performed. Furthermore, intratumoral heterogeneity has been difficult to detect using less targeted approaches such as RNA sequencing.
[0044] Systems, methods, and articles of manufacture for approximating PAM50 sub typing of any indication of cancer using only scaled patches in pathology slide images (Sis), e.g., of hematoxylin and eosin (H&E)-stained biopsy tissue sections, are described herein. A classifier model is trained to identify cancer-rich patches using subtyped Sis and subsequently used to classify cancer-specific patches within a test SI into major molecular subtypes (e.g., basal-like, HER2- enriched, luminal A, luminal B, and normal-like in the case of breast cancer). In the various embodiments, a relatively minimal number of such cancer-rich SI patches may be used to classify patients into molecularly defined subtypes (i.e., PAM50), which are typically undifferentiable using pathology-slide analysis.
Further, test results show that patch -level analyses as described herein can accurately identify heterogeneous tumors.
[0045] One distinct advantage of the patch-based subtyping systems and methods presented herein is that the ability to directly observe intratumor heterogeneity is retained without resorting to numerical deconvolution methods. Thus, the techniques herein can be leveraged to identify cancer patients presenting at least two molecular subtypes within the same tissue section, and to support these cases as mixed populations using independent data, including overall survival data. As such, a machine learning method has been achieved that can approximate advanced testing for molecular subtypes by using only routinely collected diagnostic pathology Sis, and possibly increase prognostic capabilities by detecting aggressive minority subclones that may become dominant in a tumor over time. While others have previously used image-based measures of heterogeneity as prognostic biomarkers, the methods herein relate to prognostic intrinsic subtype heterogeneity identified in diagnostic Sis.
[0046] FIG. 1 illustrates a graphical representation of a pathology slide image analyzed in accordance with an embodiment of the present invention. A pathology slide image (SI) 100 may be generated when a pathologist wishes to look at a biopsy of a suspected cancer or make other medical diagnoses. Typically, a whole- slide pathology image such as SI 100 may include more than two million cells.
Thus, a hematoxylin and eosin stain (Ή&E stain” or“HE stain”), may be used for distinguishing the various structures the whole slide pathology image. As
Figure imgf000018_0001
shown, hematoxylin is a dark blue or violet stain that binds to various
tissue/cellular regions 102 (i.e., basophilic substances such as DNA and RNA), while eosin is a red or pink stain that binds to acidophihc substances including
cytoplasmic filaments in muscle cells, intracellular membranes, and extracehular fibers such as, for example, plasma region 104. In an embodiment, a scaled patch 106 of SI 100 may be selected for medical diagnosis and study based on the various distinguished tissue areas. For example, one or more scaled patches of SI 100 may be selected to determine molecular subtype classifications and detect intratumor heterogeneity.
[0047] FIG. 2 illustrates a block diagram of a system for determining molecular subtype classifications based on pathology shde images in accordance with an embodiment. In system 200, a fixed-size scaled patch-based approach allows analysis of regions as well as capturing micro- and macroscopic characteristics of a SI simultaneously. In an embodiment, Sis (e.g., breast invasive carcinoma (BRCA) diagnostic whole-slide images of formalin -fixed paraffin-embedded (FFPE) blocks with associated PAM50 labels obtained from TCGA data sources) may be segmented or tiled into 1600 x 1600-pixel patches 202 at the 20x zoom level. The 1600 x 1600- pixel patches 202 may be filtered for a minimum color variance to eliminate empty (i.e., background) patches from further processing. Further, each 1600 x 1600-pixel patch 202 may be converted into 400 x 400-pixel patch representations 204 at, for example, one or more of 5x, lOx, 20x, and 40x magnification scales centered on a same location or point by down-samphng and cropping to the center 400 x 400- pixels.
[0048] In an embodiment, at least one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning convolutional neural network may be used to map each 400 x 400-pixel color patch 204 to patch-level descriptors (i.e., descriptive vectors) 208 at each zoom level. For example, a logits layer of an
Inception-v3 image recognition neural network 206 may be used to map each color patch 204 to patch-level descriptors 208. In some embodiments, principal component analysis (PCA) or another dimensionality reduction technique may be used to reduce dimensions of the patch -level descriptors. For example, it has been shown that PCA may be employed on 400 x 400-pixel patch representations 206 to generate reduced dimensions patch representations 208 with > 96% variance being retained. The patch-level descriptors for the one or more zoom levels (e.g., one or more of 5x, lOx, 20x, and 40x magnification) of the reduced dimension patch-level descriptors 208 may be combined (e.g., concatenated) into a multiscale descriptor 210.
[0049] At 212, analyzed locations may be filtered to include only cancer- enriched locations (as opposed to extracellular matrix or adjacent normal tissue) to reduce computational complexity and ensure a hygienic input to train classifier model 214, which may be one or more of a multiclass support voting machine (SVM) including a radial basis function (RBF) kernel, a naive Bayes classifier, a decision tree, a boosted tree, a random forest classifier, a neural network, a nearest neighbor classifier, a linear classifier, and a nonlinear classifier. In an embodiment, a plurality of scaled patches 204 selected (e.g., randomly) for training may be grouped using, for example, unsupervised clustering such as //-means clustering, where the number of clusters may be determined empirically. Clusters of scaled patches with sufficient cellularity may be investigated further (e.g., by a pathologist) to identify clusters enriched for tumor content. For example, for each SI, patches that fall within the cancer-rich clusters may be used for further analysis. [0050] As noted, classifier model 214 may comprise a multiclass support voting machine, which are generally known to exhibit superior performance on large data sets and may be trained to determine patch-level molecular subtype classifications 216, e.g., for multiscale descriptor 210. These patch-level molecular subtype classifications 216 may then be used to infer a Si-level molecular subtype classification 218 and detect molecular subtype heterogeneity 220.
[0051] FIG. 3 illustrates a block diagram of a system for determining molecular subtype classifications based on pathology sbde images in accordance with an embodiment. In block diagram 300, elements for determining molecular subtype classifications based on pathlogy sbde images include training engine 310, subtype classification engine 320, persistent storage device 330, and main memory device 340. In an embodiment, training engine 310 may be configured to obtain training Sis 1 to N 302, 304, 306 from either one or both of persistent storage device 330 and main memory device 340. Training engine 310 may then configure and train classifier model 214 (e.g., an SVM), which may be stored in either one or both of persistent storage device 330 and main memory device 340, using the training Sis 1 to N 302, 304, 306 as training inputs. For example, training engine 310 may segment each of the training Sis 1 to N 302, 304, 306 into a plurality of scaled patches 204, where each scaled patch of the plurality of scaled patches 204 comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI. Training engine 310 may then convert each scaled patch of the plurahty of scaled patches 204 into a multiscale descriptor using a deep-learning neural network 206 (e.g., one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning convolutional neural network) by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor 208 and combining the patch- level descriptors to generate a multiscale descriptor 210. For example, the patch- level descriptors may be one or more of concatenated, averaged, stacked, or mathematically or empirically mixed or manipulated to generate a multiscale descriptor 210. Training engine 310 may configure and train classifier model 214 to process the multiscale descriptors 210 such that, for each training SI 1 to N 302,
304, 306 classifier model 214 is operable to assign a patch-level molecular subtype classification 216 to each of the plurality of scaled patches corresponding to a training SI, and determine a Si-level molecular subtype classification 218 or heterogeneous classification 220 based on the patch-level molecular subtype classifications 216.
[0052] Training engine 310 may configure subtype classification engine 320 to use trained classifier model 314 to determine a Si-level molecular subtype
classification based on a test SI 312. For example, subtype classification engine 320 may obtain test SI 312; segment test SI 312 into a plurality of scaled patches 204, where each scaled patch of the plurality of scaled patches 204 comprises one or more patch representations at one or more zoom levels that are centered at a location within test SI 312; convert each scaled patch of the plurality of scaled patches into a multiscale descriptor using a deep-learning neural network 206 by, for each scaled patch, mapping each of the set of patch representations to a patch- level descriptor 208 and combining (e.g., concatenating, averaging, stacking, mathematically or empirically mixing or manipulating, etc.) the patch-level descriptors into a multiscale descriptor 210. Subtype classification engine 320 may then process the multiscale descriptors 210 using trained classifier model 314, where trained classifier model 314 is operable to assign a patch -level molecular subtype classification 216 to each of the plurality of scaled patches and determine a Si-level molecular subtype classification 218 or heterogeneous classification 220 based on the patch-level molecular subtype classifications 216.
[0053] It should be noted that the elements in FIG. 3, and the various functions attributed to each of the elements, while exemplary, are described as such solely for the purposes of ease of understanding. One skilled in the art will appreciate that one or more of the functions ascribed to the various elements may be performed by any one of the other elements, and/or by an element (not shown) configured to perform a combination of the various functions. Therefore, it should be noted that any language directed to a training engine 310, a subtype
classification engine 320, a persistent storage device 330 and a main memory device 340 should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, modules, or other types of computing devices operating individually or collectively to perform the functions ascribed to the various elements. Further, one skilled in the art will appreciate that one or more of the functions of the system of FIG. 3 described herein may be performed within the context of a client-server relationship, such as by one or more servers, one or more client devices (e.g., one or more user devices) and/or by a combination of one or more servers and client devices.
[0054] FIG. 4 illustrates a flow diagram of example operations for
determining molecular subtype classifications based on pathology slide images in accordance with an embodiment. In flow diagram 400, a plurality of training Sis 1 to N 302, 304, 306, e.g., each corresponding to a patient, is obtained and segmented into a plurality of scaled patches at step 402. For example, each scaled patch of the plurality of scaled patches may comprise one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI.
[0055] At step 404, each scaled patch of the plurality of scaled patches is converted into a multiscale descriptor using a deep -learning neural network such as at least one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep -learning convolutional neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch -level descriptor and combining the patch-level descriptors. For example, a logits layer of an Inception- v3 neural network may be configured to map each of the one or more patch representations to a patch-level descriptor. The patch-level descriptors may comprise multidimensional descriptive vectors, and principal component analysis (PCA) or another dimensionality reduction technique may be used to reduce dimensions of the multidimensional descriptive vectors. Further, combining the patch-level descriptors may comprise one or more of concatenating, averaging, stacking, or mathematically or empirically mixing or manipulating the patch-level descriptors to generate a multiscale descriptor. In some embodiments, a neural network may be used to determine or learn an optimal method of combining the patch -level descriptors to generate a multiscale descriptor.
[0056] At step 406, a classifier model (e.g., an SVM) is configured and trained to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch-level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI and determine a SI- level molecular subtype classification based on the patch-level molecular subtype classifications. For example, the patch-level molecular subtype classification and Si-level molecular subtype classification may be heterogenous classifications comprising a plurality of molecular subtypes. A molecular subtyping engine is configured to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI at step 408.
[0057] FIG. 5 illustrates a flow diagram of example operations for
determining molecular subtype classifications based on pathology slide images in accordance with an embodiment. In flow diagram 500, a subtype classification engine, e.g., subtype classification engine 320, is configured to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI. For example, a test SI is obtained at step 502. At step 504, the test SI is segmented into a plurality of scaled patches, where each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI.
[0058] At step 506, each scaled patch of the plurality of scaled patches is converted into a multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the set of patch representations to a patch-level descriptor and combining the patch-level descriptors. For example, combining the patch-level descriptors may comprise one or more of concatenating, averaging, stacking, or mathematically or empirically mixing or manipulating the patch-level descriptors to generate a multiscale descriptor. In some embodiments, a neural network may be used to determine or learn an optimal method of combining the patch -level descriptors to generate a multiscale descriptor.
[0059] At step 508, the multiscale descriptors are processed using the trained classifier model, where the trained classifier model is operable to assign a patch- level molecular subtype classification to each of the plurality of scaled patches and determine a SI -level molecular subtype classification based on the patch-level molecular subtype classifications. In some embodiments, an indication of a selected region of interest determined to be cancer-enriched within the test SI may be obtained, e.g., from a user via a user interface, or selected automatically based on, for example, one or more of biological criteria, an output of a heuristic machine learning or image processing algorithm, or an output of a deep -learning
convolutional algorithm. For example, the selected region of interest may be a centroid or closed curve, and the plurality of scaled patches may comprise the selected region of interest.
[0060] TEST RESULTS
[0061] Test results with respect to the various embodiments herein have been obtained based on 1,142 diagnostic (training) Sis from 793 breast cancer patients with associated PAM50 labels that were obtained from TCGA sources. On average, each training SI was 122,600 x 220,968 pixels at the 5x magnification level, resulting in 2,709,065 total analysis locations. After applying color filtering to remove non-tissue areas, 1,985,745 locations remained. Each location was down- sampled from the 20x zoom -level to represent 20x, lOx, and 5x zoom levels centered on a same location, resulting in 5,957,235 400 x 400-pixel color patches. These two- dimensional color patches were converted to vectors of length 2048 using an
Inception-v3 logits layer. Principal component analysis (PCA) was applied to 5x, lOx, and 20x patch-level descriptors (i.e., multidimensional vectors) independently and various levels of dimensionality reduction were explored, as shown in Table 1 below.
Figure imgf000028_0001
Table 1
[0062] A patch-level descriptor length of 768 was found to retain > 96% variance in each zoom level. After converting the training Sis to multiscale patch representations, the total data set size was a matrix of 1,985,745 locations x 2304 features.
[0063] Cancer-enrichment
[0064] In total, 238,728 multiscale patch representations were randomly selected for defining cancer-enriched centroids. Twenty-four clusters were identified using //-means clustering. Fourteen of the 24 clusters were sufficiently populated with cellular structures for further analysis. A pathologist annotated 24 patches from each cluster (336 total) to determine whether the patch contained tumor tissue (see Table 2 below).
Figure imgf000029_0001
Table 2 - Breast cancer ¾-means clusters
[0065] FIG. 6 illustrates a graphical representation of exemplary scaled patches of pathology slide images in accordance with an embodiment. In representation 600, five leading clusters had mostly cancer-rich samples (>80% of patches were cancer-rich). Particularly, Cluster 3 602 was 100% cancer-rich and represented 5.51% of the patches; Cluster 5 604 was 91.67% cancer-rich and represented 4.52% of the patches; Cluster 11 606 was 87.50% cancer-rich and represented 4.03% of the patches; Cluster 16 608 was 87.50% cancer-rich and represented 4.31% of the patches; and Cluster 2 610 was 82.61% cancer-rich and represented 5.21% of the patches.
[0066] PAM50 classification
[0067] Table 3 below summarizes the accuracy of subtype classifications at the patch, SI, and patient level in held-out test samples in fivefold cross-validation of the training SI samples.
Figure imgf000030_0001
Table 3 - Subtyping accuracy across folds: Sample size and performance statistics within the held-out test set across 5-folds of cross validation
[0068] On average, Sis from 354 patients were used to train and 88 Sis from patients were used to test accuracy. Within the held-out test patients, individual patches were classified less accurately than when aggregated into a single Si-level classification (58.6% vs 66.1% correct). Where multiple diagnostic Sis are available for a given patient, aggregating across slides may also increase accuracy (66.1% vs 67.3% correct).
[0069] Table 4 below shows performance in two validation sets: 1 unselected group of 223 patients, and a second group containing 104 patients with low- confidence RNAseq-based PAM50 classifications.
Figure imgf000031_0001
Table 4 - Subtyping confusion and accuracy in two test settings:
Confusion matrices between true labels (columns) and predicted labels (rows) at the patient-level for unselected (left) and low-confidence (right) by RNAseq-based classification
[0070] Within the group of unselected patients, classification performance was similar to the cross -validated setting (65.9% vs 66.1% correct). The largest sources of confusion were misclassifying luminal A patients as luminal B; and misclassifying basal-hke into other categories. Within the low-confidence patients, overall patient accuracy was much lower (56.7% correct), potentially due to this population being enriched for heterogeneity.
[0071] FIG. 7 illustrates a graphical representation of a subtyping cancer- enriched scaled patches of pathology shde images in accordance with an
embodiment. In representation 700, patch-level subtype classification results on four SI examples are shown. Particularly, patch A 702 was determined to comprise 100% basal-like subtypes; patch B 704 was determined to have 2.53% basal-like, 68.35% HER2 -enriched, and 29.11% luminal A subtypes; patch C 706 was determined to have a 100% luminal A subtypes; and patch D 708 was determined to have 2.50% basal-like, 1.25% HER2 -enriched, 8.75% luminal A and 87.50% luminal B subtypes.
[0072] Detecting heterogeneity
[0073] FIG. 8 illustrates graphical representations 800 of independent evidence of heterogeneity in accordance with an embodiment. In representation A 802, seventy-six Sis with > 30% of patches classified as basal-like and > 30% of patches classified as luminal A were considered as possible heterogenous (HET) samples. These HET samples were analyzed by comparing them to pure luminal A (PLA) and pure basal-like (PBL) samples. To define pure subtypes, thresholds that maximized agreement between patch-based classifications and RNAseq-based classifications were identified using Youden analysis.
[0074] A threshold of at least 63.7% of patches classifying as luminal A was found to maximize agreement with RNAseq-based luminal A classification, with a true-positive rate (TPR) of 0.80 and false positive rate (FPR) of 0.15. At this threshold 204 Sis were classified as PLA. Similarly, a threshold of at least 40.5% patches classifying as basal-like maximized agreement with RNAseq-based basal- like classification, with TPR of 0.81 and FPR of 0.14. This resulted in assigning 81 Sis as PBL. Twenty-two patients did not have sufficient mixture and did not have sufficient purity in either subtype to classify as either HET, PBL, or PLA, and were not included in further analyses. [0075] Evidence of heterogeneity
[0076] Overall RNAseq expression profiles were compared between pure and heterogeneous settings as defined by image-based classifications. Batch analysis on the PAM50 gene set showed low separation using Scatter Separability Criterion (SSC) between PBL vs HET (SSC = 0.34) and PLA vs HET (SSC = 0.507) while retaining the expected significant separation between PBL vs PLA subsets (SSC = 0.987), confirming that the HET expression profile is intermediate between PBL and PLA subtypes, as illustrated representation B 804. Furthermore, specific HR expression for the HET subset was intermediate between PLA and PLB for the three key breast-related receptors; estrogen receptor alpha ( ERdESRl ),
progesterone receptor (PR /PGR), and human epidermal growth factor receptor 2 (SER2IERBB2), as illustrated in representation C 806. For example, mean ESR1 expression was 1.9-fold higher in PLA vs HET (p = 3.4 x 10-7), yet mean HET ESR1 expression was 3.2-fold higher than that of PBL (p = 2.7 x 10-5). Indeed, HET HR expression levels were significantly distinct from both pure subsets in all 3 receptors (p-values range from 3.4 x 10-7 to 3.0 x 10-3). As luminal A and basal-like subtypes have been shown to have significantly different prognoses, survival analysis was used to confirm that the HET subset has prognostic value, as illustrated in representation D 808. Patients identified as HET were remarkably intermediate in survival characteristics between the extended overall survival (OS) of luminal A patients and the diminished OS of basal-like patients in Kaplan-Meier analysis: log-rank tests for differential survival were significant between PBL and PLA patients (p = 0.027), yet neither HET vs PBL nor HET vs PLA subsets were significantly distinct (p = 0.297 and p = 0.411, respectively). It should be noted that while the analysis in this test case was limited to basal-like and luminal A
heterogeneous samples, similar analyses could be performed using the embodiments herein for other subtype combinations such as, e.g., HER2-enriched and luminal A, luminal A and luminal B, or even three-way subtype combinations.
[0077] As such, presented herein are systems and methods for determining molecular subtype classifications based on pathology slide images. Traditionally, such classification has been accomphshed using gene expression signatures, however, the embodiments herein have been shown to achieve an overall
concordance with RNAseq-based classification of 65.92% on 223 test patients.
Intra-tumor heterogeneity, common in breast tumors (especially TNBC), may play a role in reducing concordance with expression-based subtyping. The embodiments herein summarize scaled patches into a patient -level classification by majority area, whereas expression profiles are summaries based on total transcript counts.
Concordance with expression-based subtyping may be improved in the future by increasing weight given to cell-dense or transcriptionally overactive patches. Yet because of this sensitivity to subclonal diversity, the classification framework presented herein has novel apphcation as a method for detecting intratumor heterogeneity. Inspection of patients that were misclassified revealed patterns of discordant subtypes at the patch level. Further evidence that these tumors are in fact heterogeneous populations was found in hormone-receptor expression levels, transcriptomic profiles, and survival characteristics. Specifically, patients that were classified as luminal A subtype but had basal-like subclones have poorer survival compared to homogeneous luminal A patients. The ability to identify aggressive subclonal populations from diagnostic pathology images has significant prognostic implications. For example, the specific regions located by such methods could be further confirmed as molecularly distinct subclones by laser
microdissection and separated characterization.
[0078] Systems, apparatus, and methods described herein may be
implemented using digital circuitry, or using one or more computers using well- known computer processors, memory units, storage devices, computer software, and other components. Typically, a computer includes a processor for executing instructions and one or more memories for storing instructions and data. A computer may also include, or be coupled to, one or more mass storage devices, such as one or more magnetic disks, internal hard disks and removable disks, magneto optical disks, optical disks, etc.
[0079] Systems, apparatus, and methods described herein may be
implemented using computers operating in a client-server relationship. Typically, in such a system, the client computers are located remotely from the server computers and interact via a network. The client-server relationship may be defined and controlled by computer programs running on the respective client and server computers. [0080] A high-level block diagram of an exemplary chent-server relationship that may be used to implement systems, apparatus and methods described herein is illustrated in FIG. 9. Client-server relationship 900 comprises chent 910 in communication with server 920 via network 930 and illustrates one possible division of determining molecular subtype classifications based on pathology slide images between client 910 and server 920. For example, chent 910, in accordance with the various embodiments described above, may obtain a test SI and send the test SI to server 920. Server 920 may, in turn, receive the test SI from client 910; segment the test SI into a plurality of scaled patches, where each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI; convert each scaled patch of the plurahty of scaled patches into a multiscale descriptor using a deep-learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor and combining the patch- level descriptors; determine a Si-level molecular subtype classification for the test SI using a classifier model trained to process the multiscale descriptors such that a patch-level molecular subtype classification is assigned to each of the plurality of scaled patches, and the Si-level molecular subtype classification is determined based on the patch-level molecular subtype classifications; and send the Si-level molecular subtype classification to client 910.
[0081] One skilled in the art will appreciate that the exemplary client-server relationship illustrated in FIG. 9 is only one of many client-server relationships that are possible for implementing the systems, apparatus, and methods described herein. As such, the client-server relationship illustrated in FIG. 9 should not, in any way, be construed as limiting. Examples of client devices 910 can include cellular smartphones, kiosks, personal data assistants, tablets, robots, vehicles, web cameras, or other types of computing devices.
[0082] Systems, apparatus, and methods described herein may be
implemented using a computer program product tangibly embodied in an
information carrier, e.g., in a non-transitory machine-readable storage device, for execution by a programmable processor; and the method steps described herein, including one or more of the steps of FIGS. 4 and 5, may be implemented using one or more computer programs that are executable by such a processor. A computer program is a set of computer program instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
[0083] A high-level block diagram of an exemplary apparatus that may be used to implement systems, apparatus and methods described herein is illustrated in Fig. 10. Apparatus 1000 comprises a processor 1010 operatively coupled to a persistent storage device 1020 and a main memory device 1030. Processor 1010 controls the overall operation of apparatus 1000 by executing computer program instructions that define such operations. The computer program instructions may be stored in persistent storage device 1020, or other computer-readable medium, and loaded into main memory device 1030 when execution of the computer program instructions is desired. For example, training engine 310 and subtype classification engine 320 may comprise one or more components of computer 1000. Thus, the method steps of FIGS. 4 and 5 can be defined by the computer program instructions stored in main memory device 1030 and/or persistent storage device 1020 and controlled by processor 1010 executing the computer program instructions. For example, the computer program instructions can be implemented as computer executable code programmed by one skilled in the art to perform an algorithm defined by the method steps of FIGS. 4 and 5. Accordingly, by executing the computer program instructions, the processor 1010 executes an algorithm defined by the method steps of FIGS. 4 and 5. Apparatus 1000 also includes one or more network interfaces 1080 for communicating with other devices via a network.
Apparatus 1000 may also include one or more input/output devices 1090 that enable user interaction with apparatus 1000 (e.g., display, keyboard, mouse, speakers, buttons, etc.).
[0084] Processor 1010 may include both general and special purpose microprocessors and may be the sole processor or one of multiple processors of apparatus 1000. Processor 1010 may comprise one or more central processing units (CPUs), and one or more graphics processing units (GPUs), which, for example, may work separately from and/or multi-task with one or more CPUs to accelerate processing, e.g., for various image processing applications described herein.
Processor 1010, persistent storage device 1020, and/or main memory device 1030 may include, be supplemented by, or incorporated in, one or more application- specific integrated circuits (ASICs) and/or one or more field programmable gate arrays (FPGAs).
[0085] Persistent storage device 1020 and main memory device 1030 each comprise a tangible non-transitory computer readable storage medium. Persistent storage device 1020, and main memory device 1030, may each include high-speed random access memory, such as dynamic random access memory (DRAM), static random access memory (SRAM), double data rate synchronous dynamic random access memory (DDR RAM), or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices such as internal hard disks and removable disks, magneto-optical disk storage devices, optical disk storage devices, flash memory devices, semiconductor memory devices, such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory (DVD-ROM) disks, or other non-volatile solid state storage devices.
[0086] Input/output devices 1090 may include peripherals, such as a printer, scanner, display screen, etc. For example, input/output devices 1090 may include a display device such as a cathode ray tube (CRT), plasma or liquid crystal display (LCD) monitor for displaying information (e.g., a DNA accessibility prediction result) to a user, a keyboard, and a pointing device such as a mouse or a trackball by which the user can provide input to apparatus 1000.
[0087] Any or all of the systems and apparatuses discussed herein, including training engine 310 and subtype classification engine 320 may be performed by, and/or incorporated in, an apparatus such as apparatus 1000. Further, apparatus 1000 may utilize one or more neural networks or other deep-learning techniques to perform training engine 310 and subtype classification engine 320 or other systems or apparatuses discussed herein.
[0088] One skilled in the art will recognize that an implementation of an actual computer or computer system may have other structures and may contain other components as well, and that FIG. 10 is a high-level representation of some of the components of such a computer for illustrative purposes.
[0089] The foregoing specification is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the specification, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.

Claims

CLAIMS We Claim:
1. A computerized method of determining molecular subtype classifications based on pathology sbde images (Sis), comprising:
obtaining a plurality of training Sis;
segmenting each of the training Sis into a plurahty of scaled patches,
wherein each scaled patch of the plurahty of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI; converting each scaled patch of the plurahty of scaled patches into a
multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor and combining the patch- level descriptors;
configuring a classifier model to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch- level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI, and determine a Si-level molecular subtype classification based on the patch-level molecular subtype classifications;
training the classifier model using the multiscale descriptors; and
configuring a molecular subtyping engine to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI.
2. The method of claim 1, wherein each of the plurahty of training Sis corresponds to a patient.
3. The method of claim 1, wherein each of the scaled patches comprises relatively high-zoom level patches and relatively low-zoom level patches with respect to each other.
4. The method of claim 3, wherein each of the scaled patches comprises one or more of 5x, lOx, 20x, and 40x zoom-level patch representations.
5. The method of claim l , wherein the patch-level descriptors comprise multidimensional descriptive vectors.
6. The method of claim 1, further comprising using principal component analysis (PCA) or another dimensionality reduction technique to reduce
dimensions of the multidimensional descriptive vectors.
7. The method of claim 1, wherein the Si-level molecular subtype
classification is determined based on majority area voting criteria.
8. The method of claim 1, wherein the Si-level molecular subtype
classification is determined based on weighting criteria for the plurahty of scaled patches.
9. The method of claim 8, wherein the weighting criteria is based on at least one of cellular density and transcriptional activity.
10. The method of claim 1, wherein the patch-level molecular subtype classification is a heterogenous classification comprising a plurality of molecular subtypes.
11. The method of claim 1, wherein the Si-level molecular subtype classification is a heterogenous classification comprising a plurality of molecular subtypes.
12. The method of claim 1, wherein the Si-level molecular subtype classification comprises at least one of a Prosigna Breast Cancer Prognostic Gene Signature Assay or PAM50 subtype classification.
13. The method of claim 12, wherein the Si-level molecular subtype classification is one of basal -like, HER2-enriched, luminal A, luminal B, and normal-like.
14. The method of claim 13, wherein the Si-level molecular subtype classification comprises a combination of different subtype classifications.
15. The method of claim 1, further comprising selecting a subset of the plurahty of scaled patches for further processing.
16. The method of claim 15, wherein selecting the subset of the plurahty of scaled patches comprises clustering the plurahty of scaled patches using //-means clustering or another unsupervised clustering technique.
17. The method of claim 15, wherein the subset of the plurahty of scaled patches is randomly selected to define cancer-enriched areas.
18. The method of claim 15, wherein the subset of the plurahty of scaled patches is selected to summarize tumor content within a training SI.
19. The method of claim 1, further comprising:
filtering the plurality of scaled patches for a minimum color variance; and
ehminating each scaled patch determined to be empty space or
background from further processing based on the filtering.
20. The method of claim 1, wherein the deep-learning neural network comprises at least one of an Inception-v3, resnet34, resnetl52, densenetl69, densenet201 or other deep-learning neural network.
21. The method of claim 20, wherein a logits layer of an Inception-v3 neural network is configured to map each of the one or more patch representations to a patch -level descriptor.
22. The method of claim 1, wherein combining the patch-level descriptors comprises one or more of concatenating, averaging, stacking, or mathematically or empirically mixing or manipulating the patch-level descriptors to generate the multiscale descriptor.
23. The method of claim 22, further comprising using a neural network to determine or optimize a method of combining the patch-level descriptors to generate the multiscale descriptor.
24. The method of claim 1, wherein the classifier model may comprise one or more of a multiclass support voting machine (SVM) including a radial basis function (RBF) kernel, a naive Bayes classifier, a decision tree, a boosted tree, a random forest classifier, a neural network, a nearest neighbor classifier, a hnear classifier, and a nonlinear classifier.
25. The method of claim 1, wherein the plurality of training Sis comprises at least 1000 pathology slide images.
26. The method of claim 1, wherein the plurality of scaled patches comprises at least 200,000 patch representations.
27. The method of claim 1, wherein the plurality of training Sis comprises hematoxylin and eosin (H&E)-stained whole shde images.
28. The method of claim 1, further comprising:
obtaining the test SI;
segmenting the test SI into a plurality of scaled patches, wherein each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI;
converting each scaled patch of the plurahty of scaled patches into a multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor and combining the patch -level descriptors; and
processing the multiscale descriptors using the trained classifier
model, wherein the trained classifier model is operable to assign a patch-level molecular subtype classification to each of the plurahty of scaled patches and determine a Si-level molecular subtype classification based on the patch-level
molecular subtype classifications.
29. The method of claim 28, further comprising obtaining an indication of a selected region of interest determined to be cancer-enriched the test SI.
Figure imgf000047_0001
30. The method of claim 29, wherein the indication of the selected region of interest is received from a user via a user interface or selected automatically based on one or more of biological criteria, an output of a heuristic machine learning or image processing algorithm, or an output of a deep -learning convolutional algorithm.
31. The method of claim 29, wherein the selected region of interest is a centroid or closed curve.
32. The method of claim 29, wherein the plurahty of scaled patches comprises the selected region of interest.
33. An apparatus for determining molecular subtype classifications based on pathology slide images (Sis), the apparatus comprising:
a processor;
a memory device storing software instructions for determining molecular subtype classifications; and
a training engine executable on the processor according to software instructions stored in the memory device and configured to:
obtain a plurahty of training Sis; segment each of the training Sis into a plurality of scaled patches, wherein each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within a
corresponding training SI;
convert each scaled patch of the plurality of scaled patches into a
multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor and combining the patch -level descriptors;
configure a classifier model to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch-level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI, and determine a Si-level molecular subtype classification based on the patch-level molecular subtype classifications;
train the classifier model using the multiscale descriptors; and configure a molecular subtyping engine to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI.
34. A non-transitory computer-readable medium having computer instructions stored thereon for determining molecular subtype classifications based on pathology slide images (Sis), which, when executed by a processor, cause the processor to perform one or more steps comprising:
obtaining a plurality of training Sis;
segmenting each of the training Sis into a plurahty of scaled patches,
wherein each scaled patch of the plurahty of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within a corresponding training SI; converting each scaled patch of the plurality of scaled patches into a
multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the set of patch representations to a patch-level descriptor and combining the patch-level descriptors; configuring a classifier model to process the multiscale descriptors such that, for each training SI, the classifier model is operable to assign a patch- level molecular subtype classification to each of the plurality of scaled patches corresponding to the training SI, and determine a Si-level molecular subtype classification based on the patch-level molecular subtype classifications;
training the classifier model using the multiscale descriptors; and
configuring a molecular subtyping engine to use the trained classifier model to determine a Si-level molecular subtype classification for a test SI.
35. An apparatus for determining molecular subtype classifications based on pathology slide images (Sis), the apparatus comprising:
a processor;
a memory device storing software instructions for determining molecular subtype classifications; and
a subtype classification engine executable on the processor according to software instructions stored in the memory device and configured to:
obtain a test SI;
segment the test SI into a plurality of scaled patches, wherein each scaled patch of the plurality of scaled patches comprises one or more patch representations at one or more zoom levels that are centered at a location within the test SI; convert each scaled patch of the plurality of scaled patches into a multiscale descriptor using a deep -learning neural network by, for each scaled patch, mapping each of the one or more patch representations to a patch-level descriptor and combining the patch -level descriptors;
determine a Si-level molecular subtype classification for the test SI using a classifier model trained to process the multiscale descriptors such that a patch-level molecular subtype classification is assigned to each of the plurality of scaled patches, and the Si-level molecular subtype classification is determined based on the patch-level molecular subtype classifications.
36. The apparatus of claim 35, wherein the subtype classification engine comprises at least one of a cellular smartphone, kiosk, personal data assistant, tablet, robot, vehicle, web camera, or computing device.
PCT/US2018/062911 2017-11-30 2018-11-28 Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning WO2019108695A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
SG11202003330PA SG11202003330PA (en) 2017-11-30 2018-11-28 Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning
CA3079438A CA3079438A1 (en) 2017-11-30 2018-11-28 Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning
KR1020207014947A KR20200066732A (en) 2017-11-30 2018-11-28 Intratumoral heterogeneity detection of molecular subtypes in pathology slide images using deep learning
AU2018374207A AU2018374207A1 (en) 2017-11-30 2018-11-28 Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning
IL274101A IL274101A (en) 2017-11-30 2020-04-21 Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762593224P 2017-11-30 2017-11-30
US62/593,224 2017-11-30
US201862656918P 2018-04-12 2018-04-12
US62/656,918 2018-04-12

Publications (1)

Publication Number Publication Date
WO2019108695A1 true WO2019108695A1 (en) 2019-06-06

Family

ID=66665290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/062911 WO2019108695A1 (en) 2017-11-30 2018-11-28 Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning

Country Status (7)

Country Link
KR (1) KR20200066732A (en)
AU (1) AU2018374207A1 (en)
CA (1) CA3079438A1 (en)
IL (1) IL274101A (en)
SG (1) SG11202003330PA (en)
TW (1) TWI689944B (en)
WO (1) WO2019108695A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246579A (en) * 2019-06-13 2019-09-17 西安九清生物科技有限公司 A kind of pathological diagnosis method and device
CN110532408A (en) * 2019-08-28 2019-12-03 广州金域医学检验中心有限公司 Pathological section management method, device, computer equipment and storage medium
WO2021154849A1 (en) * 2020-01-28 2021-08-05 PAIGE.AI, Inc. Systems and methods for processing electronic images for computational detection methods
WO2022015819A1 (en) * 2020-07-15 2022-01-20 Genentech, Inc. Assessing heterogeneity of features in digital pathology images using machine learning techniques
CN115330778A (en) * 2022-10-13 2022-11-11 浙江华是科技股份有限公司 Substation target detection network model training method and system
WO2023071406A1 (en) * 2021-10-29 2023-05-04 复旦大学附属华山医院 Classification method and system for classifier used for immune-related disease molecular typing and subtyping

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI701638B (en) * 2019-09-20 2020-08-11 國立中興大學 Learning technique of mechanical application on automatic optical inspection system
TWI744798B (en) * 2020-02-13 2021-11-01 國立陽明交通大學 Evaluation method and system of neuropsychiatric diseases based on brain imaging
CN111507381B (en) * 2020-03-31 2024-04-02 上海商汤智能科技有限公司 Image recognition method, related device and equipment
KR102437193B1 (en) 2020-07-31 2022-08-30 동국대학교 산학협력단 Apparatus and method for parallel deep neural networks trained by resized images with multiple scaling factors
KR102304370B1 (en) 2020-09-18 2021-09-24 동국대학교 산학협력단 Apparatus and method of analyzing status and change of wound area based on deep learning
JP7432182B2 (en) * 2020-10-18 2024-02-16 エイアイエックスメド・インコーポレイテッド Method and system for acquiring cytology images in cytopathology examination
TWI815057B (en) * 2020-11-11 2023-09-11 臺北醫學大學 Visualization methods for cancer lesions
US11610306B2 (en) 2020-12-16 2023-03-21 Industrial Technology Research Institute Medical image analysis method and device
TWI836280B (en) * 2020-12-16 2024-03-21 財團法人工業技術研究院 Medical image analysis method and device
TWI781027B (en) * 2021-12-22 2022-10-11 國立臺南大學 Neural network system for staining images and image staining conversion method
JP2023123987A (en) * 2022-02-25 2023-09-06 株式会社Screenホールディングス Model selection method and image processing method
KR20240076391A (en) * 2022-11-23 2024-05-30 (주) 노보믹스 Artificial intelligence software for image reading of gastric cancer pathology

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016191462A1 (en) * 2015-05-28 2016-12-01 Tokitae Llc Image analysis systems and related methods
US20170169567A1 (en) * 2014-05-23 2017-06-15 Ventana Medical Systems, Inc. Systems and methods for detection of structures and/or patterns in images
US20170193175A1 (en) * 2015-12-30 2017-07-06 Case Western Reserve University Prediction of recurrence of non-small cell lung cancer

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2575859A1 (en) * 2004-08-11 2006-02-23 Aureon Laboratories, Inc. Systems and methods for automated diagnosis and grading of tissue images
TWI399194B (en) * 2011-01-25 2013-06-21 Univ Nat Yunlin Sci & Tech Semi-automatic knee cartilage mri image segmentation based on cellular automata
WO2015189264A1 (en) * 2014-06-10 2015-12-17 Ventana Medical Systems, Inc. Predicting breast cancer recurrence directly from image features computed from digitized immunohistopathology tissue slides

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169567A1 (en) * 2014-05-23 2017-06-15 Ventana Medical Systems, Inc. Systems and methods for detection of structures and/or patterns in images
WO2016191462A1 (en) * 2015-05-28 2016-12-01 Tokitae Llc Image analysis systems and related methods
US20170193175A1 (en) * 2015-12-30 2017-07-06 Case Western Reserve University Prediction of recurrence of non-small cell lung cancer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANDREW JANOWCZYK ET AL.: "Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases", JOURNAL OF PATHOLOGY INFORMATICS, vol. 7, no. 1, 26 July 2016 (2016-07-26), pages 1 - 18, XP055559403 *
LE HOU ET AL.: "Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification", PROC IEEE COMPUT SOC CONF COMPUT VIS PATTERN RECOGNIT, 28 October 2016 (2016-10-28), pages 2424 - 2433, XP0055532412 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246579A (en) * 2019-06-13 2019-09-17 西安九清生物科技有限公司 A kind of pathological diagnosis method and device
CN110246579B (en) * 2019-06-13 2023-06-09 西安九清生物科技有限公司 Pathological diagnosis method and device
CN110532408A (en) * 2019-08-28 2019-12-03 广州金域医学检验中心有限公司 Pathological section management method, device, computer equipment and storage medium
WO2021154849A1 (en) * 2020-01-28 2021-08-05 PAIGE.AI, Inc. Systems and methods for processing electronic images for computational detection methods
US11176676B2 (en) 2020-01-28 2021-11-16 PAIGE.AI, Inc. Systems and methods for processing electronic images for computational detection methods
US11423547B2 (en) 2020-01-28 2022-08-23 PAIGE.AI, Inc. Systems and methods for processing electronic images for computational detection methods
US11640719B2 (en) 2020-01-28 2023-05-02 PAIGE.AI, Inc. Systems and methods for processing electronic images for computational detection methods
US11995903B2 (en) 2020-01-28 2024-05-28 PAIGE.AI, Inc. Systems and methods for processing electronic images for computational detection methods
WO2022015819A1 (en) * 2020-07-15 2022-01-20 Genentech, Inc. Assessing heterogeneity of features in digital pathology images using machine learning techniques
WO2023071406A1 (en) * 2021-10-29 2023-05-04 复旦大学附属华山医院 Classification method and system for classifier used for immune-related disease molecular typing and subtyping
CN115330778A (en) * 2022-10-13 2022-11-11 浙江华是科技股份有限公司 Substation target detection network model training method and system
CN115330778B (en) * 2022-10-13 2023-03-10 浙江华是科技股份有限公司 Substation target detection network model training method and system

Also Published As

Publication number Publication date
IL274101A (en) 2020-06-30
TWI689944B (en) 2020-04-01
AU2018374207A1 (en) 2020-04-30
CA3079438A1 (en) 2019-06-06
TW201926359A (en) 2019-07-01
KR20200066732A (en) 2020-06-10
SG11202003330PA (en) 2020-05-28

Similar Documents

Publication Publication Date Title
WO2019108695A1 (en) Detecting intratumor heterogeneity of molecular subtypes in pathology slide images using deep-learning
Jaber et al. A deep learning image-based intrinsic molecular subtype classifier of breast tumors reveals tumor heterogeneity that may affect survival
Park et al. Molecular profiling of single circulating tumor cells from lung cancer patients
Li et al. Machine learning for lung cancer diagnosis, treatment, and prognosis
Yuan et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling
Kim et al. A deep learning approach for rapid mutational screening in melanoma
List et al. Classification of breast cancer subtypes by combining gene expression and DNA methylation data
AU2021275995A1 (en) Predicting disease outcomes using machine learned models
Brück et al. Machine learning of bone marrow histopathology identifies genetic and clinical determinants in patients with MDS
Kawaguchi et al. Gene Expression Signature–Based Prognostic Risk Score in Patients with Primary Central Nervous System Lymphoma
Seraphin et al. Prediction of heart transplant rejection from routine pathology slides with self-supervised deep learning
US20220101135A1 (en) Systems and methods for using a convolutional neural network to detect contamination
CA3163492A1 (en) Real-world evidence of diagnostic testing and treatment patterns in u.s. breast cancer patients
Chan et al. Single cell profiling reveals novel tumor and myeloid subpopulations in small cell lung cancer
Zhang et al. A novel diagnostic approach for the classification of small B-cell lymphoid neoplasms based on the NanoString platform
Jørgensen et al. Untangling the intracellular signalling network in cancer—A strategy for data integration in acute myeloid leukaemia
US20220180626A1 (en) Weakly supervised learning with whole slide images
Xu et al. Using histopathology images to predict chromosomal instability in breast cancer: a deep learning approach
McCaw et al. Machine learning enabled prediction of digital biomarkers from whole slide histopathology images
Han et al. Molecular bases of morphometric composition in Glioblastoma multiforme
Wang et al. De-noising spatial expression profiling data based on in situ position and image information
Glas et al. MammaPrint® translating research into a diagnostic test
WO2024054073A1 (en) Biomarker for diagnosing pre-chemotherapy resistance in solid cancer patients and method for providing information for diagnosing pre-chemotherapy resistance, using same
Riccadonna et al. Supervised classification of combined copy number and gene expression data
Brück et al. Histopathological Landscape of Molecular Genetics and Clinical Determinants in MDS Patients

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883363

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3079438

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2018374207

Country of ref document: AU

Date of ref document: 20181128

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20207014947

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020529294

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018883363

Country of ref document: EP

Effective date: 20200630

122 Ep: pct application non-entry in european phase

Ref document number: 18883363

Country of ref document: EP

Kind code of ref document: A1