WO2020014477A1 - Methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes - Google Patents

Methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes Download PDF

Info

Publication number
WO2020014477A1
WO2020014477A1 PCT/US2019/041395 US2019041395W WO2020014477A1 WO 2020014477 A1 WO2020014477 A1 WO 2020014477A1 US 2019041395 W US2019041395 W US 2019041395W WO 2020014477 A1 WO2020014477 A1 WO 2020014477A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
test image
instance
breast cancer
classes
Prior art date
Application number
PCT/US2019/041395
Other languages
French (fr)
Inventor
Charles Maurice PEROU
Heather Dunlop COUTURE
Lindsay Almquist WILLIAMS
Sarah NYANTE
James Stephen Marron
Melissa TROESTER
Marc Niethammer
Joseph Geradts
Ebonee BUTLER
Original Assignee
The University Of North Carolina At Chapel Hill
City Of Hope
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The University Of North Carolina At Chapel Hill, City Of Hope filed Critical The University Of North Carolina At Chapel Hill
Publication of WO2020014477A1 publication Critical patent/WO2020014477A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the subject matter described herein relates to image analysis with deep learning. More particularly, the subject matter described herein relates to methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes, where the classes include tumor grade, estrogen receptor (ER) status, histologic subtype, risk of reocurrence, and intrinsic subtype.
  • ER estrogen receptor
  • Predicting breast cancer grade, ER status, histologic subtype, risk of reoccurrence, and intrinsic subtype can involve costly immunochemistry, genomic tests, and/or review by a pathologist that may not be available to all patients.
  • image analysis to predict breast cancer classes such as tumor grade, ER status, histologic subtype, risk of reoccurrence and intrinsic subtype, is difficult because the image features needed to perform these predictions may not be apparent in the images. Accordingly, there exists the need for improved methods for image analysis to predict breast cancer classes.
  • RNA-based, multi-gene molecular assays are available and widely used for patients with ER-positive/HER2-negative breast cancers.
  • RNA-based genomic tests can be costly and are not available in many countries.
  • Methods for inferring molecular subtype from histologic images may identify patients most likely to benefit from further genomic testing.
  • haemotoxylin and eosin (H&E) stained histologic images we developed an image analysis approach using deep learning.
  • a training set of 571 breast tumors was used to create image-based classifiers for tumor grade, ER status, PAM50 intrinsic subtype, histologic subtype, and risk of recurrence score (ROR-PT).
  • method for image analysis using deep learning to predict breast cancer classes includes receiving a test image.
  • the method further includes generating test image instances from the test image.
  • the method further includes inputting the test image instances into a convolutional neural network (CNN), which extracts features from each of the test image instances.
  • CNN convolutional neural network
  • the method further includes applying an instance support vector machine (SVM) to the features to predict breast cancer classes for each of the instances.
  • SVM instance support vector machine
  • the method further includes aggregating outputs from the instance SVM to predict breast cancer classes for the test image.
  • the subject matter described herein can be implemented in software in combination with hardware and/or firmware.
  • the subject matter described herein can be implemented in software executed by a processor.
  • the subject matter described herein can be implemented using a non-transitory computer readable medium having stored thereon computer executable instructions that when executed by a processor of a computer control the computer to perform steps.
  • Exemplary computer readable media suitable for implementing the subject matter described herein include non-transitory computer-readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits.
  • a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
  • Figure 1 a is a histogram for probability of high grade tumor by image analysis according to proportion of pathologist-classified low-intermediate (black) or high grade (red) in the test set. The cut point of >0.80 was selected.
  • Figure 1 b is a bee swarm plot displaying pathologist classification of tumor grade as a function of the image grade score in the test set. Points within each grade group are adjusted horizontally to avoid overlap. The black dots indicate image analysis classified low-intermediate tumor grade and the red dots indicate image analysis classified high grade tumors.
  • Figure 2 includes images of four H&E cores from a single patient and heat maps indicating the class predictions over different regions of the image. Class probabilities are indicated by the intensity of red/blue color with greater intensity for higher probabilities. Uncertainty in the prediction is indicated by white. This patient was labeled as high grade, ER negative, Basal-like intrinsic subtype, ductal histologic subtype, and high ROR.
  • Figure 3 is a block diagram illustrating a computing platform or computing cloud that includes a convolutional neural network (CNN)-based image classifier for analyzing images to predict breast cancer classes.
  • CNN convolutional neural network
  • Figure 4 is a flow chart illustrating an exemplary process for image analysis with deep learning to predict breast cancer classes.
  • Figure 5 illustrates a multiple instance (Ml) augmentation technique for training Ml learning methods with a convolutional neural network.
  • Figure 6 illustrates an overview for an example Ml learning technique.
  • Figure 7 shows classification accuracy using mean aggregation as the number of instances increase.
  • Figures 8a-8e depict visualizations of instance predictions for a sample with various ground truth labels.
  • Figures 9a-9b depict visualizations of predicted heterogeneity for various classes.
  • Figure 10 depicts various deep canonical correlation analysis (CCA) based architectures.
  • Figure 1 1 depicts classification accuracy for different CCA methods.
  • Figure 12 shows t-SNE plots for CCA methods.
  • Deep learning is a method of learning a hierarchy of features where the higher level concepts are built on the lower level ones. Automatically learning these abstract features enables the system to learn complex functions mapping an input to an output without the need for hand-crafted features.
  • Significant advances in this area have begun to show promise for tumor detection [13], metastatic cancer detection in lymph nodes [14], mitosis detection [7, 15], tissue segmentation [16], and segmentation and detection of a number of tissue structures [17]
  • all of the previous successes of deep learning from H&Es have focused on detecting image-based properties that pathologists can routinely assess visually. Using deep learning to predict complex properties that are not visually apparent to pathologists, such as receptor status, intrinsic subtype or even risk of recurrence, has not been previously described.
  • TMA H&E-stained breast tumor tissue microarray
  • the percent agreement with pathologist-classified tumor grade was slightly lower than the percent agreement between two breast pathologists who independently reviewed these same patients (overall 89%, kappa 0.78).
  • Tumors with pathologist-defined intermediate grade were more likely to be misclassified as high grade tumors by image analysis (37%), while only 7% of low-grade tumors were misclassified (results not shown).
  • errors in classification of intermediate grade tumors as high grade tumors occurred ⁇ 10% of the time and never occurred for low grade tumors (results not shown).
  • the classifier had overall accuracy of 77%, but accuracy of 85% among low-intermediate grade tumors and 70% among high grade tumors.
  • ROR-PT vs. low-medium risk of recurrence score
  • Figure 2 shows four cores from a single patient, along with the class predictions over different regions of the image. While three cores are predicted ER negative and Basal-like intrinsic subtype, the fourth is predicted mostly ER negative and non-Basal-like, indicating that some intra-tumoral heterogeneity might be present between cores.
  • histologic subtype and molecular marker status could be predicted using image analysis. While we did perform grade-weighting within ER classification, there may be other image features of ER positive tumors that are not readily discernible and are driving the higher accuracy of ER positive images over ER negative. Agreement between true ER status (by IHC) vs.
  • the high accuracy may be due to the arrangement of epithelial and stromal cells characteristic of ductal and lobular tumors whereby lobular tumors are characterized by non-cohesive single file lines of epithelial cells infiltrating the stroma and ductal tumors are characterized by sheets or nests of epithelial cells embedded in the surrounding stroma [24,25]
  • lobular tumors are characterized by non-cohesive single file lines of epithelial cells infiltrating the stroma and ductal tumors are characterized by sheets or nests of epithelial cells embedded in the surrounding stroma
  • RNA-based assessment and 77% agreement for classification of Luminal A subtype [26] Our estimates are similar suggesting that image analysis, even without the use of special IHC stains, could be a viable option for classification of molecular breast tumor subtype and ROR-PT from H&E stained images.
  • CBCS Carolina Breast Cancer Study
  • Phase 3 September 2013
  • CBCS Primary Breast Cancer Study
  • Methods for CBCS have been described elsewhere 27. Briefly, CBCS recruited participants from 44 of the 100 North Carolina counties using rapid case ascertainment via the North Carolina Central Cancer Registry. After giving informed consent, patients were enrolled under an Institutional Review Board protocol that maintains approval at the University of North Carolina. CBCS eligibility criteria included being female, a first diagnosis of invasive breast cancer, aged 20-74 years at diagnosis, and residence in specified counties. Patients provided written informed consent to access tumor tissue blocks/slides and medical records from treatment centers.
  • the training and test sets were formed by a random partition of the data.
  • Patients in the final training and test sets had information for tumor grade and histologic subtype, determined via centralized breast pathologist review within CBCS, along with biomarker data for ER status, PAM50 intrinsic breast cancer subtype, and risk of recurrence (ROR-PT) where noted.
  • the H&E images were taken from tissue microarrays constructed with 1 -4 1 mm cores for each patient, resulting in 932 core images for the test set analysis presented here.
  • ER status for each TMA core was determined using a digital algorithm as described by Allott et al. (2016) [28] and was defined using a ⁇ 10% positivity cut point for immunohistochemistry staining.
  • Tumor Tissue Microarray Construction As has been described in detail by Allott et al. (2016), tumor tissue microarrays were constructed for CBCS 3 participants with available paraffin-embedded tumor blocks [26] The CBCS study pathologist marked areas of invasive breast cancer within a tumor on H&E stained whole slide images. The marked areas were selected for coring and 1 -4 tumor tissue cores per participant were used in the TMA construction at the Translational Pathology Laboratory at UNC. TMA slides were H&E stained and images were generated at 20x magnification. Cores with insufficient tumor cellularity were eliminated from the analysis.
  • Nanostring assays were carried out on a randomly sampled subset of available formalin fixed paraffin embedded (FFPE) tumor tissue cores. RNA was isolated from [2], 1 .0-mm cores from the same FFPE block using the Qiagen RNeasy FFPE kit (catalogue # 73504). Nanostring assays, which use RNA counting as a measure of gene expression, were conducted. RNA-based intrinsic subtype was determined using the PAM50 gene signature described by Parker et al.
  • FFPE formalin fixed paraffin embedded
  • each tumor was categorized into one of five intrinsic subtypes (Luminal A, Luminal B, HER2, Basal-like, Normal-like), using the 50 gene, PAM50 signature [27] Categorizations were based on a previously validated risk of recurrence score, generated using PAM50 subtype, tumor proliferation, and tumor size (ROR-PT) with a cutoff for high of 64.7 from the continuous ROR-PR score [2]
  • the low level filters detect small structures such as edges and blobs. Intermediate layers capture increasingly complex properties like shape and texture.
  • the top layers of the network are able to represent object parts like faces or bicycle tires.
  • the convolution filters are learned from data, creating discriminating features at multiple levels of abstraction. There is no need to hand craft features.
  • VGG16 architecture configuration D [34] that was pre-trained on the ImageNet data set, which consists of 1 .2 million images from 1000 categories of objects and scenes.
  • ImageNet contains a vastly different type of image
  • CNNs trained on this data set have been shown to transfer well to other data sets [35, 57-58], including those from biomedical applications [14, 59]
  • the lower layers of a CNN are fairly generic, while the upper layers are much more specialized.
  • the lower layers only capture smaller-scale features, which do not provide enough discriminating ability, while the upper layers are so specific to ImageNet that they do not generalize well to histology.
  • Intermediate layers are both generalizable and discriminative for other tasks.
  • In transferring to histology we search for the layer that transfers best to our task. Output from each set of convolutional layers, before max pooling, was extracted over each image at full resolution to form a set of features for the image.
  • Output from the fourth set of convolutional layers was chosen because it performed better than the outputs from other layers.
  • the fourth set of convolutional layers outputs features of dimension 512.
  • These lower CNN layers are convolutional, meaning that they can be run on any image size. For an image size of 2500x2500, they produce a grid of 284x284x512 features.
  • Model Training and Training Data Sets In training a model to predict the class or characteristic group of a tumor, such as high or low grade, we utilize patient-level labels.
  • the TMA images may be much larger than the required input to the VGG16 CNN (e.g., typically 2500x2500 pixels for TMA spots vs. 224x224 for VGG16). Further, applying the original CNN fully convolutionally would produce features that are not generalizable to histology. Thus some modifications to the VGG16 approach are necessary.
  • a new classifier may be trained to operate on the intermediate level features from VGG16. Simply taking the mean of each feature over the image would limit our insight into which parts of the image contributed to the classification.
  • the patient-level labels are weak compared to detailed patch- or pixel-level annotations used in most prior work, necessitating a different classification framework called multiple instance learning.
  • a different classification framework called multiple instance learning.
  • tumor grade pathologist determined
  • ER status IHC-based
  • PAM50 intrinsic subtype 50 gene expression-based
  • ROR-PT gene expression-based
  • histologic subtype pathologist determined
  • a probabilistic model was formed for how likely each image region is to belong to each class, with these probabilities aggregated across all image regions to form a prediction for the tumor as a whole.
  • Image regions were generated as 800x800 pixel regions in the training images, with the mean of each CNN feature computed over the region.
  • a linear Support Vector Machine (SVM) [39] calibrated with isotonic regression [40] was used to predict the probability for each region. Isotonic regression fits a piecewise-constant non-decreasing function, transforming the distance from the separating hyperplane learned by the SVM to a probability that an image region belongs to each class.
  • each image region was labeled with the class of the tumor from which it belongs.
  • the data for model fitting and calibration may be disjoint, so cross-validation was used to split the training instances into five equal-sized groups, where four were used for training and the remaining for calibration/validation (the test set remains untouched).
  • an SVM was learned on the training set and calibration was learned on the calibration set with isotonic regression, thus forming an ensemble.
  • An ensemble of size five was selected to balance the desirability of a large training set, a reasonably sized validation set, and the simultaneous desirability of limiting the computation time.
  • Predictions on the test set were made by averaging probabilities from the five models. This ensemble method also helped to soften any noise in the predictions caused by incorrect image region labels due to heterogeneity.
  • Predictions for tumors were made by first forming a quantile function (inverse cumulative distribution) of the calibrated SVM ensemble predictions for the image regions using 16 equally spaced quantiles from images in the training set. The quantiles of the training images were used to train another linear SVM to predict the class label for the whole tumor, with sigmoid calibration transforming the SVM output into probabilities. This method allowed predictions to be made for individual image regions, while also aggregating to overall tumor predictions.
  • each class including tumor grade, ER status, and Basal-like vs. non-Basal-like intrinsic subtype, equally.
  • sample weighting was applied using weights inversely proportional to the number of samples in the group, e.g., low grade class 1 , low grade class 2, high grade class 1 , and high grade class 2 were each weighted equally, where the classes are the ER status, histologic subtype, or intrinsic subtype.
  • Prediction in Test Sets 800x800 pixel overlapping regions with a stride of 400 pixels were used as image regions from each TMA spot that is typically 2500 pixels in diameter. Only image regions containing at least 50% tissue within the core image field of view (i.e. 50% tissue, 50% glass) were used.
  • the calibrated SVM ensemble predicted the class of each image region by assigning a probability of belonging to one of two classes (tumor grade 1 or 3, ER+ or ER-, Basal-like or non-Basal-like subtype, ductal or lobular histologic subtype, and low-med or high ROR-PT).
  • the probabilities computed on the image regions from all cores were aggregated into a quantile function and the second SVM was used to predict the class for the whole tumor.
  • Cut points were determined for each tumor characteristic based on the achievement of optimal sensitivity, specificity, and accuracy of each core being correctly classified relative to the pathology or biomarker data.
  • image analysis assigned a probability score of being a high grade vs. low grade tumor for each image.
  • a cut point of greater than 0.80 was used for high grade tumors ( Figure 1 a).
  • traditional pathologist scoring methods were used to classify tumors as a combined grade of low, intermediate, or high.
  • two independent pathologists’ classifications of tumor grade for the same tissue sample were assessed to compare the agreement between two pathologists to that observed for image analysis versus pathologist classification.
  • Accurate classification was defined as identical classification based on histologic image analysis and biomarker data for the same core. To determine whether any clinical characteristics were associated with an inaccurate image-based call for ER status, we estimated odds ratios (ORs) and 95% confidence intervals (95% Cl) for the association between patient characteristics and the accuracy of ER status (i.e. concordant with clinical status vs. discordant with clinical status) (Supplemental Table 1 ).“ All statistical analyses were done in SAS version 9.4 (SAS Institute, Cary, NC). P-values were two-sided with an alpha of 0.05.
  • Figure 3 is a block diagram of a computing platform with a CNN-based image classifier for predicting breast cancer classes, where the classes include tumor grade, ER status, histologic subtype, and intrinsic subtype through image analysis.
  • Figure 4 is a flow chart of using the CNN-based image classifier in Figure 3 to predict breast cancer classes.
  • a test image 300 is provided as input to a CNN-based image classifier 302.
  • Test image 300 may be an FI&E stained histologic image as described above.
  • CNN-based image classifier 302 may include a convolutional neural network 304, such as the VGG16 network, trained using the steps described above.
  • CNN-based image classifier 302 may be implemented on a computing platform 306 including at least one processor 308 and memory 310.
  • generating instances may include dividing test image 300 into regions of a predetermined size.
  • the predetermined size is 800x800 pixels.
  • Extracting features may include providing each image instance as input to CNN 304, which generates features as output for each image instance.
  • the CNN may be trained using multiple instance (Ml) learning and/or Ml aggregation.
  • Ml multiple instance
  • the CNN may be trained end-to-end using Ml learning.
  • the CNN may be trained using a Ml aggregation technique that aggregates predictions from smaller regions of an image into an image-level classification by using the quantile function. Additional details regarding Ml learning, Ml aggregation, and/or related aspects are discussed below.
  • the CNN may be trained using genomic data sets and a task-based canonical correlation analysis (CCA) method.
  • CCA canonical correlation analysis
  • a CNN component may be configured for using imaging data and related genomic data set along with task-based CCA method (e.g., a task-optimal CCA method) to project data from the two sources to a shared space that is also discriminative.
  • the CNN component may be trained for use in improving accuracy when identifying or extracting features from image instances during run-time. Additional details regarding a task-based CCA method and/or related aspects are discussed below.
  • step 404 image classes are predicted for each instance.
  • the CNN-generated feature for each instance is inputted into an instance support vector machine 312, which outputs a probability as to whether an instance is a member of each class.
  • instance SVM 312 may output probabilities as to whether the instance is one or the other binary classes for tumor grade (high grade or low grade), ER status (ER positive or ER negative), histologic subtype (ductal or lobular), risk of reoccurrence (high versus low), and intrinsic subtype (basal-like or non-basal like).
  • step 406 the instance predictions are aggregated to produce instance predictions for the entire image.
  • the probabilities computed for each image instance by instance SVM 312 are aggregated into a quantile function 314, and an aggregation SVM 316 that operates on quantile function 314 is to generate probabilities of each class for the entire image.
  • tumor class is predicted.
  • CNN-based image classifier 302 uses the probabilities generated in step 406, CNN-based image classifier 302 generates output as to whether the tumor is in one or the other of the above-described binary classes.
  • cut points or threshold values may be assigned for each class.
  • the probability computed for each class may then be compared to the cut point or threshold to determine whether or not the tumor is a member of the class. For example, as described above, the cut point for classifying a tumor as high grade may be 0.8. Accordingly, if the probability computed for tumor grade for a given image is 80% or higher, the tumor may be classified as high grade.
  • tumors can be classified from image analysis alone to identify the need for further diagnostic tests (such as genomic tests) and/or treatment.
  • Ml learning with a convolutional neural network enables end-to-end training in the presence of weak image-level labels.
  • the quantile function provides a more complete description of the heterogeneity within each image, improving image-level classification.
  • Deep learning has become the standard solution for classification when a large set of images with detailed annotations is available for training.
  • annotations are weaker, such as with large, heterogeneous images, we turn to Ml learning.
  • the image (called a bag) is broken into smaller regions (called instances).
  • instances We are given a label for each bag, but the instance labels are unknown.
  • Some form of pooling aggregates instances into a bag-level classification.
  • CNN convolutional neural network
  • FIG. 5 depicts Ml augmentation.
  • Ml learning each bag contains one or more instances. Labels are given for the bag, but not the instances.
  • Ml augmentation is a technique to provide additional training samples by randomly selecting a cropped image region and the instances within it. When the bag label is applied to a small number of instances, it is weak because this small region may not be representative of the bag class. Applying the bag label to larger cropped regions provides a stronger label, while still providing benefit from image augmentation. Training with the whole image maximizes the opportunity for Ml learning, but restricts the benefits of image augmentation. At test time, the whole image is processed and the predictions from all instances are aggregated into a bag prediction.
  • a permutation invariant pooling of instances is needed to accommodate images of different sizes, whereas a fully connected neural network may not.
  • Existing pooling approaches are very aggressive; they compute a single number rather than looking at the distribution of instance predictions.
  • Most Ml applications use the maximum, which works well for problems such as cancer diagnosis where, if there is a small amount of tumor, the sample is labeled as cancerous [46, 56]
  • a smooth approximation, such as the generalized mean or noisy-OR, provide better convergence in a CNN [47, 52, 45]
  • a majority vote, median, or mean is more appropriate. We include more of the distribution by pooling with the QF and learning a mapping to the bag class prediction, improving the classification accuracy.
  • the QF is a new general type of feature pooling that could provide an alternative to max pooling in a CNN.
  • image augmentation may be applied in training a CNN by randomly cropping large portions of each image during each epoch. At test time, the whole image is used.
  • Ml augmentation in which a subset of instances is randomly selected from each bag during each epoch. Instances may be the same size, but we choose how many instances to aggregate over. In selecting the number of instances, there may be two extremes: a single instance vs. the whole bag. In the former, the bag label is assigned to each instance and is often called single instance learning.
  • Ml aggregation is incorporated while training the bag classifier as in other Ml methods [41 , 44] Comparison studies have found little or no improvement from these Ml methods on some data sets [54, 55] We found Ml learning to be very beneficial and show that it can be useful in dealing with heterogeneous data.
  • Ml learning can be implemented with many different types of classifiers [41 , 46, 54]
  • a fully convolutional network forms the instance classifier f inst , followed by a global Ml layer for instance aggregation f agg .
  • the FCN consists of convolutional and pooling layers that downsize the representation, followed by a softmax operation to predict the probability for each class.
  • the FCN output is w d x w d x C.
  • An instance is defined as the receptive field from the original image used in creating a point in this w d x w d grid; the instances are overlapping.
  • the Ml aggregation layer takes the instance probabilities and the foreground mask for the input image (downscaled to w d x w d ), thereby aggregating over only the foreground instances.
  • Figure 6 illustrates an overview for an example Ml learning technique.
  • a cropped region of a given size is randomly selected.
  • An FCN is applied to predict the class, producing a grid of instance predictions.
  • the instance predictions are aggregated over the foreground of the image (as indicated by the foreground mask) using quantile aggregation to predict the class of the cropped image region.
  • backpropagation learns the FCN and aggregation function weights.
  • the whole image is used.
  • Instance predictions can be used to form a bag prediction in different ways.
  • the bag prediction function should be invariant to the number and spatial arrangement of instances, so some pooling of predictions is needed.
  • Mean aggregation is well suited for global pooling as it is permutation invariant and can incorporate a foreground mask for the input image. Denoting the mask as M and its value for each instance as mn e ⁇ 0, 1 ⁇ , the mean aggregation function i
  • Mean pooling incorporates predictions from all instances, but a lot of information may be lost in compressing to a single number.
  • a histogram is a more complete description of the probability distribution, but is dependent upon a suitable bin width.
  • the QF inverse cumulative distribution
  • Image augmentation by random cropping is an important technique for creating extra training samples that helps to reduce over-fitting.
  • the FCN is applied to the cropped image at full resolution.
  • Ml augmentation is a strategy used during training. As the Ml aggregation layer is invariant to input size, the entire image and all its instances are always used at test time.
  • Classification accuracy is measured for five different tasks, some of them multiclass: 1 ) histologic subtype (ductal or lobular), 2) estrogen receptor (ER) status (positive or negative), 3) grade (1 , 2, or 3), 4) risk of recurrence score (ROR) (low, intermediate, or high), 5) genetic subtype (basal, luminal A, luminal B, HER2, or normal-like). Ground truth for histologic subtype and grade are from a pathologist looking at the original whole slide. ER status is determined from immunohistochemistry, genetic subtype from the PAM50 array [51 ], and ROR from the ROR-PT score-based method [51 ]
  • TMA images are intensity normalized to standardize the appearance across slides [50]
  • the hematoxylin, eosin, and residual channels are extracted from the normalization process and used as the three channel input for the rest of our algorithm.
  • a binary mask distinguishing tissue from background is also provided as input.
  • the smallest possible size is 227 c 227 (the input size for AlexNet), consisting of a single instance.
  • the bag label is applied to each instance during training, this is called single instance learning.
  • a larger cropped region of size w x w can be selected; we test multiples of 500 up to 3500 and use mean aggregation in this experiment.
  • Ml learning By assigning the bag label to this larger cropped region during training and keeping the instance size constant, we perform Ml learning. Multiple random crops are obtained from each training image such that roughly the same number of pixels is sampled for each crop size (e.g., the whole image for the
  • the whole image may be used without Ml augmentation. Random mirroring and rotations are used for augmentation at all crop sizes. At test time, the whole image is always used, with the bag prediction formed by aggregating across all instances.
  • Figure 7 shows classification accuracy using mean aggregation as the number of instances (cropped image size) used for training is increased, while keeping instance size constant.
  • the benefits level off for larger crops. As GPU memory requirements increase for larger crop sizes, selecting an intermediate crop size provides most of the benefits of Ml augmentation.
  • Table 5 Average classification accuracy for different types of Ml aggregation.
  • FIG 8 shows a visualization for a sample image where the instance predictions are colored for each class.
  • the w 2000 crop size was used for this example.
  • Figure 9 shows a plotting of the results for grade 1 vs. 3 and genetic subtype basal vs. luminal A.
  • Heterogeneity is expected for grade, as the three tumor grades are not discrete, but a continuous spectrum from low to high.
  • the level of heterogeneity to expect for genetic subtype is unknown because no studies have yet assessed genetic subtype from multiple samples within the same tumor.
  • the graph shows a continuous spectrum from basal to luminal A.
  • the luminal B, HER2, and normal samples lie mostly on the luminal A side, but with some mixing into the basal side.
  • Figures 8a-8e depict visualizations of instance predictions for a sample with ground truth labels of ductal (Figure 8a), ER positive (Figure 8b), grade 1 (Figure 8c), low ROR (Figure 8d), and luminal A (Figure 8e).
  • Figures 9a-9b depict visualizations of predicted heterogeneity for grade 1 vs. 3 ( Figure 9a) and genetic subtype basal vs luminal A ( Figure 9b).
  • the predicted proportion for each class is calculated as the proportion of instances in the sample predicted to be from each class. Test samples for all classes are plotted.
  • Ml learning while training a CNN may be very useful in achieving high classification accuracy on large, heterogeneous images. Even with a small number of labeled samples, our model was successful in fine-tuning the AlexNet CNN because of the large size of the images providing plenty of opportunity for Ml augmentation. The impact of Ml learning indicates that accommodating image heterogeneity is essential. While aggregating instance predictions with the mean is sufficient for some tasks, quantile aggregation produces a significant improvement for others. Instance-level predictions will enable future work studying tumor heterogeneity, perhaps leading to biological insights of tumor progression.
  • CCA is a popular data analysis technique that projects two data sources into a space in which they are maximally correlated [61 , 62] It was initially used for unsupervised data analysis to gain insights into components shared by the two sources [63-65] CCA is also used to compute a shared latent space for cross-view classification [66, 64, 67, 68], for representation learning on multiple views that are then joined for prediction [69, 70], and for classification from a single view when a second view is available during training [71 ] While some of the correlated CCA features are useful for discriminative tasks, many represent properties that are of no use for classification and obscure correlated information that is beneficial.
  • the first canonical directions are found via
  • DCCA Deep CCA adds non-linear projections to CCA by non-linearly mapping the input via a multilayer perceptron (MLP).
  • MLP multilayer perceptron
  • a t and A are the output activations of the final layer with d 0 features.
  • diagram (a) shows the network structure. DCCA optimizes the same objective as CCA, see Eq.
  • diagram (a) depicts a DCCA based architecture that maximizes the sum correlation in projection space by optimizing an equivalent loss, the trace norm objective (TNO) [63] and diagram (b) depicts a SoftCCA based architecture that relaxes the orthogonality constraints by regularizing with soft decorrelation (Decorr) and optimizes the - 2 distance in the projection space (equivalent to sum correlation with activations normalized to unit variance) [68]
  • Our TOCCA methods add a task loss and apply CCA orthogonality constraints by regularizing in two ways: diagram (c) TOCCA-W uses whitening and diagram (d) TOCCA-SD uses Decorr. The third method that we propose, TOCCA-ND, simply removes the Decorr components of TOCCA-SD.
  • CCA Supervised CCA methods.
  • CCA, DCCA, and SoftCCA are all unsupervised methods to learn a projection to a shared space in which the data is maximally correlated. Although these methods have shown utility for discriminative tasks, a CCA decomposition may not be optimal for classification because features that are correlated may not be discriminative. Our experiments will show that maximizing the correlation objective too much can degrade performance on discriminative tasks.
  • CCA has previously been extended to supervised settings by maximizing the total correlation between each view and the training labels in addition to each pair of views [72, 73], and by maximizing the separation of classes [66, 70] Although these methods incorporate the class labels, they do not directly optimize for classification.
  • Dorfer et. al’s CCA Layer (CCAL) is the closest to our method. It optimizes a task loss operating on a CCA projection; however, the CCA objective itself is only optimized during pre-training, not in an end-to-end manner [75]
  • Other supervised CCA methods are linear only [73, 72, 66, 74] Instead of computing the CCA projection within the network, as in CCAL, we optimize the non-linear mapping into the shared space together with the CCA part.
  • TOCCA TASK-OPTIMAL CCA
  • DCCA optimizes the sum correlation through an equivalent loss function (TNO)
  • TNO equivalent loss function
  • the CCA projection itself is computed only after optimization. Hence, the projections may not be usable to optimize another task simultaneously.
  • the main challenge in developing a task-optimal form of deep CCA that discriminates based on the CCA projection is in computing this projection within the network - a necessary step to enable simultaneous training of both objectives.
  • We tackle this by focusing on the two components of DCCA: maximizing the sum correlation between activations A and A 2 and enforcing orthonormality constraints within ⁇ and 4 2 .
  • CCA and task-driven objectives as a weighted sum with a hyperparameter for tuning. This model is flexible, in that the task-driven goal can be used for classification [87, 88], regression [89], clustering [90], or any other task. See Table 6 below for an overview.
  • Orthogonality constraints The remaining complications for mini-batch optimization are the orthogonality constraints, for which we propose three solutions, each handling the orthogonality constraints of CCA in a different way: whitening, soft decorrelation, and no decorrelation.
  • TOCCA-W Whitening
  • CCA applies orthogonality constraints to A t and A 2 .
  • Decorrelated Batch Normalization (DBN) has previously been used to regularize deep models by decorrelating features [81 ] and inspired our solution.
  • ZCA Zero-phase Component Analysis
  • B and B 2 are whitened outputs of A t and A 2 , respectively.
  • TOCCA-W has a complexity of O (d ) compared to O (d%) for TOCCA-SD, with respect to output dimension do.
  • do is typically small ( ⁇ 100) and this extra computation is only performed once per batch.
  • the difference in runtime is less than 6.5% for a batch size of 100 or 9.4% for a batch size of 30 (see Table 7 below).
  • TOCCA-ND is the most relaxed and directly attempts to obtain identical latent representations.
  • TOCCA-W whitening
  • TOCCA-SD soft decorrelation
  • Table 6 provides a comparison of our proposed task-optimal deep CCA methods with other related ones from the literature: DCCA [1 ], SoftCCA [2], CCAL-£ ranfc [3]
  • CCAL-£ ranfc uses a pairwise ranking loss with cosine similarity to identify matching and non-matching samples for image retrieval - not classification.
  • a t and A 2 are mean centered outputs from two feed-forward networks.
  • A T A is computed from a single (large) batch (used in DCCA); ⁇ is computed as a running mean over batches (for all other methods).
  • Each layer of our network consists of a fully connected layer, followed by a ReLU activation and batch normalization [86]
  • Our implementations of DCCA, SoftCCA, and Joint DCCA/DeepLDA [70] also use ReLu activation and batch normalization.
  • CCAL-£ ranfc [75] to use a softmax function and cross-entropy loss for classification, instead of a pairwise ranking loss for retrieval, referring to this modification as CCAL-£ ce .
  • Hidden layer size 500 200 1.000 Output layer size 50 50 112
  • Figure 1 1 depicts classification accuracy for different CCA methods.
  • sum correlation vs. cross-view classification accuracy on MNIST is depicted across different hyperparameter settings on a training set size of 10,000 for DCCA [63], SoftCCA [68], TOCCA-W, and TOCCA-SD.
  • DCCA and SoftCCA large correlations do not necessarily imply good accuracy.
  • the effect of batch size on classification accuracy is depicted for each TOCCA method on MNIST (training set size of 10,000), and the effect of training set size on classification accuracy for each method.
  • Our TOCCA variants out-performed all others across all training set sizes.
  • Figure 1 1 (right) plots the batch size vs. classification accuracy for a training set size of 10; 000.
  • SoftCCA shows similar behavior ([68] did not test such small training set sizes).
  • Figure 12 shows the CCA projection of the left view for each method. As expected, the task-driven variant produced more clearly separated classes.
  • Figure 12 shows t-SNE plots for CCA methods on an example variation of MNIST. Each method was used to compute projections for the two views (left and right sides of the images) using 10,000 training examples. The plots show a visualization of the projection for the left view with each digit colored differently.
  • TOCCA-SD and TOCCA-ND (not shown) produced similar results to TOCCA-W.
  • Table 9 shows classification accuracy for different methods of predicting Basal genomic subtype from images or grade from gene expression. Linear SVM and DNN were trained on a single view, while all other methods were trained with both views. By regularizing with the second view during training, all TOCCA variants improved classification accuracy. The standard error is in parentheses.
  • non-Basal genomic subtype using images which is typically done from GE, and (2) predicting grade 1 vs. 3 from GE, typically done from images. This is not a multi-task classification setup; it is a means for one view to stabilize the representation of the other.
  • this experiment used a static set of pre-trained VGG16 image features in order to assess the utility of the method.
  • the network itself could be fine-tuned end-to-end with our TOCCA model, providing an easy opportunity for data augmentation and likely further improvements in classification accuracy.
  • Table 10 lists the classification results with a baseline that used only the original input features for LDA.
  • deep methods e.g., DCCA and SoftCCA
  • softmax consistently beat LDA by a large margin.
  • TOCCA-SD and TOCCA-ND produced equivalent results as a weight of 0 on the decorrelation term performed best.
  • TOCCA-W showed the best result with an improvement of 15% over the best alternative method.
  • TOCCA can also be used in a semi-supervised manner when labels are available for only some samples.
  • Table 11 lists the results for TOCCA-W in this setting. With 0% labeled data, the result would be similar to DCCA. Notably, a large improvement over the unsupervised results in Table 10 is seen even with labels for only 10% of the training samples. Labeled data Accuracy
  • TOCCA-W or TOCCA-SD performed the best, dependent on the data set - both of which include some means of decorrelation to provide an extra regularizing effect to the model and thereby outperforming TOCCA-ND.
  • TOCCA showed large improvements over state-of-the-art in cross-view classification accuracy on MNIST and significantly increased robustness when the training set size was small.
  • TOCCA provided a regularizing effect when both views were available for training but only one at test time.
  • TOCCA also produced a large increase over state-of-the-art for multi-view representation learning on a much larger data set, XRMB.
  • XRMB data set
  • sMVCCA Supervised multi-view canonical correlation analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A method for image analysis using deep learning to predict breast cancer classes, includes receiving a test image. The method further includes generating test image instances from the test image. The method further includes inputting the test image instances into a convolutional neural network (CNN), which extracts features from each of the test image instances. The method further includes applying an instance support vector machine (SVM) to the features to predict breast cancer classes for each of the instances. The method further includes aggregating outputs from the instance SVM to predict breast cancer classes for the test image.

Description

DESCRIPTION
METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR IMAGE ANALYSIS WITH DEEP LEARNING TO PREDICT BREAST CANCER
CLASSES
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims benefit of U.S. Provisional Patent Application Serial No. 62/696,464, filed July 11 , 2018 and U.S. Provisional Patent Application Serial No. 62/757,746, filed November 8, 2018; the disclosures of each which are incorporated herein by reference in their entireties.
STATEMENT OF FEDERAL SUPPORT
This invention was made with government support under Grant Numbers CA058223, CA148761 , and CA179715 awarded by the National Institutes of Health. The government has certain rights in the invention.
TECHNICAL FIELD
The subject matter described herein relates to image analysis with deep learning. More particularly, the subject matter described herein relates to methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes, where the classes include tumor grade, estrogen receptor (ER) status, histologic subtype, risk of reocurrence, and intrinsic subtype. BACKGROUND
Predicting breast cancer grade, ER status, histologic subtype, risk of reoccurrence, and intrinsic subtype can involve costly immunochemistry, genomic tests, and/or review by a pathologist that may not be available to all patients. Using image analysis to predict breast cancer classes, such as tumor grade, ER status, histologic subtype, risk of reoccurrence and intrinsic subtype, is difficult because the image features needed to perform these predictions may not be apparent in the images. Accordingly, there exists the need for improved methods for image analysis to predict breast cancer classes.
SUMMARY RNA-based, multi-gene molecular assays are available and widely used for patients with ER-positive/HER2-negative breast cancers. However, RNA-based genomic tests can be costly and are not available in many countries. Methods for inferring molecular subtype from histologic images may identify patients most likely to benefit from further genomic testing. To identify patients who could benefit from molecular testing based on haemotoxylin and eosin (H&E) stained histologic images, we developed an image analysis approach using deep learning. A training set of 571 breast tumors was used to create image-based classifiers for tumor grade, ER status, PAM50 intrinsic subtype, histologic subtype, and risk of recurrence score (ROR-PT). The resulting classifiers were applied to an independent test set (n=288), and accuracy, sensitivity, and specificity of each assessed on the test set. Histologic image analysis with deep learning distinguished low-intermediate vs. high tumor grade (82% accuracy), ER status (84% accuracy), Basal-like vs. non-Basal-like (77% accuracy), Ductal vs. Lobular (94% accuracy), and high vs. low-medium ROR-PT score (75% accuracy).
Sampling considerations in the training set minimized bias in the test set. Incorrect classification of ER status was significantly more common for Luminal B tumors. These data provide proof of principle that molecular marker status, including a meaningful clinical biomarker (e.g., ER status), can be predicted with accuracy >75% based on H&E features. Image-based methods could be promising for identifying patients with a greater need for further genomic testing, or in place of classically scored variables typically accomplished using human-based scoring.
According to one aspect of the subject matter described herein, method for image analysis using deep learning to predict breast cancer classes is provided. The method includes receiving a test image. The method further includes generating test image instances from the test image. The method further includes inputting the test image instances into a convolutional neural network (CNN), which extracts features from each of the test image instances. The method further includes applying an instance support vector machine (SVM) to the features to predict breast cancer classes for each of the instances. The method further includes aggregating outputs from the instance SVM to predict breast cancer classes for the test image.
The subject matter described herein can be implemented in software in combination with hardware and/or firmware. For example, the subject matter described herein can be implemented in software executed by a processor. In one exemplary implementation, the subject matter described herein can be implemented using a non-transitory computer readable medium having stored thereon computer executable instructions that when executed by a processor of a computer control the computer to perform steps. Exemplary computer readable media suitable for implementing the subject matter described herein include non-transitory computer-readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 a is a histogram for probability of high grade tumor by image analysis according to proportion of pathologist-classified low-intermediate (black) or high grade (red) in the test set. The cut point of >0.80 was selected.
Figure 1 b is a bee swarm plot displaying pathologist classification of tumor grade as a function of the image grade score in the test set. Points within each grade group are adjusted horizontally to avoid overlap. The black dots indicate image analysis classified low-intermediate tumor grade and the red dots indicate image analysis classified high grade tumors.
Figure 2 includes images of four H&E cores from a single patient and heat maps indicating the class predictions over different regions of the image. Class probabilities are indicated by the intensity of red/blue color with greater intensity for higher probabilities. Uncertainty in the prediction is indicated by white. This patient was labeled as high grade, ER negative, Basal-like intrinsic subtype, ductal histologic subtype, and high ROR.
Figure 3 is a block diagram illustrating a computing platform or computing cloud that includes a convolutional neural network (CNN)-based image classifier for analyzing images to predict breast cancer classes.
Figure 4 is a flow chart illustrating an exemplary process for image analysis with deep learning to predict breast cancer classes.
Figure 5 illustrates a multiple instance (Ml) augmentation technique for training Ml learning methods with a convolutional neural network.
Figure 6 illustrates an overview for an example Ml learning technique.
Figure 7 shows classification accuracy using mean aggregation as the number of instances increase.
Figures 8a-8e depict visualizations of instance predictions for a sample with various ground truth labels.
Figures 9a-9b depict visualizations of predicted heterogeneity for various classes.
Figure 10 depicts various deep canonical correlation analysis (CCA) based architectures.
Figure 1 1 depicts classification accuracy for different CCA methods.
Figure 12 shows t-SNE plots for CCA methods.
DETAILED DESCRIPTION
Image-based features of breast cancers have an important role in clinical prognostics. For example, tumor grade is strongly associated with survivorship, even among tumors with other favorable prognostic features such as estrogen receptor positivity [1 ] However, major advances in prognostication over the past decade have relied predominantly on molecular methods [2-4] These methods are costly and are not routinely performed on all clinical patients who could benefit from advanced molecular tests. Methods for identifying patients who are likely to benefit from further molecular testing are needed.
Image analysis of H&E stained images could identify patients most likely to benefit from genomic testing. Several previous studies have utilized automated processing of H&E stained breast tumors to identify image features associated with survival. These approaches have largely focused on hand-crafted, user-designed features, such as statistics of shape and color, to capture cell by cell morphology, which are difficult to adapt to new data sets [5,6] Prior work on automated grading addresses mitotic count [7], nuclear atypia [8], and tubule formation [9] individually; however, the latter two may require a time-consuming nuclear segmentation that is also difficult to adapt to new data sets. Feature learning on small image patches to identify novel features associated with survival has shown the utility of somewhat more complex features for breast [10] and other cancers [1 1 , 12], but the focus of that work still remains on smaller-scale properties due to their use of small image patches. None of these approaches is able to capture larger scale features, such as tissue architecture, or properties that are too complex for humans to capture. These abstract features could provide unforeseen insights into prognostics.
Deep learning is a method of learning a hierarchy of features where the higher level concepts are built on the lower level ones. Automatically learning these abstract features enables the system to learn complex functions mapping an input to an output without the need for hand-crafted features. Significant advances in this area have begun to show promise for tumor detection [13], metastatic cancer detection in lymph nodes [14], mitosis detection [7, 15], tissue segmentation [16], and segmentation and detection of a number of tissue structures [17] However, all of the previous successes of deep learning from H&Es have focused on detecting image-based properties that pathologists can routinely assess visually. Using deep learning to predict complex properties that are not visually apparent to pathologists, such as receptor status, intrinsic subtype or even risk of recurrence, has not been previously described.
We hypothesized that a deep learning method for image analysis could be applied to classify H&E-stained breast tumor tissue microarray (TMA) images with respect to histologic and molecular features. We used TMA images from the population-based Carolina Breast Cancer Study Phase 3 (2008-2013) to perform deep learning-based image analysis aimed at capturing larger scale and more complex properties including tumor grade, histologic subtype, ER status, intrinsic breast cancer subtype and a risk of recurrence score [ROR-PT 2] RESULTS
Training and test sets were established from a random division of the data using TMA cores from 2/3 (n=571 ) and 1/3 (n=288) of the eligible CBCS3 patients, respectively. There were no significant differences between the training and the test sets concerning patient or tumor characteristics (Table 1 ).
Figure imgf000008_0001
Across multiple 1.0-mm cores per patient, the probability of a tumor being classified as high grade by image analysis was calculated, and Figure 1 shows that a bimodal distribution of probabilities was observed. By establishing a cut point at >0.80, high grade tumors were detected with accuracy of 82% in the test set (kappa 0.64) (Figure 1a and 1 b, Table 2).
Figure imgf000009_0001
Considering low/intermediate as a group, the percent agreement with pathologist-classified tumor grade was slightly lower than the percent agreement between two breast pathologists who independently reviewed these same patients (overall 89%, kappa 0.78). Tumors with pathologist-defined intermediate grade were more likely to be misclassified as high grade tumors by image analysis (37%), while only 7% of low-grade tumors were misclassified (results not shown). When comparing the misclassification of intermediate grade and low-grade tumors as high grade between two pathologists in a subset of CBCS tumors, errors in classification of intermediate grade tumors as high grade tumors occurred <10% of the time and never occurred for low grade tumors (results not shown).
Image analysis accuracy for predicting molecular characteristics was also high. Accuracy for ER status was 84% (kappa 0.64) and both sensitivity (88%) and specificity (76%) were high (Table 3).
Figure imgf000010_0001
However, tumor grade is strongly associated with ER status in most patient populations, and we were interested in increasing accuracy among patients with low-to-intermediate grade tumors where genomic testing is most likely to influence patient care. Thus, we also employed a training strategy that weighted samples to ensure that low and intermediate grade distributions were similar between ER-positive and ER-negative tumors. This reduced accuracy among high grade tumors (from 77% to 75%), and decreased accuracy among low-intermediate grade tumors (from 91 to 84% accuracy). Using the same weighting strategy, we trained a classifier to predict Basal-like vs. non-Basal-like (Luminal A, Luminal B, HER2, Normal-like combined) PAM50 subtype (Table 4).
Figure imgf000011_0001
‘33? pPifefe tefe 630 =. ::. » S¾f fesfefef *: spij
The classifier had overall accuracy of 77%, but accuracy of 85% among low-intermediate grade tumors and 70% among high grade tumors.
To examine the potential clinical relevance of using this image analysis technique, we determined the sensitivity and specificity of image analysis and the ability to predict whether or not a tumor is classified as having high vs. low-medium risk of recurrence score (ROR-PT) (Table 4). ROR-PT is determined using a combination of tumor information including PAM50 subtype, tumor proliferation, and tumor size [2] Overall the accuracy of image analysis for ROR-PT was high at 76% (kappa 0.47). In grade-stratified analyses, accuracy for ROR-PT was higher among low-intermediate grade tumors (86%) than high grade tumors (67%).
In addition to using image analysis to predict tumor grade, we also tested this approach using histologic subtype, another visual feature of the tumor (Table 4). Image analysis was able to predict a lobular compared to ductal tumor with 94% accuracy (kappa 0.66). The accuracy was slightly lower when restricted to low grade tumors (89%), but was non-estimable among high grade tumors as there were no high grade lobular tumors in the test set.
To evaluate which clinical factors were associated with the accuracy of the image-based metrics, we evaluated predictors of accurate/inaccurate ER status calls (Supplemental Table 1 ) among patients in the test set (n=288).
Supplemental Table 1. Patient and tumor characteristics associated with inaccuracy of predicted ER status from the test set (n=288) _
Inaccurate Accurate Chi-squared
Variable N (%1) N (%1) OR (95% Cl) p-value
Age
<50 years 23(304) 110(275) Ref 072 >50 years 20(697) 135(724) 071 (037-136)
Race
White 25(843) 125(776) Ref 022 Black 18(157) 120(224) 075 (039-145)
Grade
Low-Intermediate 19(554) 143(687) Ref 017 High 24(446) 101 (313) 179 (093-344)
Missing 0 1
Stage
I, II 39(858) 220(911) Ref 049 III. IV 4(142) 25(89) 090 (030-274)
Node Status
Negative 32(738) 159(682) Ref 053 Positive 11 (262) 86 (318) 064 (031-132)
Tumor Size
<2cm 25(600) 149(686) Ref 038 >2cm 18(400) 96(314) 112 (058-216)
IHC-based ER Status
Negative 19(378) 72 (202) Ref 007 Positive 24(622) 173(798) 053 (027-102)
IHC-based Ki 67 Status
<10% 22(561) 154(678) Ref 024
>10% 21 (439) 91 (322) 161 (084-310)
Mitotic Grade
1 14(401) 116(583) Ref 020 2 9(209) 33(124) 226 (090-568)
3 20(390) 95 (293) 174 (084-364)
Missing 0 1
Intrinsic Subtype
Luminal A 5(251) 69 (506) Ref 041 Luminal B 8(265) 25 (200) 442 (132-1477)
Basal-like 9(324) 40(198) 310 (097-991)
HER2 2(124) 13(48) 212 (037-1214)
Normal-like 2(36) 7(47) 394 (064-242)
Missing 17 91
1 All percentages weighted for sampling design
Considering age, race, grade, stage, lymph node status, ER status, Ki67 status, mitotic tumor grade, no significant differences in accuracy of image-based ER assignment were observed. However, we found that image analysis tended to inaccurately predict ER status when tumors were Luminal B [OR, (95% Cl); 4.42 (1.32-14.77)].
We gained further insight into the performance of our method by examining the class predictions across cores from the same patient and within each core. Figure 2 shows four cores from a single patient, along with the class predictions over different regions of the image. While three cores are predicted ER negative and Basal-like intrinsic subtype, the fourth is predicted mostly ER negative and non-Basal-like, indicating that some intra-tumoral heterogeneity might be present between cores.
DISCUSSION
In this study, we used a deep learning approach to conduct image analysis on H&E stained breast tumor tissue microarray samples from the population-based Carolina Breast Cancer Study, Phase 3 (2008-2013). Further details on the image analysis techniques are given in the Methods section. First, we found that the agreement between image analysis and the pathologist-classified grade was only slightly lower than that observed for two study pathologists, and we obtained high agreement and kappa values. Second, we found that ER status, RNA-based molecular subtype (Basal-like vs. non-Basal like), and risk of recurrence score (ROR-PT) could be predicted with approximately 75-80% accuracy. Further, we found the image analysis accuracy to be 94% for ductal vs. lobular histologic subtype.
Previous literature based on comparing two pathologists shows that image assessment is subject to some disagreement [18], particularly among the intermediate-grade tumors as we observed between the image analysis and pathologist classification in our study. Other groups have reported inter-rater kappa statistics of 0.6-0.7 for tumor grade [18,19], in line with both our inter-pathologist agreement and image analysis vs. pathologist agreement for grade. Elsewhere in the literature lower kappa values around 0.5 have been reported between pathologists for histologic grade [20] In light of this inherent variability in image assessment, deep learning-based image analysis performed well at predicting tumor grade as low-intermediate vs. high using H&E images.
It is particularly promising that histologic subtype and molecular marker status could be predicted using image analysis. While we did perform grade-weighting within ER classification, there may be other image features of ER positive tumors that are not readily discernible and are driving the higher accuracy of ER positive images over ER negative. Agreement between true ER status (by IHC) vs. image analysis (kappa 0.64) was slightly lower than that observed for centralized pathology and SEER classifications for ER status (kappa 0.70) [21 ] and is similar to reports of agreement between different IHC antibodies for ER that show substantial agreement (kappa 0.6-0.8) [22] Previous work with CBCS phase 1 samples found that agreement between medical records and staining of tissues was also similar (kappa of 0.62) [23] Overall, the agreement between IHC-based ER status and image analysis predictions based on H&E stained images are similar to estimates for comparing ER status classification in the literature. The high rate of agreement between pathologist-scored and image analysis based histologic subtype was also compelling (kappa 0.64). Together these results suggest that some latent features indicative of underlying tumor biology are present in H&E images and can be identified through deep learning-based approaches.
We observed high accuracy of image analysis to predict ductal versus lobular histologic subtype. The high accuracy may be due to the arrangement of epithelial and stromal cells characteristic of ductal and lobular tumors whereby lobular tumors are characterized by non-cohesive single file lines of epithelial cells infiltrating the stroma and ductal tumors are characterized by sheets or nests of epithelial cells embedded in the surrounding stroma [24,25] We speculate that it may be that the high contrast staining between the epithelium and stromal components resulting from H&E immunohistochemistry strengthens the ability of image analysis to predict this biologic feature of the tumor.
With respect to intrinsic PAM50 subtype based solely upon gene expression values, previous studies have not evaluated image-based analysis for predicting intrinsic subtype or the risk of recurrence using a score-based method, ROR-PT [2] A few previous studies have evaluated the clinical record or a central immunohistochemistry laboratory vs. RNA-based subtyping for Basal-like vs. non-Basal-like. Even considering two molecular comparisons, agreements do not exceed 90%. That is, Allott et al. (2016) found approximately 90% agreement between Basal-like status for IHC-based vs. RNA-based assessment and 77% agreement for classification of Luminal A subtype [26] Our estimates are similar suggesting that image analysis, even without the use of special IHC stains, could be a viable option for classification of molecular breast tumor subtype and ROR-PT from H&E stained images.
As with other studies, our work should be viewed in light of some limitations. Our sample size was limited in our testing set to 288 patients, but this resulted in nearly 1 ,000 TMA cores available for use in our image analysis. Using a larger set of samples with data on RNA-based subtype to balance training for each predictor could be useful. For example, the fact that Luminal B patients had a higher error rate might suggest there are some features of Luminal B breast cancers that are distinct and image-detectable, and a larger sample size would be helpful in identifying these. Deep learning may be utilizing these features, but in our small sample set, we are unable to tune our data to specifically identify those features or to clarify what they are in intuitive language. Additionally, the use of binary classification systems for training our digital algorithms (e.g., Basal-like vs. non-Basal-like) does not allow us to differentiate among all five RNA-based intrinsic subtypes. Currently, U.S. based genomic tests provide continuous risk scores, but also suggest relevant cut points that in essence make these assays almost a binary classification; thus, binary classification may have some utility in the current clinical context. However, future work should extend these approaches to multi-class classification. Furthermore, improved results may be obtained by fine-tuning the Convolutional Neural Network for breast cancer H&E image classification.
Image-based risk prediction has potential clinical value. Gene expression data on tumor tissue samples is not uniformly available for all patients and is costly to obtain in both a clinical and epidemiologic setting. These results suggest that tumor histology and molecular subtype along with the risk of recurrence (ROR-PT) can be predicted from H&E images alone in a high-throughput, objective, and accurate manner. These results could be used to identify patients who would benefit from further genomic testing. Furthermore, even ER testing is not routinely performed in countries with limited laboratory testing resources and predicting ER status by morphologic features may have utility for guiding endocrine therapy in low-resource settings. METHODS
Sample Set. The training and test sets were both comprised of participants from the Carolina Breast Cancer Study (CBCS), Phase 3 (2008-2013). Methods for CBCS have been described elsewhere 27. Briefly, CBCS recruited participants from 44 of the 100 North Carolina counties using rapid case ascertainment via the North Carolina Central Cancer Registry. After giving informed consent, patients were enrolled under an Institutional Review Board protocol that maintains approval at the University of North Carolina. CBCS eligibility criteria included being female, a first diagnosis of invasive breast cancer, aged 20-74 years at diagnosis, and residence in specified counties. Patients provided written informed consent to access tumor tissue blocks/slides and medical records from treatment centers.
The training and test sets were formed by a random partition of the data. The total number of patients available for the training and test set from CBCS3 was 1 ,203. These patients were divided into a group of 2/3 (n=802) for the training set and 1/3 (401 ) for the test set. Of the 802 patients available for the training set, 571 had H&E images and biomarker data available for contribution to the training set. Of the 401 patients eligible for the test set, 288 had H&E images and biomarker data available. Patients in the final training and test sets had information for tumor grade and histologic subtype, determined via centralized breast pathologist review within CBCS, along with biomarker data for ER status, PAM50 intrinsic breast cancer subtype, and risk of recurrence (ROR-PT) where noted. The H&E images were taken from tissue microarrays constructed with 1 -4 1 mm cores for each patient, resulting in 932 core images for the test set analysis presented here. ER status for each TMA core was determined using a digital algorithm as described by Allott et al. (2016) [28] and was defined using a ^ 10% positivity cut point for immunohistochemistry staining.
Tumor Tissue Microarray Construction. As has been described in detail by Allott et al. (2016), tumor tissue microarrays were constructed for CBCS 3 participants with available paraffin-embedded tumor blocks [26] The CBCS study pathologist marked areas of invasive breast cancer within a tumor on H&E stained whole slide images. The marked areas were selected for coring and 1 -4 tumor tissue cores per participant were used in the TMA construction at the Translational Pathology Laboratory at UNC. TMA slides were H&E stained and images were generated at 20x magnification. Cores with insufficient tumor cellularity were eliminated from the analysis.
Molecular Marker Data. In CBCS3, Nanostring assays were carried out on a randomly sampled subset of available formalin fixed paraffin embedded (FFPE) tumor tissue cores. RNA was isolated from [2], 1 .0-mm cores from the same FFPE block using the Qiagen RNeasy FFPE kit (catalogue # 73504). Nanostring assays, which use RNA counting as a measure of gene expression, were conducted. RNA-based intrinsic subtype was determined using the PAM50 gene signature described by Parker et al. (2009) [2] Based on the highest Pearson correlation with a subtype-defined centroid, each tumor was categorized into one of five intrinsic subtypes (Luminal A, Luminal B, HER2, Basal-like, Normal-like), using the 50 gene, PAM50 signature [27] Categorizations were based on a previously validated risk of recurrence score, generated using PAM50 subtype, tumor proliferation, and tumor size (ROR-PT) with a cutoff for high of 64.7 from the continuous ROR-PR score [2]
Image Analysis Pre-processing and Feature Extraction. Color and intensity normalization was first applied to standardize the appearance across core images, countering effects due to different stain amounts and protocols, as well as slide fading [29] The resulting stain intensity channels were then used as input to the rest of our algorithm. Most automated analyses of histology images use features that describe the properties of cells such as statistics of shape and color [5], [30-32] Such features are focused on cell-by-cell morphology and do not adapt well to new data sets. We instead captured tissue properties with a Convolutional Neural Network (CNN), which has been shown more successful for classification tasks on histology [16,33] These multi-layered networks consist of convolution filters applied to small patches of the image, followed by data reduction or pooling layers. Similar to human visual processing, the low level filters detect small structures such as edges and blobs. Intermediate layers capture increasingly complex properties like shape and texture. The top layers of the network are able to represent object parts like faces or bicycle tires. The convolution filters are learned from data, creating discriminating features at multiple levels of abstraction. There is no need to hand craft features. We used the VGG16 architecture (configuration D) [34] that was pre-trained on the ImageNet data set, which consists of 1 .2 million images from 1000 categories of objects and scenes. Although ImageNet contains a vastly different type of image, CNNs trained on this data set have been shown to transfer well to other data sets [35, 57-58], including those from biomedical applications [14, 59] The lower layers of a CNN are fairly generic, while the upper layers are much more specialized. The lower layers only capture smaller-scale features, which do not provide enough discriminating ability, while the upper layers are so specific to ImageNet that they do not generalize well to histology. Intermediate layers are both generalizable and discriminative for other tasks. In transferring to histology, we search for the layer that transfers best to our task. Output from each set of convolutional layers, before max pooling, was extracted over each image at full resolution to form a set of features for the image. Output from the fourth set of convolutional layers was chosen because it performed better than the outputs from other layers. The fourth set of convolutional layers outputs features of dimension 512. These lower CNN layers are convolutional, meaning that they can be run on any image size. For an image size of 2500x2500, they produce a grid of 284x284x512 features.
Model Training and Training Data Sets. In training a model to predict the class or characteristic group of a tumor, such as high or low grade, we utilize patient-level labels. The TMA images may be much larger than the required input to the VGG16 CNN (e.g., typically 2500x2500 pixels for TMA spots vs. 224x224 for VGG16). Further, applying the original CNN fully convolutionally would produce features that are not generalizable to histology. Thus some modifications to the VGG16 approach are necessary. A new classifier may be trained to operate on the intermediate level features from VGG16. Simply taking the mean of each feature over the image would limit our insight into which parts of the image contributed to the classification. The patient-level labels are weak compared to detailed patch- or pixel-level annotations used in most prior work, necessitating a different classification framework called multiple instance learning. In this setting, we were given a set of tumors, each containing one or more image regions. We were given a label for each tumor: tumor grade (pathologist determined), ER status (IHC-based), PAM50 intrinsic subtype (50 gene expression-based), ROR-PT (gene expression-based), or histologic subtype (pathologist determined). Due to the diverse appearance of tissue in a single image, learning the model with the patient label applied to every image region did not perform well in initial experiments. Heterogeneity of image region labels in each image is instead accounted for while training the model.
In order to account for intra-tumor heterogeneity, a probabilistic model was formed for how likely each image region is to belong to each class, with these probabilities aggregated across all image regions to form a prediction for the tumor as a whole. Image regions were generated as 800x800 pixel regions in the training images, with the mean of each CNN feature computed over the region. A linear Support Vector Machine (SVM) [39] calibrated with isotonic regression [40] was used to predict the probability for each region. Isotonic regression fits a piecewise-constant non-decreasing function, transforming the distance from the separating hyperplane learned by the SVM to a probability that an image region belongs to each class. This assumes that the SVM can rank image regions accurately and only needs the distances converted to probabilities. Each image region was labeled with the class of the tumor from which it belongs. The data for model fitting and calibration may be disjoint, so cross-validation was used to split the training instances into five equal-sized groups, where four were used for training and the remaining for calibration/validation (the test set remains untouched). For each fold, an SVM was learned on the training set and calibration was learned on the calibration set with isotonic regression, thus forming an ensemble. An ensemble of size five was selected to balance the desirability of a large training set, a reasonably sized validation set, and the simultaneous desirability of limiting the computation time. Predictions on the test set were made by averaging probabilities from the five models. This ensemble method also helped to soften any noise in the predictions caused by incorrect image region labels due to heterogeneity. Predictions for tumors were made by first forming a quantile function (inverse cumulative distribution) of the calibrated SVM ensemble predictions for the image regions using 16 equally spaced quantiles from images in the training set. The quantiles of the training images were used to train another linear SVM to predict the class label for the whole tumor, with sigmoid calibration transforming the SVM output into probabilities. This method allowed predictions to be made for individual image regions, while also aggregating to overall tumor predictions.
When training the previously described SVM classifiers, we initially weighted each class, including tumor grade, ER status, and Basal-like vs. non-Basal-like intrinsic subtype, equally. To reduce the leverage of grade in predicting ER status and intrinsic subtype, sample weighting was applied using weights inversely proportional to the number of samples in the group, e.g., low grade class 1 , low grade class 2, high grade class 1 , and high grade class 2 were each weighted equally, where the classes are the ER status, histologic subtype, or intrinsic subtype.
Prediction in Test Sets. At test time, 800x800 pixel overlapping regions with a stride of 400 pixels were used as image regions from each TMA spot that is typically 2500 pixels in diameter. Only image regions containing at least 50% tissue within the core image field of view (i.e. 50% tissue, 50% glass) were used. The calibrated SVM ensemble predicted the class of each image region by assigning a probability of belonging to one of two classes (tumor grade 1 or 3, ER+ or ER-, Basal-like or non-Basal-like subtype, ductal or lobular histologic subtype, and low-med or high ROR-PT). The probabilities computed on the image regions from all cores were aggregated into a quantile function and the second SVM was used to predict the class for the whole tumor.
Image-based Classification. Cut points were determined for each tumor characteristic based on the achievement of optimal sensitivity, specificity, and accuracy of each core being correctly classified relative to the pathology or biomarker data. To classify tumor grade, image analysis assigned a probability score of being a high grade vs. low grade tumor for each image. A cut point of greater than 0.80 was used for high grade tumors (Figure 1 a). Independently, traditional pathologist scoring methods were used to classify tumors as a combined grade of low, intermediate, or high. Also, two independent pathologists’ classifications of tumor grade for the same tissue sample were assessed to compare the agreement between two pathologists to that observed for image analysis versus pathologist classification. To classify patients as ER positive based on image analysis, the same principles were used as those described for tumor grade where each core was assigned a probability of ER-positivity. A probability of greater than 0.50 was classified as ER-positive by image analysis. To classify patients as ER positive based on biomarker data, samples had to have 10% or more of nuclei stained positive for ER by immunohistochemistry. For Basal-like vs. non-Basal-like RNA-based subtype, image analysis assigned a probability of each image being Basal-like and a probability cut point of >0.60 was used to classify Basal-like vs. non-Basal-like tumors. These results were compared against the PAM50-based intrinsic subtype classification methods using gene expression described previously [2] Similarly, we used image analysis to predict whether a tumor had a high or low-medium risk of recurrence. Image analysis predicted ROR-PT was based on a cut point of 0.20 for the probability of each TMA spot being classified as high ROR-PT. Histologic subtype was restricted to ductal and lobular tumors and was based on a cut point of 0.1 for the probability of each TMA spot being classified as lobular.
Prediction Accuracy and Associations with Clinical Characteristics. For core-level comparisons, image region probabilities of being a high grade tumor, ER positive, Basal-like subtype, lobular subtype, or high ROR-PT. For each variable, sensitivity, specificity, accuracy and kappa statistics (95% confidence interval [95% Cl]) were determined comparing the image analysis classification to tumor grade for the tumor tissue as a whole, IFIC-based ER status for each corresponding TMA core (ER positivity is available for each core rather than just for the whole tumor tissue), PAM50 subtype for the tumor tissue as a whole, histologic subtype for the tumor tissue as a whole, and ROR-PT for the tumor tissue as a whole. Accurate classification was defined as identical classification based on histologic image analysis and biomarker data for the same core. To determine whether any clinical characteristics were associated with an inaccurate image-based call for ER status, we estimated odds ratios (ORs) and 95% confidence intervals (95% Cl) for the association between patient characteristics and the accuracy of ER status (i.e. concordant with clinical status vs. discordant with clinical status) (Supplemental Table 1 ).“ All statistical analyses were done in SAS version 9.4 (SAS Institute, Cary, NC). P-values were two-sided with an alpha of 0.05.
Figure 3 is a block diagram of a computing platform with a CNN-based image classifier for predicting breast cancer classes, where the classes include tumor grade, ER status, histologic subtype, and intrinsic subtype through image analysis. Figure 4 is a flow chart of using the CNN-based image classifier in Figure 3 to predict breast cancer classes. Referring to Figure 3, a test image 300 is provided as input to a CNN-based image classifier 302. Test image 300 may be an FI&E stained histologic image as described above. CNN-based image classifier 302 may include a convolutional neural network 304, such as the VGG16 network, trained using the steps described above. CNN-based image classifier 302 may be implemented on a computing platform 306 including at least one processor 308 and memory 310.
Referring to Figures 3 and 4, prior to inputting image data into CNN 304, in step 400, instances are generated from test image 300. In one example, generating instances may include dividing test image 300 into regions of a predetermined size. In the example described above, the predetermined size is 800x800 pixels.
In step 402, features are extracted from each instance. Extracting features may include providing each image instance as input to CNN 304, which generates features as output for each image instance.
In some embodiments the CNN may be trained using multiple instance (Ml) learning and/or Ml aggregation. For example, the CNN may be trained end-to-end using Ml learning. In this example, the CNN may be trained using a Ml aggregation technique that aggregates predictions from smaller regions of an image into an image-level classification by using the quantile function. Additional details regarding Ml learning, Ml aggregation, and/or related aspects are discussed below. In some embodiments, the CNN may be trained using genomic data sets and a task-based canonical correlation analysis (CCA) method. For example, a CNN component may be configured for using imaging data and related genomic data set along with task-based CCA method (e.g., a task-optimal CCA method) to project data from the two sources to a shared space that is also discriminative. In this example, the CNN component may be trained for use in improving accuracy when identifying or extracting features from image instances during run-time. Additional details regarding a task-based CCA method and/or related aspects are discussed below.
In step 404, image classes are predicted for each instance. To predict the image classes for each instance, the CNN-generated feature for each instance is inputted into an instance support vector machine 312, which outputs a probability as to whether an instance is a member of each class. For example, for each of the image instances, instance SVM 312 may output probabilities as to whether the instance is one or the other binary classes for tumor grade (high grade or low grade), ER status (ER positive or ER negative), histologic subtype (ductal or lobular), risk of reoccurrence (high versus low), and intrinsic subtype (basal-like or non-basal like).
In step 406, the instance predictions are aggregated to produce instance predictions for the entire image. The probabilities computed for each image instance by instance SVM 312 are aggregated into a quantile function 314, and an aggregation SVM 316 that operates on quantile function 314 is to generate probabilities of each class for the entire image.
In step 408, tumor class is predicted. Using the probabilities generated in step 406, CNN-based image classifier 302 generates output as to whether the tumor is in one or the other of the above-described binary classes. To assign the tumor to one or the other of each of the binary classes, cut points or threshold values may be assigned for each class. The probability computed for each class may then be compared to the cut point or threshold to determine whether or not the tumor is a member of the class. For example, as described above, the cut point for classifying a tumor as high grade may be 0.8. Accordingly, if the probability computed for tumor grade for a given image is 80% or higher, the tumor may be classified as high grade. Thus, using deep learning for class predictions and instance aggregations, tumors can be classified from image analysis alone to identify the need for further diagnostic tests (such as genomic tests) and/or treatment.
MULTIPLE INSTANCE (Ml) LEARNING INTRODUCTION
Ml learning with a convolutional neural network enables end-to-end training in the presence of weak image-level labels. We propose a new method for aggregating predictions from smaller regions of the image into an image-level classification by using the quantile function. The quantile function provides a more complete description of the heterogeneity within each image, improving image-level classification. We also adapt image augmentation to the Ml framework by randomly selecting cropped regions on which to apply Ml aggregation during each epoch of training. This provides a mechanism to study the importance of Ml learning. We validate our method on five different classification tasks for breast tumor histology and provide a visualization method for interpreting local image classifications that could lead to future insights into tumor heterogeneity.
Deep learning has become the standard solution for classification when a large set of images with detailed annotations is available for training. When the annotations are weaker, such as with large, heterogeneous images, we turn to Ml learning. The image (called a bag) is broken into smaller regions (called instances). We are given a label for each bag, but the instance labels are unknown. Some form of pooling aggregates instances into a bag-level classification. By integrating Ml learning into a convolutional neural network (CNN), we can learn an instance classifier and aggregate the predictions so the entire system is trained end-to-end [47, 52, 45]
We propose a more general approach for aggregating instance predictions that looks at the full distribution by pooling with the quantile function (QF) and learning how much heterogeneity to expect for each class. As data augmentation is especially useful in training large CNNs, we also created an augmentation technique for training Ml methods with a CNN (see Figure 5). Through Ml augmentation, we study the importance of the Ml formulation during training. Using Ml learning to make class predictions over smaller regions of the image provides insight into how different parts of the image contribute to the classification. Visualizing the instance predictions provides a method of interpretability that we demonstrate on a data set of breast tumor tissue microarray (TMA) images stained with hematoxylin and eosin (H&E) by predicting grade, receptor status, and subtype. Some of these tasks are not previously known to be achievable from H&E alone. Our quantitative results conclude that the Ml component may be very useful for successful classification, demonstrating the importance of accounting for heterogeneity. This method could provide future insights into tumor heterogeneity and its connection with cancer progression [43, 49]
Figure 5 depicts Ml augmentation. In Ml learning, each bag contains one or more instances. Labels are given for the bag, but not the instances. Ml augmentation is a technique to provide additional training samples by randomly selecting a cropped image region and the instances within it. When the bag label is applied to a small number of instances, it is weak because this small region may not be representative of the bag class. Applying the bag label to larger cropped regions provides a stronger label, while still providing benefit from image augmentation. Training with the whole image maximizes the opportunity for Ml learning, but restricts the benefits of image augmentation. At test time, the whole image is processed and the predictions from all instances are aggregated into a bag prediction.
Contributions. 1 ) A more general Ml aggregation method that uses the quantile function for pooling and learns how to aggregate instance predictions. 2) An Ml augmentation technique for training Ml methods. 3) Exploration of single instance and Ml learning on a continuous spectrum, demonstrating the importance of Ml learning on heterogeneous images. 4) Evaluation on a large data set of 1713 patient samples (5970 images), showing significant gains in classifying breast cancer TMAs. 5) A method for visualizing the predictions of each instance, providing interpretability to the method.
Aggregating Instance Predictions. A permutation invariant pooling of instances is needed to accommodate images of different sizes, whereas a fully connected neural network may not. Existing pooling approaches are very aggressive; they compute a single number rather than looking at the distribution of instance predictions. Most Ml applications use the maximum, which works well for problems such as cancer diagnosis where, if there is a small amount of tumor, the sample is labeled as cancerous [46, 56] A smooth approximation, such as the generalized mean or noisy-OR, provide better convergence in a CNN [47, 52, 45] For other tasks, a majority vote, median, or mean is more appropriate. We include more of the distribution by pooling with the QF and learning a mapping to the bag class prediction, improving the classification accuracy. Our proposed method of quantile aggregation learns how to predict the bag class from instance predictions and so could provide a solution when the most suitable aggregator is unknown. The QF is a new general type of feature pooling that could provide an alternative to max pooling in a CNN.
Training Ml Methods with a CNN. In some embodiments, image augmentation may be applied in training a CNN by randomly cropping large portions of each image during each epoch. At test time, the whole image is used. We propose Ml augmentation, in which a subset of instances is randomly selected from each bag during each epoch. Instances may be the same size, but we choose how many instances to aggregate over. In selecting the number of instances, there may be two extremes: a single instance vs. the whole bag. In the former, the bag label is assigned to each instance and is often called single instance learning. In the latter, Ml aggregation is incorporated while training the bag classifier as in other Ml methods [41 , 44] Comparison studies have found little or no improvement from these Ml methods on some data sets [54, 55] We found Ml learning to be very beneficial and show that it can be useful in dealing with heterogeneous data.
MULTIPLE INSTANCE LEARNING WITH A CNN
We denote a bag by X, its label by Y e {1, 2, . .. , C}, and the instances it contains by xn for n = 1 , . ., N. The instance labels yn are unknown. On a novel sample, an instance classifier / st predicts the probability of each class c and a function f gg aggregates these instance probabilities into a bag probability:
Figure imgf000028_0001
Ml learning can be implemented with many different types of classifiers [41 , 46, 54] When implemented as a CNN, a fully convolutional network (FCN) forms the instance classifier finst , followed by a global Ml layer for instance aggregation fagg . The FCN consists of convolutional and pooling layers that downsize the representation, followed by a softmax operation to predict the probability for each class. For an input image of size w x w x 3, the FCN output is wd x wd x C. An instance is defined as the receptive field from the original image used in creating a point in this wd x wdgrid; the instances are overlapping. The Ml aggregation layer takes the instance probabilities and the foreground mask for the input image (downscaled to wd x wd), thereby aggregating over only the foreground instances.
Figure 6 illustrates an overview for an example Ml learning technique. During training, a cropped region of a given size is randomly selected. An FCN is applied to predict the class, producing a grid of instance predictions. The instance predictions are aggregated over the foreground of the image (as indicated by the foreground mask) using quantile aggregation to predict the class of the cropped image region. With a cross entropy loss applied, backpropagation then learns the FCN and aggregation function weights. At test time, the whole image is used.
MULTIPLE INSTANCE AGGREGATION
Instance predictions can be used to form a bag prediction in different ways. The bag prediction function should be invariant to the number and spatial arrangement of instances, so some pooling of predictions is needed. Mean aggregation is well suited for global pooling as it is permutation invariant and can incorporate a foreground mask for the input image. Denoting the mask as M and its value for each instance as mn e {0, 1} , the mean aggregation function i
Figure imgf000028_0002
Mean pooling incorporates predictions from all instances, but a lot of information may be lost in compressing to a single number. A histogram is a more complete description of the probability distribution, but is dependent upon a suitable bin width. Alternatively, the QF (inverse cumulative distribution) represents the boundary points between fractions of the population, providing a better discretization [42] We propose quantile aggregation to provide a more complete description of the instance predictions in a bag. If the instance predictions for class c are represented by Sc = {sl c, . . sW c}, then the q-th Q-quantile is the value z such that Pr(Sc < z) = (q - 0.5 )/Q. To pool with the QF, we first sort Sc and exclude instances not in the foreground, leaving the set Sc = {sl C,...,SjvC}. The sorted values in Sc are used to extract the QF vector for each class c as zc = \zl c, . . . , zQ c\ where
Figure imgf000029_0001
The QF vectors for all classes are concatenated as Z = [zl . . zc] . We then use a softmax function operating on Z to predict the bag class. The QF from all classes is used in order to learn the interaction of different subtypes in a bag. Backpropagation through the QF operates in a similar manner to max pooling by passing the gradient back to the instance that achieved each quantile.
TRAINING WITH MULTIPLE INSTANCE AUGMENTATION
Image augmentation by random cropping is an important technique for creating extra training samples that helps to reduce over-fitting. We propose an augmentation strategy for Ml methods to increase the number of training samples by randomly selecting a different subset of instances for each epoch. We randomly crop the image to select the set of instances, such that each crop contains at least 75% foreground according to the foreground mask. It is important to note that the image is never resized and the instance size remains constant. For each crop size chosen, the FCN is applied to the cropped image at full resolution. Ml augmentation is a strategy used during training. As the Ml aggregation layer is invariant to input size, the entire image and all its instances are always used at test time. MULTIPLE INSTANCE EXPERIMENTS
Data Set. Our data set consists of 1713 patient samples from the Carolina Breast Cancer Study, Phase 3 [53] There are typically four 1 .0 mm cores per patient in the TMA, with a total of 5970 cores. Each core is selected from the H&E-stained whole slide by a pathologist such that it contains a substantial amount of tumor tissue. Each image has a diameter of around 2400 pixels and a maximum of 3500 pixels. One sample core is shown in Figure 6. We use a random subset of half the patients for training and the other half for testing. Classification accuracy is measured for five different tasks, some of them multiclass: 1 ) histologic subtype (ductal or lobular), 2) estrogen receptor (ER) status (positive or negative), 3) grade (1 , 2, or 3), 4) risk of recurrence score (ROR) (low, intermediate, or high), 5) genetic subtype (basal, luminal A, luminal B, HER2, or normal-like). Ground truth for histologic subtype and grade are from a pathologist looking at the original whole slide. ER status is determined from immunohistochemistry, genetic subtype from the PAM50 array [51 ], and ROR from the ROR-PT score-based method [51 ]
Implementation Details. The TMA images are intensity normalized to standardize the appearance across slides [50] The hematoxylin, eosin, and residual channels are extracted from the normalization process and used as the three channel input for the rest of our algorithm. A binary mask distinguishing tissue from background is also provided as input.
We use the pre-trained CNN AlexNet [48] and fine-tune with the Ml architecture shown in Figure 6. All five tasks are equally weighted in a multi-task CNN as shared features help to reduce over-fitting. For each patient, ground truth labels are available for most tasks. The cross entropy loss is adjusted to ignore patients missing a label for a particular task.
In addition to Ml augmentation, we can randomly mirror and rotate each training image. To accommodate the larger cropped image sizes in GPU memory, we can reduce the batch size. A typical image with tissue of diameter 2400 pixels produces a 68 c 68 grid of instances. After applying the foreground mask, there are roughly 3600 instances. Q = 15 quantiles are used in all experiments. There are typically four core images per patient; we assign the patient label to each during training and, at test time, take the mean prediction across the images. Further Ml learning could be done to address the multiple core images per patient, however our current focus is only on Ml learning within each image. Ml Augmentation and the Importance of Ml Learning. We study the effect of Ml learning on large images by selecting the cropped image size for training. The smallest possible size is 227 c 227 (the input size for AlexNet), consisting of a single instance. When the bag label is applied to each instance during training, this is called single instance learning. Alternatively, a larger cropped region of size w x w can be selected; we test multiples of 500 up to 3500 and use mean aggregation in this experiment. By assigning the bag label to this larger cropped region during training and keeping the instance size constant, we perform Ml learning. Multiple random crops are obtained from each training image such that roughly the same number of pixels is sampled for each crop size (e.g., the whole image for the
35002
largest crop size of 3500, random crops for a training crop of size w). For
Figure imgf000031_0001
the largest crop size, the whole image may be used without Ml augmentation. Random mirroring and rotations are used for augmentation at all crop sizes. At test time, the whole image is always used, with the bag prediction formed by aggregating across all instances.
Figure 7 shows classification accuracy using mean aggregation as the number of instances (cropped image size) used for training is increased, while keeping instance size constant. Figure 7 shows that larger crop sizes for training significantly increases classification accuracy (p < 10-3 with McNemar’s test for w=500 vs. w=1500 on all tasks). The benefits level off for larger crops. As GPU memory requirements increase for larger crop sizes, selecting an intermediate crop size provides most of the benefits of Ml augmentation.
Although it should not be surprising that a larger crop size at training works better, the magnitude of improvement is very significant. If the images were homogeneous (at the scale of a single instance, w = 227), then applying the bag label to each instance should produce a classification accuracy similar to when Ml aggregation over the whole image is used during training. This is clearly not the case in Figure 7. For example, ER status increases from 68.6% to 85.6% when applying Ml learning over the whole image. This demonstrates the importance of Ml learning and the effect of heterogeneity. Our data set consists of cores selected from a whole slide by a pathologist. Ml learning may be even more crucial when classifying larger and more heterogeneous images like whole slides.
Figure imgf000032_0001
Table 5: Average classification accuracy for different types of Ml aggregation.
The standard error is in brackets.
Ml Aggregation. We compared aggregation methods by training our model on a crop size w = 2000 and taking the average classification accuracy over four runs. Table 5 shows that mean and quantile aggregation both significantly outperform max (p < 10-8 with McNemar’s test). While quantile aggregation performance is similar to mean for some tasks, a significant increase in performance (93.1 % to 95.2%) is observed for predicting the histologic subtype of ductal vs. lobular (p < 10-10 with McNemar’s test). This improvement is due to quantile aggregation predicting the bag class from a more complete view of the instance predictions using QF pooling, thereby capturing the heterogeneity.
Heterogeneity. By computing the class predictions for each instance, we get an idea of each region’s contribution to the classification. Figure 8 shows a visualization for a sample image where the instance predictions are colored for each class. The w = 2000 crop size was used for this example. With the same computation performed over the whole test set, we calculated the proportion of instances predicted to belong to each class. Figure 9 shows a plotting of the results for grade 1 vs. 3 and genetic subtype basal vs. luminal A. Heterogeneity is expected for grade, as the three tumor grades are not discrete, but a continuous spectrum from low to high. On the other hand, the level of heterogeneity to expect for genetic subtype is unknown because no studies have yet assessed genetic subtype from multiple samples within the same tumor. The graph shows a continuous spectrum from basal to luminal A. The luminal B, HER2, and normal samples lie mostly on the luminal A side, but with some mixing into the basal side.
Figures 8a-8e depict visualizations of instance predictions for a sample with ground truth labels of ductal (Figure 8a), ER positive (Figure 8b), grade 1 (Figure 8c), low ROR (Figure 8d), and luminal A (Figure 8e).
Figures 9a-9b depict visualizations of predicted heterogeneity for grade 1 vs. 3 (Figure 9a) and genetic subtype basal vs luminal A (Figure 9b). The predicted proportion for each class is calculated as the proportion of instances in the sample predicted to be from each class. Test samples for all classes are plotted.
TRAINING CNN USING Ml LEARNING DISCUSSION
We have shown that Ml learning while training a CNN may be very useful in achieving high classification accuracy on large, heterogeneous images. Even with a small number of labeled samples, our model was successful in fine-tuning the AlexNet CNN because of the large size of the images providing plenty of opportunity for Ml augmentation. The impact of Ml learning indicates that accommodating image heterogeneity is essential. While aggregating instance predictions with the mean is sufficient for some tasks, quantile aggregation produces a significant improvement for others. Instance-level predictions will enable future work studying tumor heterogeneity, perhaps leading to biological insights of tumor progression.
CANONICAL CORRELATION ANALYSIS (CCA) INTRODUCTION
CCA is a popular data analysis technique that projects two data sources into a space in which they are maximally correlated [61 , 62] It was initially used for unsupervised data analysis to gain insights into components shared by the two sources [63-65] CCA is also used to compute a shared latent space for cross-view classification [66, 64, 67, 68], for representation learning on multiple views that are then joined for prediction [69, 70], and for classification from a single view when a second view is available during training [71 ] While some of the correlated CCA features are useful for discriminative tasks, many represent properties that are of no use for classification and obscure correlated information that is beneficial. This problem is magnified with recent non-linear extensions of CCA, implemented via neural networks (NNs), that make significant strides in improving correlation [63-65, 68] but often at the expense of discriminative capability (see section CCA Experiments). Therefore, we present a new deep learning technique to project the data from two views to a shared space that is also discriminative.
Some prior work that boosts the discriminative capability of CCA is linear only [72-74] More recent work using NNs still remains limited in that it optimizes discriminative capability for an intermediate representation rather than the final CCA projection [70], or optimizes the CCA objective only during pre-training, not while training the task objective [75] We advocate to jointly optimize CCA and a discriminative objective by computing the CCA projection within a network layer while applying a task-driven operation such as classification. Experimental results show that our method significantly improves upon previous work [70, 75] due to its focus on both the shared latent space and a task-driven objective. The latter is particularly important on small training set sizes.
While alternative approaches to multi-view learning via CCA exist, they typically focus on a reconstruction objective. That is, they transform the input into a shared space such that the input could be reconstructed - either individually, or reconstructing one view from the other. Variations of coupled dictionary learning [76-79] and autoencoders [64, 80] have been used in this context. CCA-based objectives, such as the model used in this work, instead learn a transformation to a shared space without the need for reconstructing the input. This task may be easier and sufficient in producing a representation for multi-view classification [64] We show that the CCA objective can equivalently be expressed as an -£2 distance minimization in the shared space plus an orthogonality constraint. Orthogonality constraints help regularize NNs [81 ]; we present three techniques to accomplish this. While our method is derived from CCA, by manipulating the orthogonality constraints, we obtain deep CCA approaches that compute a shared latent space that is also discriminative.
Overall, our method enables end-to-end training via mini-batches, and we demonstrate the effectiveness of our model for three different tasks: 1 ) cross-view classification on a variation of MNIST [82] showing significant improvements in accuracy, 2) regularization when two views are available for training but only one at test time on a cancer imaging and genomic data set with only 1 ,000 samples, and 3) semi-supervised representation learning to improve speech recognition. In addition, our approach is more robust in the small sample size regime than alternative methods. Our experiments on real data show the effectiveness of our method in learning a shared space that is more discriminative than current state-of-the-art methods for a variety of tasks. CCA DETAILS
We present our task-driven CCA approach in section TOCCA. Linear and non-linear CCA are unsupervised and find the shared signal between a pair of data sources, by maximizing the sum correlation between corresponding projections. Let
Figure imgf000035_0001
and X2e d2X n be mean-centered input data from two different views with n samples and d1 , d2 features, respectively.
CCA. The objective is to maximize the correlation between <¾ = ¾ and a2 = w X2 , where
Figure imgf000035_0002
and w2 are projection vectors [61 ] The first canonical directions are found via
Figure imgf000035_0003
and subsequent projections are found by maximizing the same correlation but in orthogonal directions. Combining the projection vectors into matrices W1 =
Figure imgf000035_0004
CCA can be reformulated as a trace maximization under orthonormality constraints on the p
Figure imgf000035_0005
for covariance matrices
Figure imgf000036_0001
= X2X2 , and cross-covariance matrix S12 = XiX Let T = åϊ1/2åΐ2å2 1/2 and its singular value decomposition (SVD) be T = U1diag a)U2 with singular values
Figure imgf000036_0002
in descending order.
Figure imgf000036_0003
and W2 are computed from the top k singular vectors of
Figure imgf000036_0004
denotes the k first columns of matrix U. The sum correlation in the projection space is equivalent to
Figure imgf000036_0005
e.g., the sum of the top k singular values. A regularized variation of CCA (RCCA) ensures that the covariance matrices are positive definite by computing the covariance matrices as =
Figure imgf000036_0006
^X2X2 + rl, for regularization parameter r > 0 and identity matrix / [83]
DCCA. Deep CCA adds non-linear projections to CCA by non-linearly mapping the input via a multilayer perceptron (MLP). In particular, inputs X and X2 are mapped via non-linear functions j and f2, parameterized by and q2, resulting in activations At = f^Xi, 0 ) and A2 = f2(X2, q2) (assumed to be mean centered) [63] When implemented by a NN, At and A are the output activations of the final layer with d0 features. In Figure 10, diagram (a) shows the network structure. DCCA optimizes the same objective as CCA, see Eq. (1 ), but using activations ^ and A2. Regularized covariance matrices are computed accordingly and the solution for Wt and W2 can be computed using SVD just as with linear CCA. When k = d0 (e.g., the number of CCA components is equal to the number of features in At and A2), optimizing the sum correlation in the projection space, as in Eq. (2), is equivalent to optimizing the following matrix trace norm objective (TNO):
Figure imgf000036_0007
optimizes this objective directly, without a need to compute the CCA projection within the network. The TNO is optimized first, followed by a linear CCA operation before downstream tasks like classification are performed. Figure 10 depicts various deep CCA architectures. In Figure 10, diagram (a) depicts a DCCA based architecture that maximizes the sum correlation in projection space by optimizing an equivalent loss, the trace norm objective (TNO) [63] and diagram (b) depicts a SoftCCA based architecture that relaxes the orthogonality constraints by regularizing with soft decorrelation (Decorr) and optimizes the - 2 distance in the projection space (equivalent to sum correlation with activations normalized to unit variance) [68] Our TOCCA methods add a task loss and apply CCA orthogonality constraints by regularizing in two ways: diagram (c) TOCCA-W uses whitening and diagram (d) TOCCA-SD uses Decorr. The third method that we propose, TOCCA-ND, simply removes the Decorr components of TOCCA-SD.
SoftCCA. While DCCA enforces orthogonality constraints on projections
Figure imgf000037_0001
and W2A2, SoftCCA relaxes them using regularization [68] Final projection matrices W and W2 are integrated into j and f2 as the top network layer. The trace objective for DCCA in Eq. (1 ) can be rewritten as minimizing the - 2 distance between the projections when each feature in At and A2 is normalized to a unit variance [84], leading to L£2dist(Ai> A2) = II A1-A2 \\p . Regularization in SoftCCA penalizes the off-diagonal elements of the covariance matrix å, using a running average computed over batches as å and a loss of £Decorr(A) = åfx| åu|. Overall, the SoftCCA loss takes the form:
Figure imgf000037_0002
Supervised CCA methods. CCA, DCCA, and SoftCCA are all unsupervised methods to learn a projection to a shared space in which the data is maximally correlated. Although these methods have shown utility for discriminative tasks, a CCA decomposition may not be optimal for classification because features that are correlated may not be discriminative. Our experiments will show that maximizing the correlation objective too much can degrade performance on discriminative tasks.
CCA has previously been extended to supervised settings by maximizing the total correlation between each view and the training labels in addition to each pair of views [72, 73], and by maximizing the separation of classes [66, 70] Although these methods incorporate the class labels, they do not directly optimize for classification. Dorfer et. al’s CCA Layer (CCAL) is the closest to our method. It optimizes a task loss operating on a CCA projection; however, the CCA objective itself is only optimized during pre-training, not in an end-to-end manner [75] Other supervised CCA methods are linear only [73, 72, 66, 74] Instead of computing the CCA projection within the network, as in CCAL, we optimize the non-linear mapping into the shared space together with the CCA part.
TASK-OPTIMAL CCA (TOCCA)
To compute a shared latent space that is also discriminative, we start with the DCCA formulation and add a task-driven term to the optimization objective. The CCA component finds features that are correlated between views, while the task component ensures that they are also discriminative. This model can be used for representation learning on multiple views before joining representations for prediction [69, 70] and for classification when two views are available for training but only one at test time [71 ] In the CCA Experiments section, we demonstrate both use cases on real data. Our methods and related NN models from the literature are summarized in Table 6 below. Figure 10 shows schematic diagrams.
While DCCA optimizes the sum correlation through an equivalent loss function (TNO), the CCA projection itself is computed only after optimization. Hence, the projections may not be usable to optimize another task simultaneously. The main challenge in developing a task-optimal form of deep CCA that discriminates based on the CCA projection is in computing this projection within the network - a necessary step to enable simultaneous training of both objectives. We tackle this by focusing on the two components of DCCA: maximizing the sum correlation between activations A and A2 and enforcing orthonormality constraints within ^ and 42. We achieve both by transforming the CCA objective and present three methods that progressively relax the orthogonality constraints.
We further improve upon DCCA by enabling mini-batch computations for improved flexibility and test performance. DCCA was developed for large batches because correlation is not separable across batches. While large batch implementations of stochastic gradient optimization can increase computational efficiency via parallelism, small batch training provides more up-to-date gradient calculations, allowing a wider range of learning rates and improving test accuracy [85] We reformulate the correlation objective as the 2 distance (following SoftCCA), enabling separability across batches. We ensure a normalization to one via batch normalization without the scale and shift parameters [86]
Task-driven objective. First, we apply non-linear functions
Figure imgf000039_0001
and f2 (via MLPs) to each view Xt and X2, e.g., At = j (Xt; q ) and At = f2 (X2\ q2). Second, a task-specific function ftaSk .A > etask) operates on the outputs At and A2. In particular,
Figure imgf000039_0002
and f2 are optimized so that the - 2 distance between At and A2 is minimized; therefore, ftask can be trained to operate on both inputs A and A2 We combine CCA and task-driven objectives as a weighted sum with a hyperparameter for tuning. This model is flexible, in that the task-driven goal can be used for classification [87, 88], regression [89], clustering [90], or any other task. See Table 6 below for an overview.
Orthogonality constraints. The remaining complications for mini-batch optimization are the orthogonality constraints, for which we propose three solutions, each handling the orthogonality constraints of CCA in a different way: whitening, soft decorrelation, and no decorrelation.
Whitening (TOCCA-W). CCA applies orthogonality constraints to At and A2 . We accomplish this with a linear whitening transformation that transforms the activations such that their covariance becomes the identity matrix, e.g., features are uncorrelated. Decorrelated Batch Normalization (DBN) has previously been used to regularize deep models by decorrelating features [81 ] and inspired our solution. In particular, we apply a transformation B = UA to make B orthonormal, e.g., BB T = /.
We use a Zero-phase Component Analysis (ZCA) whitening transform composed of three steps: rotate the data to decorrelate it, rescale each axis, and rotate back to the original space. Each of these transformations is learned from the data. Any matrix UeM.d°xd° satisfying u =
Figure imgf000039_0003
whitens the data, where å denotes the covariance matrix of A. As U is only defined up to a rotation, it is not unique. PCA whitening follows the first two steps and uses the eigendecomposition of å: UPCA = A_1/2VT for A = diag l ... , j ldo) and v = [vl ..., vdo], where (L£, nέ) are the eigenvalue, eigenvector pairs of å. As PCA whitening suffers from stochastic axis swapping, neurons are not stable between batches [81] ZCA whitening uses the transformation UZCA =v A~1/2VT in which PCA whitening is first applied, followed by a rotation back to the original space. Adding the rotation v brings the whitened data B as close as possible to the original data A [31 ]
Computation of UZCA is clearly dependent on å. While Huang et al. [81] used a running average of UZCA over batches, we apply this stochastic approximation to å for each view using the update å(/c) = aå(ic_1) + (1 - a)åb for batch k where åb is the covariance matrix for the current batch and a e (0, 1) is the momentum. We then compute the ZCA transformation from å(fc) to do whitening as B = fZCA ( A ) = U^)AA. At test time,
Figure imgf000040_0001
from the last training batch is used. Algorithm 1 below describes ZCA whitening in greater detail. In summary, TOCCA-W integrates both the correlation and task-driven objectives, with decorrelation performed by whitening, into:
Figure imgf000040_0002
where B and B2 are whitened outputs of At and A2, respectively.
Algorithm SI Whitening layer for orthogonality.
Input: activations ,4eI i i X
.Hyperparameters: hatch size TO, momentum
Parameters of layer: mean /?. covariance å
If training then
m am 4- (1 - a)~·A { Update mean ]
A --- /! ···· m { Mean center data)
Figure imgf000040_0003
Update covariance]
å e å 4- el ( Add el tor numerical stability)
A, V <--- eig
Figure imgf000040_0004
{Compute eigendecomposition )
U 4-
Figure imgf000040_0005
(Compute transformation matrix)
else
A - ,4 - m { Mean center data)
end if
B - UA (Apply ZCA whitening transform)
return B Soft decorrelation (TOCCA-SD). While fully independent components may be beneficial in regularizing NNs on some data sets, a softer decorrelation may be more suitable on others. In this second formulation we relax the orthogonality constraints using regularization, following the Decorr loss of SoftCCA [68] The loss function for this formulation is:
Figure imgf000041_0001
^ 2 (.-^Decorr C^l) T (.^Decorr A 2)) ·
No decorrelation (TOCCA-ND). When CCA is used in an unsupervised manner, some form of orthogonality constraint or decorrelation is necessary to ensure that /j and f2 do not simply produce multiple copies of the same feature. While this result could maximize the sum correlation, it is not helpful in capturing useful projections. In the task-driven setting, the discriminative term ensures that the features in /j and f2 are not replicates of the same information. TOCCA-ND therefore removes the decorrelation term entirely, forming the simpler objective: LtasktftaskiA . Y + £tasMask(A2), Y) +
l {zdist(A1, A2 ) .
These three models allow testing whether whitening or soft decorrelation benefit a task-driven model.
Computational complexity. Due to the eigendecomposition, TOCCA-W has a complexity of O (d ) compared to O (d%) for TOCCA-SD, with respect to output dimension do. However, do is typically small (< 100) and this extra computation is only performed once per batch. The difference in runtime is less than 6.5% for a batch size of 100 or 9.4% for a batch size of 30 (see Table 7 below).
In summary, all three variants are motivated by adding a task-driven component to deep CCA. TOCCA-ND is the most relaxed and directly attempts to obtain identical latent representations. Experiments will show that whitening (TOCCA-W) and soft decorrelation (TOCCA-SD) provide a beneficial regularization. Further, since the £2 distance that we optimize was shown to be equivalent to the sum correlation (see section CCA DETAILS, SoftCCA paragraph), all three TOCCA models maintain the goals of CCA just with different relaxations of the orthogonality constraints. See Table 6 for an overview.
Figure imgf000042_0001
Table 6
Table 6 provides a comparison of our proposed task-optimal deep CCA methods with other related ones from the literature: DCCA [1 ], SoftCCA [2], CCAL-£ranfc [3] CCAL-£ranfc uses a pairwise ranking loss with cosine similarity to identify matching and non-matching samples for image retrieval - not classification. At and A2 are mean centered outputs from two feed-forward networks. å = ATA is computed from a single (large) batch (used in DCCA); å is computed as a running mean over batches (for all other methods). ftask{A- ^tfls/c) ^ ^ task-specific function with parameters ^tasi ^ e.g., a softmax operation for classification.
CCA EXPERIMENTS
We validated our methods on three different data sets: MNIST handwritten digits, the Carolina Breast Cancer Study (CBCS) using imaging and genomic features, and speech data from the Wisconsin X-ray Microbeam Database (XRMB). Our experiments show the utility of our methods for (1 ) cross-view classification, (2) regularization with a second view during training when only one view is available at test time, and (3) representation learning on multiple views that are joined for prediction.
Data set Batch size Epochs TOCOA W TOCCA SD
MNIST 100 200 488 s 418 s
MNIST 30 200 1071 s 1036 s
CBCS 100 400 KB s 104 s
XRMB 504)00 100 3056 s 3446 s
Table 7 Implementation. Each layer of our network consists of a fully connected layer, followed by a ReLU activation and batch normalization [86] We used the Nadam optimizer and tuned hyperparameters on a validation set via random search; settings and ranges are specified in Table 8. We used Keras with the Theano backend and an Nvidia GeForce GTX 1080 Ti. Our implementations of DCCA, SoftCCA, and Joint DCCA/DeepLDA [70] also use ReLu activation and batch normalization. We modified CCAL-£ranfc [75] to use a softmax function and cross-entropy loss for classification, instead of a pairwise ranking loss for retrieval, referring to this modification as CCAL-£ce.
Hyperparameter MNIST CBCS XRMB
_
Hidden layers 4 (0.4 j
Hidden layer size 500 200 1.000 Output layer size 50 50 112
Loss function weight L [10°, 10 4] nod ICr5 TO5 , Hr5 ! Momentum a 0.99 0,99 0.99 regularize! ; 10 :\ itr-{;L 0 ! ftr2. !0 : ], 0 NO ;\ io-?d o
Soft decorrelation regularize*· l ift0 J O · '’! p.0°J0 -d rio°, IO-^I Batch size 32 100 50.000 Learning rate [lO“2 , Kr4] [10- Mo-3] [10°, J O”4 ! Epochs 200 400 100
Table 8
CROSS-VIEW CLASSIFICATION ON MNIST DIGITS
We formed a multi-view data set from the MNIST handwritten digit image data set [82] Following Andrew et al. [63], we split each 28 x 28 image in half horizontally, creating left and right views that are each 14 x 28 pixels. All images were flattened into a vector with 392 features. The full data set consists of 60k training images and 10k test images. We used a random set of up to 50k for training and the remaining training images for validation. We used the full 10k image test set.
We evaluated cross-view classification accuracy by first computing the projection for each view, then we trained a linear SVM on one view’s projection, and finally we used the other view’s projection at test time. While the task-driven methods presented in this work learn a classifier within the model, this test setup enables a fair comparison with the unsupervised CCA variants and validates the discrim inativity of the features learned. Notably, using the built-in softmax classifier performed similarly to the SVM (not shown), as much of the power of our methods comes from the representation learning part. We do not compare with a simple supervised NN because this setup does not learn the shared space necessary for cross-view classification. We report results averaged over five randomly selected training/validation sets; the test set always remained the same.
Figure 1 1 depicts classification accuracy for different CCA methods. In the left diagram of Figure 1 1 , sum correlation vs. cross-view classification accuracy (on MNIST) is depicted across different hyperparameter settings on a training set size of 10,000 for DCCA [63], SoftCCA [68], TOCCA-W, and TOCCA-SD. For unsupervised methods (DCCA and SoftCCA), large correlations do not necessarily imply good accuracy. In the right diagram of Figure 1 1 , the effect of batch size on classification accuracy is depicted for each TOCCA method on MNIST (training set size of 10,000), and the effect of training set size on classification accuracy for each method. Our TOCCA variants out-performed all others across all training set sizes.
Correlation vs. classification accuracy. We first demonstrate the importance of adding a task-driven component to DCCA by showing that maximizing the sum correlation between views is not sufficient. Figure 1 1 (left) shows the sum correlation vs. cross-view classification accuracy across many different hyperparameter settings for DCCA [63], SoftCCA [68], and TOCCA. We used 50 components for each; thus, the maximum sum correlation was 50. The sum correlation was measured after applying linear CCA to ensure that components were independent. With DCCA a larger correlation tended to produce a larger classification accuracy, but there was still a large variance in classification accuracy amongst hyperparameter settings that produced a similar sum correlation. For example, with the two farthest right points in the plot (colored red), their classification accuracy differs by 10%, and they are not even the points with the best classification accuracy (colored purple). The pattern is different for SoftCCA. There was an increase in classification accuracy as sum correlation increased but only up to a point. For higher sum correlations, the classification accuracy varied even more from 20% to 80%. Further experiments (not shown) have indicated that when the sole objective is correlation, some of the projection directions are simply not discriminative, particularly when there are a large number of classes. Hence, optimizing for sum correlation alone does not guarantee a discriminative model. TOCCA-W and TOCCA-SD show a much greater classification accuracy across a wide range of correlations and, overall, the best accuracy when correlation is greatest.
Effect of batch size. Figure 1 1 (right) plots the batch size vs. classification accuracy for a training set size of 10; 000. We tested batch sizes from 10 to 10;000; a batch size of 10 or 30 was best for all three variations of TOCCA. This is in line with previous work that found the best performance with a batch size between 2 and 32 [85] We used a batch size of 32 in the remaining experiments on MNIST.
Effect of training set size. We manipulated the training set size in order to study the robustness of our methods. In particular, Figure 1 1 (right) shows the cross-view classification accuracy for training set sizes from n = 300 to 50;000. While we expected that performance would decrease for smaller training set sizes, some methods were more susceptible to this degradation than others. The classification accuracy with CCA dropped significantly for n = 300 and 1 ;000, due to overfitting and instability issues related to the covariance and cross-covariance matrices. SoftCCA shows similar behavior ([68] did not test such small training set sizes).
Across all training set sizes, our TOCCA variations consistently exhibited good performance, e.g., increasing classification accuracy from 78.3% to 86.7% for n = 1 ;000 with TOCCA-SD. Increases in accuracy over TOCCA-ND were small, indicating that the different decorrelation schemes have only a small effect on this data set; the task-driven component is the main reason for the success of our method. In particular, the classification accuracy with n = 1 ;000 did better than the unsupervised DCCA method on n = 10;000. Further, TOCCA with n = 300 did better than linear methods on n = 50;000, clearly showing the benefits of the proposed formulation. We also examined the CCA projections qualitatively via a 2D t-SNE embedding [92] Figure 12 shows the CCA projection of the left view for each method. As expected, the task-driven variant produced more clearly separated classes. Figure 12 shows t-SNE plots for CCA methods on an example variation of MNIST. Each method was used to compute projections for the two views (left and right sides of the images) using 10,000 training examples. The plots show a visualization of the projection for the left view with each digit colored differently. TOCCA-SD and TOCCA-ND (not shown) produced similar results to TOCCA-W.
Method Training data Test data Task Accuracy Method Training data Test data Task Accuracy
Linear SVM image only Image Basal 0.777 (0.003 ) Linear SVM GE only GE Grade 0.832 (0.012)
N'N' Image oniy Image Basal 0.808 (0.006) NN GE only GE Grade 0.830 (0.012)
CCAL-£iK Image+GE Irnaae Basal 0.807 (0.008) CCAL-A» GE+image GE Grade 0.804 (0.033)
TOCCA-W Image+GE Image Basal 0.830 (0.006) TOCCA-W GE+image GE Grade 0.862 (0.013)
TOCCA-SD Image+GE Image Basal 0.818 (0.006) TOCCA-SD GE+image GE Grade 0.856 (0.01 1)
TQCCA-KD Image+GE Image Basal 0 816 (0.004) T0CCA-ND GE+image GE Grade 0.856 (0.01 1 )
Table 9 Table 9 shows classification accuracy for different methods of predicting Basal genomic subtype from images or grade from gene expression. Linear SVM and DNN were trained on a single view, while all other methods were trained with both views. By regularizing with the second view during training, all TOCCA variants improved classification accuracy. The standard error is in parentheses.
REGULARIZATION FOR CANCER CLASSIFICATION
In this experiment, we address the following question: Given two views available for training but only one at test time, does the additional view help to regularize the model?
We study this question using 1 ,003 patient samples with image and genomic data from CBCS3 [93] Images consisted of four cores per patient from a tissue microarray that was stained with hematoxylin and eosin. Image features were extracted using a VGG16 backbone [94], pre-trained on ImageNet, by taking the mean of the 512D output of the fourth set of conv. layers across the tissue region and further averaging across all core images for the same patient. For gene expression (GE), we used the set of 50 genes in the PAM50 array [95] The data set was randomly split into half for training and one quarter for validation/testing; we report the mean over eight cross-validation runs. Classification tasks included predicting (1 ) Basal vs. non-Basal genomic subtype using images, which is typically done from GE, and (2) predicting grade 1 vs. 3 from GE, typically done from images. This is not a multi-task classification setup; it is a means for one view to stabilize the representation of the other.
We tested different classifier training methods when only one view was available at test time: a) a linear SVM trained on one view, b) a deep NN trained on one view using the same architecture as the lower layers of TOCCA, c) CCAL-£ce trained on both views, d) TOCCA trained on both views. Table 9 lists the classification accuracy for each method and task. When predicting genomic subtype Basal from images, all our methods showed an improvement in classification accuracy; the best result was with TOCCA-W, which produced a 2.2% improvement. For predicting grade from GE, all our methods again improved the accuracy - by up to 3.2% with TOCCA-W. These results show that having additional information during training can boost performance at test time. Notably, this experiment used a static set of pre-trained VGG16 image features in order to assess the utility of the method. The network itself could be fine-tuned end-to-end with our TOCCA model, providing an easy opportunity for data augmentation and likely further improvements in classification accuracy.
SEMI-SUPERVISED LEARNING FOR SPEECH RECOGNITION
Some additional experiments use speech data from XRMB, consisting of simultaneously recorded acoustic and articulatory measurements. Prior work has shown that CCA-based algorithms can improve phonetic recognition [57, 64, 65, 70] The 45 speakers were split into 35 for training, 2 for validation, and 8 for testing - a total of 1 ,429,236 samples for training, 85,297 for validation, and 1 1 1 ,314 for testing.4 The acoustic features are 1 12D and the articulatory ones are 273D. We removed the per-speaker mean & variance for both views. Samples are annotated with one of 38 phonetic labels.
Our task on this data set was representation learning for multi-view prediction. That is, using both views of data to learn a shared discriminative representation. We trained each model using both views and their labels. To test each CCA model, we followed prior work and concatenated the original input features from both views with the projections from both views. Due to the large training set size, we used a Linear Discriminant Analysis (LDA) classifier for efficiency. The same construction was used at test time. This setup was used to assess whether a task-optimal DCCA model can improve discriminative power. We tested TOCCA with a task-driven loss of LDA [88] or softmax to demonstrate the flexibility of our model.
Method T s Accuracy
Figure imgf000048_0001
Table 10
We compared the discriminability of a variety of methods to learn a shared latent representation. Table 10 lists the classification results with a baseline that used only the original input features for LDA. Although deep methods, e.g., DCCA and SoftCCA, improved upon the linear methods, all TOCCA variations significantly outperformed previous state-of-the-art techniques. Using softmax consistently beat LDA by a large margin. TOCCA-SD and TOCCA-ND produced equivalent results as a weight of 0 on the decorrelation term performed best. However, TOCCA-W showed the best result with an improvement of 15% over the best alternative method.
TOCCA can also be used in a semi-supervised manner when labels are available for only some samples. Table 11 lists the results for TOCCA-W in this setting. With 0% labeled data, the result would be similar to DCCA. Notably, a large improvement over the unsupervised results in Table 10 is seen even with labels for only 10% of the training samples. Labeled data Accuracy
100% 0.795
30% 0.762
10% 0,745
3% 0,684
1 % 0.637
Table 11
CCA DISCUSSION
We proposed a method to find a shared latent space that is also discriminative by adding a task-driven component to deep CCA while enabling end-to-end training. This was accomplished by replacing the CCA projection distance minimization and orthogonality constraints on the activations, and was implemented in three different ways. TOCCA-W or TOCCA-SD performed the best, dependent on the data set - both of which include some means of decorrelation to provide an extra regularizing effect to the model and thereby outperforming TOCCA-ND.
TOCCA showed large improvements over state-of-the-art in cross-view classification accuracy on MNIST and significantly increased robustness when the training set size was small. On CBCS, TOCCA provided a regularizing effect when both views were available for training but only one at test time. TOCCA also produced a large increase over state-of-the-art for multi-view representation learning on a much larger data set, XRMB. On this data set we also demonstrated a semi-supervised approach to get a large increase in classification accuracy with only a small proportion of the labels. Using a similar technique, our method could be applied when some samples are missing a second view.
Classification tasks using a softmax operation or LDA were explored in this work. However, the formulation presented can also be used with other tasks such as regression or clustering. Another possible avenue for future work entails extracting components shared by both views as well as individual components. This approach has been developed for dictionary learning [58- 60] and may be extended to deep CCA-based methods. The disclosure of each of the following references is hereby incorporated herein by reference in its entirety.
References
1 . Dunnwald, L. K., Rossing, M. A. & Li, C. I. Hormone receptor status, tumor characteristics, and prognosis: a prospective cohort of breast cancer patients. Breast Cancer Res. 9, R6 (2007).
2. Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1 160-7 (2009).
3. Sparano, J. A. & Paik, S. Development of the 21 -gene assay and its application in clinical practice and clinical trials. J. Clin. Oncol. 26, 721- 728 (2008).
4. Carlson, J. J. & Roth, J. A. The impact of the Oncotype Dx breast cancer assay in clinical practice: a systematic review and meta-analysis. Breast Cancer Res Treat 141 , 13-22 (2013).
5. Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3, 108ra1 13 (201 1 ).
6. Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).
7. Veta, M. et al. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med. Image Anal. 20, 237-48 (2015).
8. Khan, A. M., Sirinukunwattana, K. & Rajpoot, N. A Global Covariance Descriptor for Nuclear Atypia Scoring in Breast Histopathology Images. IEEE J. Biomed. Heal. Informatics 19, 1637-1647 (2015).
9. Basavanhally, A. et al. Incorporating domain knowledge for tubule detection in breast histopathology using O’Callaghan neighborhoods. Proc. SPIE 7963, 796310 (201 1 ).
10. Popovici, V. et al. Joint analysis of histopathology image features and gene expression in breast cancer. BMC Bioinformatics 17, 209 (2016). 11. Zhou, Y., Chang, H., Barner, K., Spellman, P. & Parvin, B. Classification of Histology Sections via Multispectral Convolutional Sparse Coding in Proc. CVPR (2014).
12. Vu, T. H., Mousavi, H. S., Monga, V., Rao, A. U. & Rao, G. Histopathological Image Classification using Discriminative Feature-oriented Dictionary Learning. IEEE Trans. Med. Imaging 35, 738-751 (2015).
13. Cruz-Roa, A. A., Ovalle, J. E. A., Madabhushi, A. & Gonzalez, F. A. O. A Deep Learning Architecture for Image Representation, Visual Interpretability and Automated Basal-Cell Carcinoma Cancer Detection in Proc. MICCAI (2013).
14. Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep Learning for Identifying Metastatic Breast Cancer (2016). Preprint at http://arxiv.Org/abs/1606.05718
15. Cire§an, D. C., Giusti, A., Gambardella, L. M. & Schmidhuber, J. Mitosis detection in breast cancer histology images with deep neural networks. Proc. MICCAI (2013).
16. Xu, J., Luo, X., Wang, G., Gilmore, H. & Madabhushi, A. A Deep Convolutional Neural Network for Segmenting and Classifying Epithelial and Stromal Regions in Histopathological Images. Neurocomputing 191 , 214-223 (2016).
17. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J. Pathol. Inform. 7, 29 (2016).
18. Longacre, T. A. et al. Interobserver agreement and reproducibility in classification of invasive breast carcinoma: an NCI breast cancer family registry study. Mod. Pathol. 19, 195-207 (2006).
19. Salles, M., Sanches, F. & Perez AA, G. Importance of a second opinion in breast surgical pathology and therapeutic implications. Rev. Bras. Ginecol. Obstet. 30, 602-608 (2008).
20. Boiesen, P. et al. Histologic grading in breast cancer-reproducibility between seven pathologic departments. South Sweden Breast Cancer Group. Acta Oncol. (Madr). 39, 41-45 (2000). 21. Ma, H. et al. Breast cancer receptor status: do results from a centralized pathology laboratory agree with SEER registry reports? Cancer Epidemiol. Biomarkers Prev. 18, 2214-20 (2009).
22. Prat, A., Ellis, M. J. & Perou, C. M. Practical implications of gene-expression-based assays for breast oncologists. Nat. Rev. Clin. Oncol. 9, 48-57 (2011 ).
23. Carey, L. A. et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA 295, 2492-502 (2006).
24. Rosen, P. P. Rosen’s Breast Pathology. (2009).
25. Makki, J. Diversity of Breast Carcinoma: Histological Subtypes and Clinical Relevance. Clin. Med. Insights Pathol. 8, 23-31 (2015).
26. Allott, E. H. et al. Performance of three biomarker immunohistochemistry for intrinsic breast cancer subtyping in the AMBER consortium. Cancer Epidemiol. Biomarkers Prev. 1-28 (2015).
27. Troester, M. A. et al. Racial Differences in PAM50 Subtypes in the Carolina Breast Cancer Study. J. Natl. Cancer Inst. 110, (2018).
28. Allott, E. H. et al. Performance of Three-Biomarker
Immunohistochemistry for Intrinsic Breast Cancer Subtyping in the AMBER Consortium. Cancer Epidemiol. Biomarkers Prev. 25, 470-478 (2016).
29. Niethammer, M. et al. Appearance Normalization of Histology Slides in MICCAI, International Workshop Machine Learning in Medical Imaging 58-66 (2010).
30. Miedema, J. et al. Image and statistical analysis of melanocytic histology. Histopathology 61 , 436-44 (2012).
31. Cooper, L. A. D. et al. Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Informatics Assoc. 19, 317-23 (2012).
32. Chang, H. et al. Morphometic analysis of TCGA glioblastoma multiforme. BMC Bioinformatics 12, 484 (2011 ).
33. Hou, L. et al. Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification in Proc. CVPR (2016). 34. Simonyan, K. & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition in International Conference on Learning Representations (2015).
35. Oquab, M., Bottou, L, Laptev, I. & Sivic, J. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks in Proc. CVPR (2014).
36. Razavian, A. S., Azizpour, H., Sullivan, J. & Carlsson, S. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition in Proc. CVPR (2014).
37. Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? in Proc. NIPS (2014).
38. Tajbakhsh, N. et al. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Trans. Med. Imaging 35, 1299-1312 (2016).
39. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. & Lin, C.-J. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res. 9, 1871-1874 (2008).
40. Zadrozny, B. & Elkan, C. Transforming classifier scores into accurate multiclass probability estimates. Proc. Int. Conf. Knowl. Discov. Data Min. 694-699 (2002).
41 . Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS. pp. 561-568 (2002)
42. Broadhurst, R.E.: Compact appearance in object populations using quantile function based distribution families. Ph.D. thesis, The University of North Carolina at Chapel Hill (2008)
43. Hiley, C.T., Swanton, C.: Spatial and temporal cancer evolution: causes and consequences of tumour diversity. Clinical Medicine 14(Suppl 6), s33-s37 (Dec 2014)
44. Hou, L., Samaras, D., et al.: Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification. In: CVPR (2016) (same as reference 33) 45. Jia, Z., Huang, X., Chang, E. I.C., Xu, Y.: Constrained Deep Weak Supervision for Histopathology Image Segmentation. arXiv preprint: 1701 .00794 (2017)
46. Kandemir, M., Hamprecht, F.A.F.: Computer-aided diagnosis from weak supervision: A benchmarking study. Computerized Medical Imaging and Graphics (2014)
47. Kraus, O.Z., Ba, J. L., Frey, B.J.: Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32(12), Ϊ52-Ϊ59 (jun 2016)
48. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS. pp. 1 106-1 1 14 (2012)
49. McGranahan, N., Swanton, C.: Biological and Therapeutic Impact of Intratumor Heterogeneity in Cancer Evolution. Cancer Cell 27(1 ), 15-26 (Jan 2015)
50. Niethammer, M., Borland, D., Marron, J., Woolsey, J., Thomas, N.: Appearance normalization of histology slides. In: MICCAI Workshop on Machine Learning in Medical Imaging (2010) (same as reference 28)
51 . Parker, J.S., Mullins, M., et al.: Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of clinical oncology 27(8), 1 160-1 167 (2009) (same as reference 2)
52. Sun, M., Han, T.X., Liu, M.C., Khodayari-Rostamabad, A.: Multiple Instance Learning Convolutional Neural Networks for Object Recognition. In: ICPR (2016)
53. Troester, M., Sun, X., et al.: Racial differences in PAM50 subtypes in the Carolina Breast Cancer Study. Journal of the National Cancer Institute (2018) (same as reference 27)
54. Vanwinckelen, G., Tragante do O, V., Fierens, D., Blockeel, H.: Instance-level accuracy versus bag-level accuracy in multi-instance learning. Data Mining and Knowledge Discovery 30(2), 313-341 (mar 2016)
55. Wang, X., Yan, Y., Tang, P., Bai, X., Liu, W.: Revisiting Multiple Instance Neural Networks. Pattern Recognition 74, 15-24 (2018) 56. Xu, Y., Zhu, J.Y., et al. : Weakly supervised histopathology cancer image segmentation and classification. Medical Image Analysis 18(3), 591-604 (Apr 2014)
57. Weiran Wang, Raman Arora, Karen Livescu, and Jeff A. Bilmes. Unsupervised learning of acoustic features via deep canonical correlation analysis. In Proc. ICASSP, 2015.
58. Eric F Lock, Katherine A Hoadley, J S Marron, and Andrew B Nobel. Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Data Types. The Annals of Applied Statistics, 7(1 ):523— 542, mar 2013.
59. Priyadip Ray, Lingling Zheng, Joseph Lucas, and Lawrence Carin. Bayesian joint analysis of heterogeneous genomics data. Bioinformatics, 30(10): 1370-6, may 2014.
60. Qing Feng, Meilei Jiang, Jan Hannig, and JS Marron. Angle-based joint and individual variation explained. Journal of Multivariate Analysis, 166:241-265, 2018.
61. Harold Hotelling. Relations between two sets of variates.
Biometrika, 28(3/4):321-377, dec 1936.
62. Tijl De Bie, Nello Cristianini, and Roman Rosipal.
Eigenproblems in pattern recognition. In Handbook of Geometric Computing, pages 129-167. Springer Berlin Heidelberg, 2005.
63. Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. Deep Canonical Correlation Analysis. In Proc. ICML, 2013.
64. Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning. In Proc. ICML, 2015.
65. Weiran Wang, Raman Arora, Karen Livescu, and Nathan Srebro. Stochastic optimization for deep CCA via nonlinear orthogonal iterations. In Proc. Allerton Conference on Communication, Control, and Computing, 2016.
66. Meina Kan, Shiguang Shan, Haihong Zhang, Shihong Lao, and Xilin Chen. Multi-view Discriminant Analysis. IEEE PAMI, 2015. 67. Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, and Balaraman Ravindran. Correlational Neural Networks. Neural Computation, 28(2):257-285, Feb 2016.
68. Xiaobin Chang, Tao Xiang, and Timothy M. Hospedales.
Scalable and Effective Deep CCA via Soft Decorrelation. In Proc. CVPR, 2018.
69. Mehmet Emre Sargin, YOcel Yemez, Engin Erzin, and A Murat Tekalp. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Transactions on Multimedia, 9(7): 1396-1403, 2007.
70. Matthias Dorfer, Gerhard Widmer, and Gerhard Widmerajku At. Towards Deep and Discriminative Canonical Correlation Analysis. In Proc. ICML Workshop on Multi-view Representaiton Learning, 2016.
71 . Raman Arora and Karen Livescu. Kernel cca for multi-view learning of acoustic features using articulatory measurements. In Symposium on Machine Learning in Speech and Language Processing, 2012.
72. George Lee, Asha Singanamalli, Haibo Wang, Michael D Feldman, Stephen R Master, Natalie N C Shih, Elaine Spangler, Timothy Rebbeck, John E Tomaszewski, and Anant Madabhushi. Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer. IEEE
Transactions on Medical Imaging, 34(1 ):284— 97, Jan 2015.
73. Asha Singanamalli, Haibo Wang, George Lee, Natalie Shih, Mark Rosen, Stephen Master, John Tomaszewski, Michael Feldman, and Anant Madabhushi. Supervised multi-view canonical correla353
tion analysis: fused multimodal prediction of disease diagnosis and prognosis. In Proc. SPIE Medical Imaging, 2014.
74. Kanghong Duan, Hongxin Zhang, and Jim Jing Yan Wang. Joint learning of cross-modal classifier and factor analysis for multimedia data classification. Neural Computing and Applications, 27(2):459-468, Feb 2016.
75. Matthias Dorfer, Jan Schliiter, Andreu Vail, Filip Korzeniowski, and Gerhard Widmer. End-to-end cross modality retrieval with CCA projections and pairwise ranking loss. International Journal of Multimedia Information Retrieval, 7(2): 1 17-128, Jun 2018. 76. Sumit Shekhar, Vishal M Patel, Nasser M Nasrabadi, and Rama Chellappa. Joint sparse representation for robust multimodal biometrics recognition. IEEE PAMI, 36(1 ):113— 26, jan 2014.
77. Xing Xu, Atsushi Shimada, Rin-ichiro Taniguchi, and Li He. Coupled dictionary learning and feature mapping for cross-modal retrieval. In Proc. International Conference on Multimedia and Expo, 2015.
78. Miriam Cha, Youngjune Gwon, and H. T. Kung. Multimodal sparse representation learning and applications. arXiv preprint: 1511.06238, 2015.
79. Soheil Bahrampour, Nasser M. Nasrabadi, Asok Ray, and W. Kenneth Jenkins. Multimodal Task-Driven Dictionary Learning for Image Classification. arXiv preprint: 1502.01094, 2015.
80. Gaurav Bhatt, Piyush Jha, and Balasubramanian Raman.
Common Representation Learning Using Step370 based Correlation
Multi-Modal CNN. arXiv preprint: 1711.00003, 2017.
81. Lei Huang, Dawei Yang, Bo Lang, and Jia Deng. Decorrelated Batch Normalization. In Proc. CVPR, 2018.
82. Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
83. Natalia Y. Bilenko and Jack L. Gallant. Pyrcca: regularized kernel canonical correlation analysis in Python and its applications to neuroimaging. Frontiers in Neuroinformatics, 10, nov 2016.
84. Dongge Li, Nevenka Dimitrova, Mingkun Li, and Ishwar K. Sethi. Multimedia content processing through cross-modal association. In Proc. ACM International Conference on Multimedia, 2003.
85. Dominic Masters and Carlo Luschi. Revisiting Small Batch Training for Deep Neural Networks arxiv preprint: 1804.07612, 2018.
86. Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. ICML, 2015.
87. Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1106-1114, 2012. 88. Matthias Dorfer, Rainer Kelz, and Gerhard Widmer. Deep linear discriminant analysis. In Proc. ICLR, 2016.
89. Jared Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. Deep Survival: A Deep Cox Proportional Hazards Network arxiv preprint: 1606.00931 , 2016.
90. Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In Proc. ECCV, 2018.
91 . Agnan Kessy, Alex Lewin, and Korbinian Strimmer. Optimal whitening and decorrelation. arXiv preprint: 1512.00809, 2015.
92. L Van Der Maaten and G Hinton. Visualizing high-dimensional data using t-sne. journal of machine learning research. Journal of Machine Learning Research, 9:26, 2008.
93. MA Troester, Xuezheng Sun, Emma H. Allott, Joseph Geradts, Stephanie M Cohen, Chui Kit Tse, Erin L. Kirk, Leigh B Thorne, Michelle Matthews, Yan Li, Zhiyuan Hu, Whitney R. Robinson, Katherine A. Hoadley, Olufunmilayo I. Olopade, Katherine E. Reeder-Hayes, H. Shelton Earp, Andrew F. Olshan, LA Carey, and Charles M. Perou. Racial differences in PAM50 subtypes in the Carolina Breast Cancer Study. Journal of the National Cancer Institute, 2018.
94. Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. ICLR, 2015.
95. Joel S Parker, Michael Mullins, Maggie CU Cheang, Samuel Leung, David Voduc, Tammi Vickery, Sherri Davies, Christiane Fauron, Xiaping He, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8): 1 160-1 167, 2009.
It will be understood that various details of the presently disclosed subject matter may be changed without departing from the scope of the presently disclosed subject matter. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation.

Claims

CLAIMS What is claimed is:
1. A method for image analysis using deep learning to predict breast cancer classes, the method comprising:
receiving a test image;
generating test image instances from the test image;
inputting the test image instances into a convolutional neural network (CNN), which extracts features from each of the test image instances;
applying an instance support vector machine (SVM) to the features to predict breast cancer classes for each of the instances; and aggregating outputs from the instance SVM to predict breast cancer classes for the test image.
2. The method of claim 1 wherein the CNN is trained using multiple instance (Ml) learning and/or Ml aggregation.
3. The method of claim 1 wherein the CNN is trained using genomic data sets and a task-optimal canonical correlation analysis (TOCCA) method.
4. The method of claim 1 wherein the test image comprises a haemotoxylin and eosin (H&E) stained histologic image and wherein generating the test image instances includes dividing the test image into regions of a predetermined pixel size.
5. The method of claim 1 wherein inputting the test image instances into a CNN includes inputting the test image instances into a VGG16 CNN trained on an ImageNet data set.
6. The method of claim 1 wherein the instance SVM outputs a probability for each of the classes that an image instance belongs to the class.
7. The method of claim 6 wherein the breast cancer classes include tumor grade, estrogen receptor (ER) status, intrinsic subtype, risk of reoccurrence, and histologic subtype.
8. The method of claim 7 wherein aggregating the outputs from the instance SVM includes aggregating, for each class, probabilities computed for each test image instance into a quantile function and using an aggregation SVM that operates on the quantile function to generate probabilities for each of the breast cancer classes for the test image.
9. The method of claim 8 wherein the breast cancer classes comprise binary classes for each of tumor grade, ER status, intrinsic subtype, risk of reoccurrence, and histologic subtype, and wherein predicting the breast cancer classes includes assigning the test image to one of the binary classes for each of tumor grade, ER status, intrinsic subtype, risk of reoccurrence, and histologic subtype by comparing the probabilities to threshold values for the binary classes.
10. A system for image analysis using deep learning to predict breast cancer classes, the system comprising:
a computing platform including at least one processor a convolutional neural network (CNN)-based image classifier implemented on the computing platform for receiving a test image and generating test image instances from the test image, the CNN-based image classifier including:
a CNN for extracting features from each of the test image instances;
an instance support vector machine (SVM) for predicting, from the features, breast cancer classes for each of the instances; and
an aggregation SVM for aggregating outputs from the instance SVM to predict breast cancer classes for the test image.
1 1 . The system of claim 10 wherein the CNN is trained using multiple instance (Ml) learning and/or Ml aggregation.
12. The system of claim 10 wherein the CNN is trained using genomic data sets and a task-optimal canonical correlation analysis (TOCCA) method.
13. The system of claim 10 wherein the test image comprises a haemotoxylin and eosin (H&E) stained histologic image.
14 The system of claim 10 wherein the instances include regions of the test image that are of a predetermined pixel size.
15. The system of claim 10 wherein the CNN comprises a VGG16 CNN trained on an ImageNet data set.
16. The system of claim 10 wherein the instance SVM outputs a probability for each of the classes that an image instance belongs to the class.
17. The system of claim 16 wherein the breast cancer classes include tumor grade, estrogen receptor (ER) status, intrinsic subtype, risk of reoccurrence, and histologic subtype.
18. The system of claim 17 wherein the instance SVM inputs probabilities computed for each test image instance into a quantile function and the aggregation SVM operates on the quantile function to generate probabilities for each of the breast cancer classes for the test image.
19. The system of claim 18 wherein the breast cancer classes comprise binary classes for each of tumor grade, ER status, intrinsic subtype, risk of reoccurrence, and histologic subtype, and wherein the CNN-based image classifier assigns the test image to one of the binary classes for each of tumor grade, ER status, intrinsic subtype, risk of reoccurrence, and histologic subtype by comparing the probabilities to threshold values for the binary classes.
20. A non-transitory computer readable medium having stored thereon executable instructions that when executed by a processor of a computer control the computer to perform steps comprising:
receiving a test image;
generating test image instances from the test image;
inputting the test image instances into a convolutional neural network (CNN), which extracts features from each of the test image instances;
applying an instance support vector machine (SVM) to the features to predict breast cancer classes for each of the instances; and aggregating outputs from the instance SVM to predict breast cancer classes for the test image.
PCT/US2019/041395 2018-07-11 2019-07-11 Methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes WO2020014477A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201862696464P 2018-07-11 2018-07-11
US62/696,464 2018-07-11
US201862757746P 2018-11-08 2018-11-08
US62/757,746 2018-11-08

Publications (1)

Publication Number Publication Date
WO2020014477A1 true WO2020014477A1 (en) 2020-01-16

Family

ID=69142033

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/041395 WO2020014477A1 (en) 2018-07-11 2019-07-11 Methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes

Country Status (1)

Country Link
WO (1) WO2020014477A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325282A (en) * 2020-03-05 2020-06-23 北京深睿博联科技有限责任公司 Mammary gland X-ray image identification method and device suitable for multiple models
CN111626989A (en) * 2020-05-06 2020-09-04 杭州迪英加科技有限公司 High-precision detection network training method for lack-of-label pathological image
CN111653355A (en) * 2020-04-16 2020-09-11 中山大学附属第六医院 Artificial intelligent prediction model for intestinal cancer peritoneal metastasis and construction method of model
EP3754611A1 (en) * 2019-06-17 2020-12-23 NVIDIA Corporation Cell image synthesis using one or more neural networks
CN112419324A (en) * 2020-11-24 2021-02-26 山西三友和智慧信息技术股份有限公司 Medical image data expansion method based on semi-supervised task driving
CN112801939A (en) * 2020-12-31 2021-05-14 杭州迪英加科技有限公司 Method for improving index accuracy of pathological image KI67
CN112820071A (en) * 2021-02-25 2021-05-18 泰康保险集团股份有限公司 Behavior identification method and device
CN112991263A (en) * 2021-02-06 2021-06-18 杭州迪英加科技有限公司 Method and equipment for improving calculation accuracy of TPS (acute respiratory syndrome) of PD-L1 immunohistochemical pathological section
CN113096079A (en) * 2021-03-30 2021-07-09 四川大学华西第二医院 Image analysis system and construction method thereof
CN113808735A (en) * 2021-09-08 2021-12-17 山西大学 Mental disease assessment method based on brain image
CN114027794A (en) * 2021-11-09 2022-02-11 新乡医学院 Pathological image breast cancer region detection method and system based on DenseNet network
WO2022061083A1 (en) * 2020-09-18 2022-03-24 Proscia Inc. Training end-to-end weakly supervised networks at the specimen (supra-image) level
US20220148178A1 (en) * 2020-11-11 2022-05-12 Agendia NV Methods of assessing diseases using image classifiers
CN114648509A (en) * 2022-03-25 2022-06-21 中国医学科学院肿瘤医院 Thyroid cancer detection system based on multi-classification task
US11423678B2 (en) 2019-09-23 2022-08-23 Proscia Inc. Automated whole-slide image classification using deep learning
WO2023052367A1 (en) * 2021-09-28 2023-04-06 Stratipath Ab System for cancer progression risk determination
CN115984622A (en) * 2023-01-10 2023-04-18 深圳大学 Classification method based on multi-mode and multi-example learning, prediction method and related device
CN116405100A (en) * 2023-05-29 2023-07-07 武汉能钠智能装备技术股份有限公司 Distortion signal restoration method based on priori knowledge
CN116881725A (en) * 2023-09-07 2023-10-13 之江实验室 Cancer prognosis prediction model training device, medium and electronic equipment
US11861881B2 (en) 2020-09-23 2024-01-02 Proscia Inc. Critical component detection using deep learning and attention
US12039720B2 (en) 2021-06-15 2024-07-16 Sony Group Corporation Automatic estimation of tumor cellularity using a DPI AI platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767946A (en) * 2017-09-26 2018-03-06 浙江工业大学 Breast cancer diagnosis system based on PCA (principal component analysis) and PSO-KE (particle swarm optimization-Key) L M (model-based regression) models
US20180157940A1 (en) * 2016-10-10 2018-06-07 Gyrfalcon Technology Inc. Convolution Layers Used Directly For Feature Extraction With A CNN Based Integrated Circuit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157940A1 (en) * 2016-10-10 2018-06-07 Gyrfalcon Technology Inc. Convolution Layers Used Directly For Feature Extraction With A CNN Based Integrated Circuit
CN107767946A (en) * 2017-09-26 2018-03-06 浙江工业大学 Breast cancer diagnosis system based on PCA (principal component analysis) and PSO-KE (particle swarm optimization-Key) L M (model-based regression) models

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ILIAS MAGLOGIANNIS ET AL.: "An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers", APPLIED INTELLIGENCE, vol. 30, no. 1, 12 July 2007 (2007-07-12), pages 24 - 36, XP019639005 *
VAISHNAVI SUBRAMANIAN ET AL.: "Correlating cellular features with geneexpression using CCA", 2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018, 24 May 2018 (2018-05-24), pages 805 - 808, XP081218485 *
Y.IREANEUS ANNA REJANI ET AL.: "Early Detection of Breast Cancer using SVM Classifier Technique", INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND ENGINEERING, vol. 1, no. 3, 2009, pages 127 - 130, XP055676413, ISSN: 0975-3397 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3754611A1 (en) * 2019-06-17 2020-12-23 NVIDIA Corporation Cell image synthesis using one or more neural networks
US11462032B2 (en) 2019-09-23 2022-10-04 Proscia Inc. Stain normalization for automated whole-slide image classification
US11423678B2 (en) 2019-09-23 2022-08-23 Proscia Inc. Automated whole-slide image classification using deep learning
CN111325282B (en) * 2020-03-05 2023-10-27 北京深睿博联科技有限责任公司 Mammary gland X-ray image identification method and device adapting to multiple models
CN111325282A (en) * 2020-03-05 2020-06-23 北京深睿博联科技有限责任公司 Mammary gland X-ray image identification method and device suitable for multiple models
CN111653355A (en) * 2020-04-16 2020-09-11 中山大学附属第六医院 Artificial intelligent prediction model for intestinal cancer peritoneal metastasis and construction method of model
CN111653355B (en) * 2020-04-16 2023-12-26 中山大学附属第六医院 Intestinal cancer peritoneal metastasis artificial intelligent prediction model and construction method thereof
CN111626989A (en) * 2020-05-06 2020-09-04 杭州迪英加科技有限公司 High-precision detection network training method for lack-of-label pathological image
CN111626989B (en) * 2020-05-06 2022-07-22 杭州迪英加科技有限公司 High-precision detection network training method for lack-of-label pathological image
WO2022061083A1 (en) * 2020-09-18 2022-03-24 Proscia Inc. Training end-to-end weakly supervised networks at the specimen (supra-image) level
US11861881B2 (en) 2020-09-23 2024-01-02 Proscia Inc. Critical component detection using deep learning and attention
US11954859B2 (en) * 2020-11-11 2024-04-09 Agendia NV Methods of assessing diseases using image classifiers
WO2022101672A3 (en) * 2020-11-11 2022-08-11 Agendia NV Method of assessing diseases using image classifiers
US20220148178A1 (en) * 2020-11-11 2022-05-12 Agendia NV Methods of assessing diseases using image classifiers
CN112419324A (en) * 2020-11-24 2021-02-26 山西三友和智慧信息技术股份有限公司 Medical image data expansion method based on semi-supervised task driving
CN112419324B (en) * 2020-11-24 2022-04-19 山西三友和智慧信息技术股份有限公司 Medical image data expansion method based on semi-supervised task driving
CN112801939A (en) * 2020-12-31 2021-05-14 杭州迪英加科技有限公司 Method for improving index accuracy of pathological image KI67
CN112801939B (en) * 2020-12-31 2022-07-22 杭州迪英加科技有限公司 Method for improving index accuracy of pathological image KI67
CN112991263B (en) * 2021-02-06 2022-07-22 杭州迪英加科技有限公司 Method and equipment for improving TPS (tissue specific differentiation) calculation accuracy of PD-L1 immunohistochemical pathological section
CN112991263A (en) * 2021-02-06 2021-06-18 杭州迪英加科技有限公司 Method and equipment for improving calculation accuracy of TPS (acute respiratory syndrome) of PD-L1 immunohistochemical pathological section
CN112820071A (en) * 2021-02-25 2021-05-18 泰康保险集团股份有限公司 Behavior identification method and device
CN113096079A (en) * 2021-03-30 2021-07-09 四川大学华西第二医院 Image analysis system and construction method thereof
CN113096079B (en) * 2021-03-30 2023-12-29 四川大学华西第二医院 Image analysis system and construction method thereof
US12039720B2 (en) 2021-06-15 2024-07-16 Sony Group Corporation Automatic estimation of tumor cellularity using a DPI AI platform
CN113808735A (en) * 2021-09-08 2021-12-17 山西大学 Mental disease assessment method based on brain image
CN113808735B (en) * 2021-09-08 2024-03-12 山西大学 Mental disease assessment method based on brain image
WO2023052367A1 (en) * 2021-09-28 2023-04-06 Stratipath Ab System for cancer progression risk determination
CN114027794A (en) * 2021-11-09 2022-02-11 新乡医学院 Pathological image breast cancer region detection method and system based on DenseNet network
CN114648509A (en) * 2022-03-25 2022-06-21 中国医学科学院肿瘤医院 Thyroid cancer detection system based on multi-classification task
CN115984622A (en) * 2023-01-10 2023-04-18 深圳大学 Classification method based on multi-mode and multi-example learning, prediction method and related device
CN115984622B (en) * 2023-01-10 2023-12-29 深圳大学 Multi-mode and multi-example learning classification method, prediction method and related device
CN116405100A (en) * 2023-05-29 2023-07-07 武汉能钠智能装备技术股份有限公司 Distortion signal restoration method based on priori knowledge
CN116405100B (en) * 2023-05-29 2023-08-22 武汉能钠智能装备技术股份有限公司 Distortion signal restoration method based on priori knowledge
CN116881725A (en) * 2023-09-07 2023-10-13 之江实验室 Cancer prognosis prediction model training device, medium and electronic equipment
CN116881725B (en) * 2023-09-07 2024-01-09 之江实验室 Cancer prognosis prediction model training device, medium and electronic equipment

Similar Documents

Publication Publication Date Title
WO2020014477A1 (en) Methods, systems, and computer readable media for image analysis with deep learning to predict breast cancer classes
Couture et al. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype
US20240069026A1 (en) Machine learning for digital pathology
Cui et al. A deep learning algorithm for one-step contour aware nuclei segmentation of histopathology images
US11449985B2 (en) Computer vision for cancerous tissue recognition
Kumar et al. Convolutional neural networks for prostate cancer recurrence prediction
Song et al. Low dimensional representation of fisher vectors for microscopy image classification
Bai et al. NHL Pathological Image Classification Based on Hierarchical Local Information and GoogLeNet‐Based Representations
Yin et al. Histopathological distinction of non-invasive and invasive bladder cancers using machine learning approaches
Popovici et al. Joint analysis of histopathology image features and gene expression in breast cancer
US11544851B2 (en) Systems and methods for mesothelioma feature detection and enhanced prognosis or response to treatment
JP2023543044A (en) Method of processing images of tissue and system for processing images of tissue
Khan et al. Gene transformer: Transformers for the gene expression-based classification of lung cancer subtypes
US20230070874A1 (en) Learning representations of nuclei in histopathology images with contrastive loss
US20240054639A1 (en) Quantification of conditions on biomedical images across staining modalities using a multi-task deep learning framework
Prezja et al. Improved accuracy in colorectal cancer tissue decomposition through refinement of established deep learning solutions
Singh et al. STRAMPN: Histopathological image dataset for ovarian cancer detection incorporating AI-based methods
Lin et al. SGCL: Spatial guided contrastive learning on whole-slide pathological images
Zamanitajeddin et al. Social network analysis of cell networks improves deep learning for prediction of molecular pathways and key mutations in colorectal cancer
Zhang et al. Multi‐feature fusion of deep networks for mitosis segmentation in histological images
Thapa et al. Deep learning for breast cancer classification: Enhanced tangent function
WO2021041342A1 (en) Semantic image retrieval for whole slide images
Nimitha et al. An improved deep convolutional neural network architecture for chromosome abnormality detection using hybrid optimization model
Song et al. Visual feature representation in microscopy image classification
Graziani et al. Attention-based interpretable regression of gene expression in histology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19834643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19834643

Country of ref document: EP

Kind code of ref document: A1