WO2019089393A1 - Temozolomide response predictor and methods - Google Patents

Temozolomide response predictor and methods Download PDF

Info

Publication number
WO2019089393A1
WO2019089393A1 PCT/US2018/057843 US2018057843W WO2019089393A1 WO 2019089393 A1 WO2019089393 A1 WO 2019089393A1 US 2018057843 W US2018057843 W US 2018057843W WO 2019089393 A1 WO2019089393 A1 WO 2019089393A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
mgmt
methylation
protein
tumor
Prior art date
Application number
PCT/US2018/057843
Other languages
French (fr)
Inventor
Christopher W. SZETO
Saihitha VEERAPANENI
Steven BENZ
Original Assignee
Nantomics, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantomics, Llc filed Critical Nantomics, Llc
Priority to AU2018362347A priority Critical patent/AU2018362347A1/en
Priority to CA3080342A priority patent/CA3080342A1/en
Priority to JP2020524174A priority patent/JP2021501422A/en
Priority to CN201880081292.2A priority patent/CN111492435A/en
Priority to KR1020207015366A priority patent/KR20200079524A/en
Publication of WO2019089393A1 publication Critical patent/WO2019089393A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Definitions

  • the field of the invention is systems and methods of predicting drug response of a patient to temozolomide, and especially where the patient is diagnosed with cancer.
  • Temozolomide is a chemotherapeutic agent that is used as standard treatment for glioblastoma and melanoma, and has recently shown limited but encouraging activity in patients with metastatic colorectal cancer (mCRC).
  • TMZ is an agent with alkylating/methylating activity at the N-7 or 0-6 positions of guanine residues in DNA, triggering often cell death in sensitive cells.
  • various DNA damage repair enzymes, and especially the O-6-methylguanine- DNA methyltransferase (MGMT) may counteract the effect of temozolomide in at least some of the tumor cells.
  • MGMT is considered a resistance marker for TMZ.
  • PCR digital polymerase chain reaction
  • MB methyl-BEAMing
  • MS mass spectrometry
  • proteomic analysis can objectively quantify the MGMT protein and other actionable protein biomarkers in formalin fixed, paraffin- embedded (FFPE) tissue sections.
  • MGMT protein cutoff of 200 amol/ug is predictive of benefit in mCRC patients treated with TMZ.
  • MGMT protein quantity may also correlate with MGMT methylation status.
  • Quantitative proteomics objectively measured MGMT protein in FFPE tumor samples and retrospectively identified 9 of 9 responders to TMZ.
  • Digital PCR methylation assay retrospectively identified 7 of 8 responders to TMZ. The investigators therefore concluded that quantitative proteomic analysis of MGMT could potentially be used to select mCRC patients for TMZ therapy.
  • the inventive subject matter is directed to various devices, systems, and methods for treatment response prediction for temozolomide in the treatment of a solid tumor of a patient.
  • the inventors contemplate a method of predicting treatment response to temozolomide in a patient that includes a step of providing RNAseq information, protein quantitative information, and methylation information from a tumor of the patient, and another step of calculating, by a response prediction model, a response prediction to temozolomide, wherein the response prediction model uses the RNAseq information, protein quantitative information, and methylation information.
  • the response prediction model uses a K-nearest-neighbors approach, and the RNAseq information, protein quantitative information, and methylation information are sub- grouped.
  • the RNAseq information may be sub-grouped using a log2(TPM+l) cutoff value of 3.5
  • the protein quantitative information may be sub-grouped using a cutoff value of 200 amol/mL
  • the methylation information is sub-grouped using a cutoff value of 60% promoter CpG methylation.
  • the response prediction model has a prediction accuracy of at least 80% or of at least 85%.
  • RNAseq information protein quantitative information
  • methylation information are provided from a FFPE sample or a fresh tumor sample, and that the tumor is a solid tumor (e.g., metastatic colon cancer, glioblastoma, or melanoma).
  • a solid tumor e.g., metastatic colon cancer, glioblastoma, or melanoma.
  • Another aspect of the inventive subject matter includes a method of treating a patient having a tumor.
  • This method includes a step of providing RNAseq information, protein quantitative information, and methylation information from a tumor (e.g., a solid tumor) of the patient, and a step of calculating, by a response prediction model, a response probability to temozolomide, wherein the response prediction model uses the RNAseq information, protein quantitative information, and methylation information. Then, the method continues with a step of administering temozolomide to a patient having the temozolomide response probability of >0.5.
  • the response prediction model uses a K-nearest-neighbors approach, and/or the response prediction model has a prediction accuracy of at least 85%.
  • the RNAseq information, protein quantitative information, and methylation information are sub-grouped.
  • the RNAseq information is sub-grouped using a log2(TPM+l) cutoff value of 3.5, and/or the protein quantitative information is sub- grouped using a cutoff value of 200 amol/mL, and/or the methylation information is sub-grouped using a cutoff value of 60% promoter CpG methylation.
  • the RNAseq information, protein quantitative information, and methylation information are provided from a FFPE sample or a fresh tumor sample.
  • the tumor is metastatic colon cancer, glioblastoma, or melanoma.
  • Figure 1 depicts the samples and assays used in the experimental studies.
  • Figure 3 depicts a graph of percent change in tumor volume (from baseline) among TMZ-treated patients.
  • Figure 4A and 4B depict graphs of progression free survival (PFS, 4A) and overall survival (OS, 4B) of TMZ-treated patients with metastatic colorectal cancer, by MGMT protein expression level.
  • Figure 5A and 5B depict graphs of PFS (5A) and OS (5B) of TMZ-treated patients with metastatic colorectal cancer, stratified by MGMT methylation status.
  • Figure 6A and 6B depict graphs of PFS (6A) and OS (6B) of TMZ-treated patients with metastatic colorectal cancer, by RNA-seq analysis.
  • Figure 7A and 7B depict graphs of TMZ-treated patients with metastatic colorectal cancer, by MGMT protein expression level.
  • Figures 8A and 8B are graphs schematically illustrating cut-off values for RNAseq data.
  • Figures 9A and 9B are graphs schematically illustrating agreement between RNAseq and proteomic values.
  • Figure 9C depicts a bar graph of optimal threshold in MGMT RNA seq.
  • Figure 9D depicts bar graphs of agreement between MGMT protein quantity and MGMT methylation.
  • Figure 10A and 10B is a graph depicting PFS (progression free survival) and OS (overall survival) versus MGMT protein level sub-groups.
  • Figure 11 is a graph depicting PFS (progression free survival) versus MGMT RNAseq level sub-groups.
  • Figure 12 is a graph depicting PFS (progression free survival) versus one MGMT subgroup combination.
  • Figure 13A and 13B is a graph depicting PFS (progression free survival) and OS (overall survival) versus another MGMT sub-group combination.
  • Figure 14 is a graph depicting temozolomide response prediction accuracies based on various input variables for various classifiers.
  • Figure 15 depicts a graph of average accuracy of predictive models per leave-pair-out cross-validation in Example 2.
  • Figure 16 depicts a graph of average predictive accuracy in unseen samples for 58 predictive modeling strategies, by MGMT assessment method group. Groups are ordered left-to- right by average accuracy in Example 2.
  • Figure 17 depicts a schematic diagram of machine learning of drug response prediction from various types of data.
  • Figure 18 depicts a flowchart of regression and classification pipeline of the predictive model.
  • Figure 19 depicts a graph of relationships between all expression and MGMT protein determined by various regression models.
  • Figure 20 depicts a graph of relationships between MGMT protein and MGMT gene determined by various regression models.
  • Figure 21 depicts a graph of accuracy values using methylation values.
  • Figure 22 depicts another graph of accuracy values using methylation values.
  • Figure 23 depicts a heatmap of training and test data sets of predictive model. Detailed Description
  • RNAseq information that is based on a combination of (preferably sub-grouped) RNAseq information, protein quantitative information, and promoter methylation information of a tumor.
  • the model is based on a K- nearest-neighbors approach.
  • RNAseq information particularly as measured in TPM
  • protein quantitative information particularly as measured by mass spectroscopy
  • methylation information particularly as measured in the MGMT promoter region from a tumor of the patient.
  • the inventor contemplates a method of predicting the treatment response to temozolomide in a patient having a tumor that includes a step of providing RNAseq information, protein quantitative information, and methylation information from a tumor of the patient.
  • the response prediction to temozolomide is then established using a response prediction model that takes into account the RNAseq information, protein quantitative information, and methylation information.
  • the type of information will at least in some degree determine the nature of the sample.
  • tumor refers to, and is interchangeably used with one or more cancer cells, cancer tissues, malignant tumor cells, or malignant tumor tissue, that can be placed or found in one or more anatomical locations in a human body.
  • patient includes both individuals that are diagnosed with a condition (e.g., cancer) as well as individuals undergoing examination and/or testing for the purpose of detecting or identifying a condition.
  • a patient having a tumor refers to both individuals that are diagnosed with a cancer as well as individuals that are suspected to have a cancer.
  • the term “provide” or “providing” refers to and includes any acts of manufacturing, generating, placing, enabling to use, transferring, or making ready to use.
  • a tumor sample will be used to obtain all relevant information. Any suitable methods of obtaining a tumor sample (tumor cells or tumor tissue) from the patient (or healthy tissue from a patient or a healthy individual as a comparison) are contemplated. Most typically, a tumor sample can be obtained from the patient via a biopsy (including liquid biopsy, or obtained via tissue excision during a surgery or an independent biopsy procedure, etc.), which can be fresh or processed (e.g., frozen, formalin- fixed paraffin-embedded (FFPE) samples etc.) until further process for obtaining omics data from the tissue. For example, the tumor cells or tumor tissue may be fresh or frozen.
  • FFPE formalin- fixed paraffin-embedded
  • the tumor cells or tumor tissues may be in a form of cell/tissue extracts.
  • the tumor samples may be obtained from a single or multiple different tissues or anatomical regions.
  • a metastatic breast cancer tissue can be obtained from the patient' s breast as well as other organs (e.g., liver, brain, lymph node, blood, lung, etc.) for metastasized breast cancer tissues.
  • a healthy tissue of the patient or matched normal tissue e.g., patient's non-cancerous breast tissue
  • a healthy tissue from a healthy individual can be also obtained via a similar manner as a comparison.
  • tumor samples can be obtained from the patient in multiple time points in order to determine any changes in the tumor samples over a relevant time period.
  • tumor samples or suspected tumor samples
  • tumor samples or suspected tumor samples
  • the tumor samples (or suspected tumor samples) may be obtained during the progress of the tumor upon identifying a new metastasized tissues or cells.
  • DNA e.g., genomic DNA
  • RNA e.g., mRNA, miRNA, siRNA, shRNA, etc.
  • proteins e.g., membrane protein, cytosolic protein, nucleic protein, etc.
  • a step of obtaining omics data may include receiving omics data from a database that stores omics information of one or more patients and/or healthy individuals.
  • omics data of the patient' s tumor may be obtained from isolated DNA, RNA, and/or proteins from the patient' s tumor tissue, and the obtained omics data may be stored in a database (e.g., cloud database, a server, etc.) with other omics data set of other patients having the same type of tumor or different types of tumor.
  • Omics data obtained from the healthy individual or the matched normal tissue (or healthy tissue) of the patient can be also stored in the database such that the relevant data set can be retrieved from the database upon analysis.
  • protein data may also include protein activity, especially where the protein has enzymatic activity (e.g., polymerase, kinase, hydrolase, lyase, ligase, oxidoreductase, etc.).
  • enzymatic activity e.g., polymerase, kinase, hydrolase, lyase, ligase, oxidoreductase, etc.
  • genomics data includes but is not limited to information related to genomics, proteomics, and transcriptomics, as well as specific gene expression or transcript analysis, and other characteristics and biological functions of a cell.
  • suitable genomics data includes DNA sequence analysis information that can be obtained by whole genome sequencing and/or exome sequencing (typically at a coverage depth of at least lOx, more typically at least 20x) of both tumor and matched normal sample.
  • DNA data may also be provided from an already established sequence record (e.g., SAM, BAM, FASTA, FASTQ, or VCF file) from a prior sequence determination.
  • data sets may include unprocessed or processed data sets, and exemplary data sets include those having BAM format, SAM format, FASTQ format, or FASTA format.
  • BAM format or as BAMBAM diff objects (e.g., US2012/0059670A1 and US2012/0066001A1).
  • Omics data can be derived from whole genome sequencing, exome sequencing, transcriptome sequencing (e.g., RNA-seq), or from gene specific analyses (e.g., PCR, qPCR, hybridization, LCR, etc.).
  • computational analysis of the sequence data may be performed in numerous manners.
  • analysis is performed in silico by location-guided synchronous alignment of tumor and normal samples as, for example, disclosed in US 2012/0059670A1 and US 2012/0066001 Al using BAM files and BAM servers.
  • Such analysis advantageously reduces false positive neoepitopes and significantly reduces demands on memory and computational resources.
  • the relevant information is directly obtained from the tumor
  • one or more of the data may also be obtained from a database.
  • the relevant information may be provided from a database or sequencing center as best suitable. Proteomics analysis may be performed from an FFPE sample using laser
  • microdissection and mass spectroscopic analysis can be performed using such samples.
  • the source of information need not necessarily be derived from a single source, but may be assembled from various sources.
  • contemplated analyses may employ data from different points in time, for example, pre-surgery and pre-administration of temozolomide, or post-surgery and pre-administration of temozolomide, etc.
  • suitable genomic information includes whole genome sequencing or exome sequencing that may, for example, identify MGMT gene mutations, duplications, or deletions, and RNA sequence information and particularly RNAseq information of MGMT to provide quantitative information of transcription (and splice variants or other mutations where present).
  • quantitative information may also be obtained by hybridization and/or other PCR based methods.
  • protein information is preferably obtained using mass spectroscopic methods, including selected reaction monitoring methods, antibody-based information, and/or staining methods.
  • the DNA-damaging alkylating agent temozolomide (TMZ) is approved in the treatment of glioblastoma, melanoma and lymphoma.
  • the MGMT enzyme is involved in repairing damage from alkylating agents.
  • MGMT epigenetic silencing is associated with TMZ resistance in melanoma studies, and occurs in about one third of colorectal cancers (CRCs).
  • CRCs colorectal cancers
  • suitable types of datasets include DNA copy number data, DNA mutation data, RNA spice variant data, RNA expression level data, promoter methylation data, epigenetic modification data, protein data, and protein activity data. Most typically, such data are readily available and/or can be inferred from various pathway models (e.g., PARADIGM). It is also contemplated that where more than one type of dataset is used, at least three different types of datasets will be employed.
  • cutoff values may be predetermined, or independently learned using further machine learning.
  • RNAseq information may be sub-grouped by a TPM (transcript per million) threshold
  • protein quantitative information may be sub-grouped by detection threshold or specific value such as 200 amol
  • methylation information may be sub-grouped by a threshold value as determined by methyl-BEAMing (e.g., 60% methylated MGMT promoter sequence).
  • RNAseq information protein quantitative information
  • methylation information methylation information
  • one or more threshold values can be used to train a prediction model and validate the accuracy of the prediction model.
  • the response prediction model it should be noted that there are numerous manners of building models known in the art, and contemplated models may use one or more of the RNAseq information, protein quantitative information, and methylation information, grouped or ungrouped, and in any combination thereof. However, it is preferred that the model will use sub-grouped RNAseq information, protein quantitative information, and methylation information as further described in more detail below.
  • classifiers include extra tree classifier, KNN classifier, RBF or linear support vector classifier, Decision Tree classifier, Naive Bayes classifier, Quad
  • Discriminant classifier Discriminant classifier, Ridge classifier, Gaussian Process classifier, Random Forest classifier, and AdaBoost classifiers using either Random Forest or Decision Tree base-estimators.
  • various univariate classification algorithms for the prediction task known in the art, and an example is finding the optimal classifying threshold using Youden analysis.
  • such algorithms will provide different accuracy metrics, and it is generally preferred the classifier with the highest accuracy (or accuracy gain) will be used for generation of the response prediction model.
  • contemplated methods allowed a prediction accuracy of at least 70%, at least 80%, at least 85% when validated against in unseen cancer patients (e.g., mCRC patients, etc.)., depending on the type of classifier used. Most preferably, where the K-nearest-neighbor classifier was used, accuracies of about 86% were achieved.
  • Archived FFPE tissue sections were obtained from 41 patients with metastatic colorectal cancer who had received TMZ in one of 3 Phase II clinical trials from the FELDSPAR cohort.
  • Table 1 tumor samples from 41 TMZ-treated patients were available for analysis. These patients had a median age of 69 years and had received a median of 3 chemotherapeutic regimens prior to TMZ. The majority of patients had an ECOG status of 0 or 1 (85%); and at least 2 metastatic sites (56%), with liver as the most frequent site. As expected in mCRC, all patients eventually progressed on TMZ. ORR was as follows: 26 patients (63%) had progressive disease; 9 (22%) had partial response; 6 (15%) had stable disease. As shown in Figure 1, of these 41 samples, 39 successfully passed quality control standards for RNAseq sequencing, and 35 successfully passed quality control standards for MethylBEAMing (digital MB). The following is a short analysis of this selection of samples.
  • RECIST Response Evaluation Criteria in Solid Tumors
  • ECOG Eastern Cooperative Oncology Group
  • PR partial response
  • SD stable disease
  • PD progressive disease.
  • RNAseq RNAseq cutoff of 3.5 log2(TPM+l) as established in TGCA COAD/READ data was used to define the subgroups as is shown in Figure 11.
  • This provided a log-rank test between RNAseq classes of p ⁇ 0.1731, and Cox proportional hazards are shown in Table 7.
  • the RNAseq classes were not as prognostic as the proteomic subgroups, and did not achieve significance with this cohort size.
  • OS as the survival metric did not achieve significance coef exp(coef) se(coef) z p lower 0.95 upper 0.95
  • MGMT high Methylation low and either RNA high or Protein high
  • MGMT low Methylation high or either of RNA low or Protein low.
  • Figure 13A shows exemplary results of such analysis.
  • Temozolomide response prediction Example I The inventor evaluated multiple methods for building a predictive model of temozolomide response based on MGMT -omics values. More specifically, the inventor built predictive models of temozolomide response using each of the MGMT assays: RNAseq expression TPMs, protein amol/mL, and methylation percentage, and combinations and sub-combinations thereof. Further models were built using both the raw continuous values for each of these features as well as their sub-grouped values (3.5
  • LOCV leave-pair-out cross-validation
  • the highest-performing modeling strategy uses a K-nearest-neighbors approach utilizing all three features (RNA, protein, and methylation) in their sub-grouped transformations.
  • This approach makes Temozolomide response predictions on novel samples as follows: 1. Define MGMT mRNA expression status, protein level, and promoter methylation status, using the predefined cut-offs described above, 2. calculate the pairwise Minkowski distance between each of the training instances and novel samples to be predicted using all three MGMT-related features (i.e. brute tree), 3. for each novel sample, identify the five closest matches, and 4. assign the novel sample the response class of the majority of the closest training samples.
  • a final model is proposed in this application that uses all available samples for training, with the strong belief that predictive performance in novel samples will be similar to those in the cross-validated setting. Due to being trained on three binary features, the final model describes the probability of temozolomide sensitivity in 8 distinct states (Table 11). Novel samples may be subgrouped using the same cutoffs as described above and assigned to one of these 8 states. A sensitivity prediction probability of >0.5 suggests that state will be sensitive to temozolomide with -87% accuracy. Conversely, a temozolomide response probability of ⁇ 0.5 is associated with resistance to temozolomide. atpfessSon states Fwrtefe status fttethyisSgn ste us PisertsMveJ
  • Temozolomide response prediction Example II The inventors sought to train a robust predictive model of TMZ response based on 3 separate quantitative MGMT assays (promoter quantitative methylation, RNA expression, and protein abundance) and validate its accuracy in unseen mCRC patients. Viewed form a different perspective, rather than identifying a single type of predictor, the inventors set out to identify multiple predictors in a machine learning setting to integrate various variables and to so arrive at a prediction model with high sensitivity and accuracy.
  • MGMT assays promoter quantitative methylation, RNA expression, and protein abundance
  • TMZ safety trials (INT Study n.20/13; INT Study 20/13 & EudraCT 2012-002766-13) were used to train models. Response to TMZ was defined by RECIST v.1.1 criteria. MGMT status was assessed by 3 methods: digital PCR/methyl-BEAMing (MB), RNAseq, and liquid chromatography mass-spec. Several multivariate modeling strategies (kNN, SVM, decision trees, etc.) were evaluated using cross- validation (CV) within the training set.
  • CV cross- validation
  • TMZ response in refractory mCRC is approximately predictable. Combining predicted methylation, transcript levels, and protein abundance, yields the most accurate and robust method of predicting response (82% - 87% accurate).
  • the inventors investigated the training cohort prediction performance for MGMT protein (as measured by LC-MS), MGMT expression (as measured by TPM), and MGMT promotor methylation (as measured by digital PCR/methyl-BEAMing (MB)). More specifically, to evaluate the ability of the predefined cutoffs to predict response to TMZ, we used the leave pair out cross validation strategy. Predefined and exploratory cutoffs were assessed in unseen samples 330, 308, and 250 times in LC-MS, RNAseq, and MB data respectively.
  • TMZ studies were used as training data to build 10 candidate models (+3 predefined cutoffs) and replaced measured methylation with 'predicted methylation' based on whole RNAseq and using a regression model. Performance was then tested in an unseen testing cohort (TEMIRI) as is exemplarily shown in Figure 17.
  • the training dataset was a TMZ cohort and included 41 mCRC patients treated with TMZ from 3 phase II studies. Continuous MGMT protein levels by mass spec were available for all of the patients, as well as RNA expression data by RNA seq and continuous MGMT methylation percentage data. Drug response was noted as binary drug response data.
  • the testing dataset comprised 32 mCRC patients treated with TMZ + irinotecan. Binary drug response data were missing for 3 patients, gene expression values were available for 14 patients, and MGMT protein expression data were available for 21 patients. See Table 13.
  • Figure 18 shows a regression and classification pipeline for building the regression model.
  • the RMSE square root of the variance of the residuals, indicating he absolute fit of the model to the data-how close the observed data points are to
  • Figure 19 shows a mean accuracy value of various regressor models when all expression (expression levels of all RNA) and MGMT protein expression level as data sets, which is also summarized in Table 15.
  • Figure 20 shows a mean RMSE value of various regressor models when MGMT gene expression and MGMT protein expression level as data sets, which is also summarized in Table 16.
  • Figures 21 and 22 show mean accuracy values of various regressor models when the predicated methylation values were used as a data set, which is also summarized in Table 17 and Table 18, respectively.
  • Figure 23 depicts a heat map with exemplary results for the response predictions on the 1,000 most variable genes across 44 samples using preset thresholds as noted, and Table 14 is a listing of exemplary classification algorithms used on selected datasets and combination of datasets.
  • Table 14 is a listing of exemplary classification algorithms used on selected datasets and combination of datasets.
  • use of MGMT RNAseq, MGMT protein, and MGMT promotor methylation provided superb training and testing accuracy for response prediction for temozolomide.
  • sensitivity, specificity, and Fl score were all substantially increased over other classifiers and individual datasets.
  • Sample level predictions for the best model of Table 19 are listed in Table 20, indicating that models that simultaneously consider protein, MGMT methylation and mRNA performed better when compared to other models.
  • TMZ-treated mCRC DNA mismatch repair in TMZ-treated mCRC is impaired where TMZ responders switched from microsatellite stable to microsatellite instable (MSI), thus rendering them eligible for therapy with immune checkpoint inhibitors.
  • MSI microsatellite instable
  • TMZ microsatellite instable
  • a patient can be administered with immune therapy (e.g., checkpoint inhibitor, a cancer vaccine, etc.) where the response prediction model predicts that the patient is no longer responsive to TMZ or has substantially reduced responsiveness to TMZ (e.g., reduced at least 30%, at lesat 50%, at least 70% compared to pre-treatment of TMZ, or compared to other individual who has similar prognosis of cancer, etc.).
  • immune therapy e.g., checkpoint inhibitor, a cancer vaccine, etc.
  • the response prediction model predicts that the patient is no longer responsive to TMZ or has substantially reduced responsiveness to TMZ (e.g., reduced at least 30%, at lesat 50%, at least 70% compared to pre-treatment of TMZ, or compared to other individual who has similar prognosis of cancer, etc.).
  • administering a drug or a cancer treatment refers to both direct and indirect administration of the drug or the cancer treatment.
  • Direct administration of the drug or the cancer treatment is typically performed by a health care professional (e.g., physician, nurse, etc.), and wherein indirect administration includes a step of providing or making available the drug or the cancer treatment to the health care professional for direct administration (e.g., via injection, oral consumption, topical application, etc.).
  • a health care professional e.g., physician, nurse, etc.
  • indirect administration includes a step of providing or making available the drug or the cancer treatment to the health care professional for direct administration (e.g., via injection, oral consumption, topical application, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Contemplated systems and methods use a response prediction model for temozolomide that is based on RNAseq information, protein quantitative information, and methylation information and has a prediction accuracy of at least 85%.

Description

TEMOZOLOMIDE RESPONSE PREDICTOR AND METHODS
[0001] This application claims priority to our co-pending US provisional applications with the serial number 62/579,127, filed October 30, 2017, with the serial number 62/727,245, filed September 5, 2018, both of which are incorporated by their entireties herein.
Field of the Invention
[0002] The field of the invention is systems and methods of predicting drug response of a patient to temozolomide, and especially where the patient is diagnosed with cancer.
Background
[0003] The background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
[0004] All publications and patent applications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
[0005] Temozolomide (TMZ) is a chemotherapeutic agent that is used as standard treatment for glioblastoma and melanoma, and has recently shown limited but encouraging activity in patients with metastatic colorectal cancer (mCRC). TMZ is an agent with alkylating/methylating activity at the N-7 or 0-6 positions of guanine residues in DNA, triggering often cell death in sensitive cells. However, various DNA damage repair enzymes, and especially the O-6-methylguanine- DNA methyltransferase (MGMT) may counteract the effect of temozolomide in at least some of the tumor cells. More recently, epigenetic silencing of the MGMT gene was reported, and tumor cells with epigenetic silencing of the MGMT gene were found to be more sensitive to killing by TMZ. [0006] Consequently, MGMT is considered a resistance marker for TMZ. Commonly, MGMT expression status of tumors can be assessed by a digital polymerase chain reaction (PCR) method known as methyl-BEAMing (MB), and a cutoff of >60% MGMT methylation predicts benefit from TMZ. In another approach, mass spectrometry (MS) proteomic analysis can objectively quantify the MGMT protein and other actionable protein biomarkers in formalin fixed, paraffin- embedded (FFPE) tissue sections. Here, a MGMT protein cutoff of 200 amol/ug (predefined based on the assay's limit of detection) is predictive of benefit in mCRC patients treated with TMZ. MGMT protein quantity may also correlate with MGMT methylation status.
[0007] Among TMZ-treated patients with mCRC, those whose tumors expressed low or undetectable levels of MGMT protein had a longer mPFS than their counterparts with higher MGMT protein levels. As was presented in Abstract # 11601 at the ASCO Annual Meeting, Chicago, Illinois (June 2-6, 2017), a correlation of 80% was observed between MGMT protein expression as quantified by mass spectrometry and MGMT methylation status by MB.
Quantitative proteomics objectively measured MGMT protein in FFPE tumor samples and retrospectively identified 9 of 9 responders to TMZ. Digital PCR methylation assay (methyl- BEAMing) retrospectively identified 7 of 8 responders to TMZ. The investigators therefore concluded that quantitative proteomic analysis of MGMT could potentially be used to select mCRC patients for TMZ therapy.
[0008] However, such approach considered proteomic and methylation analysis only in a retrospective manner. Moreover, detection limits of mass spectroscopic analysis and possible shortfalls of methylation detection further potentially reduce accuracy of the responder analysis. Indeed, it was observed that there was only an about 80% agreement in response association with MGMT between the mass spectroscopic analysis and methylation analysis. Moreover, the authors did not present any analytic option that would support or hint at a response prediction with clinically useful accuracy of prediction.
[0009] Therefore, even though various systems and methods for prediction of specific drug response are known in the art, there remains a need for systems and methods that allow for simple and robust treatment prediction for a drug with high confidence, and that also allow prediction of the treatment response in a patient specific manner. Summary of The Invention
[0010] The inventive subject matter is directed to various devices, systems, and methods for treatment response prediction for temozolomide in the treatment of a solid tumor of a patient. In one aspect of the inventive subject matter, the inventors contemplate a method of predicting treatment response to temozolomide in a patient that includes a step of providing RNAseq information, protein quantitative information, and methylation information from a tumor of the patient, and another step of calculating, by a response prediction model, a response prediction to temozolomide, wherein the response prediction model uses the RNAseq information, protein quantitative information, and methylation information.
[0011] Most preferably, the response prediction model uses a K-nearest-neighbors approach, and the RNAseq information, protein quantitative information, and methylation information are sub- grouped. For example, the RNAseq information may be sub-grouped using a log2(TPM+l) cutoff value of 3.5, the protein quantitative information may be sub-grouped using a cutoff value of 200 amol/mL, and/or the methylation information is sub-grouped using a cutoff value of 60% promoter CpG methylation. Most typically, the response prediction model has a prediction accuracy of at least 80% or of at least 85%.
[0012] It is further contemplated that the RNAseq information, protein quantitative information, and methylation information are provided from a FFPE sample or a fresh tumor sample, and that the tumor is a solid tumor (e.g., metastatic colon cancer, glioblastoma, or melanoma).
[0013] Another aspect of the inventive subject matter includes a method of treating a patient having a tumor. This method includes a step of providing RNAseq information, protein quantitative information, and methylation information from a tumor (e.g., a solid tumor) of the patient, and a step of calculating, by a response prediction model, a response probability to temozolomide, wherein the response prediction model uses the RNAseq information, protein quantitative information, and methylation information. Then, the method continues with a step of administering temozolomide to a patient having the temozolomide response probability of >0.5. Preferably, the response prediction model uses a K-nearest-neighbors approach, and/or the response prediction model has a prediction accuracy of at least 85%. [0014] In some embodiments, the RNAseq information, protein quantitative information, and methylation information are sub-grouped. Preferably, the RNAseq information is sub-grouped using a log2(TPM+l) cutoff value of 3.5, and/or the protein quantitative information is sub- grouped using a cutoff value of 200 amol/mL, and/or the methylation information is sub-grouped using a cutoff value of 60% promoter CpG methylation.
[0015] In some embodiments, the RNAseq information, protein quantitative information, and methylation information are provided from a FFPE sample or a fresh tumor sample. In some embodiments, the tumor is metastatic colon cancer, glioblastoma, or melanoma.
[0016] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures.
Brief Description of the Drawing
[0017] Figure 1 depicts the samples and assays used in the experimental studies.
[0018] Figure 2A and 2B depict graphs of percent change in tumor volume (from baseline) among TMZ-treated patients (n=41) by (7 A) MGMT protein status and (7B) MGMT promoter hypermethylation status.
[0019] Figure 3 depicts a graph of percent change in tumor volume (from baseline) among TMZ-treated patients.
[0020] Figure 4A and 4B depict graphs of progression free survival (PFS, 4A) and overall survival (OS, 4B) of TMZ-treated patients with metastatic colorectal cancer, by MGMT protein expression level.
[0021] Figure 5A and 5B depict graphs of PFS (5A) and OS (5B) of TMZ-treated patients with metastatic colorectal cancer, stratified by MGMT methylation status.
[0022] Figure 6A and 6B depict graphs of PFS (6A) and OS (6B) of TMZ-treated patients with metastatic colorectal cancer, by RNA-seq analysis. [0023] Figure 7A and 7B depict graphs of TMZ-treated patients with metastatic colorectal cancer, by MGMT protein expression level.
[0024] Figures 8A and 8B are graphs schematically illustrating cut-off values for RNAseq data.
[0025] Figures 9A and 9B are graphs schematically illustrating agreement between RNAseq and proteomic values.
[0026] Figure 9C depicts a bar graph of optimal threshold in MGMT RNA seq.
[0027] Figure 9D depicts bar graphs of agreement between MGMT protein quantity and MGMT methylation.
[0028] Figure 10A and 10B is a graph depicting PFS (progression free survival) and OS (overall survival) versus MGMT protein level sub-groups.
[0029] Figure 11 is a graph depicting PFS (progression free survival) versus MGMT RNAseq level sub-groups.
[0030] Figure 12 is a graph depicting PFS (progression free survival) versus one MGMT subgroup combination.
[0031] Figure 13A and 13B is a graph depicting PFS (progression free survival) and OS (overall survival) versus another MGMT sub-group combination.
[0032] Figure 14 is a graph depicting temozolomide response prediction accuracies based on various input variables for various classifiers.
[0033] Figure 15 depicts a graph of average accuracy of predictive models per leave-pair-out cross-validation in Example 2.
[0034] Figure 16 depicts a graph of average predictive accuracy in unseen samples for 58 predictive modeling strategies, by MGMT assessment method group. Groups are ordered left-to- right by average accuracy in Example 2. [0035] Figure 17 depicts a schematic diagram of machine learning of drug response prediction from various types of data.
[0036] Figure 18 depicts a flowchart of regression and classification pipeline of the predictive model.
[0037] Figure 19 depicts a graph of relationships between all expression and MGMT protein determined by various regression models.
[0038] Figure 20 depicts a graph of relationships between MGMT protein and MGMT gene determined by various regression models.
[0039] Figure 21 depicts a graph of accuracy values using methylation values. [0040] Figure 22 depicts another graph of accuracy values using methylation values. [0041] Figure 23 depicts a heatmap of training and test data sets of predictive model. Detailed Description
[0042] The inventor has now discovered that clinically useful temozolomide prediction models can be established with unexpectedly high accuracy using a model that is based on a combination of (preferably sub-grouped) RNAseq information, protein quantitative information, and promoter methylation information of a tumor. In particularly preferred aspects, the model is based on a K- nearest-neighbors approach.
[0043] While post hoc association between MGMT protein levels or promoter methylation and temozolomide response has generally been known, it should be appreciated that such association is not necessarily predictive, let alone predictive with a high degree of accuracy. The inventor has now observed that a highly accurate and predictive model can be established in which patient treatment response is predicted on a combination of quantified and preferably sub-grouped parameters: (1) RNAseq information, particularly as measured in TPM, (2) protein quantitative information, particularly as measured by mass spectroscopy, and (3) methylation information, particularly as measured in the MGMT promoter region from a tumor of the patient. [0044] Therefore, the inventor contemplates a method of predicting the treatment response to temozolomide in a patient having a tumor that includes a step of providing RNAseq information, protein quantitative information, and methylation information from a tumor of the patient. The response prediction to temozolomide is then established using a response prediction model that takes into account the RNAseq information, protein quantitative information, and methylation information. As will be readily appreciated, the type of information will at least in some degree determine the nature of the sample.
[0045] As used herein, the term "tumor" refers to, and is interchangeably used with one or more cancer cells, cancer tissues, malignant tumor cells, or malignant tumor tissue, that can be placed or found in one or more anatomical locations in a human body. It should be noted that the term "patient" as used herein includes both individuals that are diagnosed with a condition (e.g., cancer) as well as individuals undergoing examination and/or testing for the purpose of detecting or identifying a condition. Thus, a patient having a tumor refers to both individuals that are diagnosed with a cancer as well as individuals that are suspected to have a cancer. As used herein, the term "provide" or "providing" refers to and includes any acts of manufacturing, generating, placing, enabling to use, transferring, or making ready to use.
[0046] Thus, in most aspects of the inventive subject matter, a tumor sample will be used to obtain all relevant information. Any suitable methods of obtaining a tumor sample (tumor cells or tumor tissue) from the patient (or healthy tissue from a patient or a healthy individual as a comparison) are contemplated. Most typically, a tumor sample can be obtained from the patient via a biopsy (including liquid biopsy, or obtained via tissue excision during a surgery or an independent biopsy procedure, etc.), which can be fresh or processed (e.g., frozen, formalin- fixed paraffin-embedded (FFPE) samples etc.) until further process for obtaining omics data from the tissue. For example, the tumor cells or tumor tissue may be fresh or frozen. For other example, the tumor cells or tumor tissues may be in a form of cell/tissue extracts. In some embodiments, the tumor samples may be obtained from a single or multiple different tissues or anatomical regions. For example, a metastatic breast cancer tissue can be obtained from the patient' s breast as well as other organs (e.g., liver, brain, lymph node, blood, lung, etc.) for metastasized breast cancer tissues. Preferably, a healthy tissue of the patient or matched normal tissue (e.g., patient's non-cancerous breast tissue) can be obtained or a healthy tissue from a healthy individual (other than the patient) can be also obtained via a similar manner as a comparison.
[0047] In some embodiments, tumor samples can be obtained from the patient in multiple time points in order to determine any changes in the tumor samples over a relevant time period. For example, tumor samples (or suspected tumor samples) may be obtained before and after the samples are determined or diagnosed as cancerous. In another example, tumor samples (or suspected tumor samples) may be obtained before, during, and/or after (e.g., upon completion, etc.) a one time or a series of anti-tumor treatment (e.g., radiotherapy, chemotherapy, immunotherapy, etc.). In still another example, the tumor samples (or suspected tumor samples) may be obtained during the progress of the tumor upon identifying a new metastasized tissues or cells.
[0048] From the obtained tumor cells or tumor tissue, DNA (e.g., genomic DNA,
extrachromosomal DNA, etc.), RNA (e.g., mRNA, miRNA, siRNA, shRNA, etc.), and/or proteins (e.g., membrane protein, cytosolic protein, nucleic protein, etc.) can be isolated and further analyzed to obtain omics data. Alternatively and/or additionally, a step of obtaining omics data may include receiving omics data from a database that stores omics information of one or more patients and/or healthy individuals. For example, omics data of the patient' s tumor may be obtained from isolated DNA, RNA, and/or proteins from the patient' s tumor tissue, and the obtained omics data may be stored in a database (e.g., cloud database, a server, etc.) with other omics data set of other patients having the same type of tumor or different types of tumor. Omics data obtained from the healthy individual or the matched normal tissue (or healthy tissue) of the patient can be also stored in the database such that the relevant data set can be retrieved from the database upon analysis. Likewise, where protein data are obtained, these data may also include protein activity, especially where the protein has enzymatic activity (e.g., polymerase, kinase, hydrolase, lyase, ligase, oxidoreductase, etc.).
[0049] As used herein, omics data includes but is not limited to information related to genomics, proteomics, and transcriptomics, as well as specific gene expression or transcript analysis, and other characteristics and biological functions of a cell. With respect to genomics data, suitable genomics data includes DNA sequence analysis information that can be obtained by whole genome sequencing and/or exome sequencing (typically at a coverage depth of at least lOx, more typically at least 20x) of both tumor and matched normal sample. Alternatively, DNA data may also be provided from an already established sequence record (e.g., SAM, BAM, FASTA, FASTQ, or VCF file) from a prior sequence determination. Therefore, data sets may include unprocessed or processed data sets, and exemplary data sets include those having BAM format, SAM format, FASTQ format, or FASTA format. However, it is especially preferred that the data sets are provided in BAM format or as BAMBAM diff objects (e.g., US2012/0059670A1 and US2012/0066001A1). Omics data can be derived from whole genome sequencing, exome sequencing, transcriptome sequencing (e.g., RNA-seq), or from gene specific analyses (e.g., PCR, qPCR, hybridization, LCR, etc.). Likewise, computational analysis of the sequence data may be performed in numerous manners. In most preferred methods, however, analysis is performed in silico by location-guided synchronous alignment of tumor and normal samples as, for example, disclosed in US 2012/0059670A1 and US 2012/0066001 Al using BAM files and BAM servers. Such analysis advantageously reduces false positive neoepitopes and significantly reduces demands on memory and computational resources.
[0050] Alternatively, or additionally, while it is preferred that the relevant information is directly obtained from the tumor, one or more of the data may also be obtained from a database. For example, where a fresh tumor sample was used to obtained whole genome sequencing and RNA analysis, the relevant information may be provided from a database or sequencing center as best suitable. Proteomics analysis may be performed from an FFPE sample using laser
microdissection and mass spectroscopic analysis can be performed using such samples. Thus, the source of information need not necessarily be derived from a single source, but may be assembled from various sources. Likewise, contemplated analyses may employ data from different points in time, for example, pre-surgery and pre-administration of temozolomide, or post-surgery and pre-administration of temozolomide, etc.
[0051] Thus, preferably, suitable genomic information includes whole genome sequencing or exome sequencing that may, for example, identify MGMT gene mutations, duplications, or deletions, and RNA sequence information and particularly RNAseq information of MGMT to provide quantitative information of transcription (and splice variants or other mutations where present). Alternatively, quantitative information may also be obtained by hybridization and/or other PCR based methods. Likewise, protein information is preferably obtained using mass spectroscopic methods, including selected reaction monitoring methods, antibody-based information, and/or staining methods.
[0052] The DNA-damaging alkylating agent temozolomide (TMZ) is approved in the treatment of glioblastoma, melanoma and lymphoma. The MGMT enzyme is involved in repairing damage from alkylating agents. MGMT epigenetic silencing is associated with TMZ resistance in melanoma studies, and occurs in about one third of colorectal cancers (CRCs). Perplexingly however, previous studies demonstrated that low MGMT protein expression may increase the prediction of TMZ response, more so than MGMT methylation in mCRC patients. The relationship between various MGMT assays and outcomes remains unclear.
[0053] Consequently, the inventors contemplate use of more than one type of dataset for a single molecular entity in the construction of a prediction model. Advantageously, suitable types of datasets include DNA copy number data, DNA mutation data, RNA spice variant data, RNA expression level data, promoter methylation data, epigenetic modification data, protein data, and protein activity data. Most typically, such data are readily available and/or can be inferred from various pathway models (e.g., PARADIGM). It is also contemplated that where more than one type of dataset is used, at least three different types of datasets will be employed. As will be readily appreciated, the choice of classification algorithm will be at least to some degree a function of the type of dataset, and the PHOSITA will be able to determined appropriate classification algorithm(s) for a given dataset. Moreover, and as is also exemplarily noted below, cutoff values may be predetermined, or independently learned using further machine learning.
[0054] It is contemplated that, regardless of the particular method used to obtain the quantitative metric, it is generally preferred that one or more of the RNAseq information, protein quantitative information, and methylation information are sub-grouped using one or more threshold values. For example, and as explained in more detail below, the RNAseq information may be sub- grouped by a TPM (transcript per million) threshold, protein quantitative information may be sub-grouped by detection threshold or specific value such as 200 amol, and methylation information may be sub-grouped by a threshold value as determined by methyl-BEAMing (e.g., 60% methylated MGMT promoter sequence).
[0055] The inventors further contemplate that such one or more of the RNAseq information, protein quantitative information, and methylation information and/or sub-grouped information one or more threshold values can be used to train a prediction model and validate the accuracy of the prediction model. With respect to the response prediction model, it should be noted that there are numerous manners of building models known in the art, and contemplated models may use one or more of the RNAseq information, protein quantitative information, and methylation information, grouped or ungrouped, and in any combination thereof. However, it is preferred that the model will use sub-grouped RNAseq information, protein quantitative information, and methylation information as further described in more detail below.
[0056] Likewise, there are various multivariate classification algorithms for the prediction task known in the art, and exemplary classifiers include extra tree classifier, KNN classifier, RBF or linear support vector classifier, Decision Tree classifier, Naive Bayes classifier, Quad
Discriminant classifier, Ridge classifier, Gaussian Process classifier, Random Forest classifier, and AdaBoost classifiers using either Random Forest or Decision Tree base-estimators. Similarly there are various univariate classification algorithms for the prediction task known in the art, and an example is finding the optimal classifying threshold using Youden analysis. As is depicted below, such algorithms will provide different accuracy metrics, and it is generally preferred the classifier with the highest accuracy (or accuracy gain) will be used for generation of the response prediction model. Notably, and among other relatively high accuracies, contemplated methods allowed a prediction accuracy of at least 70%, at least 80%, at least 85% when validated against in unseen cancer patients (e.g., mCRC patients, etc.)., depending on the type of classifier used. Most preferably, where the K-nearest-neighbor classifier was used, accuracies of about 86% were achieved.
Examples
Archived FFPE tissue sections were obtained from 41 patients with metastatic colorectal cancer who had received TMZ in one of 3 Phase II clinical trials from the FELDSPAR cohort. As shown in Table 1, tumor samples from 41 TMZ-treated patients were available for analysis. These patients had a median age of 69 years and had received a median of 3 chemotherapeutic regimens prior to TMZ. The majority of patients had an ECOG status of 0 or 1 (85%); and at least 2 metastatic sites (56%), with liver as the most frequent site. As expected in mCRC, all patients eventually progressed on TMZ. ORR was as follows: 26 patients (63%) had progressive disease; 9 (22%) had partial response; 6 (15%) had stable disease. As shown in Figure 1, of these 41 samples, 39 successfully passed quality control standards for RNAseq sequencing, and 35 successfully passed quality control standards for MethylBEAMing (digital MB). The following is a short analysis of this selection of samples.
Figure imgf000014_0001
PD 26 (63)
RECIST, Response Evaluation Criteria in Solid Tumors; ECOG, Eastern Cooperative Oncology Group; PR, partial response; SD, stable disease; PD, progressive disease.
Table 1
[0057] All 41 archived samples were evaluable by LC-MS, 35 were analyzed by digital MB and 39 were of sufficient quality for MGMT assessment by RNA-seq (Figure 1, Table 1). Of patients assessed by LC-MS-based proteomics, 18 (44%) tested MGMT protein negative (<200 amol^g of tumor protein) and were therefore considered likely to respond to TMZ. The remainder (n=23) were MGMT protein positive and deemed unlikely to respond to TMZ therapy. In this molecularly-enriched population of clinical trial participants, the rate of MGMT was high at 44%. By comparison, a 16% prevalence of MGMT negativity was found among all of the CRC patient samples submitted to our laboratory for proteomic testing over the course of a year (n=l 14). By MB, MGMT promoter methylation above the 63% cutoff was observed in 12 (34%) patients; the remainder had unmethylated GMJ status (Table 2). In the 35 tumors analyzed by both MB and LC-MS, the agreement rate between methods was 77%; p=0.004.
Figure imgf000015_0001
. List of abbreviations: MGMT, 06-methylguanine-DNA-methyltransferase; TPM, transcripts per million p* two-tailed Fisher's exact test
Table 2
[0058] Ability of MGMT assays to predict response and survival: Quantitative proteomics retrospectively identified 9 of 9 RECIST-defined responders to TMZ; all 9 responders had negative MGMT protein expression by LC-MS. An additional 9 patients with MGMT negative protein did not have RECIST-defined response on TMZ (ORR, MGMT-negative patients: 50%). None of the patients with positive MGMT protein expression responded to TMZ (ORR, MGMT- positive patients: 0%; p=0.0001; Table 2, Figure 2A).
[0059] Among patients analyzed by MB (n=35), GMJhypermethylation status retrospectively identified 6 of the 8 responders to TMZ; an additional 6 patients with GMJ-hypermethylated by MB were non-responders (ORR, G J-hypermethylated patients: 50%). Two patients with negative methylation status responded to TMZ (ORR: 9%; p=0.011; Table 2; Figure 2B).
Figure 3 shows a graph combining the data of methylation status, and MGMT protein expression levels (Percent change in tumor volume (from baseline) by patients with MGMT protein < 200 amol/ug (n = 18; dark blue) and MGMT > 200 amol/ug (n = 23; light blue).
MGMT methylation status by MB (red bars; n = 35) with positive status defined as > 60% (red line).
[0060] In survival analyses, patients with negative MGMT protein expression by LC-MS had longer median PFS (mPFS) than proteomically positive patients (3.7 months vs 1.8 months; HR: 0.504 [95% CI 0.27-0.94]; p = 0.014) (Figure 4A). MGMT levels remained a statistically significant predictor of PFS when paired with 12 potential confounders (BRAF and KRAS mutation status, gender, ECOG, number of previous treatments, LDH baseline level, number of metastatic sites, neutrophil to lymphocyte ratio, peritoneal disease, primary tumor location, site of the archived tissue and age) in a bivariate Cox proportional hazards models.
[0061] Although some of these clinical variates were associated with outcomes (e.g., LDH), MGMT protein expression was the most statistically significant predictor of PFS (Table 3). Differences in OS by MGMT protein expression were similar to PFS differences but did not reach statistical significance (8.7 vs 7.4 months, HR: 0.593 [95% CI: 0.32 to 1.12]; p = 0.078) (Figure 4B). There were no statistically significant differences in PFS or OS among patients stratified by MGMT MB (Figures 5A and 5B).
Figure imgf000017_0001
Table 3
[0062] MGMT by RNA sequencing: Using the experimental cutpoint for mRNA expression fit to the data (< 3.5 log2[TPM+l]), low MGMT mRNA expression was observed in the majority of samples (n=23; 59%) (Table 2). Patient tumor with low GMJRNA expression by RNA-seq had a nonsignificantly higher ORR than higher mRNA expressors (35% vs 6%; p=0.115; Table 2). There were no statistically significant survival differences among patients stratified by MGMTmRNA expression (Figure 6A and Figure 6B). Figure 7A and Figure 7B depict another graphs showing that Among TMZ-treated patients (n = 41), those with MGMT protein levels <200 amol/ug (n = 18) had longer median PFS (mPFS) than patients with higher MGMT protein levels. All patients eventually progress on TMZ. (Figure 7A) Progression was redefined by the RECIST criteria to reflect clinical response: patients with partial response or stable disease for >6 months were defined as responders (n=18) or non-responders (n=23). Results for overall survival were consistent and nearly statistically significant (8.7 vs 7.4 months, HR=0.6, p=0.077) (Figure 7B)
[0063] Correlation between cutoff in RNAseq to MS-proteomic cutoff: In an attempt to identify a corresponding cutoff value in RNAseq for the observed/practical MS-proteomics cutoff value, the inventor obtained the MGMT expression levels for all COAD and READ TCGA samples, and looked for a natural break in the expression pattern that would match with the 200 amol cutoff suggested by prior proteomic analysis. As can be seen from Figures 8A and 8B, the distribution of MGMT TPMs appears bimodal with a natural break around 3.5 log2(TPM+l) (indicated in the vertical line). At the cutoff of 3.5 log2(TPM+l) a good agreement between the 200 amol/mL MS proteomics cutoff and the RNAseq classes was observed as is also evident from Table 4 below. The Fisher's Exact p-value for this level of association between protein and RNA thresholds is p<0.00082.
Figure imgf000018_0001
Table 4
[0064] In a follow-up analysis, the inventor found that the chosen cut-off (3.5 log2[TPM+l]) was optimal for agreement between RNAseq and proteomic values in this cohort, according to Youden analysis as can be taken from Figures 9A and 9B. At this cut-point of 3.5 a TPR of 0.71 and a FPR of 0.11 is obtained in this cohort when compared to proteomic MGMT classes.
Figure 9C shows a bar graph showing the RNAseq values in non-respondent and respondent to TMZ (statistics = -1.04, p value = 0.305). Figure 9D shows a bar graph indicating that the agreement rate between the MGMT proteomic assay and MB was 80%; p=0.0011 Fisher's test (35 tumors analyzed by MB).
[0065] Defining MGMT subgroups associated with outcomes: In a first analysis, the inventor recreated previous work demonstrating a PFS benefit to patients with <200 amol/mL MGMT protein and a typical result is shown in Figure 10A. The log-rank test between proteomic classes had a p<=0.0186, and the Cox proportional hazards results are shown in Table 5. coef exp(coef) se{coef) z p lower 0.95 upper 0.95
GMT clasB 0.40874 1.50492 0.177652 2.300787 0.021404 0.060471 0.757008
Table 5
[0066] It should be noted that while the log-rank test gives p<0.0186 for these two subgroups (as shown in earlier work), the Cox proportional hazards ratio gives p< 0.021. The Cox model is slightly more conservative, and takes imbalance in the arms more into consideration. Both statistics will be discussed in the following results. It should also be noted that this split loses p<0.05 significance when using OS as the survival metric and death as the endpoint as can be seen from Figure 10B. Here, the Log-rank test between Proteomic classes was p<=0.0879, and the Cox proportional hazards were as shown in Table6. coef estpJeeeQ se{eoei} £ ρ tower 0.SS upper 8.S8
MSMT .Gfass 0,31:315? t.36?73? 0.188014 1.S83Si2 0.882276 "RO&1SQ4 &S37819
Table 6
[0067] Next, the inventor investigated the associations as observed using RNAseq subgroups. Here, a RNAseq cutoff of 3.5 log2(TPM+l) as established in TGCA COAD/READ data was used to define the subgroups as is shown in Figure 11. This provided a log-rank test between RNAseq classes of p<=0.1731, and Cox proportional hazards are shown in Table 7. As can be readily seen, the RNAseq classes were not as prognostic as the proteomic subgroups, and did not achieve significance with this cohort size. Similarly, using OS as the survival metric did not achieve significance coef exp(coef) se(coef) z p lower 0.95 upper 0.95
MG Tjclass 0.229918 1.258496 0.168385 1.365431 0.172118 -0.100183 0.560018
Table 7
[0068] Then, the inventor investigated associations as observed using a combination of RNAseq subgroups and proteomic classes. If a sample had high MGMT in either RNAseq or Protein, it was considered MGMT high. A typical result for such analysis is shown in Figure 12. Here, the log-rank test between combination RNA+protein classes was p<=0.0350, and the Cox proportional hazards results are shown in Table 6. Here, it should be noted that while the differential survival was significant and had improved on RNAseq alone, it was not as significant as the proteomic 200 amol split alone. cmt 8xpfcoe sefeoef) 2 tawar £S.9S ti per 0.3S5
MGilffF..0f»ss 0.36:2188 1,436469 0,1:7832 2.MTM8 CS387S 3,018688 a.:?«S688
Table 8
[0069] In a further analysis, the inventor investigated associations as observed using the above combination plus MGMT promoter CpGs methylation subgroups. More specifically, samples with >60% methylation were expected to have inhibited MGMT expression, and the optimal combinations were as follows: MGMT high: Methylation low and either RNA high or Protein high; and MGMT low: Methylation high or either of RNA low or Protein low.
[0070] Figure 13A shows exemplary results of such analysis. Here, the log-rank test between 3- way combination classes was p<=0.0378, and the Cox proportional hazards are shown in Table 9. coef escpieoeJ} seieoeil z s lower 0,95 upper
MQMTjjtess 8.3SS918 1.427® 9,173032 2,Q S3 8.G 072? 0,014842 0.S8W
Table 9
[0071] This separation was not quite as distinct as the RNA+Protein or Protein alone, however it stayed significant in both log-rank and Cox PH tests when OS is the survival metric as is shown in Figure 13B. There the log-rank test between 3-way combination classes was p<=0.0419, and the Cox proportional hazards are shown in Table 10.
Goaf ejtp{coefJ se coef} z p toiwer OJ-S upper 0.95
MGMT..CtitS& CL38 31S 1. 33S2? 0,18324? 5 ί O.0 87SS 8,(30SO?S 0,723551
Table 10
[0072] Temozolomide response prediction Example I: The inventor evaluated multiple methods for building a predictive model of temozolomide response based on MGMT -omics values. More specifically, the inventor built predictive models of temozolomide response using each of the MGMT assays: RNAseq expression TPMs, protein amol/mL, and methylation percentage, and combinations and sub-combinations thereof. Further models were built using both the raw continuous values for each of these features as well as their sub-grouped values (3.5
log2(TPM+l), 200amol/mL, and 60% CpG methylation respectively). In combination this resulted in 10 different 'datasets':
1. Expression alone
2. Protein alone
3. Methylation alone
4. Expression + Protein
5. Expression + Protein + Methylation
6. Expression (subgrouped) + Protein
7. Expression (subgrouped) + Protein (subgrouped)
8. Expression (subgrouped) + Protein(subgrouped) + Methylation (subgrouped)
9. Expression + Protein + Methylation (subgrouped)
10. Expression + Protein (subgrouped) + Methylation (subgrouped) [0073] To evaluate predictive performance, the inventor used leave-pair-out cross-validation (LPOCV). This validation method requires building a predictive model in 37/39 samples, then testing the predictive performance in one unseen positive sample and one unseen negative sample. This is repeated for all possible combinations of positive and negative samples, resulting in 308 evaluations of performance in this cohort. The average performance over these 308 unseen test sets is the reported accuracy for a given predictive algorithm.
[0074] For all multi-feature datasets (4-10 above) the inventor evaluated 13 diverse classification algorithms for this prediction task. For the single-feature datasets (1, 2, and 3 above) a new optimal cutoff was established in the training samples using the Youden J Statistic, and the new cutof s performance was tested in the pair of unseen samples.
[0075] These 10 datasets and 14 classification algorithms combine into 140 different modeling strategies. Using LPOCV to evaluate the predictive performance of these 140 strategies in the unseen samples required building an additional 2772 unique predictive sub-models. Figure 14 depicts the average accuracies in unseen samples for each of these modeling strategies.
[0076] As can be seen from the calculations and Figure 14, the best modeling strategy overall was 87% accurate in predicting temozolomide response in the unseen samples. It should be appreciated that this performance (87% accuracy) is considerably better than the majority classification strategy (i.e. assuming all samples are resistant: 71%), and improves over using protein values alone (80%). This modeling strategy was selected to propose a final predictive model.
[0077] The highest-performing modeling strategy uses a K-nearest-neighbors approach utilizing all three features (RNA, protein, and methylation) in their sub-grouped transformations. This approach makes Temozolomide response predictions on novel samples as follows: 1. Define MGMT mRNA expression status, protein level, and promoter methylation status, using the predefined cut-offs described above, 2. calculate the pairwise Minkowski distance between each of the training instances and novel samples to be predicted using all three MGMT-related features (i.e. brute tree), 3. for each novel sample, identify the five closest matches, and 4. assign the novel sample the response class of the majority of the closest training samples. [0078] A final model is proposed in this application that uses all available samples for training, with the strong belief that predictive performance in novel samples will be similar to those in the cross-validated setting. Due to being trained on three binary features, the final model describes the probability of temozolomide sensitivity in 8 distinct states (Table 11). Novel samples may be subgrouped using the same cutoffs as described above and assigned to one of these 8 states. A sensitivity prediction probability of >0.5 suggests that state will be sensitive to temozolomide with -87% accuracy. Conversely, a temozolomide response probability of <0.5 is associated with resistance to temozolomide. atpfessSon states Fwrtefe status fttethyisSgn ste us PisertsMveJ
0 0.0 9.0 β,β S.e
1 1 .0 0.0 C,0 3.2
2 8,0 0,0 LB 9,6
3 1.0 0,0 i .e as
4 Q 1 ,0 6.© 0,2
5 1,0 1 ,0 β.β Q Q
6 &8 1.0 1.0 0-2
7 1.8 1.0 1.ES 0.2
Table 11
[0079] Temozolomide response prediction Example II: The inventors sought to train a robust predictive model of TMZ response based on 3 separate quantitative MGMT assays (promoter quantitative methylation, RNA expression, and protein abundance) and validate its accuracy in unseen mCRC patients. Viewed form a different perspective, rather than identifying a single type of predictor, the inventors set out to identify multiple predictors in a machine learning setting to integrate various variables and to so arrive at a prediction model with high sensitivity and accuracy.
[0080] In one example, 41 archived tumor samples from 3 TMZ safety trials (INT Study n.20/13; INT Study 20/13 & EudraCT 2012-002766-13) were used to train models. Response to TMZ was defined by RECIST v.1.1 criteria. MGMT status was assessed by 3 methods: digital PCR/methyl-BEAMing (MB), RNAseq, and liquid chromatography mass-spec. Several multivariate modeling strategies (kNN, SVM, decision trees, etc.) were evaluated using cross- validation (CV) within the training set. Due to a lack of clinical-grade methylation testing, models that first predict MGMT methylation (based on whole RNAseq) then use predicted methylation to classify TMZ response were also explored. The most accurate model in CV was validated in 14 unseen tumor samples from a follow-up study that were similarly assayed.
Predefined thresholds in each MGMT assay were used as the basis for comparison.
[0081] As can be seen from the exemplary results in Table 1 below, when multiple variables were used to train and validate, response prediction to temozolomide significantly improved as compared to single variables (i.e., methylation or protein or expression used individually).
Indeed, based on the integration of multiple variables, TMZ response in refractory mCRC is approximately predictable. Combining predicted methylation, transcript levels, and protein abundance, yields the most accurate and robust method of predicting response (82% - 87% accurate).
Figure imgf000024_0001
Table 12
[0082] In one set of experiments, the inventors investigated the training cohort prediction performance for MGMT protein (as measured by LC-MS), MGMT expression (as measured by TPM), and MGMT promotor methylation (as measured by digital PCR/methyl-BEAMing (MB)). More specifically, to evaluate the ability of the predefined cutoffs to predict response to TMZ, we used the leave pair out cross validation strategy. Predefined and exploratory cutoffs were assessed in unseen samples 330, 308, and 250 times in LC-MS, RNAseq, and MB data respectively. The predefined cutoffs in LC-MS and RNA-seq showed better mean predictive performance (82.1%> and 72.2%, respectively) than the MB model (68.0%>), and a typical result is depicted in Figure 15. [0083] To further investigate the influence of various classification algorithms and training data (i.e., single variable versus multiple variables), the inventors performed several learning approaches in which protein, RNA, and methylation data were used, alone or in combination, and with or without a predefined threshold. As can be seen from Figure 16, protein based models had relatively high prediction accuracy, which was even further superseded by a model that used all three variables. In still further attempts to improve accuracy and simplify clinical or sample requirements, the inventors used previous TMZ studies as training data to build 10 candidate models (+3 predefined cutoffs) and replaced measured methylation with 'predicted methylation' based on whole RNAseq and using a regression model. Performance was then tested in an unseen testing cohort (TEMIRI) as is exemplarily shown in Figure 17. Here, the training dataset was a TMZ cohort and included 41 mCRC patients treated with TMZ from 3 phase II studies. Continuous MGMT protein levels by mass spec were available for all of the patients, as well as RNA expression data by RNA seq and continuous MGMT methylation percentage data. Drug response was noted as binary drug response data. The testing dataset comprised 32 mCRC patients treated with TMZ + irinotecan. Binary drug response data were missing for 3 patients, gene expression values were available for 14 patients, and MGMT protein expression data were available for 21 patients. See Table 13.
Figure imgf000025_0001
[0084] The inventors further contemplate that a regression model for a machine learning that explains methylation and/or uses predicated methylation values can be built that has higher accuracy. Figure 18 shows a regression and classification pipeline for building the regression model. As shown in Table 14, the RMSE (square root of the variance of the residuals, indicating he absolute fit of the model to the data-how close the observed data points are to
the model's predicted values) of the regressor models is lower (better fit) when the MGMT protein expression level and MGMT RNAseq are both used as data sets. MGMT Protein + MGMT gene ! Theif sen 0.30
Alf expression + MGMT protein Bayesian Ridge 0.34
All expression : Elastic Net 0.422
All expression, then include Bayesian Ridge+ Bayesian 0.38
MGMT protein Ridge
Table 14
[0085] Figure 19 shows a mean accuracy value of various regressor models when all expression (expression levels of all RNA) and MGMT protein expression level as data sets, which is also summarized in Table 15.
Ma !Mesm
Repressor
a ©*:220 0;
Bias .tenet 0.39-?2S? 0. 366860
LASS 0..3S-S 23. o
KMg e 3¾e>Br€ss»4 i 0.393-5 OS 0.
a ,379S88
Lasss 0,408242 3 .333749
Q.
0 .5O4535 ,4001533
o; .43SSS
or&s OQ una 3 ma ie Sis s-sgpufsuffi 0-.S4717S o ,494883
seo ¾,7 ¾.£2 5
Table 15
[0086] Figure 20 shows a mean RMSE value of various regressor models when MGMT gene expression and MGMT protein expression level as data sets, which is also summarized in Table 16.
Figure imgf000027_0001
pas siv e a¾gsr sss ve, jr&gsrssses- &:3?&s≥-5 ssess-;
Table 16
[0087] Next, the inventors built a regressor model using predicated methylation values. Figures 21 and 22 show mean accuracy values of various regressor models when the predicated methylation values were used as a data set, which is also summarized in Table 17 and Table 18, respectively. ax essrs
classifier
K 3,883217 3,862848
^ndeKBFomst Q,?4?828 8.716601
LB§islle!¾ef ®iSS¾sf5 0-.8t7391 & 891-502
:Efesc6S3sntre®s 0,?434'?8 .67S099
Figure imgf000027_0002
Table 17 eiiassiiSer
MSMSS ssess
ha s iffi!ie^sssi a® S C*
Table 18
[0088] Figure 23 depicts a heat map with exemplary results for the response predictions on the 1,000 most variable genes across 44 samples using preset thresholds as noted, and Table 14 is a listing of exemplary classification algorithms used on selected datasets and combination of datasets. As can be seen once more, use of MGMT RNAseq, MGMT protein, and MGMT promotor methylation provided superb training and testing accuracy for response prediction for temozolomide. Likewise, sensitivity, specificity, and Fl score were all substantially increased over other classifiers and individual datasets. Sample level predictions for the best model of Table 19 are listed in Table 20, indicating that models that simultaneously consider protein, MGMT methylation and mRNA performed better when compared to other models.
Figure imgf000029_0001
Table 19
Figure imgf000029_0002
Table 20
[0089] In this retrospective analysis of patients with refractory mCRC treated with TMZ, an MS- based test for MGMT protein had a sensitivity of 50% and a specificity of 100%. In this small cohort of patients, the proteomic test outperformed both digital MB and RNA-seq in predicting response to TMZ. Moreover, MGMT protein expression below a predefined threshold (200 amol^g) was associated with a 2-fold increase in mPFS, and this association was independent of 12 prognostic variables. Of interest, patients with positive MGMT protein expression by LC-MS had similar PFS to that reported for mCRC patients who participated in clinical trials of TMZ. The disappointing results of such trials may reflect the limited ability of standard MGMT assessment methods such as MSP to select the optimal candidates for TMZ. The present study lends support for assessment of alternative MGMT platforms, such as LC-MS, in prospective studies.
[0090] The accuracy and robustness of MGMT assays as predictors of TMZ response was stringently tested in a cross-validated setting. In this exercise, the proteomic MGMT test outperformed the other testing platforms, with an average accuracy of 82.1%.
[0091] It is note that the importance of identifying potential responders to TMZ is recognized as DNA mismatch repair in TMZ-treated mCRC is impaired where TMZ responders switched from microsatellite stable to microsatellite instable (MSI), thus rendering them eligible for therapy with immune checkpoint inhibitors. In other words, patients who relapse on TMZ could begin immunotherapy.
[0092] The importance of identifying potential responders to TMZ was emphasized by recently published findings on impairment of DNA mismatch repair in TMZ-treated mCRC where TMZ responders switched from microsatellite stable to microsatellite instable (MSI), thus rendering them eligible for therapy with immune checkpoint inhibitors. This suggests that patients who relapse on TMZ could begin immunotherapy. Thus, the inventors further contemplate that a cancer treatment can be recommended or updated for a patient based on the prediction results. For example, a patient can be administered with TMZ in a dose and schedule effective to treat the tumor where the response prediction model predicts that the patient is responsive to TMZ. In another example, a patient can be administered with immune therapy (e.g., checkpoint inhibitor, a cancer vaccine, etc.) where the response prediction model predicts that the patient is no longer responsive to TMZ or has substantially reduced responsiveness to TMZ (e.g., reduced at least 30%, at lesat 50%, at least 70% compared to pre-treatment of TMZ, or compared to other individual who has similar prognosis of cancer, etc.). [0093] As used herein, the term "administering" a drug or a cancer treatment refers to both direct and indirect administration of the drug or the cancer treatment. Direct administration of the drug or the cancer treatment is typically performed by a health care professional (e.g., physician, nurse, etc.), and wherein indirect administration includes a step of providing or making available the drug or the cancer treatment to the health care professional for direct administration (e.g., via injection, oral consumption, topical application, etc.).
[0094] As used in the description herein and throughout the claims that follow, the meaning of "a," "an," and "the" includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of "in" includes "in" and "on" unless the context clearly dictates otherwise. As also used herein, and unless the context dictates otherwise, the term "coupled to" is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms "coupled to" and "coupled with" are used synonymously. Finally, and unless the context dictates the contrary, all ranges set forth herein should be interpreted as being inclusive of their endpoints, and open-ended ranges should be interpreted to include commercially practical values. Similarly, all lists of values should be considered as inclusive of intermediate values unless the context indicates the contrary.
[0095] It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the scope of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms "comprises" and "comprising" should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.

Claims

CLAIMS What is claimed is:
1. A method of predicting treatment response to temozolomide in a patient, comprising:
providing RNAseq information, protein quantitative information, and methylation
information from a tumor of the patient;
calculating, by a response prediction model, a response prediction to temozolomide, wherein the response prediction model uses the RNAseq information, protein quantitative information, and methylation information.
2. The method of claim 1, wherein the response prediction model uses a K-nearest-neighbors approach.
3. The method of any one of claims 1-2, wherein the RNAseq information, protein quantitative information, and methylation information are sub-grouped.
4. The method of claim 3 wherein the RNAseq information is sub-grouped using a
log2(TPM+l) cutoff value of 3.5.
5. The method of claim 3 wherein the protein quantitative information is sub-grouped using a cutoff value of 200 amol/mL.
6. The method of claim 3 wherein the methylation information is sub-grouped using a cutoff value of 60% promoter CpG methylation.
7. The method of any one of claims 1-6, wherein the RNAseq information, protein quantitative information, and methylation information are provided from a FFPE sample or a fresh tumor sample.
8. The method of any one of claims 1-7, wherein the tumor is a solid tumor.
9. The method of any one of claims 1-8, wherein the tumor is metastatic colon cancer,
glioblastoma, or melanoma.
10. The method of any one of claims 1-9, wherein the response prediction model has a prediction accuracy of at least 85%.
11. The method of claim 1, wherein the RNAseq information, protein quantitative information, and methylation information are sub-grouped.
12. The method of claim 11, wherein the RNAseq information is sub-grouped using a
log2(TPM+l) cutoff value of 3.5.
13. The method of claim 11, wherein the protein quantitative information is sub-grouped using a cutoff value of 200 amol/mL.
14. The method of claim 11, wherein the methylation information is sub-grouped using a cutoff value of 60% promoter CpG methylation.
15. The method of claim 1, wherein the RNAseq information, protein quantitative information, and methylation information are provided from a FFPE sample or a fresh tumor sample.
16. The method of claim 1, wherein the tumor is a solid tumor.
17. The method of claim 1, wherein the tumor is metastatic colon cancer, glioblastoma, or
melanoma.
18. The method of claim 1, wherein the response prediction model has a prediction accuracy of at least 85%.
PCT/US2018/057843 2017-10-30 2018-10-26 Temozolomide response predictor and methods WO2019089393A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
AU2018362347A AU2018362347A1 (en) 2017-10-30 2018-10-26 Temozolomide response predictor and methods
CA3080342A CA3080342A1 (en) 2017-10-30 2018-10-26 Temozolomide response predictor and methods
JP2020524174A JP2021501422A (en) 2017-10-30 2018-10-26 Temozolomide reaction predictors and methods
CN201880081292.2A CN111492435A (en) 2017-10-30 2018-10-26 Temozolomide reaction predictor and method
KR1020207015366A KR20200079524A (en) 2017-10-30 2018-10-26 TEMOZOLOMIDE RESPONSE PREDICTOR AND METHODS

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762579127P 2017-10-30 2017-10-30
US62/579,127 2017-10-30
US201862727245P 2018-09-05 2018-09-05
US62/727,245 2018-09-05

Publications (1)

Publication Number Publication Date
WO2019089393A1 true WO2019089393A1 (en) 2019-05-09

Family

ID=66333325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/057843 WO2019089393A1 (en) 2017-10-30 2018-10-26 Temozolomide response predictor and methods

Country Status (7)

Country Link
JP (1) JP2021501422A (en)
KR (1) KR20200079524A (en)
CN (1) CN111492435A (en)
AU (1) AU2018362347A1 (en)
CA (1) CA3080342A1 (en)
TW (1) TW201923635A (en)
WO (1) WO2019089393A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5731304A (en) * 1982-08-23 1998-03-24 Cancer Research Campaign Technology Potentiation of temozolomide in human tumour cells
WO2012009382A2 (en) * 2010-07-12 2012-01-19 The Regents Of The University Of Colorado Molecular indicators of bladder cancer prognosis and prediction of treatment response
KR20130138779A (en) * 2010-09-23 2013-12-19 카운슬 오브 사이언티픽 앤드 인더스트리얼 리서치 Top2a inhibition by temozolomide useful for predicting gbm patient's survival
WO2016118527A1 (en) * 2015-01-20 2016-07-28 Nantomics, Llc Systems and methods for response prediction to chemotherapy in high grade bladder cancer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5731304A (en) * 1982-08-23 1998-03-24 Cancer Research Campaign Technology Potentiation of temozolomide in human tumour cells
WO2012009382A2 (en) * 2010-07-12 2012-01-19 The Regents Of The University Of Colorado Molecular indicators of bladder cancer prognosis and prediction of treatment response
KR20130138779A (en) * 2010-09-23 2013-12-19 카운슬 오브 사이언티픽 앤드 인더스트리얼 리서치 Top2a inhibition by temozolomide useful for predicting gbm patient's survival
WO2016118527A1 (en) * 2015-01-20 2016-07-28 Nantomics, Llc Systems and methods for response prediction to chemotherapy in high grade bladder cancer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NIZ, CARLOS DE ET AL.: "Algorithms for drug sensitivity prediction", ALGORITHMS, vol. 9, 77, 2016, pages 1 - 25, XP055614617 *
RIVERA, A. L. ET AL.: "MGMT promoter methylation is predictive of response to radiotherapy and prognostic in the absence of adjuvant alkylating chemotherapy for glioblastoma", NEURO-ONCOLOGY, vol. 12, no. 2, 2010, pages 116 - 121, XP055614619 *
UNO, MIYUKI ET AL.: "Correlation of MGMT promoter methylation status with gene and protein expression levels in glioblastoma", CLINICS, vol. 66, no. 10, 2011, pages 1747 - 1755, XP055614618 *

Also Published As

Publication number Publication date
TW201923635A (en) 2019-06-16
KR20200079524A (en) 2020-07-03
CN111492435A (en) 2020-08-04
JP2021501422A (en) 2021-01-14
AU2018362347A1 (en) 2020-05-14
CA3080342A1 (en) 2019-05-09

Similar Documents

Publication Publication Date Title
Robertson et al. Comprehensive molecular characterization of muscle-invasive bladder cancer
Wei et al. Intratumoral and intertumoral genomic heterogeneity of multifocal localized prostate cancer impacts molecular classifications and genomic prognosticators
Kim et al. A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients
Erho et al. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy
Andres et al. Interrogating differences in expression of targeted gene sets to predict breast cancer outcome
JP7421474B2 (en) Normalization of tumor gene mutation burden
US20190292600A1 (en) Nasal epithelium gene expression signature and classifier for the prediction of lung cancer
JP7340021B2 (en) Tumor classification based on predicted tumor mutational burden
EP3172362A1 (en) Systems, devices and methods for constructing and using a biomarker
Eilertsen et al. Alternative splicing expands the prognostic impact of KRAS in microsatellite stable primary colorectal cancer
CN116798632B (en) Stomach cancer molecular typing and prognosis prediction model construction method based on metabolic genes and application
Sun et al. Genomic instability-associated lncRNA signature predicts prognosis and distinct immune landscape in gastric cancer
JP2019514344A (en) Epigenetic profiling of cancer
Mactier et al. Protein signatures correspond to survival outcomes of AJCC stage III melanoma patients
Addala et al. Computational immunogenomic approaches to predict response to cancer immunotherapies
Lau et al. Single-molecule methylation profiles of cell-free DNA in cancer with nanopore sequencing
WO2020092101A1 (en) Consensus molecular subtypes sidedness classification
WO2019089393A1 (en) Temozolomide response predictor and methods
Ragulan et al. A low-cost multiplex biomarker assay stratifies colorectal cancer patient samples into clinically-relevant subtypes
CN111670255A (en) BAM characteristics from liquid and solid tumors and uses thereof
Mi et al. RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data
US20230326554A1 (en) Identifying treatment response signatures
Zhang et al. Identification of novel molecular subtypes and a signature to predict prognosis and therapeutic response based on cuproptosis-related genes in prostate cancer
US20240112813A1 (en) Methods and systems for annotating genomic data
Nair et al. Transcriptomics based prediction of survival and response to therapy in malignant mesothelioma

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18873768

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3080342

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2020524174

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018362347

Country of ref document: AU

Date of ref document: 20181026

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20207015366

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018873768

Country of ref document: EP

Effective date: 20200602

122 Ep: pct application non-entry in european phase

Ref document number: 18873768

Country of ref document: EP

Kind code of ref document: A1