WO2012066451A1 - Prognostic and predictive gene signature for colon cancer - Google Patents

Prognostic and predictive gene signature for colon cancer Download PDF

Info

Publication number
WO2012066451A1
WO2012066451A1 PCT/IB2011/054962 IB2011054962W WO2012066451A1 WO 2012066451 A1 WO2012066451 A1 WO 2012066451A1 IB 2011054962 W IB2011054962 W IB 2011054962W WO 2012066451 A1 WO2012066451 A1 WO 2012066451A1
Authority
WO
WIPO (PCT)
Prior art keywords
braf
genes
subject
gene
value
Prior art date
Application number
PCT/IB2011/054962
Other languages
French (fr)
Inventor
Eva Budinska
Mauro Claudio Delorenzi
Adam Pavlicek
Vlad Calin Popovici
Sabine Tejpar
Scott Lawrence Weinrich
Original Assignee
Pfizer Inc.
Centre Hospitalier Universitaire Vaudois
Swiss Institute Of Bioinformatics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pfizer Inc., Centre Hospitalier Universitaire Vaudois, Swiss Institute Of Bioinformatics filed Critical Pfizer Inc.
Publication of WO2012066451A1 publication Critical patent/WO2012066451A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57419Specifically defined cancers of colon
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis

Definitions

  • the application relates to compositions and methods for prognosing and classifying colon cancer and for determining the benefit of adjuvant chemotherapy.
  • CRC colorectal cancer
  • CRC that is confined within the wall of the colon (TNM (tumor-node metastasis) stages I and II) are typically curable with surgery. However, if left untreated, such tumors may spread to regional lymph nodes (stage III), where up to 73% are curable by surgery and chemotherapy. Once CRC metastasizes to distant sites within the body (stage IV), the disease is typically not curable, although chemotherapy can extend the rate of survival.
  • VEGF vascular endothelial growth factor
  • EGFR epidermal growth factor receptor
  • monoclonal antibodies that target EGFR e.g. cetuximab and panitumumab
  • VEGF bevacizumab
  • the constitutive activation of the mitogen-activated protein kinase (MAPK) pathway is a key driver of CRC tumorigenesis.
  • the extracellular signal-regulated kinase (ERK) pathway plays a key role in cell proliferation and its aberrant activation is often due to oncogenic mutations in KRAS or BRAF genes (Fang and Richardson, Lancet Oncol. 6:322-327 (2005); Tejpar et al., Oncologist 15:390-404 (2010)).
  • RAF is a serine-threonine-specific protein kinase that is activated downstream of the small G-protein RAS and which activates the MAP kinase (MEK) pathway, which in turn activates ERK.
  • MEK MAP kinase
  • BRAF is one of the three highly conserved RAF genes in mammals (the other two being ARAF and CRAF) and its somatic mutations have been reported in approximately 7% of human cancers (Davies et al., Nature 417:949-954 (2002); Dhomen & Marais, Curr. Opin. Genet. Dev. 17:31-39 (2007)). In CRC, the BRAF mutations occur in 8-10% of sporadic cancers and generally are markers of poor prognosis.
  • the V600E mutation in BRAF is believed to be associated with microsatellite instability (MSI), and may confer resistance to anti-EGFR therapy (Richman et al. J. Clin. Oncol. 27(35):5931-5937 (2009)).
  • KRAS mutations are known to lead to EGFR-independent activation of the MAPK pathway, suggesting that therapies targeting EGFR will not be effective in patients with KRAS mutations (Benvenuti et al. Cancer Res 67: 2643-2648 (2007); Di Fiore et al., Br. J. Cancer 96:1 166-1 169 (2007)). Accordingly, there is an ongoing need to develop biomarkers that can effectively identify CRC patients that are best suited for certain therapeutic modalities.
  • the present disclosure relates to the identification, from historical CRC patient data, several gene signatures that identify a subpopulation of patients that may be sensitive to novel targeted treatments.
  • the present disclosure provides several gene signatures that are characteristic of BRAF mutated CRC tumors.
  • the present disclosure provides methods and kits useful for obtaining and utilizing expression information for the genes identified herein, to obtain prognostic and diagnostic information for patients with CRC.
  • the methods of the present disclosure generally involve obtaining relative expression data from a patient, at the DNA, messenger RNA (mRNA), or protein level, for each of the genes identified herein, processing the data and comparing the resulting information to one or more reference values.
  • Relative expression levels are expression data normalized according to techniques known to those skilled in the art. Expression data may be normalized with respect to one or more genes with invariant expression, such as "housekeeping" genes. In some embodiments, expression data may be processed using standard techniques, such as transformation to a z-score, and/or software tools, such as RMAexpress v0.3.
  • a multi-gene signature for prognosing or classifying patients with CRC.
  • a 39-gene pair signature is provided, comprising reference values for each of 39 pairs of different genes based on relative expression data for each gene from a historical data set with a known outcome, such as good or poor survival, and/or known treatment, such as adjuvant chemotherapy.
  • relative expression data from a patient are combined with the gene-specific reference values on a gene-by-gene basis for each of the genes identified herein, to generate a test value which allows prognosis or therapy recommendation.
  • relative expression data are subjected to an algorithm that yields a single test value, or combined score, which is then compared to a control value obtained from the historical expression data for a patient or pool of patients.
  • the control value is a numerical threshold for predicting outcomes, for example good and poor outcome, or making therapy recommendations, for example adjuvant therapy in addition to surgical resection or surgical resection alone.
  • a test value or combined score greater than the control value is predictive, for example, of high risk (poor outcome) or benefit from adjuvant therapy, whereas a combined score falling below the control value is predictive, for example, of low risk (good outcome) or lack of benefit from adjuvant therapy.
  • the present disclosure provides gene signatures that are prognostic for survival as well as predictive for benefit from adjuvant chemotherapy.
  • the disclosure provides methods that can be used to select or identify subjects who might benefit from adjuvant chemotherapy as opposed to subjects who are not likely to benefit from such adjuvant chemotherapy.
  • the disclosure provides a method of prognosing or classifying a subject with CRC comprising: a) analyzing at least one of the gene pairs shown in Table 3.1 .1 or Table 3.1 .2 according to the top scoring pair method; and b) classifying the subject into a BRAF mutant-like group or a wild-type group.
  • a method of prognosing or classifying a subject with CRC comprising: a) analyzing at least one of the gene pairs shown in Table 3.1 .1 or Table 3.1 .2 according to the top scoring pair method; and b) classifying the subject into a BRAF mutant-like group or a wild-type group.
  • at least 10 of the gene pairs shown in Table 3.1.1 or 3.1.2 are analyzed according to the top scoring pair method.
  • at least 30 of the gene pairs shown in Table 3.1 .1 or Table 3.1.2 are analyzed according to the top scoring pair method.
  • the 39 gene pairs shown in Table 3.1 .1 are analyzed according to the top scoring pair method.
  • the top scoring pair method is carried out by comparing the average value of the relative expression levels of all Genel genes used in the analysis with the average value of relative expression levels of all Gene2 genes used in the analysis, wherein if the average Genel value is less than the average Gene2 value, then the subject is classified as BRAF mutant-like. In a further embodiment, the top scoring pair method is carried out as described above, wherein if the average Genel value is greater than or equal to the average Gene2 value, then the subject is classified as wild-type. In some embodiments, the top scoring pair method uses the 39 pairs of genes shown in Table 3.1.1. In some embodiments, the top scoring pair method uses the 32 pairs of genes shown in Table 3.1 .2.
  • the disclosure provides a method of prognosing or classifying a subject with CRC comprising: a) calculating a score using the AdaBoost method as described in Example 3, using the relative expression values of the genes shown in Table 3.2; and b) classifying the subject into a BRAF mutant-like group or a wild-type group. For example, in one embodiment the subject is classified as wild-type if the calculated score is less than 0.5, and the subject is classified as BRAF mutant-like if the score is 0.5 or greater.
  • the disclosure provides a method of prognosing or classifying a subject with CRC by using the CCP2 gene signature as described in Example 3.
  • the relative expression levels of the genes noted in Example 3.2.2 can be determined and the CCP2 method carried out as described in Example 3.1 .2.
  • Using the CCP2 gene signature as described in Example 3 can be used to classify or prognose a subject with CRC as either BRAF mutant-like, or wild-type.
  • the present disclosure provides a method for selecting therapy comprising the steps of classifying or prognosing a subject with CRC using any of the methods described herein, and further comprising selecting adjuvant chemotherapy for a subject classified as wild-type, or selecting no adjuvant chemotherapy for a subject classified as BRAF mutant-like.
  • the present disclosure provides a method for selecting therapy comprising the steps of classifying or prognosing a subject with CRC using any of the methods described herein, and further comprising selecting adjuvant chemotherapy for a subject classified as wild-type, or selecting a treatment regimen comprising a BRAF mutant-specific inhibitor for a subject classified as BRAF mutantlike.
  • the present disclosure provides a method of treating a subject with CRC comprising administering a BRAF mutant-specific inhibitor to said subject, wherein said subject is classified as BRAF mutant-like according to any of the methods described herein.
  • the present disclosure provides any of the methods described herein, wherein said subject is a human.
  • the present disclosure provides a CRC prognosticator comprising a mechanism for determining relative expression levels in a CRC tumor sample of the genes listed in Table 3.1.1 , Table 3.1.2, Table 3.2, or those listed in Example 3.2.2.
  • the mechanism comprises a microarray.
  • the mechanism comprises an assay of reverse transcription polymerase chain reaction.
  • kits used to prognose or classify a subject with CRC into a good survival group or a poor survival group or for selecting therapy for a subject with CRC that includes detection agents that can detect the expression products of the biomarkers described herein, for example the gene pairs shown in Table 3.1.1 , Table 3.1 .2, or the genes listed in Table 3.2, or those listed in Example 3.2.2.
  • the present disclosure provides a kit for classifying a subject with CRC comprising detection agents capable of detecting the expression products of at least one gene pair shown in Table 3.1.1 , or Table 3.1 .2, or of the genes shown in Table 3.1 .1 or Table 3.1.2, or in Example 3.2.2.
  • said agents are capable of detecting the expression products of at least 5, at least 10, at least 20, at least 30, at least 35, or the 39 gene pairs shown in Table 3.1 .1.
  • said agents are capable of detecting the expression products of at least 5, at least 10, at least 20, at least 30, or the 32 gene pairs shown in Table 3.1.2.
  • any of the kits described above comprise an addressable array that comprises probes for the expression products of the at least one, at least 5, at least 10, at least 20, at least 30, at least 35, or the 39 gene pairs of Table 3.1 .1.
  • any of the kits described above comprise an addressable array that comprises probes for the expression products of the at least one, at least 5, at least 10, at least 20, at least 30, or the 32 gene pairs of Table 3.1 .2.
  • the detection agents comprise primers capable of hybridizing to the expression products of the gene pairs.
  • kits described herein further comprising a computer implemented product for comparing: a) the relative expression level values for Genel genes in Table 3.1.1 or Table 3.1.2 for a subject to b) the relative expression level values for Gene2 genes in Table 3.1.1 or Table 3.1 .2 for said subject.
  • the average value of the relative expression levels of all Genel genes used in the analysis is compared with the average value of relative expression levels of all Gene2 genes used in the analysis.
  • the 39 gene pairs in Table 3.1.1 are used in the analysis.
  • the 32 gene pairs in Table 3.1.2 are used in the analysis.
  • the present disclosure provides probes for detecting the biomarkers described herein, for example the genes disclosed in Table 3.1 .1 , Table 3.1 .2, Table 3.2, and those disclosed in Example 3.2.2.
  • Exemplary probes include mRNA oligonucleotides, cDNA oligonucleotides, and PCR primers.
  • the probes are capable of detecting or hybridizing to, each of the 39 pairs or 32 pairs of genes described in Example 3.
  • kits useful for carrying out the diagnostic and prognostic tests described herein generally comprise reagents and compositions for obtaining relative expression data for the 39 gene pairs or 32 gene pairs, described in Table 3.1 .1 or Table 3.1.2, the genes shown in Table 3.2, or the genes noted in Example 3.2.2.
  • the kits typically comprise probes for detecting the 39 gene pairs.
  • the present disclosure also provides antibodies capable of specifically binding to the protein products of the biomarkers described herein. As will be recognized by skilled artisans, the contents of the kits will depend upon the means used to obtain the relative expression information.
  • Kits may comprise a labeled compound or agent capable of detecting protein product(s) or nucleic acid sequence(s) in a sample and means for determining the amount of the protein or mRNA in the sample (e.g., an antibody which binds the protein or a fragment thereof, or an oligonucleotide probe which binds to DNA or mRNA encoding the protein). Kits can also include instructions for interpreting the results obtained using the kit.
  • a labeled compound or agent capable of detecting protein product(s) or nucleic acid sequence(s) in a sample and means for determining the amount of the protein or mRNA in the sample (e.g., an antibody which binds the protein or a fragment thereof, or an oligonucleotide probe which binds to DNA or mRNA encoding the protein).
  • Kits can also include instructions for interpreting the results obtained using the kit.
  • kits are oligonucleotide-based kits, which may comprise, for example: (1 ) an oligonucleotide, e.g., a detectably labeled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a marker protein or (2) a pair of primers useful for amplifying a marker nucleic acid molecule. Kits may also comprise, e.g., a buffering agent, a preservative, or a protein stabilizing agent. The kits can further comprise components necessary for detecting the detectable label (e.g., an enzyme or a substrate).
  • kits can also contain a control sample or a series of control samples which can be assayed and compared to the test sample.
  • Each component of a kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.
  • kits are antibody-based kits, which may comprise, for example: (1 ) a first antibody (e.g., attached to a solid support) which binds to a marker protein; and, optionally, (2) a second, different antibody which binds to either the protein or the first antibody and is conjugated to a detectable label.
  • a first antibody e.g., attached to a solid support
  • a second, different antibody which binds to either the protein or the first antibody and is conjugated to a detectable label.
  • a further aspect provides computer implemented products, computer readable mediums and computer systems that are useful for the methods described herein.
  • Figures 1 A and B AUC and error rates when the model is built on the phase 1 data and validated on phase 2 data, for increasing model size.
  • Figures 2 A and B AUC and error rates when the model is built on the phase 2 data and validated on phase 1 data, for increasing model size.
  • FIG. 3 (AdaBoost) Distribution of BRAF scores: all scores above 0.5 (grey vertical line) indicate the "BRAF-like" samples.
  • the small hash lines at the bottom right show the score of the BRAFmut samples and the small hash lines along the top are those of KRASmut samples.
  • FIG. 5 (grey vertical line) indicate the "BRAF-like" samples.
  • the small hash lines along the bottom show the score of the BRAFmut samples and the small hash along the top are those of KRASmut samples.
  • Figure 5 Classifiers agreement: The diagrams show the number of samples that are predicted to be either BRAF-like or WT2-like by the three classifiers. For some samples, the three classifiers agree on their predictions, while for others there is no agreement.
  • Figure 7 KRASmut samples stratified by the mTSP signature in BRAF-like samples (dashed-line - BRAF high) and non-BRAF-like samples (solid line - BRAF low).
  • FIG. 8 (PETACC3 data/mTSP) Overall survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
  • FIG. 9 (PETACC3 data/mTSP) Relapse-free survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
  • FIG. 10 (PETACC3 data/mTSP) Survival after relapse: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
  • Figure 1 1 Kaplan-Meier plots for the BRAF-like group predicted by mTSP (A and
  • Figure 12 (Overall survival) KRASmut samples stratified by the mTSP signature in BRAF-like samples (BRAF high) and non-BRAF-like samples (BRAF low) in the CETUX data set.
  • Figure 13 Kaplan-Meier plots for the BRAF-like group predicted by CCP2 and the OS, RFS and SAR endpoints, on the PETACC3 data set.
  • FIG. 14 (PETACC3 data/CCP2) Overall survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
  • FIG 15 (PETACC3 data/CCP2) Relapse-free survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
  • Figure 16 (PETACC3 data/CCP2) Survival after relapse: B RAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D - only KRASmut; E - only WT2).
  • Figure 17 Kaplan-Meier plots for the BRAF-like group predicted by CCP2 and the OS and PFS on the CETUX data set.
  • Figure 18 Overall survival: Population stratification by binarized BRAF score.
  • Figure 19 Overall survival: BRAFhi and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D - only KRASmut; E - only WT2).
  • Figure 20 Relapse-free survival: Population stratification by binarized BRAF score.
  • Figure 21 Survival after relapse: Population stratification by binarized BRAF score.
  • Figure 22 Survival after relapse: BRAFhi and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D - only KRASmut; E - only WT2).
  • the present disclosure provides several gene signatures that can be used to predict BRAFmut status, and provides methods, compositions, computer implemented products, detection agents and kits for prognosing or classifying a subject with CRC and for determining the benefit of adjuvant chemotherapy.
  • biomarker refers to a gene that is differentially expressed in individuals with CRC according to prognosis and is predictive of different survival outcomes and of the benefit of adjuvant chemotherapy.
  • a 39-gene pair signature comprises 39 gene pairs listed in Table 3.1.1 .
  • a 32-gene pair signature comprises 32 gene pairs listed in Table 3.1 .2.
  • the term "reference expression profile” refers to the expression of the 39 gene pairs listed in Table 3.1.1 associated with a clinical outcome in a CRC patient.
  • the reference expression profile comprises 78 values, each value representing the expression level of a biomarker, wherein each biomarker corresponds to one gene in Table 3.1.1 .
  • the reference expression profile is identified using one or more samples comprising tumor wherein the expression is similar between related samples defining an outcome class or group such as poor survival or good survival and is different to unrelated samples defining a different outcome class such that the reference expression profile is associated with a particular clinical outcome.
  • the reference expression profile is accordingly a reference profile of the expression of the 78 genes in Table 3.1 .1 , to which the subject expression levels of the corresponding genes in a patient sample are compared in methods for determining or predicting clinical outcome.
  • a reference expression profile can also refer to the 32 gene pairs listed in Table 3.1.2.
  • control refers to a specific value or dataset that can be used to prognose or classify the value, e.g., expression level or reference expression profile obtained from the test sample associated with an outcome class.
  • a dataset may be obtained from samples from a group of subjects known to have CRC and good survival outcome or known to have CRC and have poor survival outcome or known to have CRC and have benefited from adjuvant chemotherapy or known to have CRC and not have benefited from adjuvant chemotherapy.
  • the expression data of the biomarkers in the dataset can be used to create a "control value” that is used in testing samples from new patients.
  • a control value is obtained from the historical expression data for a patient or pool of patients with a known outcome.
  • the control value is a numerical threshold for predicting outcomes, for example good and poor outcome, or making therapy recommendations, for example adjuvant therapy in addition to surgical resection or surgical resection alone.
  • the "control" is a predetermined value for the set of 78 biomarkers obtained from CRC patients whose biomarker expression values and survival times are known.
  • the "control” is a predetermined reference profile for the set of 78 biomarkers obtained from CRC patients whose survival times are known. Using values from known samples allows one to develop an algorithm for classifying new patient samples into good and poor survival groups as described in the Examples.
  • control is a sample from a subject known to have CRC and good survival outcome.
  • control is a sample from a subject known to have CRC and poor survival outcome.
  • the comparison between the expression of the biomarkers in the test sample and the expression of the biomarkers in the control will depend on the control used. For example, if the control is from a subject known to have CRC and poor survival, and there is a difference in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a good survival group. If the control is from a subject known to have CRC and good survival, and there is a difference in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a poor survival group.
  • the subject can be prognosed or classified in a good survival group.
  • the control is from a subject known to have CRC and poor survival, and there is a similarity in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a poor survival group.
  • a "reference value" refers to a gene-specific coefficient derived from historical expression data.
  • the multi-gene signatures of the present disclosure comprise gene-specific reference values.
  • the multi-gene signature comprises one reference value for each gene in the signature.
  • the multi-gene signature comprises four reference values for each gene in the signature.
  • the reference values are the first four components derived from principal component analysis for each gene in the signature.
  • the term “differentially expressed” or “differential expression” as used herein refers to a difference in the level of expression of the biomarkers that can be assayed by measuring the level of expression of the products of the biomarkers, such as the difference in level of messenger RNA transcript expressed or proteins expressed of the biomarkers. In a preferred embodiment, the difference is statistically significant.
  • the term “difference in the level of expression” refers to an increase or decrease in the measurable expression level of a given biomarker as measured by the amount of messenger RNA transcript and/or the amount of protein in a sample as compared with the measurable expression level of a given biomarker in a control.
  • the differential expression can be compared using the ratio of the level of expression of a given biomarker or biomarkers as compared with the expression level of the given biomarker or biomarkers of a control, wherein the ratio is not equal to 1 .0.
  • an RNA or protein is differentially expressed if the ratio of the level of expression in a first sample as compared with a second sample is greater than or less than 1 .0.
  • a ratio of greater than 1 , 1.2, 1 .5, 1 .7, 2, 3, 5, 10, 15, 20 or more or a ratio less than 1 , 0.8, 0.6, 0.4, 0.2, 0.1 , 0.05, 0.001 or less.
  • the differential expression is measured using p-value.
  • a biomarker when using p-value, is identified as being differentially expressed as between a first sample and a second sample when the p-value is less than 0.1 , preferably less than 0.05, more preferably less than 0.01 , even more preferably less than 0.005, the most preferably less than 0.001 .
  • similarity in expression means that there is no or little difference in the level of expression of the biomarkers between the test sample and the control or reference profile. For example, similarity can refer to a fold difference compared to a control. In a preferred embodiment, there is no statistically significant difference in the level of expression of the biomarkers.
  • most similar in the context of a reference profile refers to a reference profile that is associated with a clinical outcome that shows the greatest number of identities and/or degree of changes with the subject profile.
  • prognosis refers to a clinical outcome group such as a poor survival group or a good survival group associated with a disease subtype which is reflected by a reference profile such as a biomarker reference expression profile or reflected by an expression level of the biomarkers disclosed herein.
  • the prognosis provides an indication of disease progression and includes an indication of likelihood of death due to CRC.
  • the clinical outcome class includes a good survival group and a poor survival group.
  • prognosing or classifying means predicting or identifying the clinical outcome group that a subject belongs to according to the subject's similarity to a reference profile or biomarker expression level associated with the prognosis.
  • prognosing or classifying comprises a method or process of determining whether an individual with CRC has a good or poor survival outcome, or grouping an individual with CRC into a good survival group or a poor survival group.
  • good survival refers to an increased chance of survival as compared to patients in the "poor survival” group.
  • the biomarkers of the application can prognose or classify patients into a "good survival group.” These patients are at a lower risk of death after surgery.
  • poor survival refers to an increased risk of death as compared to patients in the "good survival” group.
  • biomarkers or genes of the application can prognose or classify patients into a "poor survival group.” These patients are at greater risk of death from surgery.
  • the biomarker reference expression profile comprises a poor survival group. In another embodiment, the biomarker reference expression profile comprises a good survival group.
  • subject refers to any member of the animal kingdom, preferably a human being, that has CRC or that is suspected of having CRC.
  • CRC patients are classified into stages, which are used to determine therapy.
  • Staging classification testing may include any or all of history, physical examination, routine laboratory evaluations, x-rays, and computed tomography scans or positron emission tomography scans with infusion of contrast materials.
  • BRAF mutant-specific inhibitor refers to a substance that decreases the activity and/or expression of a BRAF mutant protein, but that does not substantially decrease the activity and/or expression of wild type BRAF.
  • Such inhibitors include small molecules, antibodies, and antisense molecules.
  • BRAF mutant proteins include those with mutations as compared with the wild type sequence.
  • a DNA missense mutation leading to a valine to glutamic acid amino acid substitution (V600E) is the most frequent BRAF mutation observed, and functionally the most important involved in the aberrant activation of the MEK-ERK pathway and CRC carcinogenesis.
  • BRAF mutations include R461 I, I462S, G463E, G463V, G465A, G465E, G465V, G468A, G468E, N580S, E585K, D593V, F594L, G595R, L596V, T598I, V599D, V599E, V599K, V599R, K600E, A727V. Most of such mutations are clustered in two regions: the glycine-rich P loop of the N lobe, and the activation segment and flanking regions.
  • BRAF mutant-specific inhibitors currently in development include, without limitation, compounds such as PLX-4720 (Plexxikon), PLX-4032 (Plexxikon), XL-281 (Exelixis), GSK-21 18436 (Glaxo Smith Kline).
  • BRAF mutant-like refers to a classification of subjects with CRC as predicted by the gene signatures disclosed herein, where subjects with CRC that are classified as "BRAF mutant-like" are those expected to possess at least one BRAF mutation, and/or are expected to respond to adjuvant chemotherapy in a manner that is similar to subjects with CRC who have BRAF mutations and/or possess mutations that result in the aberrant activation of the MEK-ERK pathway and are thus expected to exhibit poor survival when treated with adjuvant chemotherapy.
  • subjects with CRC that have at least one BRAF mutation are generally expected to show a poor response to adjuvant chemotherapy.
  • subjects with CRC that are BRAF mutant-like have a poor survival outcome.
  • WT2 refers to a classification of subjects with CRC as predicted by the gene signatures disclosed herein, where subjects with CRC that are classified as “WT2" or “wild-type” are those expected to be wild type for both BRAF and KRAS genes (i.e. have no mutations in either BRAF or KRAS genes), and/or are expected to respond to adjuvant chemotherapy in a manner that is similar to subjects with CRC who are wild type for both BRAF and KRAS genes. Subjects with CRC that are wild type for both BRAF and KRAS genes are generally expected to show a good response to adjuvant chemotherapy and have a good survival outcome.
  • a multi-gene signature is prognostic of patient outcome and/or response to adjuvant chemotherapy.
  • the present disclosure provides prognostic signatures that are stage-independent classifiers.
  • a 39 gene pair or 32 gene pair signature is provided as described herein.
  • the signature comprises reference values for each of the 39 gene pairs listed in Table 3.1 .1 , or the 32 gene pairs listed in Table 3.1 .2.
  • this gene signature is prognostic of patient outcome and/or response to adjuvant chemotherapy.
  • the gene pairs listed in Table 3.1.1 or Table 3.1 .2 are used in a "top scoring pair" algorithm/method to predict whether or not a patient is classified as "BRAF mutant-like".
  • Table 3.1 .1 and Table 3.1 .2 lists pairs of genes, where the first gene in the pair is the "Genel " gene, and the second gene in the pair is the "Gene2" gene.
  • a single gene pair can be analyzed according to the top scoring pair method by comparing the relative gene expression value of a Genel gene in Table 3.1 .1 or Table 3.1 .2 with the relative gene expression value of the second gene in the pair (i.e. Gene2). If the Genel value of this gene pair is less than the Gene2 value, then the method predicts BRAF mutant-like status. If the Genel value of this gene pair is greater than or equal to the Gene2 value, then the method predicts wild-type ("WT2”) status.
  • WT2 wild-type
  • the average value of all the Genel values can be compared to the average value of all the Gene2 values. Accordingly, if the average Genel value is less than the average Gene2 value, then the method predicts BRAF mutant-like status. For example, as described in Example 3, when using all 39 gene pairs, the average relative expression value of all the Genel genes in Table 3.1 .1 can be compared to the average relative expression value of all the Gene2 genes in Table 3.1.1. If the average Genel value is less than the average Gene2 value, then the top scoring pair method predicts BRAF mutant-like.
  • this method could be applied, for example, using relative expression levels of any number of the gene pairs from Table 3.1 .1 , for example, less than 39 pairs, less than 30 pairs, less than 25 pairs, less than 20 pairs, less than 15 pairs, less than 10 pairs, less than 5 pairs, or less than 4, less than 3, or less than 2 pairs.
  • test sample refers to any cancer-affected fluid, cell or tissue sample from a subject which can be assayed for biomarker expression products and/or a reference expression profile, e.g., genes differentially expressed in subjects with CRC according to survival outcome.
  • RNA includes mRNA transcripts, and/or specific spliced variants of mRNA.
  • RNA product of the biomarker refers to RNA transcripts transcribed from the biomarkers and/or specific spliced variants.
  • protein it refers to proteins translated from the RNA transcripts transcribed from the biomarkers.
  • protein product of the biomarker or “biomarker protein” refers to proteins translated from RNA products of the biomarkers.
  • RNA products of the biomarkers within a sample, including arrays, such as microarrays, RT-PCR (including quantitative PCR), nuclease protection assays and Northern blot analyses.
  • arrays such as microarrays, RT-PCR (including quantitative PCR), nuclease protection assays and Northern blot analyses.
  • RT-PCR including quantitative PCR
  • nuclease protection assays and Northern blot analyses.
  • Any analytical procedure capable of permitting specific and quantifiable (or semi-quantifiable) detection of the genes described here and, optionally, additional biomarkers may be used in the methods herein presented, such as the microarray methods set forth herein, and methods known to those skilled in the art.
  • the biomarker expression levels are determined using arrays, optionally microarrays, RT-PCR, optionally quantitative RT-PCR, nuclease protection assays or Northern blot analyses.
  • the biomarker expression levels are determined by using an array.
  • cDNA microarrays consist of multiple (usually thousands) of different cDNA probes spotted (usually using a robotic spotting device) onto known locations on a solid support, such as a glass microscope slide.
  • Microarrays for use in the methods described herein comprise a solid substrate onto which the probes are covalently or non-covalently attached.
  • the cDNAs are typically obtained by PCR amplification of plasmid library inserts using primers complementary to the vector backbone portion of the plasmid or to the gene itself for genes where sequence is known.
  • PCR products suitable for production of microarrays are typically between 0.5 and 2.5 kB in length.
  • RNA either total RNA or poly A RNA
  • labeling is usually performed during reverse transcription by incorporating a labeled nucleotide in the reaction mixture.
  • a microarray is then hybridized with labeled RNA, and relative expression levels calculated based on the relative concentrations of cDNA molecules that hybridized to the cDNAs represented on the microarray.
  • Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols, such as by using Affymetrix GeneChip technology, Agilent Technologies cDNA microarrays, lllumina Whole-Genome DASL array assays, or any other comparable microarray technology.
  • probes capable of hybridizing to one or more biomarker RNAs or cDNAs are attached to the substrate at a defined location ("addressable array"). Probes can be attached to the substrate in a wide variety of ways, as will be appreciated by those in the art. In some embodiments, the probes are synthesized first and subsequently attached to the substrate. In other embodiments, the probes are synthesized on the substrate. In some embodiments, probes are synthesized on the substrate surface using techniques such as photo-polymerization and photolithography.
  • microarrays are utilized in a RNA-primed, Array-based Klenow Enzyme ("RAKE") assay.
  • RAKE RNA-primed, Array-based Klenow Enzyme
  • the DNA probes comprise a base sequence that is complementary to a target RNA of interest, such as one or more biomarker RNAs capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the genes described in Example 3 under standard hybridization conditions.
  • a target RNA of interest such as one or more biomarker RNAs capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the genes described in Example 3 under standard hybridization conditions.
  • the addressable array comprises DNA probes for no more than the 78 genes listed in Table 3.1.1 , or the 64 genes listed in Table 3.1 .2, or the genes listed in Table 3.2, or those listed in Example 3.2.2. In some embodiments, the addressable array comprises DNA probes for each of the 78 genes listed in Table 3.1.1 , or each of the 64 genes listed in Table 3.1.2, or each of the genes listed in Table 3.2, or each of the genes listed in Example 3.2.2.
  • the addressable array comprises DNA probes for each of the 78 genes listed in Table 3.1.1 , or for each of the 64 genes listed in Table 3.1.2, or the genes listed in Table 3.2, or those listed in Example 3.2.2.
  • expression data are pre-processed to correct for variations in sample preparation or other non-experimental variables affecting expression measurements.
  • background adjustment, quantile adjustment, and summarization may be performed on microarray data, using standard software programs such as RMAexpress v0.3, followed by centering of the data to the mean and scaling to the standard deviation.
  • the sample After the sample is hybridized to the array, it is exposed to exonuclease I to digest any unhybridized probes.
  • the Klenow fragment of DNA polymerase I is then applied along with biotinylated dATP, allowing the hybridized biomarker RNAs to act as primers for the enzyme with the DNA probe as template.
  • the slide is then washed and a streptavidin-conjugated fluorophore is applied to detect and quantitate the spots on the array containing hybridized and Klenow-extended biomarker RNAs from the sample.
  • the RNA sample is reverse transcribed using a biotin/poly-dA random octamer primer.
  • the RNA template is digested and the biotin- containing cDNA is hybridized to an addressable microarray with bound probes that permit specific detection of biomarker RNAs.
  • the microarray includes at least one probe comprising at least 8, at least 9, at least 10, at least 1 1 , at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, even at least 20, 21 , 22, 23, or 24 contiguous nucleotides identically present in each of the genes listed in Table 3.1.1 or Table 3.1.2, or each of the genes listed in Table 3.2, or each of the genes listed in Example 3.2.2.
  • a streptavidin-bound detectable marker such as a fluorescent dye
  • the array is a U133A chip from Affymetrix.
  • a plurality of nucleic acid probes that are complementary or hybridizable to an expression product of the genes listed in Table 3.1 .1 , or Table 3.1.2, or the genes listed in Table 3.2, or the genes listed in Example 3.2.2, are used on the array.
  • a plurality of nucleic acid probes that are complementary or hybridizable to an expression product of some or all the genes described in Example 3 are used on the array.
  • nucleic acid includes DNA and RNA and can be either double stranded or single stranded.
  • hybridize or “hybridizable” refers to the sequence specific non- covalent binding interaction with a complementary nucleic acid.
  • the hybridization is under high stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, 6. OX sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2.0XSSC at 50°C may be employed.
  • SSC sodium chloride/sodium citrate
  • probe refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence.
  • the probe hybridizes to an RNA product of the biomarker or a nucleic acid sequence complementary thereof.
  • the length of probe depends on the hybridization conditions and the sequences of the probe and nucleic acid target sequence. In one embodiment, the probe is at least 8, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length.
  • compositions that comprise at least one biomarker or target RNA-specific probe.
  • target RNA-specific probe encompasses probes that have a region of contiguous nucleotides having a sequence that is either (i) identically present in one of the genes described in Example 3, or (ii) complementary to the sequence of a region of contiguous nucleotides found in one of the genes described in Example 3, where "region” can comprise the full length sequence of any one of the genes described in Example 3, a complementary sequence of the full length sequence of any one of the genes described in Example 3, or a subsequence thereof.
  • target RNA-specific probes consist of deoxyribonucleotides. In other embodiments, target RNA-specific probes consist of both deoxyribonucleotides and nucleotide analogs. In some embodiments, biomarker RNA-specific probes comprise at least one nucleotide analog which increases the hybridization binding energy. In some embodiments, a target RNA-specific probe in the compositions described herein binds to one biomarker RNA in the sample.
  • more than one probe specific for a single biomarker RNA is present in the compositions, the probes capable of binding to overlapping or spatially separated regions of the biomarker RNA.
  • the compositions described herein are designed to hybridize to cDNAs reverse transcribed from biomarker RNAs
  • the composition comprises at least one target RNA-specific probe comprising a sequence that is identically present in a biomarker RNA (or a subsequence thereof).
  • a biomarker RNA is capable of specifically hybridizing to at least one probe comprising a base sequence that is identically present in one of the genes described in Example 3. In some embodiments, a biomarker RNA is capable of specifically hybridizing to at least one nucleic acid probe comprising a sequence that is identically present in one of the genes described in Example 3.
  • the composition comprises a plurality of target or biomarker RNA-specific probes each comprising a region of contiguous nucleotides comprising a base sequence that is identically present in one or more of the genes described in Example 3, or in a subsequence thereof.
  • the terms “complementary” or “partially complementary” to a biomarker or target RNA (or target region thereof), and the percentage of “complementarity” of the probe sequence to that of the biomarker RNA sequence is the percentage “identity” to the reverse complement of the sequence of the biomarker RNA.
  • the degree of “complementarity” is expressed as the percentage identity between the sequence of the probe (or region thereof) and the reverse complement of the sequence of the biomarker RNA that best aligns therewith. The percentage is calculated by counting the number of aligned bases that are identical as between the two sequences, dividing by the total number of contiguous nucleotides in the probe, and multiplying by 100.
  • the microarray comprises probes comprising a region with a base sequence that is fully complementary to a target region of a biomarker RNA. In other embodiments, the microarray comprises probes comprising a region with a base sequence that comprises one or more base mismatches when compared to the sequence of the best-aligned target region of a biomarker RNA.
  • a "region" of a probe or biomarker RNA may comprise or consist of 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29 or more contiguous nucleotides from a particular gene or a complementary sequence thereof.
  • the region is of the same length as the probe or the biomarker RNA. In other embodiments, the region is shorter than the length of the probe or the biomarker RNA.
  • the microarray comprises 78 probes each comprising a region of at least 10 contiguous nucleotides, such as at least 1 1 contiguous nucleotides, such as at least 13 contiguous nucleotides, such as at least 14 contiguous nucleotides, such as at least 15 contiguous nucleotides, such as at least 16 contiguous nucleotides, such as at least 17 contiguous nucleotides, such as at least 18 contiguous nucleotides, such as at least 19 contiguous nucleotides, such as at least 20 contiguous nucleotides, such as at least 21 contiguous nucleotides, such as at least 22 contiguous nucleotides, such as at least 23 contiguous nucleotides, such as at least 24 contiguous nucleotides, such as at least 25 contiguous nucleotides with a base sequence that is identically present in one of the genes described in Table 3.1 .1 , or
  • the biomarker expression levels are determined by using quantitative RT-PCR.
  • RT-PCR is one of the most sensitive, flexible, and quantitative methods for measuring expression levels.
  • the first step is the isolation of mRNA from a target sample.
  • the starting material is typically total RNA isolated from human tumors or tumor cell lines.
  • General methods for mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995).
  • RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions.
  • a purification kit such as Qiagen
  • Qiagen RNeasy mini-columns. Numerous RNA isolation kits are commercially available.
  • the primers used for quantitative RT-PCR comprise a forward and reverse primer for each gene listed in Table 3.1.1 , or Table 3.1 .2.
  • the analytical method used for detecting at least one biomarker RNA in the methods set forth herein includes real-time quantitative RT-PCR.
  • PCR can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5' proofreading endonuclease activity.
  • RT-PCR is done using a TaqManTM assay sold by Applied Biosystems, Inc. In a first step, total RNA is isolated from the sample.
  • the assay can be used to analyze about 10 ng of total RNA input sample, such as about 9 ng of input sample, such as about 8 ng of input sample, such as about 7 ng of input sample, such as about 6 ng of input sample, such as about 5 ng of input sample, such as about 4 ng of input sample, such as about 3 ng of input sample, such as about 2 ng of input sample, and even as little as about 1 ng of input sample containing RNA.
  • the TaqManTM assay utilizes a stem-loop primer that is specifically complementary to the 3'-end of a biomarker RNA.
  • the step of hybridizing the stem-loop primer to the biomarker RNA is followed by reverse transcription of the biomarker RNA template, resulting in extension of the 3' end of the primer.
  • the result of the reverse transcription step is a chimeric (DNA) amplicon with the step-loop primer sequence at the 5' end of the amplicon and the cDNA of the biomarker RNA at the 3' end.
  • Quantitation of the biomarker RNA is achieved by RT-PCR using a universal reverse primer comprising a sequence that is complementary to a sequence at the 5' end of all stem-loop biomarker RNA primers, a biomarker RNA-specific forward primer, and a biomarker RNA sequence-specific TaqManTM probe.
  • the assay uses fluorescence resonance energy transfer ("FRET") to detect and quantitate the synthesized PCR product.
  • the TaqManTM probe comprises a fluorescent dye molecule coupled to the 5'-end and a quencher molecule coupled to the 3'-end, such that the dye and the quencher are in close proximity, allowing the quencher to suppress the fluorescence signal of the dye via FRET.
  • FRET fluorescence resonance energy transfer
  • the polymerase replicates the chimeric amplicon template to which the TaqManTM probe is bound
  • the 5'- nuclease of the polymerase cleaves the probe, decoupling the dye and the quencher so that FRET is abolished and a fluorescence signal is generated. Fluorescence increases with each RT-PCR cycle proportionally to the amount of probe that is cleaved.
  • quantitation of the results of RT-PCR assays is done by constructing a standard curve from a nucleic acid of known concentration and then extrapolating quantitative information for biomarker RNAs of unknown concentration.
  • the nucleic acid used for generating a standard curve is an RNA of known concentration.
  • the nucleic acid used for generating a standard curve is a purified double-stranded plasmid DNA or a single-stranded DNA generated in vitro.
  • Ct cycle threshold, e.g., the number of PCR cycles required for the fluorescence signal to rise above background
  • Ct values are inversely proportional to the amount of nucleic acid target in a sample.
  • Ct values of the target RNA of interest can be compared with a control or calibrator, such as RNA from normal tissue.
  • the Ct values of the calibrator and the target RNA samples of interest are normalized to an appropriate endogenous housekeeping gene (see above).
  • RT-PCR chemistries useful for detecting and quantitating PCR products in the methods presented herein include, but are not limited to, Molecular Beacons, Scorpion probes and SYBR Green detection.
  • Molecular Beacons can be used to detect and quantitate PCR products. Like TaqManTM probes, Molecular Beacons use FRET to detect and quantitate a PCR product via a probe comprising a fluorescent dye and a quencher attached at the ends of the probe. Unlike TaqManTM probes, Molecular Beacons remain intact during the PCR cycles. Molecular Beacon probes form a stem-loop structure when free in solution, thereby allowing the dye and quencher to be in close enough proximity to cause fluorescence quenching. When the Molecular Beacon hybridizes to a target, the stem-loop structure is abolished so that the dye and the quencher become separated in space and the dye fluoresces. Molecular Beacons are available, e.g., from Gene LinkTM.
  • Scorpion probes can be used as both sequence-specific primers and for PCR product detection and quantitation. Like Molecular Beacons, Scorpion probes form a stem-loop structure when not hybridized to a target nucleic acid. However, unlike Molecular Beacons, a Scorpion probe achieves both sequence-specific priming and PCR product detection. A fluorescent dye molecule is attached to the 5'- end of the Scorpion probe, and a quencher is attached to the 3'-end. The 3' portion of the probe is complementary to the extension product of the PCR primer, and this complementary portion is linked to the 5'-end of the probe by a non-amplifiable moiety.
  • Scorpion probes are available from, e.g., Premier Biosoft International.
  • RT-PCR detection is performed specifically to detect and quantify the expression of a single biomarker RNA.
  • the biomarker RNA in typical embodiments, is selected from a biomarker RNA capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the genes described in Example 3.
  • the biomarker RNA specifically hybridizes to a nucleic acid comprising a sequence that is identically present in at least one of the genes in Table 3.1 .1 , or Table 3.1.2.
  • the biomarker RNA specifically hybridizes to a nucleic acid comprising a sequence that is identically present in at least one of the genes in Table 3.2 or in Example 3.2.2.
  • RT-PCR detection is utilized to detect, in a single multiplex reaction, each of 78 biomarker RNAs.
  • the biomarker RNAs in some embodiments, are capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the 78 genes listed in Table 3.1 .1 , or Table 3.1 .2.
  • RT-PCR detection is utilized to detect, in a single multiplex reaction, RNAs corresponding to each of the biomarkers listed in Table 3.2, or in Example 3.2.2.
  • a plurality of probes such as TaqManTM probes, each specific for a different RNA target, is used.
  • each target RNA-specific probe is spectrally distinguishable from the other probes used in the same multiplex reaction.
  • quantitation of RT-PCR products is accomplished using a dye that binds to double-stranded DNA products, such as SYBR Green.
  • the assay is the QuantiTect SYBR Green PCR assay from Qiagen.
  • total RNA is first isolated from a sample.
  • Total RNA is subsequently poly- adenylated at the 3'-end and reverse transcribed using a universal primer with poly-dT at the 5'-end.
  • a single reverse transcription reaction is sufficient to assay multiple biomarker RNAs.
  • RT-PCR is then accomplished using biomarker RNA-specific primers and a miScript Universal Primer, which comprises a poly-dT sequence at the 5'-end.
  • SYBR Green dye binds non-specifically to double-stranded DNA and upon excitation, emits light.
  • buffer conditions that promote highly-specific annealing of primers to the PCR template e.g., available in the QuantiTect SYBR Green PCR Kit from Qiagen
  • the signal from SYBR green increases, allowing quantitation of specific products.
  • RT-PCR is performed using any RT-PCR instrumentation available in the art.
  • instrumentation used in real-time RT-PCR data collection and analysis comprises a thermal cycler, optics for fluorescence excitation and emission collection, and optionally a computer and data acquisition and analysis software.
  • the method of detectably quantifying one or more biomarker RNAs includes the steps of: (a) isolating total RNA; (b) reverse transcribing a biomarker RNA to produce a cDNA that is complementary to the biomarker RNA; (c) amplifying the cDNA from step (b); and (d) detecting the amount of a biomarker RNA with RT-PCR.
  • the RT-PCR detection is performed using a FRET probe, which includes, but is not limited to, a TaqManTM probe, a Molecular beacon probe and a Scorpion probe.
  • a FRET probe which includes, but is not limited to, a TaqManTM probe, a Molecular beacon probe and a Scorpion probe.
  • the RT-PCR detection and quantification is performed with a TaqManTM probe, i.e., a linear probe that typically has a fluorescent dye covalently bound at one end of the DNA and a quencher molecule covalently bound at the other end of the DNA.
  • the FRET probe comprises a base sequence that is complementary to a region of the cDNA such that, when the FRET probe is hybridized to the cDNA, the dye fluorescence is quenched, and when the probe is digested during amplification of the cDNA, the dye is released from the probe and produces a fluorescence signal.
  • the amount of biomarker RNA in the sample is proportional to the amount of fluorescence measured during cDNA amplification.
  • the TaqManTM probe typically comprises a region of contiguous nucleotides comprising a base sequence that is complementary to a region of a biomarker RNA or its complementary cDNA that is reverse transcribed from the biomarker RNA template (i.e., the sequence of the probe region is complementary to or identically present in the biomarker RNA to be detected) such that the probe is specifically hybridizable to the resulting PCR amplicon.
  • the probe comprises a region of at least 6 contiguous nucleotides having a base sequence that is fully complementary to or identically present in a region of a cDNA that has been reverse transcribed from a biomarker RNA template, such as comprising a region of at least 8 contiguous nucleotides, or comprising a region of at least 10 contiguous nucleotides, or comprising a region of at least 12 contiguous nucleotides, or comprising a region of at least 14 contiguous nucleotides, or even comprising a region of at least 16 contiguous nucleotides having a base sequence that is complementary to or identically present in a region of a cDNA reverse transcribed from a biomarker RNA to be detected.
  • the region of the cDNA that has a sequence that is complementary to the TaqManTM probe sequence is at or near the center of the cDNA molecule.
  • all biomarker RNAs are detected in a single multiplex reaction.
  • each TaqManTM probe that is targeted to a unique cDNA is spectrally distinguishable when released from the probe.
  • each biomarker RNA is detected by a unique fluorescence signal.
  • expression levels may be represented by gene transcript numbers per nanogram of cDNA.
  • RT-PCR data can be subjected to standardization and normalization against one or more housekeeping genes as has been previously described. See, e.g., Rubie et al., Mol. Cell. Probes 19(2):101-9 (2005).
  • Appropriate genes for normalization in the methods described herein include those as to which the quantity of the product does not vary between different cell types, cell lines or under different growth and sample preparation conditions.
  • endogenous housekeeping genes useful as normalization controls in the methods described herein include, but are not limited to, ACTB, BAT1 , B2M, TBP, U6 snRNA, RNU44, RNU 48, and U47.
  • the at least one endogenous housekeeping gene for use in normalizing the measured quantity of RNA is selected from ACTB, BAT1 , B2M, TBP, U6 snRNA, U6 snRNA, RNU44, RNU 48, and U47.
  • normalization to the geometric mean of two, three, four or more housekeeping genes is performed.
  • one housekeeping gene is used for normalization.
  • two, three, four or more housekeeping genes are used for normalization.
  • labels that can be used on the FRET probes include colorimetric and fluorescent labels such as Alexa Fluor dyes, BODIPY dyes, such as BODIPY FL; Cascade Blue; Cascade Yellow; coumarin and its derivatives, such as 7- amino-4-methylcoumarin, aminocoumarin and hydroxycoumarin; cyanine dyes, such as Cy3 and Cy5; eosins and erythrosins; fluorescein and its derivatives, such as fluorescein isothiocyanate; macrocyclic chelates of lanthanide ions, such as Quantum DyeTM; Marina Blue; Oregon Green; rhodamine dyes, such as rhodamine red, tetramethylrhodamine and rhodamine 6G; Texas Red; fluorescent energy transfer dyes, such as thiazole orange-ethidium heterodimer; and, TOTAB.
  • Alexa Fluor dyes such as Alexa Fluor dyes, BODIPY dyes, such as
  • dyes include, but are not limited to, those identified above and the following: Alexa Fluor 350, Alexa Fluor 405, Alexa Fluor 430, Alexa Fluor 488, Alexa Fluor 500. Alexa Fluor 514, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 610, Alexa Fluor 633, Alexa Fluor 647, Alexa Fluor 660, Alexa Fluor 680, Alexa Fluor 700, and, Alexa Fluor 750; amine-reactive BODIPY dyes, such as BODIPY 493/503, BODIPY 530/550, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591 , BODIPY 630/650, BODIPY 650/655, BODIPY FL, BODIPY R6G, BODIPY TMR, and, BOD
  • fluorescently labeled ribonucleotides useful in the preparation of RT-PCR probes for use in some embodiments of the methods described herein are available from Molecular Probes (Invitrogen), and these include, Alexa Fluor 488-5-UTP, Fluorescein-12-UTP, BODIPY FL-14-UTP, BODIPY TMR-14-UTP, Tetramethylrhodamine-6-UTP, Alexa Fluor 546-14-UTP, Texas Red-5-UTP, and BODIPY TR-14-UTP.
  • Other fluorescent ribonucleotides are available from Amersham Biosciences (GE Healthcare), such as Cy3-UTP and Cy5-UTP.
  • Examples of fluorescently labeled deoxyribonucleotides useful in the preparation of RT-PCR probes for use in the methods described herein include Dinitrophenyl (DNP)- r-dUTP, Cascade Blue-7-dUTP, Alexa Fluor 488-5-dUTP, Fluorescein-12-dUTP, Oregon Green 488-5-dUTP, BODIPY FL-14-dUTP, Rhodamine Green-5-dUTP, Alexa Fluor 532-5-dUTP, BODIPY TMR-14-dUTP, Tetramethylrhodamine-6-dUTP, Alexa Fluor 546-14-dUTP, Alexa Fluor 568-5-dUTP, Texas Red-12-dUTP, Texas Red-5-dUTP, BODIPY TR-14-dUTP, Alexa Fluor 594-5-dUTP, BODIPY 630/650-14-dUTP, BODIPY 650/665-14-dUTP; Alexa Fluor 488-7
  • dyes and other moieties are introduced into nucleic acids used in the methods described herein, such as FRET probes, via modified nucleotides.
  • a "modified nucleotide” refers to a nucleotide that has been chemically modified, but still functions as a nucleotide.
  • the modified nucleotide has a chemical moiety, such as a dye or quencher, covalently attached, and can be introduced into an oligonucleotide, for example, by way of solid phase synthesis of the oligonucleotide.
  • the modified nucleotide includes one or more reactive groups that can react with a dye or quencher before, during, or after incorporation of the modified nucleotide into the nucleic acid.
  • the modified nucleotide is an amine-modified nucleotide, i.e., a nucleotide that has been modified to have a reactive amine group.
  • the modified nucleotide comprises a modified base moiety, such as uridine, adenosine, guanosine, and/or cytosine.
  • the amine-modified nucleotide is selected from 5-(3-aminoallyl)-UTP; 8-[(4-amino)butyl]-amino-ATP and 8-[(6- amino)butyl]-amino-ATP; N6-(4-amino)butyl-ATP, N6-(6-amino)butyl-ATP, N4-[2,2-oxy- bis-(ethylamine)]-CTP; N6-(6-Amino)hexyl-ATP; 8-[(6-Amino)hexyl]-amino-ATP; 5- propargylamino-CTP, 5-propargylamino-UTP.
  • nucleotides with different nucleobase moieties are similarly modified, for example, 5-(3-aminoallyl)-GTP instead of 5-(3-aminoallyl)-UTP.
  • Many amine modified nucleotides are commercially available from, e.g., Applied Biosystems, Sigma, Jena Bioscience and TriLink.
  • the methods of detecting at least one biomarker RNA described herein employ one or more modified oligonucleotides, such as oligonucleotides comprising one or more affinity-enhancing nucleotides.
  • modified oligonucleotides useful in the methods described herein include primers for reverse transcription, PCR amplification primers, and probes.
  • the incorporation of affinity-enhancing nucleotides increases the binding affinity and specificity of an oligonucleotide for its target nucleic acid as compared to oligonucleotides that contain only deoxyribonucleotides, and allows for the use of shorter oligonucleotides or for shorter regions of complementarity between the oligonucleotide and the target nucleic acid.
  • affinity-enhancing nucleotides include nucleotides comprising one or more base modifications, sugar modifications and/or backbone modifications.
  • modified bases for use in affinity-enhancing nucleotides include 5-methylcytosine, isocytosine, pseudoisocytosine, 5-bromouracil, 5- propynyluracil, 6-aminopurine, 2-aminopurine, inosine, diaminopurine, 2-chloro-6- aminopurine, xanthine and hypoxanthine.
  • affinity-enhancing modifications include nucleotides having modified sugars such as 2'-substituted sugars, such as 2'-0-alkyl-ribose sugars, 2'-amino-deoxyribose sugars, 2'-fluoro- deoxyribose sugars, 2'-fluoro-arabinose sugars, and 2'-0-methoxyethyl-ribose (2'MOE) sugars.
  • modified sugars are arabinose sugars, or d-arabino-hexitol sugars.
  • affinity-enhancing modifications include backbone modifications such as the use of peptide nucleic acids (e.g., an oligomer including nucleobases linked together by an amino acid backbone).
  • backbone modifications include phosphorothioate linkages, phosphodiester modified nucleic acids, combinations of phosphodiester and phosphorothioate nucleic acid, methylphosphonate, alkylphosphonates, phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidates, carboxymethyl esters, methylphosphorothioate, phosphorodithioate, p-ethoxy, and combinations thereof.
  • the oligomer includes at least one affinity-enhancing nucleotide that has a modified base, at least nucleotide (which may be the same nucleotide) that has a modified sugar and at least one internucleotide linkage that is non-naturally occurring.
  • the affinity-enhancing nucleotide contains a locked nucleic acid ("LNA") sugar, which is a bicyclic sugar.
  • an oligonucleotide for use in the methods described herein comprises one or more nucleotides having an LNA sugar.
  • the oligonucleotide contains one or more regions consisting of nucleotides with LNA sugars.
  • the oligonucleotide contains nucleotides with LNA sugars interspersed with deoxy ribonucleotides.
  • primer refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH).
  • the primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent.
  • the exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used.
  • a primer typically contains 15-25 or more nucleotides, although it can contain less. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art.
  • a person skilled in the art will appreciate that a number of methods can be used to determine the amount of a protein product of the biomarker of the disclosure, including immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
  • immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
  • an antibody is used to detect the polypeptide products of the 78 biomarkers listed in Table 3.1.1 , or Table 3.1 .2.
  • the sample comprises a tissue sample.
  • the tissue sample is suitable for immunohistochemistry.
  • antibody as used herein is intended to include monoclonal antibodies, polyclonal antibodies, and chimeric antibodies. The antibody may be from recombinant sources and/or produced in transgenic animals.
  • antibody fragment as used herein is intended to include Fab, Fab', F(ab')2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, and multimers thereof and bispecific antibody fragments.
  • Antibodies can be fragmented using conventional techniques. For example, F(ab')2 fragments can be generated by treating the antibody with pepsin. The resulting F(ab')2 fragment can be treated to reduce disulfide bridges to produce Fab' fragments.
  • Papain digestion can lead to the formation of Fab fragments.
  • Fab, Fab' and F(ab')2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, bispecific antibody fragments and other fragments can also be synthesized by recombinant techniques.
  • antibodies having specificity for a specific protein may be prepared by conventional methods.
  • a mammal e.g., a mouse, hamster, or rabbit
  • an immunogenic form of the peptide which elicits an antibody response in the mammal.
  • Techniques for conferring immunogenicity on a peptide include conjugation to carriers or other techniques well known in the art.
  • the peptide can be administered in the presence of adjuvant.
  • the progress of immunization can be monitored by detection of antibody titers in plasma or serum. Standard ELISA or other immunoassay procedures can be used with the immunogen as antigen to assess the levels of antibodies.
  • antisera can be obtained and, if desired, polyclonal antibodies isolated from the sera.
  • antibody producing cells can be harvested from an immunized animal and fused with myeloma cells by standard somatic cell fusion procedures thus immortalizing these cells and yielding hybridoma cells.
  • myeloma cells can be harvested from an immunized animal and fused with myeloma cells by standard somatic cell fusion procedures thus immortalizing these cells and yielding hybridoma cells.
  • hybridoma cells can be screened immunochemically for production of antibodies specifically reactive with the peptide and the monoclonal antibodies can be isolated.
  • recombinant antibodies are provided that specifically bind protein products of the genes described in Example 3.
  • Recombinant antibodies include, but are not limited to, chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, single-chain antibodies and multi-specific antibodies.
  • a chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a murine monoclonal antibody (mAb) and a human immunoglobulin constant region.
  • Single-chain antibodies have an antigen binding site and consist of single polypeptides. They can be produced by techniques known in the art.
  • Multi-specific antibodies are antibody molecules having at least two antigen-binding sites that specifically bind different antigens. Such molecules can be produced by techniques known in the art.
  • Monoclonal antibodies directed against any of the expression products of the genes described in Example 3 can be identified and isolated by screening a recombinant combinatorial immunoglobulin library (e.g., an antibody phage display library) with the polypeptide(s) of interest.
  • Kits for generating and screening phage display libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01 ; and the Stratagene SurfZAP Phage Display Kit, Catalog No. 240612).
  • Humanized antibodies are antibody molecules from non-human species having one or more complementarity determining regions (CDRs) from the non-human species and a framework region from a human immunoglobulin molecule.
  • CDRs complementarity determining regions
  • Humanized monoclonal antibodies can be produced by recombinant DNA techniques known in the art.
  • humanized antibodies can be produced, for example, using transgenic mice which are incapable of expressing endogenous immunoglobulin heavy and light chains genes, but which can express human heavy and light chain genes.
  • the transgenic mice are immunized in the normal fashion with a selected antigen, e.g., all or a portion of a polypeptide corresponding to a protein product.
  • Monoclonal antibodies directed against the antigen can be obtained using conventional hybridoma technology.
  • the human immunoglobulin transgenes harbored by the transgenic mice rearrange during B cell differentiation, and subsequently undergo class switching and somatic mutation. Thus, using such a technique, it is possible to produce therapeutically useful IgG, IgA and IgE antibodies.
  • Antibodies may be isolated after production (e.g., from the blood or serum of the subject) or synthesis and further purified by well-known techniques. For example, IgG antibodies can be purified using protein A chromatography. Antibodies specific for a protein can be selected or (e.g., partially purified) or purified by, e.g., affinity chromatography. For example, a recombinantly expressed and purified (or partially purified) expression product may be produced, and covalently or non-covalently coupled to a solid support such as, for example, a chromatography column.
  • the column can then be used to affinity purify antibodies specific for the protein products of the genes described in Example 3 from a sample containing antibodies directed against a large number of different epitopes, thereby generating a substantially purified antibody composition, i.e., one that is substantially free of contaminating antibodies.
  • a substantially purified antibody composition it is meant, in this context, that the antibody sample contains at most only 30% (by dry weight) of contaminating antibodies directed against epitopes other than those of the protein products of the genes described in Example 3, and preferably at most 20%, yet more preferably at most 10%, and most preferably at most 5% (by dry weight) of the sample is contaminating antibodies.
  • a purified antibody composition means that at least 99% of the antibodies in the composition are directed against the desired protein.
  • substantially purified antibodies may specifically bind to a signal peptide, a secreted sequence, an extracellular domain, a transmembrane or a cytoplasmic domain or cytoplasmic membrane of a protein product of one of the genes described in Example 3.
  • antibodies directed against a protein product of one of the genes described in Example 3 can be used to detect the protein products or fragment thereof (e.g., in a cellular lysate or cell supernatant) in order to evaluate the level and pattern of expression of the protein. Detection can be facilitated by the use of an antibody derivative, which comprises an antibody coupled to a detectable substance.
  • detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials.
  • suitable enzymes include horseradish peroxidase, alkaline phosphatase, ⁇ -galactosidase, or acetylcholinesterase;
  • suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin;
  • suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin;
  • an example of a luminescent material includes luminol;
  • examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 1251, 131 1, 35S or 3H.
  • a variety of techniques can be employed to measure expression levels of each of the products from the 78 genes shown in Table 3.1 .1 or the 64 genes shown in Table 3.1 .2 given a sample that contains protein products that bind to a given antibody.
  • Examples of such formats include, but are not limited to, enzyme immunoassay (EIA), radioimmunoassay (RIA), Western blot analysis and enzyme linked immunoabsorbant assay (ELISA).
  • EIA enzyme immunoassay
  • RIA radioimmunoassay
  • ELISA enzyme linked immunoabsorbant assay
  • antibodies, or antibody fragments or derivatives can be used in methods such as Western blots or immunofluorescence techniques to detect the expressed proteins.
  • either the antibodies or proteins are immobilized on a solid support.
  • Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody.
  • Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros, and magnetite.
  • the support can then be washed with suitable buffers followed by treatment with the detectably labeled antibody.
  • the solid phase support can then be washed with the buffer a second time to remove unbound antibody.
  • the amount of bound label on the solid support can then be detected by conventional means.
  • Immunohistochemistry methods are also suitable for detecting the expression levels of the prognostic markers.
  • antibodies or antisera including polyclonal antisera, and monoclonal antibodies specific for each marker may be used to detect expression.
  • the antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase.
  • unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Immunohistochemistry protocols and kits are well known in the art and are commercially available.
  • Immunological methods for detecting and measuring complex formation as a measure of protein expression using either specific polyclonal or monoclonal antibodies are known in the art. Examples of such techniques include enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RIAs), fluorescence-activated cell sorting (FACS) and antibody arrays. Such immunoassays typically involve the measurement of complex formation between the protein and its specific antibody.
  • Radioisotopes such as 36S, 14C, 1251, 3H, and 131 1.
  • the antibody variant can be labeled with the radioisotope using the techniques described in Current
  • Fluorescent labels such as rare earth chelates (europium chelates) or fluorescein and its derivatives, rhodamine and its derivatives, dansyl, Lissamine, phycoerythrin and Texas Red are available.
  • the fluorescent labels can be conjugated to the antibody variant using techniques well known in the art. Fluorescence can be quantified using a fluorimeter;
  • Various enzyme-substrate labels are available and well known to those skilled in the art.
  • the enzyme generally catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques.
  • the enzyme may catalyze a color change in a substrate, which can be measured spectrophotometrically.
  • the enzyme may alter the fluorescence or chemiluminescence of the substrate. Techniques for quantifying a change in fluorescence are described above.
  • the chemiluminescent substrate becomes electronically excited by a chemical reaction and may then emit light which can be measured (using a chemiluminometer, for example) or donates energy to a fluorescent acceptor.
  • enzymatic labels include luciferases (e.g., firefly luciferase and bacterial luciferase, luciferin, 2,3-dihydrophthalazinediones, malate dehydrogenase, urease, peroxidase such as horseradish peroxidase (HRPO), alkaline phosphatase, .beta.-galactosidase, glucoamylase, lysozyme, saccharide oxidases (e.g., glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase), heterocyclic oxidases (such as uricase and xanthine oxidase), lactoperoxidase, microperoxidase, and the like.
  • luciferases e.g., firefly luciferase and bacterial luciferase, luciferin, 2,3
  • a detection label is indirectly conjugated with the antibody.
  • the antibody can be conjugated with biotin and any of the three broad categories of labels mentioned above can be conjugated with avidin, or vice versa. Biotin binds selectively to avidin and thus, the label can be conjugated with the antibody in this indirect manner.
  • the antibody is conjugated with a small hapten (e.g., digoxin) and one of the different types of labels mentioned above is conjugated with an anti-hapten antibody (e.g., anti-digoxin antibody).
  • the antibody need not be labeled, and the presence thereof can be detected using a labeled antibody, which binds to the antibody.
  • the 39 gene pair signature described herein can be used to select treatment for CRC patients.
  • the biomarkers can classify patients with CRC into a poor survival group or a good survival group and into groups that might benefit from adjuvant chemotherapy or not.
  • adjuvant chemotherapy means treatment of cancer with standard chemotherapeutic agents after surgery where all detectable disease has been removed, but where there still remains a risk of small amounts of remaining cancer.
  • Typical chemotherapeutic agents include cisplatin, carboplatin, vinorelbine, gemcitabine, doccetaxel, paclitaxel and navelbine.
  • Chemotherapeutic agents that are typically used to treat CRC such as 5-fluorouracil, leucovorin, bevacizumab, cetuximab, panitumumab, and oxaliplatin are known to those in the art.
  • kits used to prognose or classify a subject with CRC into a good survival group or a poor survival group or to select a therapy for a subject with CRC that includes detection agents that can detect the expression products of the biomarkers described herein.
  • kits are provided containing antibodies to each of the protein products of the genes described in Example 3, conjugated to a detectable substance, and instructions for use.
  • the kits comprise antibodies to the protein products of the 78 genes (39 gene pairs) listed in Table 3.1 .1 , or the 64 genes listed in Table 3.1.2.
  • Kits may comprise an antibody, an antibody derivative, or an antibody fragment, which binds specifically with a marker protein, or a fragment of the protein.
  • Such kits may also comprise a plurality of antibodies, antibody derivatives, or antibody fragments wherein the plurality of such antibody agents binds specifically with a marker protein, or a fragment of the protein.
  • kits may comprise antibodies such as a labeled or label- able antibody and a compound or agent for detecting protein in a biological sample; means for determining the amount of protein in the sample; means for comparing the amount of protein in the sample with a standard; and instructions for use.
  • kits can be supplied to detect a single protein or epitope or can be configured to detect one of a multitude of epitopes, such as in an antibody detection array. Arrays are described in detail herein for nucleic acid arrays and similar methods have been developed for antibody arrays.
  • a multi-gene signature is provided for prognosis or classifying patients with CRC.
  • a 39-gene pair signature is provided as described in Example 3, comprising reference values for each of the 78 genes based on relative expression data from a historical data set with a known outcome, such as good or poor survival, and/or known treatment, such as adjuvant chemotherapy.
  • relative expression data from a patient are combined with the gene-specific reference values on a gene-by-gene basis for each of genes being assessed, to generate a test value which allows prognosis or therapy recommendation.
  • relative expression data are subjected to an algorithm that yields a single test value, or combined score, which is then compared to a control value obtained from the historical expression data for a patient or pool of patients.
  • control value is a numerical threshold for predicting outcomes, for example good and poor outcome, or making therapy recommendations for a subject, for example adjuvant chemotherapy in addition to surgical resection or surgical resection alone.
  • a test value or combined score greater than the control value is predictive, for example, of a good outcome or benefit from adjuvant chemotherapy, whereas a combined score falling below the control value is predictive, for example, of a poor outcome or lack of benefit from adjuvant chemotherapy for a subject.
  • the test value or combined score can be used to predict BRAFT mutant-like status, as described herein.
  • the application provides computer programs and computer implemented products for carrying out the methods described herein. Accordingly, in one embodiment, the application provides a computer program product for use in conjunction with a computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the methods described herein.
  • the application provides a computer implemented product for predicting a prognosis or classifying a subject with CRC comprising:
  • a a means for receiving values corresponding to a subject expression profile in a subject sample
  • a database comprising a reference expression profile associated with a prognosis, wherein the subject biomarker expression profile and the biomarker reference profile each has 78 values, each value representing the expression level of a biomarker, wherein each biomarker corresponds to one gene in Table 3.1.1 ;
  • the computer implemented product selects the biomarker reference expression profile most similar to the subject biomarker expression profile, to thereby predict a prognosis or classify the subject.
  • the application provides a computer implemented product for determining therapy for a subject with CRC comprising:
  • a a means for receiving values corresponding to a subject expression profile in a subject sample
  • a database comprising a reference expression profile associated with a therapy, wherein the subject biomarker expression profile and the biomarker reference profile each has 78 values, each value representing the expression level of a biomarker, wherein each biomarker corresponds to one gene in Table 3.1 .1 ;
  • the computer implemented product selects the biomarker reference expression profile most similar to the subject biomarker expression profile, to thereby predict the therapy.
  • Another aspect relates to computer readable mediums such as CD-ROMs.
  • the application provides computer readable medium having stored thereon a data structure for storing a computer implemented product described herein.
  • the data structure is capable of configuring a computer to respond to queries based on records belonging to the data structure, each of the records comprising:
  • the application provides a computer system comprising a. a database including records comprising a biomarker reference expression profile of the 78 genes in Table 3.1 .1 associated with a prognosis or therapy;
  • a user interface capable of receiving a selection of gene expression levels of the 78 genes in Table 3.1.1 for use in comparing to the biomarker reference expression profile in the database;
  • the application provides a computer implemented product comprising
  • a a means for receiving values corresponding to relative expression levels in a subject, of the 39 gene pairs in Table 3.1 .1 ;
  • BRAFhi is an indicator variable (C ⁇ 0, 1 ⁇ ) that is obtained by binarizing the BRAF score at 0.5 level; "BRAFhi.
  • BRAF-like refers to samples with a high BRAF score ( ⁇ 0.5);
  • BRAFmut refers to samples with mutation of BRAF, as determined by RT-PCR; it is also the indicator variable for the BRAF mutation status;
  • BRAF score is the score produced by the classifier, which can be interpreted as a posteriori probability (C ⁇ 0, 1 ⁇ );
  • HR is a hazard ratio;
  • KM means Kaplan-Meier;
  • KRASmut means samples with mutation of KRAS, as determined by RT-PCR; it is also the indicator variable, for the KRAS mutation status;
  • MSI/MSS means microsatellite instable/stable;
  • OS means overall survival;
  • RFS means relapse-free survival;
  • Example 1 Data used to generate the models
  • the BRAF signatures described herein were built by modeling a binary classification problem (BRAF mutants vs. non-BRAF and non-KRAS mutants, i.e. WT2) using three different classification algorithms: (multiple) top scoring pairs, compound covariate predictor and AdaBoost. While the signatures were derived from a dataset consisting solely of BRAF mutants and WT2 samples, they have been applied to the full population of patients, in contradiction somehow with the usual modeling paradigm which requires a representative data set for classifier training. Nevertheless, this exercise allows the identification of a larger subpopulation of patients with a consistent gene expression pattern, which is generically called a "BRAF-like" subpopulation.
  • the modeling set consisted of gene expression data from tumor samples from phase 1 and 2 of the PETACC3 study, and were either BRAFmut (all V600E mutants) or WT2.
  • the PETACC3 study was an international, randomized clinical study that involved comparison of infused irinotecan + 5-fluorouracil/folinic acid (5-FU/FA) versus 5-FU/FA in patients with stage II and stage III colon cancer.
  • One important feature of the PETACC3 study was the coordinated collection of formalin fixed, paraffin embedded (FFPE) colon cancer tumor samples.
  • RNA from 1378 FFPE colon cancer samples was extracted for expression profiling on the Affymetrix-based platform Colorectal Cancer Disease Specific Array (DSATM) developed by Almac Diagnostics.
  • DSATM Colorectal Cancer Disease Specific Array
  • the KRASmut were discarded from the modeling phase. To reduce batch effects as much as possible, the data from the two phases was aligned using the 45 bridging samples.
  • Table 1.1 Sample sizes for the training and validation sets, on the modeling data.
  • sensitivity also called the true positive fraction, gives the proportion of "positive” samples (BRAFmut in the present case) that are correctly classified;
  • AUC area under the ROC curve
  • the differential gene expression was assessed using a multivariate linear model (and the limma R package (Gordon, Stat. App. Gen. and Mol. Biol. 3 (2004); Smyth et al., Bioinform. 21 :2067-2075 (2005)).
  • the linear model used adjusted for the effects of KRAS mutation status and for the known interactions with MSI status, and had the form:
  • ARID3A ADXCRAD_BP389511_at -1.48 6.12E-011
  • AIFM3 ADXCRAG_BC032485_s_at -2.21 3.68E-009
  • TIMM8AP ADXCRAD_AI09251 1_at 0.58 1 .18E-007
  • PABPC1 L ADXCRPD.4612.C1_s_at -1 .25 1 .43E-007
  • X be a data matrix with variables by columns (in the present case a gene expression matrix, with genes by columns, samples by rows).
  • the top scoring pairs (TSPs) method seeks a pair of variables i, j such that X k ⁇ X ⁇ for all samples k labeled as (positive class) and Xk > Xkj for all samples k labeled as "0". While in a real life situation, there is no pair of variables to provide a perfect classification, the method ranks the pairs according to the proportion of erroneous predictions they make. Those top ranking pairs are usually considered for making the predictions.
  • Compound covariate predictor (CCP - Radmacher et al., J. Comput. Biol. 9:505- 51 1 (2002)) is another simple classification rule that, in contrast with TSPs, builds a score which is used for making the final prediction.
  • the score for the sample k has the form
  • CCP proposes to use the t-statistic to rank all variables (genes), and use the corresponding statistic as the coefficient in the sum above. Only the top m variables are used in the sum, with m to be tuned via some cross-validation process, for example.
  • a threshold Co must be chosen. The simplest choice is to take
  • variable selection is done with adjustment for MSI status and tumor site.
  • CCP2 uses the linear model for gene ranking (with adjustment for MSI status and tumor site) and takes the averages of positively and negatively associated genes separately:
  • Boosting refers to a general class of methods that produce accurate decision rules by combining rough and slightly better than chance base rules (weak learners). Boosting proceeds by repeatedly training the weak learners on different distributions over the training set. For a given sample, the final prediction is obtained by combining the predictions of the individual weak learners. Different combination approaches can be attempted, but usually a simple weighted majority voting scheme is adopted. Even though the early versions of the boosting algorithm were provably converging to an improved classification rule (with respect to the performance of any of the weak learners), they suffered from serious practical drawbacks. The first practically usable version of boosting was AdaBoost, introduced in 1995 by Freund and Schapire.
  • AdaBoost used in developing the BRAF-gene signature fits a generalized linear model using the boosting algorithm based on univariate linear models as weak learners (Buhlmann and Yu, J. Amer. Stat. Assoc. 98:324-339 (2003)). This algorithm is implemented in the R package mboost available from http://stat.ethz.ch/CRAN/. There are a number of advantages in using AdaBoost, particularly the version
  • the algorithm produces a sparse classifier - in the sense that the number of variables (genes in the present case) in the final model is small when compared with the initial dimensionality of the feature space;
  • AdaBoost AdaBoost will implicitly perform a variable selection as well (selecting those genes that contribute most to the discrimination between classes);
  • AdaBoost is resistant to overfitting, meaning that there is a high probability that the training performance will be reproduced on other independent data sets.
  • the model produced is minimalistic, in the sense that not all genes that could be included in the model are considered. Rather, the minimal set of genes that lead to a good classifier is selected. This means that other genes that are correlated with those in the model could also be considered. However, this strategy would not lead to an improved classification performance and the model would become redundant.
  • the individual TSP predicts "BRAFm" when Genel ⁇ Gene2. From all individual TSPs with a score above 0.6, a number of TSPs were selected such that each gene appears only once. These unique TSPs are averaged (all Genel and all Gene2 values are averaged separately) and the final prediction is: if average of all Genel is less than average of all Gene2 than predict BRAFm.
  • Table 3.1 .1 lists the pairs of genes that were obtained from the modeling set (all PETACC3 samples, pooled), as the final model. Table 3.1.1 The 39 TSPs making up the meta-TSP, as obtained from the modeling set.
  • CCP2 takes the difference between the average of positively associated genes and the average of the negatively associated genes with BRAFm, from a linear model (see Example 3.1 .2).
  • Figures 1 A and B and 2 A and B the AUC and error rates obtained are presented.
  • the final model contains 100 genes, which are provided below:
  • the AdaBoost signature contains 29 genes which are combined through a weighted mean. Table 3.2 lists these genes and the corresponding coefficients.
  • the signature development process has been validated in two stages, using one data batch as a training/modeling set and the other one as an independent validation set.
  • the 45 bridging samples were always considered in the training set (to keep the number of BRAFmut samples at a reasonable level), and their replicates have been removed from the validation set.
  • the performance of the classifier has been estimated by repeated (10 times) stratified 5-fold cross validation.
  • the same performance parameters (area under the ROC curve - AUC, sensitivity, specificity and error rate) were measured on the validation sets. Table 3.3 lists these performance measures.
  • the main criterion for judging the performance of the classifiers was the AUC as it is independent of the classifier threshold and of the prevalence of BRAF mutations. Note that this is only a subset of the full PETACC3 data set, which contains only BRAFmut and WT2, the KRASmut being discarded.
  • Table 3.3 Estimated and validation performance of the BRAF classifiers. For the estimated parameters, the standard deviation of the estimates are given between parentheses. T - train, V - validation; Ph. 1 - phase 1 data, Ph. 2 - phase 2 data. The pooled estimates correspond to the results of repeated cross-validation on the pooled data.
  • KIM data set contains only KRASmut and BRAFmut; 2 out of 1 1 (18.18%) BRAFmut are not V600E mutants (as were those in the modeling set) and they are always classified as non-BRAFmut;
  • CETUX data set originates from an Almac platform, as the one which generated the modeling set - that is why the AdaBoost classifier could be applied as well;
  • CCP2 uses a threshold that is tuned on the modeling set; this threshold is not portable across platforms and that is why only AUC is given for this classifier
  • the scores produced by the AdaBoost classifier can be interpreted as a posteriori probability that a sample belongs to the category "BRAF mutants", so a score of at least 0.5 can be considered as predicting the "BRAF mutants" class. Or, as it will be called later on, "BRAF-like samples”. While the models have been constructed without taking into account the KRAS mutants, they were applied to the whole population, including the KRASes.
  • Figure 3 shows the distribution of BRAF scores as well as the scores for KRASmut (small hashes along the top) and BRAFmut (small hashes along the bottom) samples. Note that all the BRAFmut samples have a high BRAF score ( ⁇ 0.5). Also, there are 96 KRASmut samples out of 248, which have a high BRAF score (see Table 3.5 for details). Table 3.5: AdaBoost Stratification of BRAF scores by mutation status.
  • CCP2 does not produce a posteriori probabilities, but a real value that is to be thresholded to produce the final label.
  • This real value (the difference between the average expression level of positively and negatively associated genes, respectively) can be used as a surrogate for a score. The distribution of these values is shown in Figure 4 along with the scores of the BRAF mutants and KRAS mutants.
  • FIGS in Figure 5 show the overlap between the predictions (agreement of classifiers), for both the BRAF- like samples (those predicted to be BRAF mutants) and WT2-like samples (those predicted to be WT2). Note that the figures do not necessarily add up to those in the clinical table, because of the missing values (even if the BRAF/KRAS status is missing in the clinical table, the sample's status was predicted).
  • Table 3.7 shows such a stratification for the common predictions and classifier-specific predictions. For example, intersection of all three classifier stands for the common predictions made by the three classifiers (the intersection of the three sets in Figure 5). Taking the row BRAF-like/intersection of all three as an example, one can see that out of the 126 samples that were predicted to be "BRAF-like" by all three classifiers (Figure 5), 25 are actually WT2, 36 are BRAF mutants and 56 are KRAS mutants, respectively (9 have missing values).
  • the row BRAF-like/mTSP shows that out of 14 samples that are predicted to be BRAF-like solely by mTSP, 4 are actually WT2 and 10 are KRAS mutants respectively, and so forth.
  • Table 3.7 Stratification of the predictions made by the three classifiers: common and classifier-specific predictions.
  • Table 4.1 The univariate analyses for the 78 genes (39 pairs) in the mTSP signature are given in Table 4.1 .
  • Table 4.1 Hazard rations (HR) and p-values for the 78 genes in the mTSP signature.
  • APCDD1 1.01 1976 0.814317 1 .046970 0.289266 0.946524 0.223520
  • CDX2 0.839908 0.000669 0.934687 0.164696 0.778608 0.000001
  • EPDR1 0.958408 0.505291 1 .023628 0.667766 0.886263 0.046942
  • TBC1 D8 1.139950 0.228368 1 .039978 0.671 127 1.229105 0.037039
  • TPK1 0.816910 0.180659 0.821094 0.122846 0.865062 0.305323
  • VAV3 0.823752 0.008269 0.967883 0.589143 0.760513 0.000076
  • Table 4.2 Hazard ratios (HR) and p-values for the 39 pairs in the mTSP signature. From each pair, a new variable is constructed as the difference between the two genes.
  • TPK1-AMACR 1.1577333 0.0254243 1.0431 1 16 0.4407091 1.1938752 0.0084244
  • AIFM3 0.904398 0.015891 0.937492 0.065816 0.896340 0.005167
  • APCDD1 1.01 1976 0.814317 1 .046970 0.289266 0.946524 0.223520
  • CD109 1.255953 0.004716 1 .161989 0.032736 1.212516 0.004030
  • CDX2 0.839908 0.000669 0.934687 0.164696 0.778608 0.000001
  • DCBLD2 1.377086 0.000231 1 .192181 0.023270 1.226188 0.007232
  • DNAH2 1.121460 0.135174 1 .059443 0.384370 1.137699 0.057982
  • EPDR1 0.958408 0.505291 1 .023628 0.667766 0.886263 0.046942
  • EPHA4 1.349624 0.051575 1 .282245 0.063399 1.160280 0.184862
  • EPHB6 1.196603 0.094872 1 .213457 0.032779 1.055901 0.565234
  • H2AFY2 0.794496 0.028053 0.917900 0.331243 0.939388 0.470710
  • PIWIL1 1.051252 0.337307 1 .040166 0.376955 1.017717 0.686420
  • TIMM8AP1 1.316610 0.037070 1 .135399 0.278472 1.179795 0.218924
  • VAV3 0.823752 0.008269 0.967883 0.589143 0.760513 0.000076
  • VNN1 1.303692 0.000575 1 .069727 0.347998 1.4161 19 0.000001
  • NPTX2 0.969961 0.362040 0.999669 0.990423 0.958906 0.184604
  • NTSR1 1.252817 0.051561 1.057497 0.632747 1.174101 0.092968
  • Table 4.5 (PETACC3 data/mTSP) Hazard ratios (HR) and p-values for predicted and assessed BRAFmut status, produced by Cox proportional harzards model.
  • CETUX samples represent metastatic patients (stage IV) and two endpoints are considered: overall survival (OS) and progression-free survival (PFS).
  • OS overall survival
  • PFS progression-free survival
  • Table 4.8 the results of Cox proportional models analyses are given for the predicted BRAFmut status and for the golden standard (BRAFmut by PCR).
  • the Kaplan-Meier curves for the two endpoints (OS and PFS) are given in Figure 17. Note that the p-values given in the figures correspond to the likelihood ratio test for the differences between the two groups.
  • OS relapse-free survival
  • SAR survival after relapse
  • CETUX samples represent metastatic patients (stage IV) and two endpoints are considered: overall survival (OS) and progression-free survival (PFS).
  • OS overall survival
  • PFS progression-free survival
  • Ada Boost BRAFmut status vs. predicted BRAFmut
  • Figure 18 shows the KM curves for the subpopulations identified by BRAFhi and BRAFmut indicator variables, in the whole patient population.
  • BRAF score and BRAFhi were always significant, with BRAFmut being redundant.
  • Another way of assessing the predictive power of a variable/score is to use the time-dependent ROC curves (Heagerty et al., Biometrics 56:337-344 (2000)). These are a generalization of the usual ROC curves and give an indication of the dichotomization power of the variable/score at a given time point. Nevertheless, the BRAF score and BRAFhi indicator are always better than the BRAFmut status - and they work also in WT2 and KRASmut subgroups.
  • Figure 20 shows the KM curves for the subpopulations identified by BRAFhi and BRAFmut indicator variables, in the whole patient population. 4.4.3 Survival after relapse. Univariate analysis: BRAF score vs. BRAFhi vs. BRAFmut
  • Figure 21 shows the KM curves for the subpopulations identified by BRAFhi and BRAFmut indicator variables, in the whole patient population.
  • the AUCs at 3 years are better than in the case of OS, but they remain below 0.7.
  • Multivariate models Starting with a full model including all the variables (BRAFscore, age, grade, tstage, nstage, site, MSI, KRASm) and their pairwise interactions, and using automatic stepwise variable elimination (with AIC criterion) led to the following model:
  • BRAFhi and MSI status within different subpopulations • whole population: there is a clear difference in SAR between BRAF-high and BRAF-low groups within MSS (p-value 0.000331 ), but not within MSI (however, there are not many MSIs) (see Figure 22A).

Abstract

The application provides methods of prognosing and classifying colon cancer patients into poor survival groups or good survival groups and for determining the benefit of adjuvant chemotherapy by way of a multigene signature. The application also includes kits and computer products for use in the methods of the application.

Description

Prognostic and Predictive Gene Signature for Colon Cancer
This application claims priority to U.S. Provisional Application No. 61/413,806 filed on November 15, 2010, and to U.S. Provisional Application No. 61/470,381 filed on March 31 , 201 1 , both of which are incorporated herein by reference in their entireties.
Field
The application relates to compositions and methods for prognosing and classifying colon cancer and for determining the benefit of adjuvant chemotherapy.
Background
As the third most common form of cancer, over 1 million new cases of colorectal cancer (CRC) are diagnosed worldwide each year. Despite significant advances in detection, surgery, and chemotherapeutic treatment, CRC is the fourth most common cause of cancer death worldwide, and second most common cause of cancer death in the United States (Tenesa & Dunlop, Nat Rev Genet 10:353-358 (2009); Jemal et al., Methods Mol. Biol. 471 :3-29 (2009)).
CRC that is confined within the wall of the colon (TNM (tumor-node metastasis) stages I and II) are typically curable with surgery. However, if left untreated, such tumors may spread to regional lymph nodes (stage III), where up to 73% are curable by surgery and chemotherapy. Once CRC metastasizes to distant sites within the body (stage IV), the disease is typically not curable, although chemotherapy can extend the rate of survival. Clinical benefit in CRC patients has recently been observed with drugs that target vascular endothelial growth factor (VEGF) or epidermal growth factor receptor (EGFR). In particular, monoclonal antibodies that target EGFR (e.g. cetuximab and panitumumab) and VEGF (bevacizumab) are approved for clinical use to treat CRC.
The constitutive activation of the mitogen-activated protein kinase (MAPK) pathway is a key driver of CRC tumorigenesis. The extracellular signal-regulated kinase (ERK) pathway plays a key role in cell proliferation and its aberrant activation is often due to oncogenic mutations in KRAS or BRAF genes (Fang and Richardson, Lancet Oncol. 6:322-327 (2005); Tejpar et al., Oncologist 15:390-404 (2010)).
RAF is a serine-threonine-specific protein kinase that is activated downstream of the small G-protein RAS and which activates the MAP kinase (MEK) pathway, which in turn activates ERK. BRAF is one of the three highly conserved RAF genes in mammals (the other two being ARAF and CRAF) and its somatic mutations have been reported in approximately 7% of human cancers (Davies et al., Nature 417:949-954 (2002); Dhomen & Marais, Curr. Opin. Genet. Dev. 17:31-39 (2007)). In CRC, the BRAF mutations occur in 8-10% of sporadic cancers and generally are markers of poor prognosis. For example, the V600E mutation in BRAF, is believed to be associated with microsatellite instability (MSI), and may confer resistance to anti-EGFR therapy (Richman et al. J. Clin. Oncol. 27(35):5931-5937 (2009)). Furthermore, KRAS mutations are known to lead to EGFR-independent activation of the MAPK pathway, suggesting that therapies targeting EGFR will not be effective in patients with KRAS mutations (Benvenuti et al. Cancer Res 67: 2643-2648 (2007); Di Fiore et al., Br. J. Cancer 96:1 166-1 169 (2007)). Accordingly, there is an ongoing need to develop biomarkers that can effectively identify CRC patients that are best suited for certain therapeutic modalities.
Summary
As will be discussed in more detail herein, the present disclosure relates to the identification, from historical CRC patient data, several gene signatures that identify a subpopulation of patients that may be sensitive to novel targeted treatments. In particular, the present disclosure provides several gene signatures that are characteristic of BRAF mutated CRC tumors. The present disclosure provides methods and kits useful for obtaining and utilizing expression information for the genes identified herein, to obtain prognostic and diagnostic information for patients with CRC.
The methods of the present disclosure generally involve obtaining relative expression data from a patient, at the DNA, messenger RNA (mRNA), or protein level, for each of the genes identified herein, processing the data and comparing the resulting information to one or more reference values. Relative expression levels are expression data normalized according to techniques known to those skilled in the art. Expression data may be normalized with respect to one or more genes with invariant expression, such as "housekeeping" genes. In some embodiments, expression data may be processed using standard techniques, such as transformation to a z-score, and/or software tools, such as RMAexpress v0.3.
In one aspect, a multi-gene signature is provided for prognosing or classifying patients with CRC. In some embodiments, a 39-gene pair signature is provided, comprising reference values for each of 39 pairs of different genes based on relative expression data for each gene from a historical data set with a known outcome, such as good or poor survival, and/or known treatment, such as adjuvant chemotherapy.
In one aspect, relative expression data from a patient are combined with the gene-specific reference values on a gene-by-gene basis for each of the genes identified herein, to generate a test value which allows prognosis or therapy recommendation. In some embodiments, relative expression data are subjected to an algorithm that yields a single test value, or combined score, which is then compared to a control value obtained from the historical expression data for a patient or pool of patients. In some embodiments, the control value is a numerical threshold for predicting outcomes, for example good and poor outcome, or making therapy recommendations, for example adjuvant therapy in addition to surgical resection or surgical resection alone. In some embodiments, a test value or combined score greater than the control value is predictive, for example, of high risk (poor outcome) or benefit from adjuvant therapy, whereas a combined score falling below the control value is predictive, for example, of low risk (good outcome) or lack of benefit from adjuvant therapy.
The present disclosure provides gene signatures that are prognostic for survival as well as predictive for benefit from adjuvant chemotherapy. For example, the disclosure provides methods that can be used to select or identify subjects who might benefit from adjuvant chemotherapy as opposed to subjects who are not likely to benefit from such adjuvant chemotherapy.
Accordingly, in one embodiment, the disclosure provides a method of prognosing or classifying a subject with CRC comprising: a) analyzing at least one of the gene pairs shown in Table 3.1 .1 or Table 3.1 .2 according to the top scoring pair method; and b) classifying the subject into a BRAF mutant-like group or a wild-type group. For example, in one embodiment at least 10 of the gene pairs shown in Table 3.1.1 or 3.1.2 are analyzed according to the top scoring pair method. In a further embodiment at least 30 of the gene pairs shown in Table 3.1 .1 or Table 3.1.2 are analyzed according to the top scoring pair method. In a further embodiment, the 39 gene pairs shown in Table 3.1 .1 are analyzed according to the top scoring pair method. In a further embodiment, the 32 gene pairs shown in Table 3.1.2 are analyzed according to the top scoring pair method.
In a further embodiment, the top scoring pair method is carried out by comparing the average value of the relative expression levels of all Genel genes used in the analysis with the average value of relative expression levels of all Gene2 genes used in the analysis, wherein if the average Genel value is less than the average Gene2 value, then the subject is classified as BRAF mutant-like. In a further embodiment, the top scoring pair method is carried out as described above, wherein if the average Genel value is greater than or equal to the average Gene2 value, then the subject is classified as wild-type. In some embodiments, the top scoring pair method uses the 39 pairs of genes shown in Table 3.1.1. In some embodiments, the top scoring pair method uses the 32 pairs of genes shown in Table 3.1 .2.
In a further embodiment, the disclosure provides a method of prognosing or classifying a subject with CRC comprising: a) calculating a score using the AdaBoost method as described in Example 3, using the relative expression values of the genes shown in Table 3.2; and b) classifying the subject into a BRAF mutant-like group or a wild-type group. For example, in one embodiment the subject is classified as wild-type if the calculated score is less than 0.5, and the subject is classified as BRAF mutant-like if the score is 0.5 or greater.
In a further embodiment, the disclosure provides a method of prognosing or classifying a subject with CRC by using the CCP2 gene signature as described in Example 3. For example, the relative expression levels of the genes noted in Example 3.2.2 can be determined and the CCP2 method carried out as described in Example 3.1 .2. Using the CCP2 gene signature as described in Example 3 can be used to classify or prognose a subject with CRC as either BRAF mutant-like, or wild-type.
In a further aspect, the present disclosure provides a method for selecting therapy comprising the steps of classifying or prognosing a subject with CRC using any of the methods described herein, and further comprising selecting adjuvant chemotherapy for a subject classified as wild-type, or selecting no adjuvant chemotherapy for a subject classified as BRAF mutant-like.
In one embodiment, the present disclosure provides a method for selecting therapy comprising the steps of classifying or prognosing a subject with CRC using any of the methods described herein, and further comprising selecting adjuvant chemotherapy for a subject classified as wild-type, or selecting a treatment regimen comprising a BRAF mutant-specific inhibitor for a subject classified as BRAF mutantlike. In a further aspect, the present disclosure provides a method of treating a subject with CRC comprising administering a BRAF mutant-specific inhibitor to said subject, wherein said subject is classified as BRAF mutant-like according to any of the methods described herein. In one embodiment, the present disclosure provides any of the methods described herein, wherein said subject is a human.
In a further aspect, the present disclosure provides a CRC prognosticator comprising a mechanism for determining relative expression levels in a CRC tumor sample of the genes listed in Table 3.1.1 , Table 3.1.2, Table 3.2, or those listed in Example 3.2.2. In one embodiment, the mechanism comprises a microarray. In a further embodiment, the mechanism comprises an assay of reverse transcription polymerase chain reaction.
The application also provides for kits used to prognose or classify a subject with CRC into a good survival group or a poor survival group or for selecting therapy for a subject with CRC that includes detection agents that can detect the expression products of the biomarkers described herein, for example the gene pairs shown in Table 3.1.1 , Table 3.1 .2, or the genes listed in Table 3.2, or those listed in Example 3.2.2.
Accordingly, in a further aspect the present disclosure provides a kit for classifying a subject with CRC comprising detection agents capable of detecting the expression products of at least one gene pair shown in Table 3.1.1 , or Table 3.1 .2, or of the genes shown in Table 3.1 .1 or Table 3.1.2, or in Example 3.2.2. In some embodiments, said agents are capable of detecting the expression products of at least 5, at least 10, at least 20, at least 30, at least 35, or the 39 gene pairs shown in Table 3.1 .1. In some embodiments, said agents are capable of detecting the expression products of at least 5, at least 10, at least 20, at least 30, or the 32 gene pairs shown in Table 3.1.2. In a further embodiment, any of the kits described above comprise an addressable array that comprises probes for the expression products of the at least one, at least 5, at least 10, at least 20, at least 30, at least 35, or the 39 gene pairs of Table 3.1 .1. In a further embodiment, any of the kits described above comprise an addressable array that comprises probes for the expression products of the at least one, at least 5, at least 10, at least 20, at least 30, or the 32 gene pairs of Table 3.1 .2. In a further embodiment, the detection agents comprise primers capable of hybridizing to the expression products of the gene pairs. In a further embodiment, the present disclosure provides any of the kits described herein, further comprising a computer implemented product for comparing: a) the relative expression level values for Genel genes in Table 3.1.1 or Table 3.1.2 for a subject to b) the relative expression level values for Gene2 genes in Table 3.1.1 or Table 3.1 .2 for said subject. In one embodiment, the average value of the relative expression levels of all Genel genes used in the analysis is compared with the average value of relative expression levels of all Gene2 genes used in the analysis. In one embodiment, the 39 gene pairs in Table 3.1.1 are used in the analysis. In one embodiment, the 32 gene pairs in Table 3.1.2 are used in the analysis.
The present disclosure provides probes for detecting the biomarkers described herein, for example the genes disclosed in Table 3.1 .1 , Table 3.1 .2, Table 3.2, and those disclosed in Example 3.2.2. Exemplary probes include mRNA oligonucleotides, cDNA oligonucleotides, and PCR primers. The probes are capable of detecting or hybridizing to, each of the 39 pairs or 32 pairs of genes described in Example 3.
In one aspect, the present disclosure provides kits useful for carrying out the diagnostic and prognostic tests described herein. The kits generally comprise reagents and compositions for obtaining relative expression data for the 39 gene pairs or 32 gene pairs, described in Table 3.1 .1 or Table 3.1.2, the genes shown in Table 3.2, or the genes noted in Example 3.2.2. The kits typically comprise probes for detecting the 39 gene pairs. The present disclosure also provides antibodies capable of specifically binding to the protein products of the biomarkers described herein. As will be recognized by skilled artisans, the contents of the kits will depend upon the means used to obtain the relative expression information.
Kits may comprise a labeled compound or agent capable of detecting protein product(s) or nucleic acid sequence(s) in a sample and means for determining the amount of the protein or mRNA in the sample (e.g., an antibody which binds the protein or a fragment thereof, or an oligonucleotide probe which binds to DNA or mRNA encoding the protein). Kits can also include instructions for interpreting the results obtained using the kit.
In some embodiments, the kits are oligonucleotide-based kits, which may comprise, for example: (1 ) an oligonucleotide, e.g., a detectably labeled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a marker protein or (2) a pair of primers useful for amplifying a marker nucleic acid molecule. Kits may also comprise, e.g., a buffering agent, a preservative, or a protein stabilizing agent. The kits can further comprise components necessary for detecting the detectable label (e.g., an enzyme or a substrate). The kits can also contain a control sample or a series of control samples which can be assayed and compared to the test sample. Each component of a kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.
In some embodiments, the kits are antibody-based kits, which may comprise, for example: (1 ) a first antibody (e.g., attached to a solid support) which binds to a marker protein; and, optionally, (2) a second, different antibody which binds to either the protein or the first antibody and is conjugated to a detectable label.
A further aspect provides computer implemented products, computer readable mediums and computer systems that are useful for the methods described herein.
Other features and advantages of the present disclosure will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples while indicating preferred embodiments of the disclosure are given by way of illustration only, since various changes and modifications within the spirit and scope of the disclosure will become apparent to those skilled in the art from this detailed description.
Brief Description of the Drawings
Figures 1 A and B: AUC and error rates when the model is built on the phase 1 data and validated on phase 2 data, for increasing model size.
Figures 2 A and B: AUC and error rates when the model is built on the phase 2 data and validated on phase 1 data, for increasing model size.
Figure 3: (AdaBoost) Distribution of BRAF scores: all scores above 0.5 (grey vertical line) indicate the "BRAF-like" samples. The small hash lines at the bottom right show the score of the BRAFmut samples and the small hash lines along the top are those of KRASmut samples.
Figure 4: (CCP2) Distribution of BRAF scores: all scores above the threshold
(grey vertical line) indicate the "BRAF-like" samples. The small hash lines along the bottom show the score of the BRAFmut samples and the small hash along the top are those of KRASmut samples. Figure 5: Classifiers agreement: The diagrams show the number of samples that are predicted to be either BRAF-like or WT2-like by the three classifiers. For some samples, the three classifiers agree on their predictions, while for others there is no agreement.
Figure 6: Kaplan-Meier plots for the BRAF-like group predicted by mTSP
(Figures 6 A, C, and E) and BRAFmut status (Figures 6 B, D, and F) and the OS, RFS and SAR endpoints, on the PETACC3 data set.
Figure 7: KRASmut samples stratified by the mTSP signature in BRAF-like samples (dashed-line - BRAF high) and non-BRAF-like samples (solid line - BRAF low).
Figure 8: (PETACC3 data/mTSP) Overall survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
Figure 9: (PETACC3 data/mTSP) Relapse-free survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
Figure 10: (PETACC3 data/mTSP) Survival after relapse: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
Figure 1 1 : Kaplan-Meier plots for the BRAF-like group predicted by mTSP (A and
C) and BRAFmut status (B and D) and the OS and PFS, on the CETUX data set.
Figure 12: (Overall survival) KRASmut samples stratified by the mTSP signature in BRAF-like samples (BRAF high) and non-BRAF-like samples (BRAF low) in the CETUX data set.
Figure 13: Kaplan-Meier plots for the BRAF-like group predicted by CCP2 and the OS, RFS and SAR endpoints, on the PETACC3 data set.
Figure 14: (PETACC3 data/CCP2) Overall survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2).
Figure 15: (PETACC3 data/CCP2) Relapse-free survival: BRAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D only KRASmut; E only WT2). Figure 16: (PETACC3 data/CCP2) Survival after relapse: B RAF. hi (predicted) and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D - only KRASmut; E - only WT2).
Figure 17: Kaplan-Meier plots for the BRAF-like group predicted by CCP2 and the OS and PFS on the CETUX data set.
Figure 18: Overall survival: Population stratification by binarized BRAF score.
Figure 19: Overall survival: BRAFhi and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D - only KRASmut; E - only WT2).
Figure 20: Relapse-free survival: Population stratification by binarized BRAF score.
Figure 21 : Survival after relapse: Population stratification by binarized BRAF score.
Figure 22: Survival after relapse: BRAFhi and MSI status interaction within different subpopulations (A - whole population; B - no BRAFmut; C - only BRAFmut and KRASmut; D - only KRASmut; E - only WT2).
Detailed Description
The present disclosure provides several gene signatures that can be used to predict BRAFmut status, and provides methods, compositions, computer implemented products, detection agents and kits for prognosing or classifying a subject with CRC and for determining the benefit of adjuvant chemotherapy.
The term "biomarker" as used herein refers to a gene that is differentially expressed in individuals with CRC according to prognosis and is predictive of different survival outcomes and of the benefit of adjuvant chemotherapy. In some embodiments, a 39-gene pair signature comprises 39 gene pairs listed in Table 3.1.1 . In some embodiments, a 32-gene pair signature comprises 32 gene pairs listed in Table 3.1 .2.
As used herein the following terms have the following meanings. The term "reference expression profile" refers to the expression of the 39 gene pairs listed in Table 3.1.1 associated with a clinical outcome in a CRC patient. The reference expression profile comprises 78 values, each value representing the expression level of a biomarker, wherein each biomarker corresponds to one gene in Table 3.1.1 . The reference expression profile is identified using one or more samples comprising tumor wherein the expression is similar between related samples defining an outcome class or group such as poor survival or good survival and is different to unrelated samples defining a different outcome class such that the reference expression profile is associated with a particular clinical outcome. The reference expression profile is accordingly a reference profile of the expression of the 78 genes in Table 3.1 .1 , to which the subject expression levels of the corresponding genes in a patient sample are compared in methods for determining or predicting clinical outcome. Similarly, such a reference expression profile can also refer to the 32 gene pairs listed in Table 3.1.2.
As used herein, the term "control" refers to a specific value or dataset that can be used to prognose or classify the value, e.g., expression level or reference expression profile obtained from the test sample associated with an outcome class. In one embodiment, a dataset may be obtained from samples from a group of subjects known to have CRC and good survival outcome or known to have CRC and have poor survival outcome or known to have CRC and have benefited from adjuvant chemotherapy or known to have CRC and not have benefited from adjuvant chemotherapy. The expression data of the biomarkers in the dataset can be used to create a "control value" that is used in testing samples from new patients. A control value is obtained from the historical expression data for a patient or pool of patients with a known outcome. In some embodiments, the control value is a numerical threshold for predicting outcomes, for example good and poor outcome, or making therapy recommendations, for example adjuvant therapy in addition to surgical resection or surgical resection alone.
In some embodiments, the "control" is a predetermined value for the set of 78 biomarkers obtained from CRC patients whose biomarker expression values and survival times are known. Alternatively, the "control" is a predetermined reference profile for the set of 78 biomarkers obtained from CRC patients whose survival times are known. Using values from known samples allows one to develop an algorithm for classifying new patient samples into good and poor survival groups as described in the Examples.
Accordingly, in one embodiment, the control is a sample from a subject known to have CRC and good survival outcome. In another embodiment, the control is a sample from a subject known to have CRC and poor survival outcome.
A person skilled in the art will appreciate that the comparison between the expression of the biomarkers in the test sample and the expression of the biomarkers in the control will depend on the control used. For example, if the control is from a subject known to have CRC and poor survival, and there is a difference in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a good survival group. If the control is from a subject known to have CRC and good survival, and there is a difference in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a poor survival group. For example, if the control is from a subject known to have CRC and good survival, and there is a similarity in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a good survival group. For example, if the control is from a subject known to have CRC and poor survival, and there is a similarity in expression of the biomarkers between the control and test sample, then the subject can be prognosed or classified in a poor survival group.
As used herein, a "reference value" refers to a gene-specific coefficient derived from historical expression data. The multi-gene signatures of the present disclosure comprise gene-specific reference values. In some embodiments, the multi-gene signature comprises one reference value for each gene in the signature. In some embodiments, the multi-gene signature comprises four reference values for each gene in the signature. In some embodiments, the reference values are the first four components derived from principal component analysis for each gene in the signature.
The term "differentially expressed" or "differential expression" as used herein refers to a difference in the level of expression of the biomarkers that can be assayed by measuring the level of expression of the products of the biomarkers, such as the difference in level of messenger RNA transcript expressed or proteins expressed of the biomarkers. In a preferred embodiment, the difference is statistically significant. The term "difference in the level of expression" refers to an increase or decrease in the measurable expression level of a given biomarker as measured by the amount of messenger RNA transcript and/or the amount of protein in a sample as compared with the measurable expression level of a given biomarker in a control. In one embodiment, the differential expression can be compared using the ratio of the level of expression of a given biomarker or biomarkers as compared with the expression level of the given biomarker or biomarkers of a control, wherein the ratio is not equal to 1 .0. For example, an RNA or protein is differentially expressed if the ratio of the level of expression in a first sample as compared with a second sample is greater than or less than 1 .0. For example, a ratio of greater than 1 , 1.2, 1 .5, 1 .7, 2, 3, 5, 10, 15, 20 or more, or a ratio less than 1 , 0.8, 0.6, 0.4, 0.2, 0.1 , 0.05, 0.001 or less. In another embodiment the differential expression is measured using p-value. For instance, when using p-value, a biomarker is identified as being differentially expressed as between a first sample and a second sample when the p-value is less than 0.1 , preferably less than 0.05, more preferably less than 0.01 , even more preferably less than 0.005, the most preferably less than 0.001 .
The term "similarity in expression" as used herein means that there is no or little difference in the level of expression of the biomarkers between the test sample and the control or reference profile. For example, similarity can refer to a fold difference compared to a control. In a preferred embodiment, there is no statistically significant difference in the level of expression of the biomarkers.
The term "most similar" in the context of a reference profile refers to a reference profile that is associated with a clinical outcome that shows the greatest number of identities and/or degree of changes with the subject profile.
The term "prognosis" as used herein refers to a clinical outcome group such as a poor survival group or a good survival group associated with a disease subtype which is reflected by a reference profile such as a biomarker reference expression profile or reflected by an expression level of the biomarkers disclosed herein. The prognosis provides an indication of disease progression and includes an indication of likelihood of death due to CRC. In one embodiment the clinical outcome class includes a good survival group and a poor survival group.
The term "prognosing or classifying" as used herein means predicting or identifying the clinical outcome group that a subject belongs to according to the subject's similarity to a reference profile or biomarker expression level associated with the prognosis. For example, prognosing or classifying comprises a method or process of determining whether an individual with CRC has a good or poor survival outcome, or grouping an individual with CRC into a good survival group or a poor survival group.
The term "good survival" as used herein refers to an increased chance of survival as compared to patients in the "poor survival" group. For example, the biomarkers of the application can prognose or classify patients into a "good survival group." These patients are at a lower risk of death after surgery. The term "poor survival" as used herein refers to an increased risk of death as compared to patients in the "good survival" group. For example, biomarkers or genes of the application can prognose or classify patients into a "poor survival group." These patients are at greater risk of death from surgery.
Accordingly, in one embodiment, the biomarker reference expression profile comprises a poor survival group. In another embodiment, the biomarker reference expression profile comprises a good survival group.
The term "subject" as used herein refers to any member of the animal kingdom, preferably a human being, that has CRC or that is suspected of having CRC.
CRC patients are classified into stages, which are used to determine therapy.
Staging classification testing may include any or all of history, physical examination, routine laboratory evaluations, x-rays, and computed tomography scans or positron emission tomography scans with infusion of contrast materials.
As used herein, the term "BRAF mutant-specific inhibitor" refers to a substance that decreases the activity and/or expression of a BRAF mutant protein, but that does not substantially decrease the activity and/or expression of wild type BRAF. Such inhibitors include small molecules, antibodies, and antisense molecules. BRAF mutant proteins include those with mutations as compared with the wild type sequence. A DNA missense mutation leading to a valine to glutamic acid amino acid substitution (V600E) is the most frequent BRAF mutation observed, and functionally the most important involved in the aberrant activation of the MEK-ERK pathway and CRC carcinogenesis. Other known BRAF mutations include R461 I, I462S, G463E, G463V, G465A, G465E, G465V, G468A, G468E, N580S, E585K, D593V, F594L, G595R, L596V, T598I, V599D, V599E, V599K, V599R, K600E, A727V. Most of such mutations are clustered in two regions: the glycine-rich P loop of the N lobe, and the activation segment and flanking regions. BRAF mutant-specific inhibitors currently in development include, without limitation, compounds such as PLX-4720 (Plexxikon), PLX-4032 (Plexxikon), XL-281 (Exelixis), GSK-21 18436 (Glaxo Smith Kline).
As used herein, the term "BRAF mutant-like" refers to a classification of subjects with CRC as predicted by the gene signatures disclosed herein, where subjects with CRC that are classified as "BRAF mutant-like" are those expected to possess at least one BRAF mutation, and/or are expected to respond to adjuvant chemotherapy in a manner that is similar to subjects with CRC who have BRAF mutations and/or possess mutations that result in the aberrant activation of the MEK-ERK pathway and are thus expected to exhibit poor survival when treated with adjuvant chemotherapy. For example, subjects with CRC that have at least one BRAF mutation are generally expected to show a poor response to adjuvant chemotherapy. Furthermore, subjects with CRC that are BRAF mutant-like have a poor survival outcome.
As used herein, the term "WT2", or "wild-type" refers to a classification of subjects with CRC as predicted by the gene signatures disclosed herein, where subjects with CRC that are classified as "WT2" or "wild-type" are those expected to be wild type for both BRAF and KRAS genes (i.e. have no mutations in either BRAF or KRAS genes), and/or are expected to respond to adjuvant chemotherapy in a manner that is similar to subjects with CRC who are wild type for both BRAF and KRAS genes. Subjects with CRC that are wild type for both BRAF and KRAS genes are generally expected to show a good response to adjuvant chemotherapy and have a good survival outcome.
In one aspect, a multi-gene signature is prognostic of patient outcome and/or response to adjuvant chemotherapy. The present disclosure provides prognostic signatures that are stage-independent classifiers. In some embodiments, a 39 gene pair or 32 gene pair signature is provided as described herein. In one embodiment, the signature comprises reference values for each of the 39 gene pairs listed in Table 3.1 .1 , or the 32 gene pairs listed in Table 3.1 .2. In some embodiments, this gene signature is prognostic of patient outcome and/or response to adjuvant chemotherapy. In some embodiments, the gene pairs listed in Table 3.1.1 or Table 3.1 .2 are used in a "top scoring pair" algorithm/method to predict whether or not a patient is classified as "BRAF mutant-like". Table 3.1 .1 and Table 3.1 .2 lists pairs of genes, where the first gene in the pair is the "Genel " gene, and the second gene in the pair is the "Gene2" gene. In one embodiment, a single gene pair can be analyzed according to the top scoring pair method by comparing the relative gene expression value of a Genel gene in Table 3.1 .1 or Table 3.1 .2 with the relative gene expression value of the second gene in the pair (i.e. Gene2). If the Genel value of this gene pair is less than the Gene2 value, then the method predicts BRAF mutant-like status. If the Genel value of this gene pair is greater than or equal to the Gene2 value, then the method predicts wild-type ("WT2") status.
Several of the gene pairs shown in Table 3.1.1 or Table 3.1.2 can be used in a similar way, where each of the gene pairs that predict BRAF mutant-like status count as a "vote" for BRAF mutant-like status, so that if there are more "votes" for BRAF mutantlike status, then the method would predict BRAF mutant-like status overall. This could be applied, for example, using any number of the gene pairs from Table 3.1.1 , or Table 3.1 .2, for example, less than 39 pairs, less than 30 pairs, less than 25 pairs, less than 20 pairs, less than 15 pairs, less than 10 pairs, less than 5 pairs, or less than 4, 3, or 2 pairs of the genes in Table 3.1.1 . Further, for example, for a given number of gene pairs from Table 3.1 .1 or Table 3.1 .2 that will be used in the prediction method, the average value of all the Genel values can be compared to the average value of all the Gene2 values. Accordingly, if the average Genel value is less than the average Gene2 value, then the method predicts BRAF mutant-like status. For example, as described in Example 3, when using all 39 gene pairs, the average relative expression value of all the Genel genes in Table 3.1 .1 can be compared to the average relative expression value of all the Gene2 genes in Table 3.1.1. If the average Genel value is less than the average Gene2 value, then the top scoring pair method predicts BRAF mutant-like. Those of skill in the art will recognize that in other embodiments, this method could be applied, for example, using relative expression levels of any number of the gene pairs from Table 3.1 .1 , for example, less than 39 pairs, less than 30 pairs, less than 25 pairs, less than 20 pairs, less than 15 pairs, less than 10 pairs, less than 5 pairs, or less than 4, less than 3, or less than 2 pairs.
The term "test sample" as used herein refers to any cancer-affected fluid, cell or tissue sample from a subject which can be assayed for biomarker expression products and/or a reference expression profile, e.g., genes differentially expressed in subjects with CRC according to survival outcome.
The phrase "determining the expression of biomarkers" as used herein refers to determining or quantifying RNA or proteins expressed by the biomarkers. The term "RNA" includes mRNA transcripts, and/or specific spliced variants of mRNA. The terms "RNA product of the biomarker," "biomarker RNA," or "target RNA" as used herein refers to RNA transcripts transcribed from the biomarkers and/or specific spliced variants. In the case of "protein," it refers to proteins translated from the RNA transcripts transcribed from the biomarkers. The term "protein product of the biomarker" or "biomarker protein" refers to proteins translated from RNA products of the biomarkers.
A person skilled in the art will appreciate that a number of methods can be used to detect or quantify the level of RNA products of the biomarkers within a sample, including arrays, such as microarrays, RT-PCR (including quantitative PCR), nuclease protection assays and Northern blot analyses. Any analytical procedure capable of permitting specific and quantifiable (or semi-quantifiable) detection of the genes described here and, optionally, additional biomarkers may be used in the methods herein presented, such as the microarray methods set forth herein, and methods known to those skilled in the art.
Accordingly, in one embodiment, the biomarker expression levels are determined using arrays, optionally microarrays, RT-PCR, optionally quantitative RT-PCR, nuclease protection assays or Northern blot analyses.
In some embodiments, the biomarker expression levels are determined by using an array. cDNA microarrays consist of multiple (usually thousands) of different cDNA probes spotted (usually using a robotic spotting device) onto known locations on a solid support, such as a glass microscope slide. Microarrays for use in the methods described herein comprise a solid substrate onto which the probes are covalently or non-covalently attached. The cDNAs are typically obtained by PCR amplification of plasmid library inserts using primers complementary to the vector backbone portion of the plasmid or to the gene itself for genes where sequence is known. PCR products suitable for production of microarrays are typically between 0.5 and 2.5 kB in length. In a typical microarray experiment, RNA (either total RNA or poly A RNA) is isolated from cells or tissues of interest and is reverse transcribed to yield cDNA. Labeling is usually performed during reverse transcription by incorporating a labeled nucleotide in the reaction mixture. A microarray is then hybridized with labeled RNA, and relative expression levels calculated based on the relative concentrations of cDNA molecules that hybridized to the cDNAs represented on the microarray. Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols, such as by using Affymetrix GeneChip technology, Agilent Technologies cDNA microarrays, lllumina Whole-Genome DASL array assays, or any other comparable microarray technology.
In some embodiments, probes capable of hybridizing to one or more biomarker RNAs or cDNAs are attached to the substrate at a defined location ("addressable array"). Probes can be attached to the substrate in a wide variety of ways, as will be appreciated by those in the art. In some embodiments, the probes are synthesized first and subsequently attached to the substrate. In other embodiments, the probes are synthesized on the substrate. In some embodiments, probes are synthesized on the substrate surface using techniques such as photo-polymerization and photolithography.
In some embodiments, microarrays are utilized in a RNA-primed, Array-based Klenow Enzyme ("RAKE") assay. See Nelson, P. T. et al. (2004) Nature Methods 1 (2):1 -7; Nelson, P. T. et al. (2006) RNA 12(2):1 -5, each of which is incorporated herein by reference in its entirety. In these embodiments, total RNA is isolated from a sample. Optionally, small RNAs can be further purified from the total RNA sample. The RNA sample is then hybridized to DNA probes immobilized at the 5'-end on an addressable array. The DNA probes comprise a base sequence that is complementary to a target RNA of interest, such as one or more biomarker RNAs capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the genes described in Example 3 under standard hybridization conditions.
In some embodiments, the addressable array comprises DNA probes for no more than the 78 genes listed in Table 3.1.1 , or the 64 genes listed in Table 3.1 .2, or the genes listed in Table 3.2, or those listed in Example 3.2.2. In some embodiments, the addressable array comprises DNA probes for each of the 78 genes listed in Table 3.1.1 , or each of the 64 genes listed in Table 3.1.2, or each of the genes listed in Table 3.2, or each of the genes listed in Example 3.2.2.
In some embodiments, quantitation of biomarker RNA expression levels requires assumptions to be made about the total RNA per cell and the extent of sample loss during sample preparation. In some embodiments, the addressable array comprises DNA probes for each of the 78 genes listed in Table 3.1.1 , or for each of the 64 genes listed in Table 3.1.2, or the genes listed in Table 3.2, or those listed in Example 3.2.2.
In some embodiments, expression data are pre-processed to correct for variations in sample preparation or other non-experimental variables affecting expression measurements. For example, background adjustment, quantile adjustment, and summarization may be performed on microarray data, using standard software programs such as RMAexpress v0.3, followed by centering of the data to the mean and scaling to the standard deviation.
After the sample is hybridized to the array, it is exposed to exonuclease I to digest any unhybridized probes. The Klenow fragment of DNA polymerase I is then applied along with biotinylated dATP, allowing the hybridized biomarker RNAs to act as primers for the enzyme with the DNA probe as template. The slide is then washed and a streptavidin-conjugated fluorophore is applied to detect and quantitate the spots on the array containing hybridized and Klenow-extended biomarker RNAs from the sample.
In some embodiments, the RNA sample is reverse transcribed using a biotin/poly-dA random octamer primer. The RNA template is digested and the biotin- containing cDNA is hybridized to an addressable microarray with bound probes that permit specific detection of biomarker RNAs. In typical embodiments, the microarray includes at least one probe comprising at least 8, at least 9, at least 10, at least 1 1 , at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, even at least 20, 21 , 22, 23, or 24 contiguous nucleotides identically present in each of the genes listed in Table 3.1.1 or Table 3.1.2, or each of the genes listed in Table 3.2, or each of the genes listed in Example 3.2.2. After hybridization of the cDNA to the microarray, the microarray is exposed to a streptavidin-bound detectable marker, such as a fluorescent dye, and the bound cDNA is detected.
In one embodiment, the array is a U133A chip from Affymetrix. In another embodiment, a plurality of nucleic acid probes that are complementary or hybridizable to an expression product of the genes listed in Table 3.1 .1 , or Table 3.1.2, or the genes listed in Table 3.2, or the genes listed in Example 3.2.2, are used on the array. In some embodiments, a plurality of nucleic acid probes that are complementary or hybridizable to an expression product of some or all the genes described in Example 3 are used on the array.
The term "nucleic acid" includes DNA and RNA and can be either double stranded or single stranded.
The term "hybridize" or "hybridizable" refers to the sequence specific non- covalent binding interaction with a complementary nucleic acid. In a preferred embodiment, the hybridization is under high stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, 6. OX sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2.0XSSC at 50°C may be employed.
The term "probe" as used herein refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence. In one example, the probe hybridizes to an RNA product of the biomarker or a nucleic acid sequence complementary thereof. The length of probe depends on the hybridization conditions and the sequences of the probe and nucleic acid target sequence. In one embodiment, the probe is at least 8, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length.
In some embodiments, compositions are provided that comprise at least one biomarker or target RNA-specific probe. The term "target RNA-specific probe" encompasses probes that have a region of contiguous nucleotides having a sequence that is either (i) identically present in one of the genes described in Example 3, or (ii) complementary to the sequence of a region of contiguous nucleotides found in one of the genes described in Example 3, where "region" can comprise the full length sequence of any one of the genes described in Example 3, a complementary sequence of the full length sequence of any one of the genes described in Example 3, or a subsequence thereof.
In some embodiments, target RNA-specific probes consist of deoxyribonucleotides. In other embodiments, target RNA-specific probes consist of both deoxyribonucleotides and nucleotide analogs. In some embodiments, biomarker RNA-specific probes comprise at least one nucleotide analog which increases the hybridization binding energy. In some embodiments, a target RNA-specific probe in the compositions described herein binds to one biomarker RNA in the sample.
In some embodiments, more than one probe specific for a single biomarker RNA is present in the compositions, the probes capable of binding to overlapping or spatially separated regions of the biomarker RNA.
It will be understood that in some embodiments in which the compositions described herein are designed to hybridize to cDNAs reverse transcribed from biomarker RNAs, the composition comprises at least one target RNA-specific probe comprising a sequence that is identically present in a biomarker RNA (or a subsequence thereof).
In some embodiments, a biomarker RNA is capable of specifically hybridizing to at least one probe comprising a base sequence that is identically present in one of the genes described in Example 3. In some embodiments, a biomarker RNA is capable of specifically hybridizing to at least one nucleic acid probe comprising a sequence that is identically present in one of the genes described in Example 3.
In some embodiments, the composition comprises a plurality of target or biomarker RNA-specific probes each comprising a region of contiguous nucleotides comprising a base sequence that is identically present in one or more of the genes described in Example 3, or in a subsequence thereof.
As used herein, the terms "complementary" or "partially complementary" to a biomarker or target RNA (or target region thereof), and the percentage of "complementarity" of the probe sequence to that of the biomarker RNA sequence is the percentage "identity" to the reverse complement of the sequence of the biomarker RNA. In determining the degree of "complementarity" between probes used in the compositions described herein (or regions thereof) and a biomarker RNA, such as those disclosed herein, the degree of "complementarity" is expressed as the percentage identity between the sequence of the probe (or region thereof) and the reverse complement of the sequence of the biomarker RNA that best aligns therewith. The percentage is calculated by counting the number of aligned bases that are identical as between the two sequences, dividing by the total number of contiguous nucleotides in the probe, and multiplying by 100.
In some embodiments, the microarray comprises probes comprising a region with a base sequence that is fully complementary to a target region of a biomarker RNA. In other embodiments, the microarray comprises probes comprising a region with a base sequence that comprises one or more base mismatches when compared to the sequence of the best-aligned target region of a biomarker RNA.
As noted above, a "region" of a probe or biomarker RNA, as used herein, may comprise or consist of 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29 or more contiguous nucleotides from a particular gene or a complementary sequence thereof. In some embodiments, the region is of the same length as the probe or the biomarker RNA. In other embodiments, the region is shorter than the length of the probe or the biomarker RNA.
In some embodiments, the microarray comprises 78 probes each comprising a region of at least 10 contiguous nucleotides, such as at least 1 1 contiguous nucleotides, such as at least 13 contiguous nucleotides, such as at least 14 contiguous nucleotides, such as at least 15 contiguous nucleotides, such as at least 16 contiguous nucleotides, such as at least 17 contiguous nucleotides, such as at least 18 contiguous nucleotides, such as at least 19 contiguous nucleotides, such as at least 20 contiguous nucleotides, such as at least 21 contiguous nucleotides, such as at least 22 contiguous nucleotides, such as at least 23 contiguous nucleotides, such as at least 24 contiguous nucleotides, such as at least 25 contiguous nucleotides with a base sequence that is identically present in one of the genes described in Table 3.1 .1 , or Table 3.1.2.
In another embodiment, the biomarker expression levels are determined by using quantitative RT-PCR. RT-PCR is one of the most sensitive, flexible, and quantitative methods for measuring expression levels. The first step is the isolation of mRNA from a target sample. The starting material is typically total RNA isolated from human tumors or tumor cell lines. General methods for mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995). In particular, RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions. For example, total RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns. Numerous RNA isolation kits are commercially available.
In some embodiments, the primers used for quantitative RT-PCR comprise a forward and reverse primer for each gene listed in Table 3.1.1 , or Table 3.1 .2.
In some embodiments the analytical method used for detecting at least one biomarker RNA in the methods set forth herein includes real-time quantitative RT-PCR. Although PCR can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5' proofreading endonuclease activity. In some embodiments, RT-PCR is done using a TaqMan™ assay sold by Applied Biosystems, Inc. In a first step, total RNA is isolated from the sample. In some embodiments, the assay can be used to analyze about 10 ng of total RNA input sample, such as about 9 ng of input sample, such as about 8 ng of input sample, such as about 7 ng of input sample, such as about 6 ng of input sample, such as about 5 ng of input sample, such as about 4 ng of input sample, such as about 3 ng of input sample, such as about 2 ng of input sample, and even as little as about 1 ng of input sample containing RNA.
The TaqMan™ assay utilizes a stem-loop primer that is specifically complementary to the 3'-end of a biomarker RNA. The step of hybridizing the stem-loop primer to the biomarker RNA is followed by reverse transcription of the biomarker RNA template, resulting in extension of the 3' end of the primer. The result of the reverse transcription step is a chimeric (DNA) amplicon with the step-loop primer sequence at the 5' end of the amplicon and the cDNA of the biomarker RNA at the 3' end. Quantitation of the biomarker RNA is achieved by RT-PCR using a universal reverse primer comprising a sequence that is complementary to a sequence at the 5' end of all stem-loop biomarker RNA primers, a biomarker RNA-specific forward primer, and a biomarker RNA sequence-specific TaqMan™ probe.
The assay uses fluorescence resonance energy transfer ("FRET") to detect and quantitate the synthesized PCR product. Typically, the TaqMan™ probe comprises a fluorescent dye molecule coupled to the 5'-end and a quencher molecule coupled to the 3'-end, such that the dye and the quencher are in close proximity, allowing the quencher to suppress the fluorescence signal of the dye via FRET. When the polymerase replicates the chimeric amplicon template to which the TaqMan™ probe is bound, the 5'- nuclease of the polymerase cleaves the probe, decoupling the dye and the quencher so that FRET is abolished and a fluorescence signal is generated. Fluorescence increases with each RT-PCR cycle proportionally to the amount of probe that is cleaved.
In some embodiments, quantitation of the results of RT-PCR assays is done by constructing a standard curve from a nucleic acid of known concentration and then extrapolating quantitative information for biomarker RNAs of unknown concentration. In some embodiments, the nucleic acid used for generating a standard curve is an RNA of known concentration. In some embodiments, the nucleic acid used for generating a standard curve is a purified double-stranded plasmid DNA or a single-stranded DNA generated in vitro.
In some embodiments, where the amplification efficiencies of the biomarker nucleic acids and the endogenous reference are approximately equal, quantitation is accomplished by the comparative Ct (cycle threshold, e.g., the number of PCR cycles required for the fluorescence signal to rise above background) method. Ct values are inversely proportional to the amount of nucleic acid target in a sample. In some embodiments, Ct values of the target RNA of interest can be compared with a control or calibrator, such as RNA from normal tissue. In some embodiments, the Ct values of the calibrator and the target RNA samples of interest are normalized to an appropriate endogenous housekeeping gene (see above).
In addition to the TaqMan™ assays, other RT-PCR chemistries useful for detecting and quantitating PCR products in the methods presented herein include, but are not limited to, Molecular Beacons, Scorpion probes and SYBR Green detection.
In some embodiments, Molecular Beacons can be used to detect and quantitate PCR products. Like TaqMan™ probes, Molecular Beacons use FRET to detect and quantitate a PCR product via a probe comprising a fluorescent dye and a quencher attached at the ends of the probe. Unlike TaqMan™ probes, Molecular Beacons remain intact during the PCR cycles. Molecular Beacon probes form a stem-loop structure when free in solution, thereby allowing the dye and quencher to be in close enough proximity to cause fluorescence quenching. When the Molecular Beacon hybridizes to a target, the stem-loop structure is abolished so that the dye and the quencher become separated in space and the dye fluoresces. Molecular Beacons are available, e.g., from Gene Link™.
In some embodiments, Scorpion probes can be used as both sequence-specific primers and for PCR product detection and quantitation. Like Molecular Beacons, Scorpion probes form a stem-loop structure when not hybridized to a target nucleic acid. However, unlike Molecular Beacons, a Scorpion probe achieves both sequence-specific priming and PCR product detection. A fluorescent dye molecule is attached to the 5'- end of the Scorpion probe, and a quencher is attached to the 3'-end. The 3' portion of the probe is complementary to the extension product of the PCR primer, and this complementary portion is linked to the 5'-end of the probe by a non-amplifiable moiety. After the Scorpion primer is extended, the target-specific sequence of the probe binds to its complement within the extended amplicon, thus opening up the stem-loop structure and allowing the dye on the 5'-end to fluoresce and generate a signal. Scorpion probes are available from, e.g., Premier Biosoft International.
In some embodiments, RT-PCR detection is performed specifically to detect and quantify the expression of a single biomarker RNA. The biomarker RNA, in typical embodiments, is selected from a biomarker RNA capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the genes described in Example 3. In some embodiments, the biomarker RNA specifically hybridizes to a nucleic acid comprising a sequence that is identically present in at least one of the genes in Table 3.1 .1 , or Table 3.1.2. In other embodiments, the biomarker RNA specifically hybridizes to a nucleic acid comprising a sequence that is identically present in at least one of the genes in Table 3.2 or in Example 3.2.2. In various other embodiments, RT-PCR detection is utilized to detect, in a single multiplex reaction, each of 78 biomarker RNAs. The biomarker RNAs, in some embodiments, are capable of specifically hybridizing to a nucleic acid comprising a sequence that is identically present in one of the 78 genes listed in Table 3.1 .1 , or Table 3.1 .2. In various other embodiments, RT-PCR detection is utilized to detect, in a single multiplex reaction, RNAs corresponding to each of the biomarkers listed in Table 3.2, or in Example 3.2.2.
In some multiplex embodiments, a plurality of probes, such as TaqMan™ probes, each specific for a different RNA target, is used. In typical embodiments, each target RNA-specific probe is spectrally distinguishable from the other probes used in the same multiplex reaction.
In some embodiments, quantitation of RT-PCR products is accomplished using a dye that binds to double-stranded DNA products, such as SYBR Green. In some embodiments, the assay is the QuantiTect SYBR Green PCR assay from Qiagen. In this assay, total RNA is first isolated from a sample. Total RNA is subsequently poly- adenylated at the 3'-end and reverse transcribed using a universal primer with poly-dT at the 5'-end. In some embodiments, a single reverse transcription reaction is sufficient to assay multiple biomarker RNAs. RT-PCR is then accomplished using biomarker RNA-specific primers and a miScript Universal Primer, which comprises a poly-dT sequence at the 5'-end. SYBR Green dye binds non-specifically to double-stranded DNA and upon excitation, emits light. In some embodiments, buffer conditions that promote highly-specific annealing of primers to the PCR template (e.g., available in the QuantiTect SYBR Green PCR Kit from Qiagen) can be used to avoid the formation of non-specific DNA duplexes and primer dimers that will bind SYBR Green and negatively affect quantitation. Thus, as PCR product accumulates, the signal from SYBR green increases, allowing quantitation of specific products.
RT-PCR is performed using any RT-PCR instrumentation available in the art. Typically, instrumentation used in real-time RT-PCR data collection and analysis comprises a thermal cycler, optics for fluorescence excitation and emission collection, and optionally a computer and data acquisition and analysis software.
In some embodiments, the method of detectably quantifying one or more biomarker RNAs includes the steps of: (a) isolating total RNA; (b) reverse transcribing a biomarker RNA to produce a cDNA that is complementary to the biomarker RNA; (c) amplifying the cDNA from step (b); and (d) detecting the amount of a biomarker RNA with RT-PCR.
As described above, in some embodiments, the RT-PCR detection is performed using a FRET probe, which includes, but is not limited to, a TaqMan™ probe, a Molecular beacon probe and a Scorpion probe. In some embodiments, the RT-PCR detection and quantification is performed with a TaqMan™ probe, i.e., a linear probe that typically has a fluorescent dye covalently bound at one end of the DNA and a quencher molecule covalently bound at the other end of the DNA. The FRET probe comprises a base sequence that is complementary to a region of the cDNA such that, when the FRET probe is hybridized to the cDNA, the dye fluorescence is quenched, and when the probe is digested during amplification of the cDNA, the dye is released from the probe and produces a fluorescence signal. In such embodiments, the amount of biomarker RNA in the sample is proportional to the amount of fluorescence measured during cDNA amplification.
The TaqMan™ probe typically comprises a region of contiguous nucleotides comprising a base sequence that is complementary to a region of a biomarker RNA or its complementary cDNA that is reverse transcribed from the biomarker RNA template (i.e., the sequence of the probe region is complementary to or identically present in the biomarker RNA to be detected) such that the probe is specifically hybridizable to the resulting PCR amplicon. In some embodiments, the probe comprises a region of at least 6 contiguous nucleotides having a base sequence that is fully complementary to or identically present in a region of a cDNA that has been reverse transcribed from a biomarker RNA template, such as comprising a region of at least 8 contiguous nucleotides, or comprising a region of at least 10 contiguous nucleotides, or comprising a region of at least 12 contiguous nucleotides, or comprising a region of at least 14 contiguous nucleotides, or even comprising a region of at least 16 contiguous nucleotides having a base sequence that is complementary to or identically present in a region of a cDNA reverse transcribed from a biomarker RNA to be detected.
Preferably, the region of the cDNA that has a sequence that is complementary to the TaqMan™ probe sequence is at or near the center of the cDNA molecule. In some embodiments, there are independently at least 2 nucleotides, such as at least 3 nucleotides, such as at least 4 nucleotides, such as at least 5 nucleotides of the cDNA at the 5'-end and at the 3'-end of the region of complementarity. In typical embodiments, all biomarker RNAs are detected in a single multiplex reaction. In these embodiments, each TaqMan™ probe that is targeted to a unique cDNA is spectrally distinguishable when released from the probe. Thus, each biomarker RNA is detected by a unique fluorescence signal.
In some embodiments, expression levels may be represented by gene transcript numbers per nanogram of cDNA. To control for variability in cDNA quantity, integrity and the overall transcriptional efficiency of individual primers, RT-PCR data can be subjected to standardization and normalization against one or more housekeeping genes as has been previously described. See, e.g., Rubie et al., Mol. Cell. Probes 19(2):101-9 (2005).
Appropriate genes for normalization in the methods described herein include those as to which the quantity of the product does not vary between different cell types, cell lines or under different growth and sample preparation conditions. In some embodiments, endogenous housekeeping genes useful as normalization controls in the methods described herein include, but are not limited to, ACTB, BAT1 , B2M, TBP, U6 snRNA, RNU44, RNU 48, and U47. In typical embodiments, the at least one endogenous housekeeping gene for use in normalizing the measured quantity of RNA is selected from ACTB, BAT1 , B2M, TBP, U6 snRNA, U6 snRNA, RNU44, RNU 48, and U47. In some embodiments, normalization to the geometric mean of two, three, four or more housekeeping genes is performed. In some embodiments, one housekeeping gene is used for normalization. In some embodiments, two, three, four or more housekeeping genes are used for normalization.
In some embodiments, labels that can be used on the FRET probes include colorimetric and fluorescent labels such as Alexa Fluor dyes, BODIPY dyes, such as BODIPY FL; Cascade Blue; Cascade Yellow; coumarin and its derivatives, such as 7- amino-4-methylcoumarin, aminocoumarin and hydroxycoumarin; cyanine dyes, such as Cy3 and Cy5; eosins and erythrosins; fluorescein and its derivatives, such as fluorescein isothiocyanate; macrocyclic chelates of lanthanide ions, such as Quantum Dye™; Marina Blue; Oregon Green; rhodamine dyes, such as rhodamine red, tetramethylrhodamine and rhodamine 6G; Texas Red; fluorescent energy transfer dyes, such as thiazole orange-ethidium heterodimer; and, TOTAB.
Specific examples of dyes include, but are not limited to, those identified above and the following: Alexa Fluor 350, Alexa Fluor 405, Alexa Fluor 430, Alexa Fluor 488, Alexa Fluor 500. Alexa Fluor 514, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 610, Alexa Fluor 633, Alexa Fluor 647, Alexa Fluor 660, Alexa Fluor 680, Alexa Fluor 700, and, Alexa Fluor 750; amine-reactive BODIPY dyes, such as BODIPY 493/503, BODIPY 530/550, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591 , BODIPY 630/650, BODIPY 650/655, BODIPY FL, BODIPY R6G, BODIPY TMR, and, BODIPY-TR; Cy3, Cy5, 6-FAM, Fluorescein Isothiocyanate, HEX, 6-JOE, Oregon Green 488, Oregon Green 500, Oregon Green 514, Pacific Blue, REG, Rhodamine Green, Rhodamine Red, Renographin, ROX, SYPRO, TAMRA, 2',4',5',7'-Tetrabromosulfonefluorescein, and TET.
Specific examples of fluorescently labeled ribonucleotides useful in the preparation of RT-PCR probes for use in some embodiments of the methods described herein are available from Molecular Probes (Invitrogen), and these include, Alexa Fluor 488-5-UTP, Fluorescein-12-UTP, BODIPY FL-14-UTP, BODIPY TMR-14-UTP, Tetramethylrhodamine-6-UTP, Alexa Fluor 546-14-UTP, Texas Red-5-UTP, and BODIPY TR-14-UTP. Other fluorescent ribonucleotides are available from Amersham Biosciences (GE Healthcare), such as Cy3-UTP and Cy5-UTP.
Examples of fluorescently labeled deoxyribonucleotides useful in the preparation of RT-PCR probes for use in the methods described herein include Dinitrophenyl (DNP)- r-dUTP, Cascade Blue-7-dUTP, Alexa Fluor 488-5-dUTP, Fluorescein-12-dUTP, Oregon Green 488-5-dUTP, BODIPY FL-14-dUTP, Rhodamine Green-5-dUTP, Alexa Fluor 532-5-dUTP, BODIPY TMR-14-dUTP, Tetramethylrhodamine-6-dUTP, Alexa Fluor 546-14-dUTP, Alexa Fluor 568-5-dUTP, Texas Red-12-dUTP, Texas Red-5-dUTP, BODIPY TR-14-dUTP, Alexa Fluor 594-5-dUTP, BODIPY 630/650-14-dUTP, BODIPY 650/665-14-dUTP; Alexa Fluor 488-7-OBEA-dCTP, Alexa Fluor 546-16-OBEA-dCTP, Alexa Fluor 594-7-OBEA-dCTP, Alexa Fluor 647-12-OBEA-dCTP. Fluorescently labeled nucleotides are commercially available and can be purchased from, e.g., Invitrogen.
In some embodiments, dyes and other moieties, such as quenchers, are introduced into nucleic acids used in the methods described herein, such as FRET probes, via modified nucleotides. A "modified nucleotide" refers to a nucleotide that has been chemically modified, but still functions as a nucleotide. In some embodiments, the modified nucleotide has a chemical moiety, such as a dye or quencher, covalently attached, and can be introduced into an oligonucleotide, for example, by way of solid phase synthesis of the oligonucleotide. In other embodiments, the modified nucleotide includes one or more reactive groups that can react with a dye or quencher before, during, or after incorporation of the modified nucleotide into the nucleic acid. In specific embodiments, the modified nucleotide is an amine-modified nucleotide, i.e., a nucleotide that has been modified to have a reactive amine group. In some embodiments, the modified nucleotide comprises a modified base moiety, such as uridine, adenosine, guanosine, and/or cytosine. In specific embodiments, the amine-modified nucleotide is selected from 5-(3-aminoallyl)-UTP; 8-[(4-amino)butyl]-amino-ATP and 8-[(6- amino)butyl]-amino-ATP; N6-(4-amino)butyl-ATP, N6-(6-amino)butyl-ATP, N4-[2,2-oxy- bis-(ethylamine)]-CTP; N6-(6-Amino)hexyl-ATP; 8-[(6-Amino)hexyl]-amino-ATP; 5- propargylamino-CTP, 5-propargylamino-UTP. In some embodiments, nucleotides with different nucleobase moieties are similarly modified, for example, 5-(3-aminoallyl)-GTP instead of 5-(3-aminoallyl)-UTP. Many amine modified nucleotides are commercially available from, e.g., Applied Biosystems, Sigma, Jena Bioscience and TriLink.
In some embodiments, the methods of detecting at least one biomarker RNA described herein employ one or more modified oligonucleotides, such as oligonucleotides comprising one or more affinity-enhancing nucleotides. Modified oligonucleotides useful in the methods described herein include primers for reverse transcription, PCR amplification primers, and probes. In some embodiments, the incorporation of affinity-enhancing nucleotides increases the binding affinity and specificity of an oligonucleotide for its target nucleic acid as compared to oligonucleotides that contain only deoxyribonucleotides, and allows for the use of shorter oligonucleotides or for shorter regions of complementarity between the oligonucleotide and the target nucleic acid.
In some embodiments, affinity-enhancing nucleotides include nucleotides comprising one or more base modifications, sugar modifications and/or backbone modifications.
In some embodiments, modified bases for use in affinity-enhancing nucleotides include 5-methylcytosine, isocytosine, pseudoisocytosine, 5-bromouracil, 5- propynyluracil, 6-aminopurine, 2-aminopurine, inosine, diaminopurine, 2-chloro-6- aminopurine, xanthine and hypoxanthine.
In some embodiments, affinity-enhancing modifications include nucleotides having modified sugars such as 2'-substituted sugars, such as 2'-0-alkyl-ribose sugars, 2'-amino-deoxyribose sugars, 2'-fluoro- deoxyribose sugars, 2'-fluoro-arabinose sugars, and 2'-0-methoxyethyl-ribose (2'MOE) sugars. In some embodiments, modified sugars are arabinose sugars, or d-arabino-hexitol sugars.
In some embodiments, affinity-enhancing modifications include backbone modifications such as the use of peptide nucleic acids (e.g., an oligomer including nucleobases linked together by an amino acid backbone). Other backbone modifications include phosphorothioate linkages, phosphodiester modified nucleic acids, combinations of phosphodiester and phosphorothioate nucleic acid, methylphosphonate, alkylphosphonates, phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidates, carboxymethyl esters, methylphosphorothioate, phosphorodithioate, p-ethoxy, and combinations thereof.
In some embodiments, the oligomer includes at least one affinity-enhancing nucleotide that has a modified base, at least nucleotide (which may be the same nucleotide) that has a modified sugar and at least one internucleotide linkage that is non-naturally occurring.
In some embodiments, the affinity-enhancing nucleotide contains a locked nucleic acid ("LNA") sugar, which is a bicyclic sugar. In some embodiments, an oligonucleotide for use in the methods described herein comprises one or more nucleotides having an LNA sugar. In some embodiments, the oligonucleotide contains one or more regions consisting of nucleotides with LNA sugars. In other embodiments, the oligonucleotide contains nucleotides with LNA sugars interspersed with deoxy ribonucleotides.
The term "primer" as used herein refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used. A primer typically contains 15-25 or more nucleotides, although it can contain less. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art.
In addition, a person skilled in the art will appreciate that a number of methods can be used to determine the amount of a protein product of the biomarker of the disclosure, including immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
Accordingly, in another embodiment, an antibody is used to detect the polypeptide products of the 78 biomarkers listed in Table 3.1.1 , or Table 3.1 .2. In another embodiment, the sample comprises a tissue sample. In a further embodiment, the tissue sample is suitable for immunohistochemistry.
The term "antibody" as used herein is intended to include monoclonal antibodies, polyclonal antibodies, and chimeric antibodies. The antibody may be from recombinant sources and/or produced in transgenic animals. The term "antibody fragment"" as used herein is intended to include Fab, Fab', F(ab')2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, and multimers thereof and bispecific antibody fragments. Antibodies can be fragmented using conventional techniques. For example, F(ab')2 fragments can be generated by treating the antibody with pepsin. The resulting F(ab')2 fragment can be treated to reduce disulfide bridges to produce Fab' fragments. Papain digestion can lead to the formation of Fab fragments. Fab, Fab' and F(ab')2, scFv, dsFv, ds-scFv, dimers, minibodies, diabodies, bispecific antibody fragments and other fragments can also be synthesized by recombinant techniques.
Conventional techniques of molecular biology, microbiology and recombinant DNA techniques are within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook, Fritsch & Maniatis, 1989, Molecular Cloning: A Laboratory Manual, Second Edition; Oligonucleotide Synthesis (M. J. Gait, ed., 1984); Nucleic Acid Hybridization (B. D. Harnes & S. J. Higgins, eds., 1984); A Practical Guide to Molecular Cloning (B. Perbal, 1984); and a series, Methods in Enzymology (Academic Press, Inc.); Short Protocols In Molecular Biology, (Ausubel et al., ed., 1995).
For example, antibodies having specificity for a specific protein, such as the protein product of a biomarker, may be prepared by conventional methods. A mammal, (e.g., a mouse, hamster, or rabbit) can be immunized with an immunogenic form of the peptide which elicits an antibody response in the mammal. Techniques for conferring immunogenicity on a peptide include conjugation to carriers or other techniques well known in the art. For example, the peptide can be administered in the presence of adjuvant. The progress of immunization can be monitored by detection of antibody titers in plasma or serum. Standard ELISA or other immunoassay procedures can be used with the immunogen as antigen to assess the levels of antibodies. Following immunization, antisera can be obtained and, if desired, polyclonal antibodies isolated from the sera.
To produce monoclonal antibodies, antibody producing cells (lymphocytes) can be harvested from an immunized animal and fused with myeloma cells by standard somatic cell fusion procedures thus immortalizing these cells and yielding hybridoma cells. Such techniques are well known in the art, as well as other techniques such as the human B-cell hybridoma technique. Hybridoma cells can be screened immunochemically for production of antibodies specifically reactive with the peptide and the monoclonal antibodies can be isolated.
In some embodiments, recombinant antibodies are provided that specifically bind protein products of the genes described in Example 3. Recombinant antibodies include, but are not limited to, chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, single-chain antibodies and multi-specific antibodies. A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a murine monoclonal antibody (mAb) and a human immunoglobulin constant region. Single-chain antibodies have an antigen binding site and consist of single polypeptides. They can be produced by techniques known in the art. Multi-specific antibodies are antibody molecules having at least two antigen-binding sites that specifically bind different antigens. Such molecules can be produced by techniques known in the art.
Monoclonal antibodies directed against any of the expression products of the genes described in Example 3 can be identified and isolated by screening a recombinant combinatorial immunoglobulin library (e.g., an antibody phage display library) with the polypeptide(s) of interest. Kits for generating and screening phage display libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01 ; and the Stratagene SurfZAP Phage Display Kit, Catalog No. 240612).
Humanized antibodies are antibody molecules from non-human species having one or more complementarity determining regions (CDRs) from the non-human species and a framework region from a human immunoglobulin molecule. Humanized monoclonal antibodies can be produced by recombinant DNA techniques known in the art.
In some embodiments, humanized antibodies can be produced, for example, using transgenic mice which are incapable of expressing endogenous immunoglobulin heavy and light chains genes, but which can express human heavy and light chain genes. The transgenic mice are immunized in the normal fashion with a selected antigen, e.g., all or a portion of a polypeptide corresponding to a protein product. Monoclonal antibodies directed against the antigen can be obtained using conventional hybridoma technology. The human immunoglobulin transgenes harbored by the transgenic mice rearrange during B cell differentiation, and subsequently undergo class switching and somatic mutation. Thus, using such a technique, it is possible to produce therapeutically useful IgG, IgA and IgE antibodies.
Antibodies may be isolated after production (e.g., from the blood or serum of the subject) or synthesis and further purified by well-known techniques. For example, IgG antibodies can be purified using protein A chromatography. Antibodies specific for a protein can be selected or (e.g., partially purified) or purified by, e.g., affinity chromatography. For example, a recombinantly expressed and purified (or partially purified) expression product may be produced, and covalently or non-covalently coupled to a solid support such as, for example, a chromatography column. The column can then be used to affinity purify antibodies specific for the protein products of the genes described in Example 3 from a sample containing antibodies directed against a large number of different epitopes, thereby generating a substantially purified antibody composition, i.e., one that is substantially free of contaminating antibodies. By a substantially purified antibody composition it is meant, in this context, that the antibody sample contains at most only 30% (by dry weight) of contaminating antibodies directed against epitopes other than those of the protein products of the genes described in Example 3, and preferably at most 20%, yet more preferably at most 10%, and most preferably at most 5% (by dry weight) of the sample is contaminating antibodies. A purified antibody composition means that at least 99% of the antibodies in the composition are directed against the desired protein.
In some embodiments, substantially purified antibodies may specifically bind to a signal peptide, a secreted sequence, an extracellular domain, a transmembrane or a cytoplasmic domain or cytoplasmic membrane of a protein product of one of the genes described in Example 3.
In some embodiments, antibodies directed against a protein product of one of the genes described in Example 3 can be used to detect the protein products or fragment thereof (e.g., in a cellular lysate or cell supernatant) in order to evaluate the level and pattern of expression of the protein. Detection can be facilitated by the use of an antibody derivative, which comprises an antibody coupled to a detectable substance. Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, β-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 1251, 131 1, 35S or 3H.
A variety of techniques can be employed to measure expression levels of each of the products from the 78 genes shown in Table 3.1 .1 or the 64 genes shown in Table 3.1 .2 given a sample that contains protein products that bind to a given antibody. Examples of such formats include, but are not limited to, enzyme immunoassay (EIA), radioimmunoassay (RIA), Western blot analysis and enzyme linked immunoabsorbant assay (ELISA). A skilled artisan can readily adapt known protein/antibody detection methods for use in determining protein expression levels of the products of the genes described in Example 3.
In one embodiment, antibodies, or antibody fragments or derivatives, can be used in methods such as Western blots or immunofluorescence techniques to detect the expressed proteins. In some embodiments, either the antibodies or proteins are immobilized on a solid support. Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody. Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros, and magnetite.
One skilled in the art will know many other suitable carriers for binding antibody or antigen, and will be able to adapt such support for use with the present disclosure. The support can then be washed with suitable buffers followed by treatment with the detectably labeled antibody. The solid phase support can then be washed with the buffer a second time to remove unbound antibody. The amount of bound label on the solid support can then be detected by conventional means.
Immunohistochemistry methods are also suitable for detecting the expression levels of the prognostic markers. In some embodiments, antibodies or antisera, including polyclonal antisera, and monoclonal antibodies specific for each marker may be used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Immunohistochemistry protocols and kits are well known in the art and are commercially available.
Immunological methods for detecting and measuring complex formation as a measure of protein expression using either specific polyclonal or monoclonal antibodies are known in the art. Examples of such techniques include enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RIAs), fluorescence-activated cell sorting (FACS) and antibody arrays. Such immunoassays typically involve the measurement of complex formation between the protein and its specific antibody.
Numerous labels are available which can be generally grouped into the following categories:
a. Radioisotopes, such as 36S, 14C, 1251, 3H, and 131 1. The antibody variant can be labeled with the radioisotope using the techniques described in Current
Protocols in Immunology, Vol. 1-2, Coligen et al., Ed., Wiley-lnterscience, New York, Pubs. (1991 ) for example and radioactivity can be measured using scintillation counting. b. Fluorescent labels such as rare earth chelates (europium chelates) or fluorescein and its derivatives, rhodamine and its derivatives, dansyl, Lissamine, phycoerythrin and Texas Red are available. The fluorescent labels can be conjugated to the antibody variant using techniques well known in the art. Fluorescence can be quantified using a fluorimeter; c. Various enzyme-substrate labels are available and well known to those skilled in the art. The enzyme generally catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques. For example, the enzyme may catalyze a color change in a substrate, which can be measured spectrophotometrically. Alternatively, the enzyme may alter the fluorescence or chemiluminescence of the substrate. Techniques for quantifying a change in fluorescence are described above. The chemiluminescent substrate becomes electronically excited by a chemical reaction and may then emit light which can be measured (using a chemiluminometer, for example) or donates energy to a fluorescent acceptor. Examples of enzymatic labels include luciferases (e.g., firefly luciferase and bacterial luciferase, luciferin, 2,3-dihydrophthalazinediones, malate dehydrogenase, urease, peroxidase such as horseradish peroxidase (HRPO), alkaline phosphatase, .beta.-galactosidase, glucoamylase, lysozyme, saccharide oxidases (e.g., glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase), heterocyclic oxidases (such as uricase and xanthine oxidase), lactoperoxidase, microperoxidase, and the like. Techniques for conjugating enzymes to antibodies are well known in the art.
In some embodiments, a detection label is indirectly conjugated with the antibody. The skilled artisan will be aware of various techniques for achieving this. For example, the antibody can be conjugated with biotin and any of the three broad categories of labels mentioned above can be conjugated with avidin, or vice versa. Biotin binds selectively to avidin and thus, the label can be conjugated with the antibody in this indirect manner. Alternatively, to achieve indirect conjugation of the label with the antibody, the antibody is conjugated with a small hapten (e.g., digoxin) and one of the different types of labels mentioned above is conjugated with an anti-hapten antibody (e.g., anti-digoxin antibody). In some embodiments, the antibody need not be labeled, and the presence thereof can be detected using a labeled antibody, which binds to the antibody.
The 39 gene pair signature described herein can be used to select treatment for CRC patients. As explained herein, the biomarkers can classify patients with CRC into a poor survival group or a good survival group and into groups that might benefit from adjuvant chemotherapy or not.
The term "adjuvant chemotherapy" as used herein means treatment of cancer with standard chemotherapeutic agents after surgery where all detectable disease has been removed, but where there still remains a risk of small amounts of remaining cancer. Typical chemotherapeutic agents include cisplatin, carboplatin, vinorelbine, gemcitabine, doccetaxel, paclitaxel and navelbine. Chemotherapeutic agents that are typically used to treat CRC, such as 5-fluorouracil, leucovorin, bevacizumab, cetuximab, panitumumab, and oxaliplatin are known to those in the art.
In yet another aspect, the application also provides for kits used to prognose or classify a subject with CRC into a good survival group or a poor survival group or to select a therapy for a subject with CRC that includes detection agents that can detect the expression products of the biomarkers described herein.
In some embodiments, kits are provided containing antibodies to each of the protein products of the genes described in Example 3, conjugated to a detectable substance, and instructions for use. In some embodiments, the kits comprise antibodies to the protein products of the 78 genes (39 gene pairs) listed in Table 3.1 .1 , or the 64 genes listed in Table 3.1.2. Kits may comprise an antibody, an antibody derivative, or an antibody fragment, which binds specifically with a marker protein, or a fragment of the protein. Such kits may also comprise a plurality of antibodies, antibody derivatives, or antibody fragments wherein the plurality of such antibody agents binds specifically with a marker protein, or a fragment of the protein.
In some embodiments, kits may comprise antibodies such as a labeled or label- able antibody and a compound or agent for detecting protein in a biological sample; means for determining the amount of protein in the sample; means for comparing the amount of protein in the sample with a standard; and instructions for use. Such kits can be supplied to detect a single protein or epitope or can be configured to detect one of a multitude of epitopes, such as in an antibody detection array. Arrays are described in detail herein for nucleic acid arrays and similar methods have been developed for antibody arrays.
In some aspects, a multi-gene signature is provided for prognosis or classifying patients with CRC. In some embodiments, a 39-gene pair signature is provided as described in Example 3, comprising reference values for each of the 78 genes based on relative expression data from a historical data set with a known outcome, such as good or poor survival, and/or known treatment, such as adjuvant chemotherapy.
In one aspect, relative expression data from a patient are combined with the gene-specific reference values on a gene-by-gene basis for each of genes being assessed, to generate a test value which allows prognosis or therapy recommendation. In some embodiments, relative expression data are subjected to an algorithm that yields a single test value, or combined score, which is then compared to a control value obtained from the historical expression data for a patient or pool of patients.
In some embodiments, the control value is a numerical threshold for predicting outcomes, for example good and poor outcome, or making therapy recommendations for a subject, for example adjuvant chemotherapy in addition to surgical resection or surgical resection alone. In some embodiments, a test value or combined score greater than the control value is predictive, for example, of a good outcome or benefit from adjuvant chemotherapy, whereas a combined score falling below the control value is predictive, for example, of a poor outcome or lack of benefit from adjuvant chemotherapy for a subject. In another embodiment, the test value or combined score can be used to predict BRAFT mutant-like status, as described herein.
In a further aspect, the application provides computer programs and computer implemented products for carrying out the methods described herein. Accordingly, in one embodiment, the application provides a computer program product for use in conjunction with a computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the methods described herein.
In another embodiment, the application provides a computer implemented product for predicting a prognosis or classifying a subject with CRC comprising:
a. a means for receiving values corresponding to a subject expression profile in a subject sample; and
b. a database comprising a reference expression profile associated with a prognosis, wherein the subject biomarker expression profile and the biomarker reference profile each has 78 values, each value representing the expression level of a biomarker, wherein each biomarker corresponds to one gene in Table 3.1.1 ;
wherein the computer implemented product selects the biomarker reference expression profile most similar to the subject biomarker expression profile, to thereby predict a prognosis or classify the subject. In yet another embodiment, the application provides a computer implemented product for determining therapy for a subject with CRC comprising:
a. a means for receiving values corresponding to a subject expression profile in a subject sample; and
b. a database comprising a reference expression profile associated with a therapy, wherein the subject biomarker expression profile and the biomarker reference profile each has 78 values, each value representing the expression level of a biomarker, wherein each biomarker corresponds to one gene in Table 3.1 .1 ;
wherein the computer implemented product selects the biomarker reference expression profile most similar to the subject biomarker expression profile, to thereby predict the therapy.
Another aspect relates to computer readable mediums such as CD-ROMs. In one embodiment, the application provides computer readable medium having stored thereon a data structure for storing a computer implemented product described herein.
In one embodiment, the data structure is capable of configuring a computer to respond to queries based on records belonging to the data structure, each of the records comprising:
a. a value that identifies a biomarker reference expression profile of the 78 genes in Table 3.1.1 ;
b. a value that identifies the probability of a prognosis associated with the biomarker reference expression profile.
In another aspect, the application provides a computer system comprising a. a database including records comprising a biomarker reference expression profile of the 78 genes in Table 3.1 .1 associated with a prognosis or therapy;
b. a user interface capable of receiving a selection of gene expression levels of the 78 genes in Table 3.1.1 for use in comparing to the biomarker reference expression profile in the database; and
c. an output that displays a prediction of prognosis or therapy according to the biomarker reference expression profile most similar to the expression levels of the 78 genes.
In some embodiments, the application provides a computer implemented product comprising
a. a means for receiving values corresponding to relative expression levels in a subject, of the 39 gene pairs in Table 3.1 .1 ;
b. an algorithm for calculating the top scoring pair method as described herein using the relative expression levels of the 39 gene pairs as shown in Table 3.1.1 ;
c. an output that displays the combined score; and, optionally,
d. an output that displays a prognosis or therapy recommendation based on the combined score.
A more complete understanding of the present disclosure can be obtained by reference to the following specific examples. These examples are described solely for the purpose of illustration and are not intended to limit the scope of the overall disclosure. Changes in form and substitution of equivalents are contemplated as circumstances might suggest or render expedient. Although specific terms have been employed herein, such terms are intended in a descriptive sense and not for purposes of limitation.
In the Examples below, and as used in other parts of the specification, the following terms have the following meanings: "BRAFhi" is an indicator variable (C {0, 1}) that is obtained by binarizing the BRAF score at 0.5 level; "BRAFhi. t" is an indicator variable (C {0, 1 }) that is obtained by binarizing the BRAF score at t/100-level, for example, BRAFhi.80 = 1 if and only if BRAF score≥ 0.80; "BRAF-like" refers to samples with a high BRAF score (≥ 0.5); "BRAFmut" refers to samples with mutation of BRAF, as determined by RT-PCR; it is also the indicator variable for the BRAF mutation status; "BRAF score" is the score produced by the classifier, which can be interpreted as a posteriori probability (C {0, 1}); "HR" is a hazard ratio; "KM" means Kaplan-Meier; "KRASmut" means samples with mutation of KRAS, as determined by RT-PCR; it is also the indicator variable, for the KRAS mutation status; "MSI/MSS" means microsatellite instable/stable; "OS" means overall survival; "RFS" means relapse-free survival; "SAR" means survival after relapse; and "WT2" means double wild-type, i.e. no BRAFmut nor KRASmut. Furthermore, the names used to identify specific genes referenced herein (e.g. in the Examples below, and throughout the specification) follow the Hugo Gene nomenclature system (i.e. as stored in the Hugo Gene Nomenclature Committee database - see www.genenames.org), and will be well understood by those of skill in the art. In some instances, GenelD numbers are also provided, which is the gene identification method of the Entrez Gene database (operated by the National Center for Biotechnology Information - see www.ncbi.nlm.nih.gov), and is also well understood by those of skill in the art.
EXAMPLES
Example 1 : Data used to generate the models
The BRAF signatures described herein were built by modeling a binary classification problem (BRAF mutants vs. non-BRAF and non-KRAS mutants, i.e. WT2) using three different classification algorithms: (multiple) top scoring pairs, compound covariate predictor and AdaBoost. While the signatures were derived from a dataset consisting solely of BRAF mutants and WT2 samples, they have been applied to the full population of patients, in contradiction somehow with the usual modeling paradigm which requires a representative data set for classifier training. Nevertheless, this exercise allows the identification of a larger subpopulation of patients with a consistent gene expression pattern, which is generically called a "BRAF-like" subpopulation.
The modeling set consisted of gene expression data from tumor samples from phase 1 and 2 of the PETACC3 study, and were either BRAFmut (all V600E mutants) or WT2. The PETACC3 study was an international, randomized clinical study that involved comparison of infused irinotecan + 5-fluorouracil/folinic acid (5-FU/FA) versus 5-FU/FA in patients with stage II and stage III colon cancer. One important feature of the PETACC3 study was the coordinated collection of formalin fixed, paraffin embedded (FFPE) colon cancer tumor samples. RNA from 1378 FFPE colon cancer samples was extracted for expression profiling on the Affymetrix-based platform Colorectal Cancer Disease Specific Array (DSA™) developed by Almac Diagnostics. The KRASmut were discarded from the modeling phase. To reduce batch effects as much as possible, the data from the two phases was aligned using the 45 bridging samples.
All models were assessed in two steps on the modeling set: in a first step one of the phases (either one or two, called training set) was used to estimate the performance parameters and their variance (by repeated 5-fold cross-validation) and the other phase (called validation set) was used as an independent validation set on which the performance parameters were simply measured. Then the two data sets were swapped and the processed repeated all over again. We label the two processes as Phase 1 vs 2 (phase 1 for training, phase 2 for independent validation) and Phase 2 vs 1 (phase 2 for training, phase 1 for independent validation), respectively. To avoid biasing the performance estimation, the WT2 and BRAFmut bridging samples are always considered in the training set, and not in the validation set (also for enriching the number of BRAFmut samples in the training set). Table 1.1 shows the samples sizes for the training and validation sets used in the modeling phase.
Table 1.1 : Sample sizes for the training and validation sets, on the modeling data.
Figure imgf000042_0001
Different independent data sets were used for external independent validation of the BRAF signatures. However, not all models can be assessed on totally different platforms. For example, the AdaBoost model (see Example 3.1 .3) cannot be applied to data that originates from a different platform than the one used for creating the model, due to its sensitivity to exact numerical values. On the other hand, mTSP (see Example 3.1 .1 ) can be applied on any data set which is feasible to assume that the relative ordering of the genes expression do not change from the modeling set.
The following data sets were used for external validation:
1. Cetuximab data set (CETUX):
2. Kim data set (KIM - Kim et al., Carcinogenesis 27:392-404 (2006)), which consists of 20 tissue samples of CRC of which 9 are KRASmut and 1 1 are BRAFmut (9 V600E mutants, one D594G mutant, and one G464V mutant). We considered 2 versions of this data set: one containing all the samples (KIM (all)) and one from which the non-V600E BRAF mutants are discarded (KIM (V600E)), because those mutants were considered to be more 'KRAS-like', by the authors of the paper. Performance parameters
For a binary classification problem, the following parameters are used to describe the performance:
sensitivity (Se), also called the true positive fraction, gives the proportion of "positive" samples (BRAFmut in the present case) that are correctly classified;
specificity (Sp) gives the proportion of negative samples (WT2 in the present case) that are correctly classified; error rate (Err) gives the proportion of samples that are misclassified;
area under the ROC curve (AUC) - this parameter is the most descriptive and summarizes the performance of the classifier across all possible values of the threshold and is independent of the prevalence of any of the classes in the population. However, a proper estimation needs a classifier that outputs a score and not simply a binary decision. Nevertheless, even for the binary decisions, the AUC can be approximated by
AUC = 0.5 x (Se + Sp)
Example 2: Differentially Expressed Genes
The differential gene expression (BRAF mutants vs. WT2) was assessed using a multivariate linear model (and the limma R package (Gordon, Stat. App. Gen. and Mol. Biol. 3 (2004); Smyth et al., Bioinform. 21 :2067-2075 (2005)). The linear model used adjusted for the effects of KRAS mutation status and for the known interactions with MSI status, and had the form:
gene expression ~ BRAF +KRAS+MSI+BRAF*MSI+KRAS*MSI
In this model, a gene was called differentially expressed if the adjusted p-value associated with the BRAF variable was below the preselected false discovery rate. The analysis was carried out on the full set of samples (pooled data), at probeset-level, not gene-symbol level. While there are roughly 1 ,000 probesets differentially expressed at FDR=0.001 , Table 2.1.1 provides only the top 100 differentially expressed genes (ordered by the corresponding adjusted p-value).
Table 2.1.1 Top 100 differentially expressed genes
Log-fold Adjusted P-
Gene symbol Probeset ID change value
AQP5 ADXCRPD.7995.C1_x_at 4.07 2.27E-038
F5 ADXCRAG_M14335_s_at 1 .47 4.95E-018
REG4 ADXCRIH.384.C1_s_at 3.21 5.32E-017
HSF5 ADXCRSS.Hs#S2988180_at -2.34 1.68E-016
CTSE ADXCRAG_AJ250717_s_at 3.57 9.55E-016
GGH ADXCRIH.546.C1_at -2.25 2.01 E-015
TM4SF4 ADXCRAD_BM825250_s_at 1 .26 5.89E-015
CDX2 ADXCRAG_BC014461 _x_at -1 .80 7.03E-015
LYZ ADXCRIH.1305.C1_s_at 2.36 9.13E-015 Log-fold Adjusted P-
Gene symbol Probeset ID change value
RNF43 ADXCRPDRC.4289.C1_at -1.63 8.31E-014
TFCP2L1 ADXCRPDRC.8321.C1_s_at -1.63 7.96E-013
MIRN142 ADXCRPD.15182.C1_at -1.28 1.11E-012
VNN1 ADXCRAD BP299698 s at 1.26 2.65E-012
C13orf18 ADXCRAD_NM_025113_s_at -2.11 2.97E-012
RPUSD1 ADXCRPD.7300.C1_s_at 1.88 4.93E-012
ANXA10 ADXCRAD_CK823169_at 1.08 4.99E-012
SYT13 ADXCRPD.12823.C1_s_at 0.96 5.07E-012
SATB2 ADXCRPD.10016.C1_at -2.31 8.90E-012
PIWIL1 ADXCRAG_BC028581_x_at 1.97 9.60E-012
SOX8 ADXCRAG_AK024491_s_at 2.31 1.90E-011
EPDR1 ADXCRPD.4010.C1_s_at -1.52 1.97E-011
VAV3 ADXCRAD_NM_006113_s_at -1.71 2.38E-011
AMACR ADXCRAG_AF047020_s_at -1.27 3.10E-011
PLAGL2 ADXCRPD.16547.C1_at -1.03 5.00E-011
ARID3A ADXCRAD_BP389511_at -1.48 6.12E-011
DPP4 ADXCRAD_BX110831_s_at 0.79 7.18E-011
TSPAN6 ADXCRIH.1064.C1_at -1.42 7.20E-011
INPP5J ADXCRAG_U45975_s_at -1.30 1.80E-010
LOC100130716 ADXCRAD_XM_168585_s_at -2.32 2.21E-010
SPINK1 ADXCRIH.4080.C1_s_at -2.54 2.34E-010
KLK10 ADXCRPD.7217.C1_at 1.44 2.73E-010
C1orf67 ADXCRAD_AI076810_s_at -1.44 2.91E-010
SLC14A1 ADXCRAD_BU664688_s_at 1.21 3.58E-010
CCDC113 ADXCRAD_CN256031_s_at -1.51 3.61E-010
GRM8 ADXCRAD_BG198589_at -1.76 4.25E-010
MUC12 ADXCRAD_XM_168585_at -1.86 4.64E-010
PPP1R14C ADXCRAG_AF407165_at -2.53 9.85E-010
AGRN ADXCRAG_XM_372195_s_at 0.56 9.96E-010
C1orf225 ADXCRAD_BM718216_s_at 1.52 1.15E-009
CEACAM5 ADXCRPD.11630.C1_at -1.33 1.30E-009
KIAA0802 ADXCRAG_XM_031357_s_at 1.02 1.73E-009
C10orf99 ADXCRIH.1562.C1_s_at -1.49 1.99E-009
PLLP ADXCRPD.6652.C1_s_at 0.91 2.26E-009 Log-fold Adjusted P-
Gene symbol Probeset ID change value
SPRED1 ADXCRPD.3092.C1_at 0.84 3.31 E-009
OSBPL1A ADXCRAD_CN36621 1_s_at 0.64 3.34E-009
AIFM3 ADXCRAG_BC032485_s_at -2.21 3.68E-009
LOC100134361 ADXCRAD_XM_086879_at -1 .34 4.05E-009
TRPM6 ADXCRAG_NM_017662_s_at -1 .54 4.55E-009
TNNC2 ADXCRAG_M33772_s_at -1 .87 6.25E-009
DPYSL2 ADXCRPD.15073.C1_s_at 0.98 6.32E-009
POFUT1 ADXCRPDRC.5238.C1_at -0.92 6.33E-009
AXIN2 ADXCRPDRC.1943.C1_at -1 .49 6.34E-009
B3GALT5 ADXCRAG_NM_006057_s_at 0.94 7.26E-009
TP53RK ADXCRAD_BX460824_s_at -0.92 1 .02E-008
ANTXR2 ADXCRPD.1 1 175.C1_x_at 0.79 1 .41 E-008
MUC3B ADXCRPD.9207.C1 s at -1 .53 1 .44E-008
PCLO ADXCRAD_XM_374484_at -2.08 1 .47E-008
SLC26A2 ADXCRIH.2831 .C1_s_at -1 .70 1 .78E-008
C1 1 orf9 ADXCRAD_CK005805_s_at 1 .40 2.42E-008
GDA ADXCRAG_NM_004293_s_at 1 .18 2.59E-008
AK3L1 ADXCRPDRC.963.C1_at -1 .21 3.14E-008
CELP ADXCRAD_NM_001808_s_at -1 .93 3.40E-008
MYRIP ADXCRAG_AF396687_s_at -1 .39 3.49E-008
XKR9 ADXCRSS.Hs#S5955802_at 0.68 3.98E-008
ATP5EP2 ADXCRPD.7790.C1_at -0.68 4.20E-008
TSPAN8 1_RDCR049_C08_at 0.69 4.73E-008
DPEP1 ADXCRPD.6962.C1_x_at -1 .27 6.94E-008
ACSF2 ADXCRPD.6142.C1_at -1 .33 7.16E-008
MEGF6 ADXCRSS.Hs#S3733570_at 1 .01 7.20E-008
TMEM56 ADXCRIH.578.C1_s_at -0.96 7.85E-008
MLPH ADXCRPD.1 1 15.C1_s_at 1 .54 8.13E-008
TFAP2A ADXCRAD_BM719170_at 1 .28 8.46E-008
WDR18 ADXCRPD.470.C1_x_at -1 .43 9.22E-008
CRIP1 ADXCRPD.5224.C1_x_at 1 .17 9.32E-008
DUSP4 ADXCRAD_BM852899_at 1 .23 1 .00E-007
TIMM8AP ADXCRAD_AI09251 1_at 0.58 1 .18E-007
CABLES1 ADXCRAD_BX108451_s_at 0.80 1.29E-007 Log-fold Adjusted P-
Gene symbol Probeset ID change value
PABPC1 L ADXCRPD.4612.C1_s_at -1 .25 1 .43E-007
CBFA2T2 ADXCRAD_BE218916_s_at -0.84 2.37E-007
FLJ32063 ADXCRAD_BX1 19160_s_at -1 .03 2.44E-007
SLC19A3 ADXCRAG_AF283317_s_at -0.77 2.93E-007
ASAP2 ADXCRIH.1590.C1_s_at 0.59 3.43E-007
RASGRF2 ADXCRPD.15964. C1_at 1.12 3.50E-007
ZNRF3 ADXCRAG_BC021570_at -0.88 3.82E-007
GPR143 ADXCRPD.3734.C1_s_at -0.80 4.05E-007
EPHA4 ADXCRAG_BC016981_s_at 0.52 4.17E-007
C20orf1 12 ADXCRPDRC.3468.C1_s_at -0.86 4.22E-007
PPP1 R14D ADXCRPD.7272.C1_s_at -0.87 4.25E-007
SLC5A6 ADXCRAG_BC015631_s_at -1.05 4.30E-007
BST2 ADXCRIH.286.C1_s_at 0.82 4.49E-007
TM9SF4 ADXCRAG_BC022850_s_at -0.80 4.53E-007
NT5C3L ADXCRPD.10192.C1_s_at -0.90 4.62E-007
TRERF1 ADXCRAD_CV572327_x_at 0.67 4.75E-007
GNG4 ADXCRPD.1216.C1_s_at -0.91 4.75E-007
KLK7 ADXCRAG_NM_005046_s_at 1 .64 4.80E-007
KLK1 1 ADXCRAD_NM_144947_s_at 0.91 4.92E-007
GCNT3 ADXCRPD.15673. C1_at 1 .67 4.93E-007
PIPOX ADXCRAD_BM690859_at -1 .55 4.97E-007
PLCB4 ADXCRPDRC.13802.C1 at -1 .17 5.03E-007
CLDN2 ADXCRAG NM 020384 at 1.78 5.38E-007
Further analyses, using some additional samples not included previously, resulted differentially expressed genes shown in Table 2.1.2.
Table 2.1.2
Gene Gene name Log-Fold change Adjusted P-value
AQP5 Aquaporin 5 5.5 9.7E-21
CTSE cathepsin E 4,8 5.43E-1 1
SRY (sex determining
SOX8 2,6 0,000217
region Y)-box 8 Gene Gene name Log-Fold change Adjusted P-value
Regenerating islet-derived
REG4 3,7 1 ,68E-07
family, member 4
PIWIL1 piwi-like 1 2,5 9,43E-05
AXIN2 Axin 2 -2, 1 5,65E-06
CDX2 caudal type homeobox 2 -2,3 1 .16E-10
heat shock transcription
HSF5 -3, 1 5.79E-10
factor family member 5
transcription factor CP2-
TFCP2L1 -2,3 4,5E-08
like 1
gamma-glutamyl
GGH -3,0 8,54E-08
hydrolase
RNF43 ring finger protein 43 -2,2 1 ,35E-07
protein tyrosine
PTPRO phosphatase, receptor -3,2 5,65E-06
type, 0
serine peptidase inhibitor,
SPINK1 -2,9 9,73E-06
Kazal type 1
SATB2 SATB homeobox 2 -2,5 1 .17E-05
DPEP1 -2,4 2.76E-05
TNNC2 -2,2 9,21 E-05
PCLO -2,5 0,00017
Example 3: Braf Signatures
3.1 Introduction
3.1 .1 Top Scoring Pairs - and extensions
Let X be a data matrix with variables by columns (in the present case a gene expression matrix, with genes by columns, samples by rows). The top scoring pairs (TSPs) method (Geman et al., Stat. Appl. Genet. Mol. Biol. 3, Article 19 (2004)) seeks a pair of variables i, j such that Xk < X^ for all samples k labeled as (positive class) and Xk > Xkj for all samples k labeled as "0". While in a real life situation, there is no pair of variables to provide a perfect classification, the method ranks the pairs according to the proportion of erroneous predictions they make. Those top ranking pairs are usually considered for making the predictions. However, the practice demonstrates that the top scoring pairs share a lot of common variables, so their predictions are correlated. To increase the prediction power of the classifier, we propose to combine several TSPs (or their decisions). To this end, we filtered the list of TSPs in such a way that each variable (gene) appears only once in the list on either side of the inequality and then we average the values of these variables, to produce a new prediction rule, as follows. From a list of TSPs of the form
X.,/'? < X.,/2→ class +1
X.,/3 < X.,i4→ class +1
we produce the rule
∑ X.,ik <∑X.,/'/→ class +1
/
This rule is referred to as mefa-TSP (mTSP) herein. The main advantage of using (m)TSP consists in its applicability across platforms and technologies. Its drawbacks come from the fact that it does not produce a score and its somewhat reduced performance, depending on the particular problem. 3.1.2 Compound Covariate Predictor - and extensions
Compound covariate predictor (CCP - Radmacher et al., J. Comput. Biol. 9:505- 51 1 (2002)) is another simple classification rule that, in contrast with TSPs, builds a score which is used for making the final prediction. The score for the sample k has the form
ck =∑ tiXk,i
/
where t, is some coefficient. In its original form, CCP proposes to use the t-statistic to rank all variables (genes), and use the corresponding statistic as the coefficient in the sum above. Only the top m variables are used in the sum, with m to be tuned via some cross-validation process, for example. For making a prediction, a threshold Co must be chosen. The simplest choice is to take
Figure imgf000048_0001
where lo and l+i are the indexes of the samples in class "0" and "+1 ", respectively. While this version works fairly well in practice, several extensions can be imagined to either improve robustness (for cross-platform applicability) or to improve the variable (gene) ranking/selection process. By convention, we divide the variables (genes) into two groups: those positively associated with class (they have a positive sign for the ranking statistics), and those negatively associated with class Then, the following extensions are immediate:
• instead of using a coefficient specific to each variable, just average all variables - either globally or within the groups of positively and negatively associated variables. The sign is, nevertheless, preserved.
• instead of using t-statistics to rank the genes, use a linear model of the form
X.,, ~ class indicator + other covariates.
and rank the genes by the p-values corresponding to the class indicator. Again, for the coefficient one can choose any of the options above. For example, in the case of BRAFm prediction, the linear model used was
X.,/ ~ BRAFm + MSI + site,
so the variable selection is done with adjustment for MSI status and tumor site.
The version of CCP used reported herein, called CCP2, uses the linear model for gene ranking (with adjustment for MSI status and tumor site) and takes the averages of positively and negatively associated genes separately:
I positively -associated -genes ^2 negatively -associated -genes
3.1.3 AdaBoost method
Boosting refers to a general class of methods that produce accurate decision rules by combining rough and slightly better than chance base rules (weak learners). Boosting proceeds by repeatedly training the weak learners on different distributions over the training set. For a given sample, the final prediction is obtained by combining the predictions of the individual weak learners. Different combination approaches can be attempted, but usually a simple weighted majority voting scheme is adopted. Even though the early versions of the boosting algorithm were provably converging to an improved classification rule (with respect to the performance of any of the weak learners), they suffered from serious practical drawbacks. The first practically usable version of boosting was AdaBoost, introduced in 1995 by Freund and Schapire. The version of AdaBoost used in developing the BRAF-gene signature fits a generalized linear model using the boosting algorithm based on univariate linear models as weak learners (Buhlmann and Yu, J. Amer. Stat. Assoc. 98:324-339 (2003)). This algorithm is implemented in the R package mboost available from http://stat.ethz.ch/CRAN/. There are a number of advantages in using AdaBoost, particularly the version
mentioned above:
· the algorithm produces a sparse classifier - in the sense that the number of variables (genes in the present case) in the final model is small when compared with the initial dimensionality of the feature space;
• the classification rule is robust - this is a consequence of the fact that AdaBoost algorithms converge to a large margin classifier (maximize the separation between classes);
• by adopting univariate weak learners, AdaBoost will implicitly perform a variable selection as well (selecting those genes that contribute most to the discrimination between classes);
• AdaBoost is resistant to overfitting, meaning that there is a high probability that the training performance will be reproduced on other independent data sets.
One thing must be stressed: the model produced is minimalistic, in the sense that not all genes that could be included in the model are considered. Rather, the minimal set of genes that lead to a good classifier is selected. This means that other genes that are correlated with those in the model could also be considered. However, this strategy would not lead to an improved classification performance and the model would become redundant.
3.2 Signatures
3.2.1 mTSP
The individual TSP predicts "BRAFm" when Genel <Gene2. From all individual TSPs with a score above 0.6, a number of TSPs were selected such that each gene appears only once. These unique TSPs are averaged (all Genel and all Gene2 values are averaged separately) and the final prediction is: if average of all Genel is less than average of all Gene2 than predict BRAFm. Table 3.1 .1 lists the pairs of genes that were obtained from the modeling set (all PETACC3 samples, pooled), as the final model. Table 3.1.1 The 39 TSPs making up the meta-TSP, as obtained from the modeling set. Genel Gene2
1 GGH LYPLA1
2 PTPRO AQP5
3 CCDC1 13 REG4
4 PPP1 R14D CTSE
5 HSF5 LM04
6 GRM8 DUSP4
7 TSPAN6 RBBP8
8 SATB2 RASSF6
9 TNNC2 CRIP1
10 MUC12 MLPH
1 1 CDX2 MALL
12 VAV3 TBC1 D8
13 RNF43 S100A16
14 EPDR1 ANXA1
15 CELP LSM7
16 PIN4 LYZ
17 POFUT1 SPRED1
18 ACOX1 KIAA0802
19 SPINK1 PLK2
20 LDLRAD3 TRNP1
21 PPP1 R14C CD55
22 PCTP SMCHD1
23 AMACR TPK1
24 GUCY2C AGR2
25 MYRIP PLLP
26 APCDD1 FSCN1
27 GNG4 TM4SF4
28 ARID3A RAB26
29 AK3L1 FUT8
30 CYP4F2 XKR9
31 IFT52 SOX13
32 TP53RK ANP32E
33 DPEP1 QSOX1
34 GPR160 FLJ23867
35 CCDC56 PKM2 Genel Gene2
36 FAM84A MEGF6
37 ZNF518B ASPHD2
38 SEMA5A MAP3K5
39 HEPH DPP4
Further analyses, using some additional samples not included previously, resulted in a 32 gene pair meta-TSP signature as shown below in Table 3.1.2.
Table 3.1.2
Gene 1 Gene2
1 C13orf18 CTSE
2 DDC AQP5
3 PPP1 R14D REG4
4 HSF5 RSBN1 L
5 SATB2 RASSF6
6 TNNC2 CRIP1
7 GGH PPPDE2
8 SPINK1 PLK2
9 PTPRO TM4SF4
10 ZSWIM1 MLPH
11 RNF43 RBM8A
12 CELP SOX8
13 CBFA2T2 PIWIL1
14 PTPRD LOC388199
15 CDX2 S100A16
16 TSPAN6 RBBP8
17 VAV3 OSBP2
18 CFTR KLK10
19 PHYH DUSP4
20 PLCB4 HOXD3
21 ZNF141 C1 1 orf9
22 PPP1 R14C CD55
23 FLJ32063 TRNP1 Gene 1 Gene2
24 APCDD1 FSCN1
25 ACOX1 KIAA0802
26 C10orf99 PLLP
27 MIR142 IRX3
28 ARID3A SLC25A37
29 C20orf 1 1 1 PIK3AP1
30 AMACR TPK1
31 AIFM3 ZIC2
32 CTTNBP2 SERPINB5
3.2.2 CCP2
As mentioned before, CCP2 takes the difference between the average of positively associated genes and the average of the negatively associated genes with BRAFm, from a linear model (see Example 3.1 .2). One has to choose the number of genes to be included in the model. A sensitivity analysis on the modeling set was performed using phase 1 data as training and phase 2 as validation and then swapping the two data sets, and varying the number of selected genes from 10 to 300, in increments of 10. While the performance varied slightly with the number of selected genes, this variability remained limited. In Figures 1 A and B and 2 A and B, the AUC and error rates obtained are presented. The final model contains 100 genes, which are provided below:
• positively associated genes: ABLIM3, ANXA1 , ANXA10, AP1 S3, AQP5, B3GALT2, B3GALT5, BST2, CABLES1 , CD109, CRIP1 , CTSE, DCBLD2, DNAH2, DPP4, DPYSL2, DUSP4, EPHA4, EPHB6, F5, GABRE, GDA, GPR126, HCRP1 , HOXB2, INSM1 , KIAA0802, KLK1 1 , KLK6, KLK7, LYZ, MLPH, NT5E, PFKP, PIWIL1 , PKM2, PLLP, PMAIP1 , PON3, PRDM16, REG4, SERPINB5, SLC14A1 , SLC1A1 , SMCHD1 , SOX13, SOX2, SOX8, SPRR1A, SPRR1 B, STS, TFAP2A, TIMM8AP1 , TM4SF4, TRERF1 , TRNP1 , UBASH3B, VNN1 , XKR9
· negatively associated genes: AIFM3, AK3L1 , AMACR, APCDD1 , ARID3A, AXIN2, CCDC1 13, CDHR1 , CDX2, DDC, DPEP1 , EPDR1 , FAM84A, GGH, GPR143, GPR160, GRM8, GUCY2C, H2AFY2, HSF5, INPP5D, INPP5J, LDLRAD3, MUC12, MUC3B, MYRIP, PARM1 , PCLO, POFUT1 , PPP1 R14C, PPP1 R14D, PTPRO, RNF43, SATB2, SEMA5A, SPINK1 , SUPT4H1 , TM9SF4, TSPAN6, VAV3, ZNF518B
3.2.3 AdaBoost
The AdaBoost signature contains 29 genes which are combined through a weighted mean. Table 3.2 lists these genes and the corresponding coefficients.
Table 3.2 The AdaBoost model.
GenelD Symbol Coeff
1 6690 SPINK1 -0.1989
2 9221 1 CDHR1 -0.1559
3 83998 REG4 0.1 191
4 362 AQP5 0.1 131
5 6698 SPRR1A 0.0855
6 4885 NPTX2 -0.0852
7 30812 SOX8 0.0549
8 5473 PPBP -0.0538
9 2918 GRM8 -0.0528
10 412 STS 0.0489
1 1 145447 ABHD12B -0.0488
12 181 1 SLC26A3 -0.0462
13 7546 ZIC2 0.0459
14 146336 LOC146336 0.0415
15 6699 SPRR1 B 0.0413
16 4923 NTSR1 0.0340
17 55328 RNLS -0.031 1
18 54749 EPDR1 -0.0250
19 57415 C3orf14 -0.0219
20 100134361 LOC100134361 -0.0182
21 5650 KLK7 0.0170
22 51268 PIPOX -0.0157
23 3223 HOXC6 0.0094
24 84959 UBASH3B 0.0081
25 5366 PMAIP1 0.0074
26 5800 PTPRO -0.0058 GenelD Symbol Coeff
27 4291 MLF1 0.0039
28 1670 DEFA5 -0.0034
29 5968 REG1 B 0.0024
3.3 Internal validation of signatures
The signature development process has been validated in two stages, using one data batch as a training/modeling set and the other one as an independent validation set. The 45 bridging samples were always considered in the training set (to keep the number of BRAFmut samples at a reasonable level), and their replicates have been removed from the validation set. On the training set, the performance of the classifier has been estimated by repeated (10 times) stratified 5-fold cross validation. The same performance parameters (area under the ROC curve - AUC, sensitivity, specificity and error rate) were measured on the validation sets. Table 3.3 lists these performance measures. The main criterion for judging the performance of the classifiers was the AUC as it is independent of the classifier threshold and of the prevalence of BRAF mutations. Note that this is only a subset of the full PETACC3 data set, which contains only BRAFmut and WT2, the KRASmut being discarded.
Table 3.3: Estimated and validation performance of the BRAF classifiers. For the estimated parameters, the standard deviation of the estimates are given between parentheses. T - train, V - validation; Ph. 1 - phase 1 data, Ph. 2 - phase 2 data. The pooled estimates correspond to the results of repeated cross-validation on the pooled data.
Train/
Valid AUC Sensitivity Specificity Error rate
88.40% 85.36% 14.12%
T: Ph. 1
0.869 0.948 (14.07) (7.86) (6.78) V: Ph. 2
100.0% 89.67% 9.69
88.40% 89.69% 10.44% mTSP T: Ph. 2
0.890 0.892 (15.00) (4.76) (4.59) V: Ph. 1
92.86% 85.45% 13.71 %
92.50% 87.77% 1 1.74%
Pooled 0.901
(8.94) (4.12) (3.71 )
Figure imgf000056_0001
3.4 External validation of signatures
Only the CCP2 and mTSP signatures are susceptible to work on other platforms than those on which they were built. The AdaBoost signature is bound to the platform on which the model was produced. Although CCP2 requires a threshold to produce the labels (which is platform-dependent), it can still be applied on various other platforms, but its performance must be judged only by AUC (threshold-independent). To keep in mind when interpreting the results:
• the models were built to discriminate between BRAFmut and WT2; there were no KRASmut in the modeling set;
• KIM data set contains only KRASmut and BRAFmut; 2 out of 1 1 (18.18%) BRAFmut are not V600E mutants (as were those in the modeling set) and they are always classified as non-BRAFmut;
• CETUX data set originates from an Almac platform, as the one which generated the modeling set - that is why the AdaBoost classifier could be applied as well;
• CCP2 uses a threshold that is tuned on the modeling set; this threshold is not portable across platforms and that is why only AUC is given for this classifier
• mTSP predictions were made with only 13 pairs (out of 39) due to missing genes on the KIM dataset Table 3.4: External validation of the different models. The results marked with a star are approximations of the real AUC.
Figure imgf000057_0001
3.5 BRAF score distributions
The scores produced by the AdaBoost classifier can be interpreted as a posteriori probability that a sample belongs to the category "BRAF mutants", so a score of at least 0.5 can be considered as predicting the "BRAF mutants" class. Or, as it will be called later on, "BRAF-like samples". While the models have been constructed without taking into account the KRAS mutants, they were applied to the whole population, including the KRASes.
Figure 3 shows the distribution of BRAF scores as well as the scores for KRASmut (small hashes along the top) and BRAFmut (small hashes along the bottom) samples. Note that all the BRAFmut samples have a high BRAF score (≥ 0.5). Also, there are 96 KRASmut samples out of 248, which have a high BRAF score (see Table 3.5 for details). Table 3.5: AdaBoost Stratification of BRAF scores by mutation status.
BRAF
score BRAFmut KRASmut WT2
BRAF high 41 96 42
BRAF low 0 152 298 On the other hand, CCP2 does not produce a posteriori probabilities, but a real value that is to be thresholded to produce the final label. This real value (the difference between the average expression level of positively and negatively associated genes, respectively) can be used as a surrogate for a score. The distribution of these values is shown in Figure 4 along with the scores of the BRAF mutants and KRAS mutants.
There are two BRAF mutant samples that are misclassified by CCP2, one of them harboring a K601 E mutation and the other one the 'classical' V600E mutation. See also Table 3.6 for details. Table 3.6: CCP2 stratification of BRAF scores by mutation status.
Figure imgf000058_0001
3.6 Comparison of predictions by the three classifiers
While the performance parameters of the three classifiers are roughly equivalent, each of them has its own strengths and weaknesses. The Venn diagrams in Figure 5 show the overlap between the predictions (agreement of classifiers), for both the BRAF- like samples (those predicted to be BRAF mutants) and WT2-like samples (those predicted to be WT2). Note that the figures do not necessarily add up to those in the clinical table, because of the missing values (even if the BRAF/KRAS status is missing in the clinical table, the sample's status was predicted).
We can further stratify the samples by their BRAF/KRAS status to see each classifier's preferences. Table 3.7 shows such a stratification for the common predictions and classifier-specific predictions. For example, intersection of all three classifier stands for the common predictions made by the three classifiers (the intersection of the three sets in Figure 5). Taking the row BRAF-like/intersection of all three as an example, one can see that out of the 126 samples that were predicted to be "BRAF-like" by all three classifiers (Figure 5), 25 are actually WT2, 36 are BRAF mutants and 56 are KRAS mutants, respectively (9 have missing values). Similarly, the row BRAF-like/mTSP shows that out of 14 samples that are predicted to be BRAF-like solely by mTSP, 4 are actually WT2 and 10 are KRAS mutants respectively, and so forth. Table 3.7: Stratification of the predictions made by the three classifiers: common and classifier-specific predictions.
Figure imgf000059_0002
Example 4: Survival Analysis
4.1 Gene-wise univariate analyses
In this section we simply assess the statistical significance of each of the genes selected in the signatures with respect to their capacity to model overall survival (OS), relapse-free survival (RFS) and survival after relapse (SAR). Cox proportional hazards models are employed and the p-values and hazard ratios are reported for each of the genes. The p-values are not adjusted for multiple testing. Each signature is analyzed separately and all its corresponding genes are listed, even though there are genes that are repeated in each signature.
4.1.1 mTSP signature
The univariate analyses for the 78 genes (39 pairs) in the mTSP signature are given in Table 4.1 . Table 4.1 : Hazard rations (HR) and p-values for the 78 genes in the mTSP signature.
Figure imgf000059_0001
OS RFS SAR
Gene HR p-value HR p-value HR p-value
AMACR 0.759074 0.000722 0.889180 0.077870 0.813820 0.002669
ANP32E 0.993829 0.953706 0.929135 0.422184 1.107437 0.248244
ANXA1 1.194339 0.019506 1 .108505 0.1 18909 1.187781 0.008244
APCDD1 1.01 1976 0.814317 1 .046970 0.289266 0.946524 0.223520
AQP5 1.065522 0.138348 0.979870 0.625296 1.133315 0.002084
ARID3A 0.976514 0.712791 1 .032744 0.552308 0.847283 0.007934
ASPHD2 0.913907 0.457890 0.815255 0.051926 1.044158 0.701053
CCDC1 13 0.759774 0.000012 0.838217 0.001040 0.867007 0.008989
CCDC56 0.655098 0.000037 0.736718 0.000744 0.710177 0.000748
CD55 1.138029 0.047873 1 .044060 0.432369 1.14561 1 0.027358
CDX2 0.839908 0.000669 0.934687 0.164696 0.778608 0.000001
CELP 0.934963 0.137288 1 .01 1415 0.761537 0.882329 0.001031
CRIP1 1.081349 0.258736 1 .014100 0.815023 1.175387 0.015704
CTSE 1.040962 0.198035 0.993135 0.804060 1.074130 0.01 1275
CYP4F2 0.970872 0.576600 1 .0401 18 0.357507 0.867677 0.001681
DPEP1 0.963414 0.574187 1 .078197 0.170482 0.894459 0.045753
DPP4 1.063181 0.138558 1 .033895 0.346706 1.033349 0.335143
DUSP4 1.055373 0.383004 1 .001065 0.983946 1.095415 0.095239
EPDR1 0.958408 0.505291 1 .023628 0.667766 0.886263 0.046942
F AM 84 A 0.873787 0.084929 0.956789 0.51 1838 0.884573 0.068570
FLJ23867 1.037426 0.634071 1 .1 16663 0.085763 0.968377 0.596619
FSCN1 1.067861 0.336202 0.996680 0.953727 1.220631 0.00141 1
FUT8 0.91 1963 0.273613 0.850633 0.023629 1.220007 0.01 1093
GGH 0.833575 0.000466 0.897208 0.013533 0.887023 0.007099
GNG4 0.965531 0.683367 1 .070530 0.343121 0.893757 0.121575
GPR160 0.906843 0.128825 0.979221 0.702616 0.909414 0.105212
GRM8 0.849902 0.022081 0.914437 0.123485 0.813905 0.002349
GUCY2C 0.948688 0.385435 0.997317 0.959904 0.889945 0.037220
HEPH 0.907534 0.206448 0.985275 0.824159 0.853810 0.030337
HSF5 0.880988 0.006229 0.971788 0.490682 0.820950 0.000002
IFT52 0.869604 0.154256 0.940783 0.459983 0.849186 0.060800
KIAA0802 1.155620 0.1 12572 1 .021792 0.784838 1.124320 0.145340
LDLRAD3 0.888478 0.123067 0.939502 0.338006 0.915010 0.225120
LM04 1.189495 0.060034 1 .055665 0.491373 1.243096 0.010835 OS RFS SAR
Gene HR p-value HR p-value HR p-value
LSM7 0.983847 0.882237 0.935364 0.478460 1.054322 0.604924
LYPLA1 0.870614 0.104542 0.940632 0.398505 0.913033 0.199888
LYZ 1.037584 0.373932 1 .000349 0.992142 1.122158 0.00391 1
MALL 1.029591 0.710195 1 .036597 0.588273 1.01 1614 0.878504
MAP3K5 1.003794 0.963268 0.936950 0.341428 1.186521 0.025234
MEGF6 0.996246 0.943752 0.972325 0.537979 1.063554 0.177216
MLPH 1.130788 0.017551 1 .081469 0.075207 1.134625 0.006565
MUC12 0.877017 0.009283 0.934127 0.1 14455 0.882433 0.006190
MYRIP 0.854320 0.0131 1 1 0.965635 0.514853 0.809667 0.000303
PCTP 0.743664 0.021626 0.881 145 0.238765 0.787532 0.031600
PIN4 0.972783 0.652297 1 .076756 0.16001 1 0.863862 0.004064
PKM2 1.103719 0.480483 0.996628 0.977193 1.263841 0.065897
PLK2 1.104036 0.198052 1 .056188 0.398619 1.196036 0.01 1 191
PLLP 1.123741 0.245703 0.981086 0.831081 1.255523 0.006585
P0FUT1 0.879065 0.089546 0.968652 0.612847 0.885235 0.061416
PPP1 R14C 0.933526 0.045321 0.992239 0.797096 0.906457 0.002887
PPP1 R14D 0.895082 0.218894 0.984250 0.836757 0.856302 0.033034
PTPRO 0.871313 0.000546 0.916871 0.009726 0.920625 0.022089
QSOX1 1.100855 0.30831 1 1 .071244 0.390655 1.162947 0.062719
RAB26 1.036615 0.723135 1 .027328 0.756324 1.036250 0.656771
RASSF6 1.204860 0.024594 1 .122079 0.102738 1.182388 0.014732
RBBP8 0.973107 0.805185 0.827032 0.048632 1.219093 0.055071
REG4 1.048030 0.181 121 0.993976 0.847632 1.098196 0.004958
RNF43 0.803797 0.000192 0.897388 0.042545 0.856799 0.001527
S100A16 1.01 1868 0.902990 0.999442 0.994562 1.090512 0.329960
SATB2 0.879355 0.001637 0.943131 0.107627 0.876323 0.000283
SEMA5A 0.885409 0.024470 0.955283 0.327092 0.881056 0.009810
SMCHD1 0.885614 0.490233 0.855678 0.292335 1.31051 1 0.089404
SOX 13 1.126294 0.334717 1 .070240 0.516693 1.258041 0.056515
SPINK1 0.936081 0.078244 0.976927 0.473389 0.932000 0.023642
SPRED1 1.059666 0.596464 0.994338 0.951386 1.237983 0.029843
TBC1 D8 1.139950 0.228368 1 .039978 0.671 127 1.229105 0.037039
TM4SF4 1.189333 0.000259 1 .104730 0.023585 1.126832 0.007163
TNNC2 0.955299 0.261048 1 .030227 0.396237 0.884317 0.000464 OS RFS SAR
Gene HR p-value HR p-value HR p-value
TP53RK 0.866625 0.132189 1.015147 0.854809 0.806361 0.005695
TPK1 0.816910 0.180659 0.821094 0.122846 0.865062 0.305323
TRNP1 1.181406 0.004930 1.098287 0.068505 1.144291 0.016540
TSPAN6 0.848330 0.012430 0.971464 0.608883 0.816825 0.000158
VAV3 0.823752 0.008269 0.967883 0.589143 0.760513 0.000076
XKR9 1.095569 0.313464 0.948245 0.527779 1.158396 0.076050
ZNF518B 0.880308 0.184318 1.058794 0.467954 0.808457 0.006073
The predictive power of each individual pair of genes was also estimated by constructing a new set of variables of the form d = Gene2 - Genei . The results are shown in Table 4.2.
Table 4.2: Hazard ratios (HR) and p-values for the 39 pairs in the mTSP signature. From each pair, a new variable is constructed as the difference between the two genes.
OS RFS SAR
Difference HR p-value HR p-value HR p-value
LYPLA1-GGH 1.2534459 0.0010314 1.1590868 0.0106207 1.2252445 0.0027639
AQP5-PTPRO 1.0919755 0.0007171 1.0361712 0.1211483 1.1402320 0.0000042
REG4-
1.0837008 0.0025540 1.0314297 0.1897915 1.1139664 0.0001082 CCDC1 13
CTSE-
1.0414463 0.1333025 0.9962422 0.8758005 1.1026721 0.0003238 PPP1 R14D
LM04-HSF5 1.1181697 0.0028084 1.0283001 0.3987142 1.2185076 0.0000002
DUSP4-GRM8 1.0842092 0.0470370 1.0332763 0.3408591 1.1507371 0.0013195
RBBP8-
1.1297166 0.0359132 0.9675698 0.5184169 1.3759971 0.0000003 TSPAN6
RASSF6-
1.1523124 0.0000895 1.0740050 0.0261665 1.1919828 0.0000018 SATB2
CRIP1-TNNC2 1.0465923 0.1582336 0.9843564 0.5704062 1.1659913 0.0000174
MLPH-MUC12 1.1202864 0.0007898 1.0680848 0.0230445 1.1387998 0.0003660
MALL-CDX2 1.1347794 0.0044204 1.0570062 0.1579294 1.2234772 0.0000058
TBC1 D8-VAV3 1.1708764 0.0058146 1.0311399 0.5293330 1.3360842 0.0000028
S100A16-
1.1482330 0.0037658 1.0678105 0.1199634 1.1536007 0.0012956 RNF43
ANXA1-EPDR1 1.1122661 0.0366551 1.0294241 0.51 18602 1.2039764 0.0005607 OS RFS SAR
Difference HR p-value HR p-value HR p-value
LSM7-CELP 1.0560865 0.1877028 0.9809036 0.5768485 1.1566158 0.0007974
LYZ-PIN4 1.033221 1 0.3357549 0.9779082 0.4393917 1.134141 1 0.0003624
SPRED1-
1.1248048 0.0737450 1.0220826 0.69141 17 1.229891 1 0.0016650 POFUT1
KIAA0802-
1.2239129 0.0024448 1.075531 1 0.1977104 1.2861612 0.0002181 ACOX1
PLK2-SPINK1 1.0709370 0.0367193 1.0284068 0.3237235 1.1 13351 1 0.0008255
TRNP1 -
1.141 1827 0.0027688 1.0755800 0.0544150 1.1574420 0.0022457 LDLRAD3
CD55-
1.0777298 0.0089514 1.0149708 0.5580232 1.1332316 0.0000358 PPP1 R14C
SMCHD1-PCTP 1.1605628 0.1425769 1.0272059 0.7539837 1.3997527 0.0023022
TPK1-AMACR 1.1577333 0.0254243 1.0431 1 16 0.4407091 1.1938752 0.0084244
AGR2-GUCY2C 1.0032258 0.9382656 0.9729313 0.4459302 1.0694452 0.1289237
PLLP-MYRIP 1.1 160892 0.0177857 1.0151075 0.7093447 1.1937869 0.0001954
FSCN1 -
1.0151403 0.7085034 0.9697070 0.3692627 1.1220012 0.0051660 APCDD1
TM4SF4-GNG4 1.1257204 0.0031938 1.0450995 0.2254462 1.1473762 0.0003651
RAB26-ARID3A 1.0232534 0.6438379 0.9871875 0.7606257 1.0977248 0.0634740
FUT8-AK3L1 1.0255514 0.6085967 0.9656173 0.4127725 1.1404701 0.0099008
XKR9-CYP4F2 1.0440546 0.3382601 0.9620422 0.2939290 1.1750613 0.0006332
SOX13-IFT52 1.1754513 0.0580465 1.0801871 0.2847481 1.2781237 0.0077750
ANP32E-
1.0831204 0.2750790 0.9556197 0.4700689 1.2596667 0.001 1774 TP53RK
QSOX1 -DPEP1 1.0550599 0.3143480 0.9719515 0.5247499 1.1937187 0.0015984
FLJ23867-
1.0741366 0.1446725 1.0593436 0.1687958 1.0532819 0.2709038 GPR160
PKM2-CCDC56 1.37871 18 0.0002362 1.2224768 0.0086323 1.3727521 0.0002984
MEGF6-
1.0399873 0.3704800 0.9954619 0.9031479 1.1035178 0.0266460 F AM 84 A
ASPHD2-
1.0408367 0.5722131 0.8993359 0.0824427 1.2408419 0.0023261 ZNF518B
MAP3K5-
1.1 167793 0.0320731 1.0133378 0.7654574 1.2751488 0.0000172 SEMA5A
DPP4-HEPH 1.0817938 0.0429931 1.0335324 0.3249376 1.0938932 0.0174591
4.1 .2 CCP2 signature
The univariate analyses for the 100 genes in the CCP2 signature are shown Table 4.3. Table 4.3: Hazard ratios (H R) and p-values for the 100 genes in the CCP signature.
OS RFS SAR
Gene HR p-value HR p-value HR p-value
ABLIM3 1.497490 0.000047 1 .303227 0.002872 1.384103 0.000431
AIFM3 0.904398 0.015891 0.937492 0.065816 0.896340 0.005167
AK3L1 0.898902 0.107229 0.960577 0.483179 0.922714 0.179480
AMACR 0.759074 0.000722 0.889180 0.077870 0.813820 0.002669
ANXA1 1.194339 0.019506 1 .108505 0.1 18909 1.187781 0.008244
ANXA10 1.134066 0.134587 0.974327 0.771200 1.465642 0.000033
AP1 S3 1.1 13324 0.147471 1 .037104 0.572433 1.124993 0.071436
APCDD1 1.01 1976 0.814317 1 .046970 0.289266 0.946524 0.223520
AQP5 1.065522 0.138348 0.979870 0.625296 1.133315 0.002084
ARID3A 0.976514 0.712791 1 .032744 0.552308 0.847283 0.007934
AXIN2 0.942664 0.307607 1 .037246 0.462772 0.836303 0.001240
B3GALT2 1.01 1082 0.921719 1 .084372 0.376361 0.925781 0.420204
B3GALT5 1.309501 0.003539 1 .072206 0.412556 1.330909 0.000872
BST2 1.049824 0.618907 1 .002713 0.974331 1.128929 0.152963
CABLES1 1.233657 0.047150 1 .141701 0.145670 1.136693 0.1 18884
CCDC1 13 0.759774 0.000012 0.838217 0.001040 0.867007 0.008989
CD109 1.255953 0.004716 1 .161989 0.032736 1.212516 0.004030
CDHR1 0.882343 0.015251 0.944431 0.172553 0.903573 0.015955
CDX2 0.839908 0.000669 0.934687 0.164696 0.778608 0.000001
CRIP1 1.081349 0.258736 1 .014100 0.815023 1.175387 0.015704
CTSE 1.040962 0.198035 0.993135 0.804060 1.074130 0.01 1275
DCBLD2 1.377086 0.000231 1 .192181 0.023270 1.226188 0.007232
DDC 0.873088 0.007056 0.914186 0.040853 0.912274 0.037127
DNAH2 1.121460 0.135174 1 .059443 0.384370 1.137699 0.057982
DPEP1 0.963414 0.574187 1 .078197 0.170482 0.894459 0.045753
DPP4 1.063181 0.138558 1 .033895 0.346706 1.033349 0.335143
DPYSL2 1.305706 0.005342 1 .206497 0.020454 1.156984 0.060817
DUSP4 1.055373 0.383004 1 .001065 0.983946 1.095415 0.095239
EPDR1 0.958408 0.505291 1 .023628 0.667766 0.886263 0.046942
EPHA4 1.349624 0.051575 1 .282245 0.063399 1.160280 0.184862
EPHB6 1.196603 0.094872 1 .213457 0.032779 1.055901 0.565234
F5 1.2691 1 1 0.002292 1 .160289 0.034510 1.244789 0.001560 OS RFS SAR
Gene HR p-value HR p-value HR p-value
FAM84A 0.873787 0.084929 0.956789 0.51 1838 0.884573 0.068570
GAB RE 1.196660 0.041797 1 .139479 0.082907 1.132824 0.126430
GDA 1.100078 0.204991 1 .010734 0.869262 1.154380 0.031994
GGH 0.833575 0.000466 0.897208 0.013533 0.887023 0.007099
GPR126 1.079345 0.452165 0.948483 0.564600 1.210875 0.064309
GPR143 0.830655 0.076539 0.964521 0.674131 0.739189 0.001583
GPR160 0.906843 0.128825 0.979221 0.702616 0.909414 0.105212
GRM8 0.849902 0.022081 0.914437 0.123485 0.813905 0.002349
GUCY2C 0.948688 0.385435 0.997317 0.959904 0.889945 0.037220
H2AFY2 0.794496 0.028053 0.917900 0.331243 0.939388 0.470710
HCRP1 1.444374 0.007168 1 .145086 0.259057 1.279577 0.024342
HOXB2 1.276536 0.018571 1 .1 16701 0.219808 1.234947 0.017446
HSF5 0.880988 0.006229 0.971788 0.490682 0.820950 0.000002
INPP5D 0.898273 0.250249 0.974600 0.752102 0.894546 0.158873
INPP5J 0.757369 0.000435 0.878427 0.050404 0.733601 0.000006
INSM1 1.09001 1 0.395273 0.938331 0.548180 1.284272 0.007707
KIAA0802 1.155620 0.1 12572 1 .021792 0.784838 1.124320 0.145340
KLK1 1 1.175413 0.049403 1 .014710 0.847447 1.277231 0.000735
KLK6 1.2451 1 1 0.000038 1 .21 1314 0.000045 1.063122 0.169059
KLK7 1.166322 0.000290 1 .1 16986 0.003296 1.132638 0.001365
LDLRAD3 0.888478 0.123067 0.939502 0.338006 0.915010 0.225120
LYZ 1.037584 0.373932 1 .000349 0.992142 1.122158 0.00391 1
MLPH 1.130788 0.017551 1 .081469 0.075207 1.134625 0.006565
MUC12 0.877017 0.009283 0.934127 0.1 14455 0.882433 0.006190
MUC3B 0.926527 0.179849 0.997926 0.967174 0.828397 0.001302
MYRIP 0.854320 0.0131 1 1 0.965635 0.514853 0.809667 0.000303
NT5E 0.991 194 0.881716 0.960145 0.410964 1.134883 0.026849
PARM1 0.932707 0.201808 1 .004359 0.928395 0.868928 0.015853
PCLO 0.940934 0.126817 0.953603 0.164105 0.950175 0.143515
PFKP 1.163223 0.147553 1 .018043 0.842848 1.216293 0.051877
PIWIL1 1.051252 0.337307 1 .040166 0.376955 1.017717 0.686420
PKM2 1.103719 0.480483 0.996628 0.977193 1.263841 0.065897
PLLP 1.123741 0.245703 0.981086 0.831081 1.255523 0.006585
PMAIP1 0.997173 0.979369 0.916793 0.367122 1.070145 0.451713 OS RFS SAR
Gene HR p-value HR p-value HR p-value
POFUT1 0.879065 0.089546 0.968652 0.612847 0.885235 0.061416
PON3 1.134636 0.224728 1 .1 17593 0.216849 1.166422 0.060832
PPP1 R14C 0.933526 0.045321 0.992239 0.797096 0.906457 0.002887
PPP1 R14D 0.895082 0.218894 0.984250 0.836757 0.856302 0.033034
PRDM16 1.058510 0.746084 0.962358 0.798247 1.086322 0.577336
PTPRO 0.871313 0.000546 0.916871 0.009726 0.920625 0.022089
REG4 1.048030 0.181 121 0.993976 0.847632 1.098196 0.004958
RNF43 0.803797 0.000192 0.897388 0.042545 0.856799 0.001527
SATB2 0.879355 0.001637 0.943131 0.107627 0.876323 0.000283
SEMA5A 0.885409 0.024470 0.955283 0.327092 0.881056 0.009810
SERPINB5 1.028221 0.720547 0.981598 0.780908 1.080777 0.324920
SLC14A1 1.084213 0.312425 1 .055312 0.444103 1.051996 0.495986
SLC1A1 1.012414 0.905613 0.999092 0.991971 1.017134 0.834070
SMCHD1 0.885614 0.490233 0.855678 0.292335 1.31051 1 0.089404
SOX 13 1.126294 0.334717 1 .070240 0.516693 1.258041 0.056515
SOX2 1.141217 0.000045 1 .079902 0.006649 1.1 12738 0.000222
SOX8 1.068053 0.1 19300 1 .024612 0.517469 1.094492 0.013953
SPINK1 0.936081 0.078244 0.976927 0.473389 0.932000 0.023642
SPRR1A 1.000828 0.992821 0.941992 0.470270 1.175592 0.075792
SPRR1 B 1.076602 0.280513 1 .004812 0.939228 1.181609 0.016949
STS 1.065145 0.656022 0.910328 0.460252 1.332961 0.015319
SUPT4H1 0.765986 0.024948 0.844335 0.105817 0.806960 0.029800
TFAP2A 1.072181 0.229260 1 .006607 0.895913 1.162769 0.009146
TIMM8AP1 1.316610 0.037070 1 .135399 0.278472 1.179795 0.218924
TM4SF4 1.189333 0.000259 1 .104730 0.023585 1.126832 0.007163
TM9SF4 0.799885 0.022205 0.882396 0.130886 0.841469 0.040131
TRERF1 1.152120 0.223466 0.993471 0.948759 1.430096 0.000356
TRNP1 1.181406 0.004930 1 .098287 0.068505 1.144291 0.016540
TSPAN6 0.848330 0.012430 0.971464 0.608883 0.816825 0.000158
UBASH3B 1.076483 0.425824 0.984391 0.847303 1.290780 0.000973
VAV3 0.823752 0.008269 0.967883 0.589143 0.760513 0.000076
VNN1 1.303692 0.000575 1 .069727 0.347998 1.4161 19 0.000001
XKR9 1.095569 0.313464 0.948245 0.527779 1.158396 0.076050
ZNF518B 0.880308 0.184318 1 .058794 0.467954 0.808457 0.006073 4.1.3 AdaBoost Signature
The univariate analyses for the 29 genes in the AdaBoost signature are provided in Table 4.4.
Table 4.4: Hazard ratios (HR) and p-values for the 29 genes in the AdaBoost signature.
OS RFS SAR
Gene HR p-value HR p-value HR p-value
ABHD12B 0.874514 0.072398 0.914256 0.139512 0.987078 0.836374
AQP5 1.065522 0.138348 0.979870 0.625296 1.133315 0.002084
C3orf14 0.934716 0.377755 1.009220 0.884522 0.939038 0.306504
CDHR1 0.882343 0.015251 0.944431 0.172553 0.903573 0.015955
DEFA5 0.961939 0.332530 0.949784 0.137190 1.010263 0.773808
EPDR1 0.958408 0.505291 1.023628 0.667766 0.886263 0.046942
GRM8 0.849902 0.022081 0.914437 0.123485 0.813905 0.002349
HOXC6 1.050021 0.245995 1.022963 0.528481 1.163585 0.0001 15
KLK7 1.166322 0.000290 1.116986 0.003296 1.132638 0.001365
LOC100134361 0.851882 0.018675 0.910384 0.102897 0.895754 0.066705
LOC 146336 1.008758 0.905163 0.910496 0.190946 1.197185 0.006389
MLF1 1.102477 0.715277 1.027213 0.910235 0.888758 0.610615
NPTX2 0.969961 0.362040 0.999669 0.990423 0.958906 0.184604
NTSR1 1.252817 0.051561 1.057497 0.632747 1.174101 0.092968
PIPOX 0.859552 0.007045 0.929001 0.106368 0.859023 0.001155
PMAIP1 0.997173 0.979369 0.916793 0.367122 1.070145 0.451713
PPBP 0.914964 0.080723 0.905417 0.024926 0.948058 0.237915
PTPRO 0.871313 0.000546 0.916871 0.009726 0.920625 0.022089
REG1 B 0.968052 0.324659 0.968596 0.252862 1.004730 0.866556
REG4 1.048030 0.181121 0.993976 0.847632 1.098196 0.004958
RNLS 0.977795 0.766893 1.004652 0.942227 0.899600 0.101564
SLC26A3 0.979071 0.530018 1.016791 0.558492 0.939665 0.024468
SOX8 1.068053 0.1 19300 1.024612 0.517469 1.094492 0.013953
SPINK1 0.936081 0.078244 0.976927 0.473389 0.932000 0.023642
SPRR1A 1.000828 0.992821 0.941992 0.470270 1.175592 0.075792 OS RFS SAR
Gene HR p-value HR p-value HR p-value
SPRR1 B 1.076602 0.280513 1.004812 0.939228 1.181609 0.016949
STS 1.065145 0.656022 0.910328 0.460252 1.332961 0.015319
UBASH3B 1.076483 0.425824 0.984391 0.847303 1.290780 0.000973
ZIC2 0.958494 0.123379 0.953836 0.042714 1.0321 16 0.188028
4.2 mTSP: BRAFmut status vs. predicted BRAF-mut
4.2.1 PETACC3 samples
On the PETACC3 data, the following endpoints were considered: overall survival (OS), relapse-free survival (RFS) and survival after relapse (SAR). For each of the endpoints the BRAFmut status as predicted by mTSP is compared with the BRAFmut status given by PCR.
In Table 4.5, the results of the Cox proportional models analysis are given for the predicted BRAFmut status and for the golden standard (BRAFmut by PCR).
Table 4.5: (PETACC3 data/mTSP) Hazard ratios (HR) and p-values for predicted and assessed BRAFmut status, produced by Cox proportional harzards model.
Figure imgf000068_0001
The Kaplan-Meier curves for the same three endpoints are shown in Figure 6.
Note that the p-values given in the figures correspond to the likelihood ratio test for the differences between the two groups.
The effect of stratification induced by the mTSP signature was also studied within the group of KRASmut samples, without taking into account the MSI status. While there is no statistically significant difference in survival experience for OS and RFS endpoints, at 0.05 level, there is a significant difference for the SAR endpoint (p-value=0.04,
HR=1.58) - see Figure 7. The interactions between MSI status and predicted BRAF mutation status were also studied within different subpopulations of the PETACC3 data set for all the three endpoints. The results are given in Figures 8, 9, and 10.
Within the MSS subpopulation there is a statistically significant difference in survival experience between BRAF. hi and BRAF.Io groups (predicted by mTSP), for OS and SAR, in all stratifications. For the RFS endpoint, the only stratifications with significant differences are the whole population and all but BRAF mutants (see Figures 9A and 9B).
Within the MSI subpopulation there is no statistically significant difference in survival experience, in any stratification and for all endpoints.
Table 4.6: CETUX data/mTSP - Hazard ratios (HR) and p-values for predicted and assessed BRAFmut status, produced by the Cox proportional hazards model.
Figure imgf000069_0001
In the PETACC3 data set, within WT2 samples, there is no statistically significant difference in survival experience between samples classified as BRAFIike and BRAFmut-like, by the mTSP, for any of the three endpoints. 4.2.2 CETUX samples
All CETUX samples represent metastatic patients (stage IV) and two endpoints are considered: overall survival (OS) and progression-free survival (PFS). In Table 4.8, the results of Cox proportional models analyses are given for the predicted BRAFmut status and for the golden standard (BRAFmut by PCR). The Kaplan-Meier curves for the two endpoints (OS and PFS) are given in Figure 17. Note that the p-values given in the figures correspond to the likelihood ratio test for the differences between the two groups.
In this smaller data set, only two KRASmut samples are classified as BRAFIike by the mTSP, so any conclusion about the separation between BRAF-like and non- BRAF-like samples within the KRASmut group is speculative. Nevertheless, for the sake of completion, we mention that there is a significant difference between the two groups for both OS and PFS (see Figure 12).
The test within WT2 samples cannot be performed in the CETUX data set because a single WT2 sample is misclassified, meaning that the BRAF-like group is too small.
4.3 CCP2: BRAFmut status vs. predicted BRAF-mut
4.3.1 PETACC3 samples
On the PETACC3 data, the following endpoints were considered: overall survival
(OS), relapse-free survival (RFS) and survival after relapse (SAR). For each of the endpoints the BRAFmut status as predicted by CCP2 is compared with the BRAFmut status given by PCR.
In Table 4.7, the results of Cox proportional models analyses are given for the predicted BRAFmut status and for the golden standard (BRAFmut by PCR). The Kaplan-Meier curves for the same three endpoints are given in Figure 13.
Table 4.7: (PETACC3 data/CCP2) Hazard ratios (HR) and p-values for predicted and assessed BRAFmut status, produced by Cox proportional harzards model.
Figure imgf000070_0001
Note that the p-values given in the figures correspond to the likelihood ratio test for the differences between the two groups. Within KRASmut samples, the effect of the stratification induced by the CCP2 signature within the group of KRASmut samples was also studied, without taking into account the MSI status. There is no statistically significant difference in survival experience between the two groups defined by the CCP2 classifier, for all three endpoints. The interactions between MSI status and predicted BRAF mutation status were also studied within different subpopulations of the PETACC3 data set, for all three endpoints. The results are given in Figures 14, 15, and 16.
Within the MSS subpopulation there are statistically significant differences in survival experience between BRAF.hi and BRAF.Io groups (predicted by CCP2) for various combinations of endpoint and stratification:
• OS endpoint: all but within KRAS mutants stratifications show a significant difference between BRAF.Io and BRAF.hi
• RFS endpoint: the only significant differences are within the whole population and all but BRAF mutants
• SAR endpoint: all but within KRAS mutant stratifications show a significant difference between BRAF.Io and BRAF.hi
Within the MSI subpopulation there is no statistically significant difference in survival experience, in any stratification and for all endpoints.
4.3.2 CETUX samples
All CETUX samples represent metastatic patients (stage IV) and two endpoints are considered: overall survival (OS) and progression-free survival (PFS).
In Table 4.8, the results of Cox proportional models analyses are given for the predicted BRAFmut status and for the golden standard (BRAFmut by PCR). The Kaplan-Meier curves for the two endpoints (OS and PFS) are given in Figure 17. Note that the p-values given in the figures correspond to the likelihood ratio test for the differences between the two groups.
Table 4.8: (CETUX data/CCP2) Hazard ratios (HR) and p-values for predicted
(by CCP2) and assessed BRAFmut status, produced by Cox proportional harzards model. mTSP prediction BRAFmut status
Endpoint HR p-value HR p-value
OS 7.42 1 .3e-05 9.43 1.1 e-06
PFS 13.8 2.7e-07 5.80 2.0e-05 4.4 Ada Boost: BRAFmut status vs. predicted BRAFmut
4.4.1 Overall survival. Univariate analysis: BRAF score vs. BRAFhi vs. BRAFmut
The continuous BRAF score is predictive for overall survival (p-value=0.0045, HR=1.91 ). Also, its binarized version, BRAFhi, is predictive for OS (p-value=0.0013, HR=1 .62). In contrast, the BRAFmut variable is only marginally significant for OS prediction (p-value=0.059, HR=1.64). Figure 18 shows the KM curves for the subpopulations identified by BRAFhi and BRAFmut indicator variables, in the whole patient population.
In a bivariate model including either BRAF score or BRAFhi and BRAFmut, the
BRAF score and BRAFhi were always significant, with BRAFmut being redundant. Another way of assessing the predictive power of a variable/score is to use the time- dependent ROC curves (Heagerty et al., Biometrics 56:337-344 (2000)). These are a generalization of the usual ROC curves and give an indication of the dichotomization power of the variable/score at a given time point. Nevertheless, the BRAF score and BRAFhi indicator are always better than the BRAFmut status - and they work also in WT2 and KRASmut subgroups.
Multivariate models
Starting with a full model including all the variables (BRAFscore, age, grade, tstage, nstage, site, MSI, KRASm) and their pairwise interactions, and using automatic stepwise variable elimination (with AIC criterion) led to the following model:
coxph(formula = Surv(os_time, os_event) ~ BRAF. score + age + grade + tstage + nstage + site + MSI + KRASm. any + BRAF.score:MSI + tstage:site + tstage:KRASm.any + BRAF.score:tstage + site:KRASm.any + age:tstage + nstage:site)
To further reduce the model, this step was followed by manual variable selection: those variables (or interaction terms) with non-significant (at 0.05 level) p-values were removed and the models re-assessed. The final model was:
coxph(formula = Surv(os_time, os_event) ~ BRAF. score + grade + tstage + KRASm. any)
with the following HRs and p-values:
coef HR p-value
BRAF. score 0.544 1 .72 0.0270 gradeG-34 0.444 1 .56 0.0470
tstageT3 0.744 2.10 0.0760
tstageT4 1 .506 4.51 0.0005
KRASm.any 0.270 1 .31 0.0230
Interestingly, MSI status does not seem to be significant in a model with KRASmut and BRAF score, with or without interaction between BRAF score and MSI: coxph(formula = Surv(os_time, os_event) ~ BRAF. score + grade + tstage + MSI + KRASm.any + BRAF.score*MSI) coef HR D-value
BRAF. score 2.003 7.408 0.01 100
gradeG-34 0.626 1 .870 0.00660
tstageT3 0.832 2.298 0.04700
tstageT4 1 .593 4.917 0.00023
MSIMSI-H 0.629 1.875 0.49000
MSIMSS 1 .126 3.083 0.03800
KRASm.any 0.318 1.374 0.02700
BRAF.score:MSIMSI-H -2.260 0.104 0.08900
BRAF.score:MSIMSS -1 .276 0.279 0.12000 and
coxph(formula = Surv(os_time, os_event) ~ BRAF. score + grade + tstage + MSI + KRASm.any)
coef HR D-value
BRAF. score 0.813 2.254 0.00180
gradeG-34 0.632 1 .882 0.00600
tstageT3 0.821 2.273 0.05000
tstageT4 1 .592 4.914 0.00024
MSIMSI-H -0.690 0.502 0.12000
MSIMSS 0.491 1.633 0.09600
KRASm.any 0.306 1.357 0.03200 BRAFhi and MSI status within different subpopulations
• whole population: there is a clear difference in OS between BRAF-high and BRAF-low groups within MSS (p-value=0.00036), but not within MSI. Also, within BRAF-high there is a significant difference between MSI and MSS patients (p- value=0.016), but not within BRAF-low patients, (see Figure 19A).
• all but BRAFmut. within MSS, BRAF-high has a worse prognostic than BRAF- low (p-value=0.0066), however, there is no difference between BRAF-high and -low within MSI. On the other hand, there is no significant difference between MSI and MSS within BRAF-high or BRAF-low subpopulations (see Figure 19B).
· only BRAFmut and KRASmut. the only significant difference is between BRAF- low and -high within MSI subpopulation (p-value=0.013) (see Figure 19C).
• only KRASmut. the only marginally significant difference (p-value=0.07) is between MSS and MSI in the BRAF-high group of KRASmuts (see Figure 19D).
• only WT2: there is no significant difference between various subgroups (see Figure 19E).
4.4.2 Relapse-free survival. Univariate analysis: BRAF score vs. BRAFhi vs. BRAFmut
The continuous BRAF score is not predictive for RFS (p-value=0.36). However, its binarized version, BRAFhi, is marginally predictive for RFS (p-value=0.07, HR=1.27). On the other hand, the BRAFmut variable is not predictive for RFS (p-value=0.63), either. Figure 20 shows the KM curves for the subpopulations identified by BRAFhi and BRAFmut indicator variables, in the whole patient population. 4.4.3 Survival after relapse. Univariate analysis: BRAF score vs. BRAFhi vs. BRAFmut
The continuous BRAF score is strongly predictive for SAR (p-value=0.00000023, HR=3.56), as is its binarized version, BRAFhi (p-value=0.000053, HR=1.85). The BRAFmut variable is also significant for SAR prediction (p-value=0.00006, HR=2.91 ). Figure 21 shows the KM curves for the subpopulations identified by BRAFhi and BRAFmut indicator variables, in the whole patient population. The AUCs at 3 years are better than in the case of OS, but they remain below 0.7.
Multivariate models Starting with a full model including all the variables (BRAFscore, age, grade, tstage, nstage, site, MSI, KRASm) and their pairwise interactions, and using automatic stepwise variable elimination (with AIC criterion) led to the following model:
coxph(formula = Surv(os_time - rfs_time, os_event) ~ BRAF. score + grade + tstage + site + MSI + tstage:site + BRAF.score:grade, data = D)
coef HR p-value
BRAF. score 1 .001 1 2.721 0.00044
gradeG-34 -0.5918 0.553 0.29000
tstageT3 0.6093 1.839 0.30000
tstageT4 0.6837 1.981 0.26000
siteright 0.9637 2.621 0.24000
MSIMSI-H -1.1436 0.319 0.00530
MSIMSS -0.0277 0.973 0.91000
tstageT3:siteright -0.8073 0.446 0.34000
tstageT4:siteright 0.4503 1.569 0.61000
BRAF.score:gradeG-34 2.0818 8.019 0.01400
To further reduce the model, this step was followed by manual variable selection: those variables (or interaction terms) with non-significant (at 0.05 level) p-values were removed and the models re-assessed. The final model was:
coxph(formula = Surv(os_time - rfs_time, os_event) ~ BRAF. score + grade + tstage + site + MSI, data = D)
coef HR D-value
BRAF. score 1 .23e+00 3.419 4.6e-06
gradeG-34 4.80e-01 1 .615 3.3e-02
tstageT3 3.32e-01 1.393 4.3e-01
tstageT4 8.48e-01 2.336 5.2e-02
siteright 3.95e-01 1 .485 1.7e-02
MSIMSI-H -8.93e-01 0.409 2.7e-02
MSIMSS -9.20e-06 1.000 1 .0e+00
BRAFhi and MSI status within different subpopulations • whole population: there is a clear difference in SAR between BRAF-high and BRAF-low groups within MSS (p-value=0.000331 ), but not within MSI (however, there are not many MSIs) (see Figure 22A).
• all but BRAFmut. there is a significant (p-value=0.015) difference between BRAF-high and BRAF-low patients. Also, within MSS, BRAF-high has a worse prognostic than BRAF-low (p-value=0.019), however, there is no difference between BRAF-high and -low within MSI. There is no significant difference between MSI and MSS within BRAF-high or BRAF-low subpopulations (see Figure 22B).
• only BRAFmut and KRASmut there is no significant difference between subgroups (see Figure 22C).
• only KRASmut: there is no significant difference between subgroups (see Figure 22D).
• only WT2: there are only a few MSIs, so most of the effect is due to MSSs. There is a marginally significant difference between BRAF-low and BRAF-high subgroups both within all WT2 and within MSS (p=0.04) (see Figure 22E).

Claims

1 . A method of classifying a subject with CRC comprising: a. analyzing at least one of the gene pairs shown in Table 3.1 .1 according to the top scoring pair method; and b. classifying the subject into a BRAF mutant-like group or a wild-type group.
2. The method according to claim 1 , wherein at least 10 of the gene pairs shown in Table 3.1 .1 are analyzed according to the top scoring pair method.
3. The method according to claim 1 , wherein at least 30 of the gene pairs shown in Table 3.1 .1 are analyzed according to the top scoring pair method.
4. The method according to claim 1 , wherein the 39 gene pairs shown in Table 3.1 .1 are analyzed according to the top scoring pair method.
5. The method of claim 1 , wherein the top scoring pair method is carried out by comparing the average value of the relative expression levels of all Genel genes used in the analysis with the average value of relative expression levels of all Gene2 genes used in the analysis, wherein if the average Genel value is less than the average Gene2 value, then the subject is classified as BRAF mutant-like.
6. The method of claim 6, wherein if the average Genel value is greater than or equal to the average Gene2 value, then the subject is classified as wild-type.
7. The method of claim 5 or 6, wherein the analysis uses the 39 pairs of genes shown in Table 3.1.1.
8. A method for selecting therapy comprising the steps of any one of claims 1-7, and further comprising selecting adjuvant chemotherapy for a subject classified as wild- type, or selecting no adjuvant chemotherapy for a subject classified as BRAF mutantlike.
9. A method for selecting therapy comprising the steps of any one of claims 1-7, and further comprising selecting adjuvant chemotherapy for a subject classified as wild- type, or selecting a treatment regimen comprising a BRAF mutant-specific inhibitor for a subject classified as BRAF mutant-like.
10. A method of treating a subject with CRC comprising administering a BRAF mutant-specific inhibitor to said subject, wherein said subject is classified as BRAF mutant-like according to any of the methods of claims 1-7.
1 1 . The method according to claim 8 or 9, wherein said subject is a human.
12. A CRC prognosticator comprising a mechanism for determining relative expression levels in a CRC tumor sample of the genes listed in Table 3.1.1.
13. The CRC prognosticator of claim 12, wherein the mechanism comprises a microarray.
14. The CRC prognosticator of claim 12, wherein the mechanism comprises an assay of reverse transcription polymerase chain reaction.
15. A kit for classifying a subject with CRC comprising detection agents capable of detecting the expression products of at least one gene pair shown in Table 3.1 .1.
16. The kit of claim 15, further comprising an addressable array comprising probes for the expression products of the at least one gene pair.
17. The kit of claim 15, wherein the detection agents comprise primers capable of hybridizing to the expression products of the at least one gene pair.
18. The kit of claim 15, comprising detection agents capable of detecting the expression products of at least 10 gene pairs shown in Table 3.1 .1 .
19. The kit of claim 15, comprising detection agents capable of detecting the expression products of at least 20 gene pairs shown in Table 3.1 .1 .
20. The kit of claim 15, comprising detection agents capable of detecting the expression products of at least 30 gene pairs shown in Table 3.1 .1 .
21 . The kit of claim 15, comprising detection agents capable of detecting the expression products of the 39 gene pairs shown in Table 3.1 .1 .
22. A kit according to claim 15, further comprising a computer implemented product for comparing a) the relative expression level values for Genel genes in Table 3.1.1 for a subject to b) the relative expression level values for Gene2 genes in Table 3.1 .1 for said subject.
23. The kit according to claim 22, wherein the average value of the relative expression levels of all Genel genes used in the analysis is compared with the average value of relative expression levels of all Gene2 genes used in the analysis.
24. The kit according to claim 23, wherein the 39 gene pairs in Table 3.1.1 are used in the analysis.
PCT/IB2011/054962 2010-11-15 2011-11-07 Prognostic and predictive gene signature for colon cancer WO2012066451A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US41380610P 2010-11-15 2010-11-15
US61/413,806 2010-11-15
US201161470381P 2011-03-31 2011-03-31
US61/470,381 2011-03-31

Publications (1)

Publication Number Publication Date
WO2012066451A1 true WO2012066451A1 (en) 2012-05-24

Family

ID=45420693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2011/054962 WO2012066451A1 (en) 2010-11-15 2011-11-07 Prognostic and predictive gene signature for colon cancer

Country Status (1)

Country Link
WO (1) WO2012066451A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014062845A1 (en) * 2012-10-16 2014-04-24 University Of Utah Research Foundation Compositions and methods for detecting sessile serrated adenomas/polyps
WO2015175705A1 (en) * 2014-05-13 2015-11-19 Board Of Regents, The University Of Texas System Gene mutations and copy number alterations of egfr, kras and met
US9598731B2 (en) 2012-09-04 2017-03-21 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US9902992B2 (en) 2012-09-04 2018-02-27 Guardant Helath, Inc. Systems and methods to detect rare mutations and copy number variation
US9920366B2 (en) 2013-12-28 2018-03-20 Guardant Health, Inc. Methods and systems for detecting genetic variants
WO2020056162A1 (en) * 2018-09-12 2020-03-19 Oregon Health & Science University Detecting and/or subtyping circulating hybrid cells that correlate with stage and survival
US10704086B2 (en) 2014-03-05 2020-07-07 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
CN111383716A (en) * 2020-03-20 2020-07-07 广州市妇女儿童医疗中心(广州市妇幼保健院、广州市儿童医院、广州市妇婴医院、广州市妇幼保健计划生育服务中心) Method and device for screening gene pairs, computer equipment and storage medium
WO2020197820A1 (en) * 2019-03-28 2020-10-01 Board Of Regents Of The University Of Texas System Computerized system and method for antigen-independent de novo prediction of cancer-associated tcr repertoire
WO2020225426A1 (en) * 2019-05-08 2020-11-12 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Colorectal cancer screening examination and early detection method
RU2740576C1 (en) * 2019-11-06 2021-01-15 федеральное государственное бюджетное учреждение "Национальный медицинский исследовательский центр онкологии" Министерства здравоохранения Российской Федерации Minimally invasive method for detecting sensitivity of rectal tumour to radiation therapy based on change in abundance of n2ax and rbbp8 genes
US11242569B2 (en) 2015-12-17 2022-02-08 Guardant Health, Inc. Methods to determine tumor gene copy number by analysis of cell-free DNA
WO2023282916A1 (en) 2021-07-09 2023-01-12 Guardant Health, Inc. Methods of detecting genomic rearrangements using cell free nucleic acids
US11913065B2 (en) 2012-09-04 2024-02-27 Guardent Health, Inc. Systems and methods to detect rare mutations and copy number variation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007084992A2 (en) * 2006-01-19 2007-07-26 The University Of Chicago Prognosis and therapy predictive markers and methods of use
WO2007100859A2 (en) * 2006-02-28 2007-09-07 Pfizer Products Inc. Gene predictors of response to metastatic colorectal chemotherapy
WO2010006225A1 (en) * 2008-07-11 2010-01-14 Novartis Ag Combination of (a) a phosphoinositide 3-kinase inhibitor and (b) a modulator of ras/raf/mek pathway

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007084992A2 (en) * 2006-01-19 2007-07-26 The University Of Chicago Prognosis and therapy predictive markers and methods of use
WO2007100859A2 (en) * 2006-02-28 2007-09-07 Pfizer Products Inc. Gene predictors of response to metastatic colorectal chemotherapy
WO2010006225A1 (en) * 2008-07-11 2010-01-14 Novartis Ag Combination of (a) a phosphoinositide 3-kinase inhibitor and (b) a modulator of ras/raf/mek pathway

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AIK CHOON TAN ET AL: "Simple decision rules for classifying human cancers from gene expression profiles", BIOINFORMATICS, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 21, no. 20, 1 January 2005 (2005-01-01), pages 3896 - 3904, XP002545348, ISSN: 1367-4803, [retrieved on 20050816], DOI: 10.1093/BIOINFORMATICS/BTI631 *
J. J. ARCAROLI ET AL: "Gene Array and Fluorescence In situ Hybridization Biomarkers of Activity of Saracatinib (AZD0530), a Src Inhibitor, in a Preclinical Model of Colorectal Cancer", CLINICAL CANCER RESEARCH, vol. 16, no. 16, 15 August 2010 (2010-08-15), pages 4165 - 4177, XP055019523, ISSN: 1078-0432, DOI: 10.1158/1078-0432.CCR-10-0066 *
J. J. TENTLER ET AL: "Identification of Predictive Markers of Response to the MEK1/2 Inhibitor Selumetinib (AZD6244) in K-ras-Mutated Colorectal Cancer", MOLECULAR CANCER THERAPEUTICS, vol. 9, no. 12, 5 October 2010 (2010-10-05), pages 3351 - 3362, XP055019521, ISSN: 1535-7163, DOI: 10.1158/1535-7163.MCT-10-0376 *
T. M. PITTS ET AL: "Development of an Integrated Genomic Classifier for a Novel Agent in Colorectal Cancer: Approach to Individualized Therapy in Early Development", CLINICAL CANCER RESEARCH, vol. 16, no. 12, 15 June 2010 (2010-06-15), pages 3193 - 3204, XP055019519, ISSN: 1078-0432, DOI: 10.1158/1078-0432.CCR-09-3191 *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10683556B2 (en) 2012-09-04 2020-06-16 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10822663B2 (en) 2012-09-04 2020-11-03 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11434523B2 (en) 2012-09-04 2022-09-06 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US9598731B2 (en) 2012-09-04 2017-03-21 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US9834822B2 (en) 2012-09-04 2017-12-05 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US9840743B2 (en) 2012-09-04 2017-12-12 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US9902992B2 (en) 2012-09-04 2018-02-27 Guardant Helath, Inc. Systems and methods to detect rare mutations and copy number variation
US11879158B2 (en) 2012-09-04 2024-01-23 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10041127B2 (en) 2012-09-04 2018-08-07 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10457995B2 (en) 2012-09-04 2019-10-29 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10494678B2 (en) 2012-09-04 2019-12-03 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10501808B2 (en) 2012-09-04 2019-12-10 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10501810B2 (en) 2012-09-04 2019-12-10 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11773453B2 (en) 2012-09-04 2023-10-03 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10894974B2 (en) 2012-09-04 2021-01-19 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10876152B2 (en) 2012-09-04 2020-12-29 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11319598B2 (en) 2012-09-04 2022-05-03 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11319597B2 (en) 2012-09-04 2022-05-03 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10738364B2 (en) 2012-09-04 2020-08-11 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11001899B1 (en) 2012-09-04 2021-05-11 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10793916B2 (en) 2012-09-04 2020-10-06 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10995376B1 (en) 2012-09-04 2021-05-04 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11913065B2 (en) 2012-09-04 2024-02-27 Guardent Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10961592B2 (en) 2012-09-04 2021-03-30 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10837063B2 (en) 2012-09-04 2020-11-17 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10947600B2 (en) 2012-09-04 2021-03-16 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10876171B2 (en) 2012-09-04 2020-12-29 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10876172B2 (en) 2012-09-04 2020-12-29 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
EP2909345A4 (en) * 2012-10-16 2016-08-17 Univ Utah Res Found Compositions and methods for detecting sessile serrated adenomas/polyps
WO2014062845A1 (en) * 2012-10-16 2014-04-24 University Of Utah Research Foundation Compositions and methods for detecting sessile serrated adenomas/polyps
US10889858B2 (en) 2013-12-28 2021-01-12 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11434531B2 (en) 2013-12-28 2022-09-06 Guardant Health, Inc. Methods and systems for detecting genetic variants
US10883139B2 (en) 2013-12-28 2021-01-05 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11959139B2 (en) 2013-12-28 2024-04-16 Guardant Health, Inc. Methods and systems for detecting genetic variants
US9920366B2 (en) 2013-12-28 2018-03-20 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11767556B2 (en) 2013-12-28 2023-09-26 Guardant Health, Inc. Methods and systems for detecting genetic variants
US10801063B2 (en) 2013-12-28 2020-10-13 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11767555B2 (en) 2013-12-28 2023-09-26 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11667967B2 (en) 2013-12-28 2023-06-06 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11649491B2 (en) 2013-12-28 2023-05-16 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11639526B2 (en) 2013-12-28 2023-05-02 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11118221B2 (en) 2013-12-28 2021-09-14 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11149306B2 (en) 2013-12-28 2021-10-19 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11149307B2 (en) 2013-12-28 2021-10-19 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11639525B2 (en) 2013-12-28 2023-05-02 Guardant Health, Inc. Methods and systems for detecting genetic variants
US11091797B2 (en) 2014-03-05 2021-08-17 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10982265B2 (en) 2014-03-05 2021-04-20 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10704085B2 (en) 2014-03-05 2020-07-07 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10704086B2 (en) 2014-03-05 2020-07-07 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11447813B2 (en) 2014-03-05 2022-09-20 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US10870880B2 (en) 2014-03-05 2020-12-22 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11091796B2 (en) 2014-03-05 2021-08-17 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11667959B2 (en) 2014-03-05 2023-06-06 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
WO2015175705A1 (en) * 2014-05-13 2015-11-19 Board Of Regents, The University Of Texas System Gene mutations and copy number alterations of egfr, kras and met
US11085086B2 (en) 2014-05-13 2021-08-10 Guardant Health, Inc. Gene mutations and copy number alterations of EGFR, KRAS and MET
US11242569B2 (en) 2015-12-17 2022-02-08 Guardant Health, Inc. Methods to determine tumor gene copy number by analysis of cell-free DNA
WO2020056162A1 (en) * 2018-09-12 2020-03-19 Oregon Health & Science University Detecting and/or subtyping circulating hybrid cells that correlate with stage and survival
WO2020197820A1 (en) * 2019-03-28 2020-10-01 Board Of Regents Of The University Of Texas System Computerized system and method for antigen-independent de novo prediction of cancer-associated tcr repertoire
WO2020225426A1 (en) * 2019-05-08 2020-11-12 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Colorectal cancer screening examination and early detection method
RU2740576C1 (en) * 2019-11-06 2021-01-15 федеральное государственное бюджетное учреждение "Национальный медицинский исследовательский центр онкологии" Министерства здравоохранения Российской Федерации Minimally invasive method for detecting sensitivity of rectal tumour to radiation therapy based on change in abundance of n2ax and rbbp8 genes
CN111383716A (en) * 2020-03-20 2020-07-07 广州市妇女儿童医疗中心(广州市妇幼保健院、广州市儿童医院、广州市妇婴医院、广州市妇幼保健计划生育服务中心) Method and device for screening gene pairs, computer equipment and storage medium
CN111383716B (en) * 2020-03-20 2023-11-24 广州市妇女儿童医疗中心(广州市妇幼保健院、广州市儿童医院、广州市妇婴医院、广州市妇幼保健计划生育服务中心) Screening method, screening device, screening computer device and screening storage medium
WO2023282916A1 (en) 2021-07-09 2023-01-12 Guardant Health, Inc. Methods of detecting genomic rearrangements using cell free nucleic acids

Similar Documents

Publication Publication Date Title
JP5583117B2 (en) Prognostic and predictive gene signatures for non-small cell lung cancer and adjuvant chemotherapy
WO2012066451A1 (en) Prognostic and predictive gene signature for colon cancer
ES2925983T3 (en) Method for using gene expression to determine prostate cancer prognosis
US7622251B2 (en) Molecular indicators of breast cancer prognosis and prediction of treatment response
US8868352B2 (en) Predicting response to chemotherapy using gene expression markers
JP6404304B2 (en) Prognosis prediction of melanoma cancer
US20160032407A1 (en) Prognostic and predictive gene signature for non-small cell lung cancer and adjuvant chemotherapy
US20120258878A1 (en) Prognostic gene signatures for non-small cell lung cancer
AU2006328023A1 (en) Prognosis prediction for colorectal cancer
US20180142303A1 (en) Methods and compositions for diagnosing or detecting lung cancers
AU2015317893B2 (en) Compositions, methods and kits for diagnosis of a gastroenteropancreatic neuroendocrine neoplasm
US20150024956A1 (en) Methods for diagnosis and/or prognosis of gynecological cancer
US20120077687A1 (en) Prognostic and predictive gene signature for non-small cell lung cancer and adjuvant chemotherapy
TW201827603A (en) Biomarker panel for prognosis of bladder cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11802529

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11802529

Country of ref document: EP

Kind code of ref document: A1