WO2013079188A1 - Methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer - Google Patents

Methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer Download PDF

Info

Publication number
WO2013079188A1
WO2013079188A1 PCT/EP2012/004895 EP2012004895W WO2013079188A1 WO 2013079188 A1 WO2013079188 A1 WO 2013079188A1 EP 2012004895 W EP2012004895 W EP 2012004895W WO 2013079188 A1 WO2013079188 A1 WO 2013079188A1
Authority
WO
WIPO (PCT)
Prior art keywords
genes
grade
gene
determining
subject
Prior art date
Application number
PCT/EP2012/004895
Other languages
French (fr)
Inventor
Sabrina Carpentier
Virgine FASOLO
Original Assignee
Ipsogen
Universite Libre De Bruxelles
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ipsogen, Universite Libre De Bruxelles filed Critical Ipsogen
Publication of WO2013079188A1 publication Critical patent/WO2013079188A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • breast cancer is the most common cancer in women and a leading cause of cancer death worldwide.
  • the variability of breast cancers depends on morphological appearances, molecular features, behavior and response to therapy.
  • histological grade is based on the degree of differentiation of the tumor tissue and is determined by evaluating three parameters which are the frequency of cell mitosis (i.e. the rate of cell division), the tubule formation (i.e. the percentage of cancer composed of tubular structures), and the nuclear pleomorphism (i.e. the change in cell size and uniformity).
  • the histological grade is also a factor capable of indicating prognosis of said cancers and survival of patients by predicting tumor behavior: indeed low-grade (Grade 1) breast cancers tend to show a very good outcome whereas high- grade (Grade 3) breast cancers tend to recur and metastasize early following the diagnosis. Those indications are determinant for the identification of the treatment which should be administered to the patient. Therefore, a patient having a Grade 3 breast cancer will benefit a much more aggressive treatment than a patient with a Grade 1 breast cancer having a good prognosis.
  • histological grade for the determination of the prognosis and treatment of breast cancer therefore appears not to be sufficient in the case of Grade 2 breast cancer.
  • the histological grading system is considered to lack reproducibility especially due to its dependence on the tissue handling, fixation and preparation. Said lack of reproducibility may also as well be caused by variation of the practitioners performing the histological grading despite the introduction of guidelines for standardization of pre- analytical parameters such as the preparation of the tissue (Rakha et al., Breast Cancer Research, 2010, vol. 12, pp:207).
  • GGI Genomic Grade Index
  • ER estrogen receptor
  • the 97-gene GGI enables classifying breast cancers into two classes, Genomic Grade 1 and Genomic Grade 3, respectively with low and high risk of recurrence, instead of three grades 1, 2 and 3 with a low, intermediate and high risk of recurrence, respectively, in the case of histological grading system, and therefore can safely spare adjuvant chemotherapy to breast cancer patients presenting an intermediate histological Grade 2 behaving like a Grade 1 breast cancer.
  • histological Grade 1 and Grade 3 breast cancers are distinct, whereas histological Grade 2 tumors have heterogeneous gene expression profiles ranging from those for histological Grade 1 to those for histological Grade 3.
  • PCR-GGI In order to improve and facilitate the clinical applicability of the known Genomic Grade Index, another test, the PCR-GGI, called herein the “4-gene PCR-GGI” or “Toussaint GGI-PCR", has been developed by Toussaint et al for transposing the 97-gene GGI onto a real-time quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) assay based on a reduced set of genes compared to the 97 genes from GGI, said reduced set of genes comprising 4 genes representative of the GGI and 4 reference genes.
  • qRT-PCR real-time quantitative Reverse Transcription Polymerase Chain Reaction
  • the "4-gene PCR-GGI” is capable of reproducing in a reasonably accurate and reproducible manner the grading and prognostic of the "97-gene GGI” for estrogen receptor (ER) - positive breast cancers using both frozen and paraffin-embedded (Formalin Fixed paraffin embedded, FFPE) tumor samples, said samples being more widely available than fresh-frozen samples used in the "97-gene GGI”.
  • the "4-gene PCR-GGI" test also enables to predict benefit from a treatment to adjuvant tamoxifen in early breast cancer patients or to first line tamoxifen in advanced breast cancer patients (Toussaint et al, BMC Genomics, 2009, vol. 10, pp:424).
  • GGI of the invention a new alternative Genomic Grade Index test, herein called “GGI of the invention”, “new GGI” or “new GGI-PCR”, based on a minimal set of genes and that could recapitulate in an accurate and reproducible manner the grading, diagnosis and prognostic performance of the 97-gene GGI using both frozen and paraffin-embedded tumor samples, to facilitate its use in clinical practice for the diagnosis, grading of a solid tumor and prognosis of a subject suffering from cancer, preferably breast cancer.
  • the inventors have selected a set of 24 genes among which three genes in common with the 4-gene PCR-GGI, and demonstrated that new combinations of a reduced number of genes from two genes to 24 genes selected from said set of genes enable the diagnosis, grading of a solid tumor and prognosis of a subject suffering from cancer, preferably breast cancer with comparable performances or even a better overall efficiency and practicability compared to the 4-gene and the 97-gene GGI-based methods.
  • the inventors have selected a set of 24 genes to meet some performance criteria.
  • the first performance criterion is a good correlation to the 97-gene GGI.
  • the feature selection consists of finding the combination of genes that best portray the GGI. Initially, a stepwise linear model selection was performed but the best combination was the 97-gene. A combination of 10 genes was then looked for. To avoid selection bias, the bootstrap method has been used: 100 selections were done on a resampling dataset (with replacement). Genes were ordered by selection frequency.
  • the second performance criteria was a good prediction of Genomic Grade.
  • the feature selection consists of finding the combination of genes that best predicts the GG (Genomic Grade), a binary variable: GGI or GG3. Different methodologies have been tested, including bayesian approach and stepwise forward combined with probit mixed model. Genes lists and scores have been compared between these different methods (intersection and differences).
  • the third performance criteria was a good prediction of histological grade.
  • the feature selection consists of finding the combination of genes that best predicts the HG (Histological Grade), a binary variable: HG1 or HG3. Same methodologies have been tested. Genes lists and scores have been compared between these methods (intersection and differences).
  • prognostic value (MFS at 5 years and RFS at 10 years).
  • Said prognostic value can be evaluated with two approaches: prognostic value at defined time, e.g. 5-year MFS or 10-year RFS, or instantaneous risk evaluation.
  • the first step was to censure data which did not have enough follow-up and to define categories: event before T- years versus event after T-years or no event.
  • the variable to explain was binary and the different techniques of feature selection described above were applied.
  • prognostic value has been evaluated using a Cox model (stratified by datasets and ER status) adjusted for age, tumor size, nodal involvement and her2 status if necessary with stepwise forward algorithm.
  • Another improvement characterizing the new Genomic Grade Index test according to the invention is the use of a reduced number of reference genes for the normalization of the expression levels of the genes to be analyzed, especially compared to the 4-gene PCR-GGI from Toussaint et al.
  • the expression level of the genes can be normalized in order to adjust and improve the accuracy and reliability of expression levels of genes according to the invention.
  • the normalization can be realized with one, two up to three reference genes displaying uniform expression during various phases of development, across different tissue types, and under different environmental and experimental conditions.
  • the reference genes which can be used according to the invention are selected in the group comprising the genes GUS, TBP and RPLP0, whereas the 4-gene PCR-GGI of Toussaint et al used a normalization with a set of 4 reference genes comprising the three reference genes previously cited and another reference gene TFRC, which the inventors found to be correlated to the grade of breast cancer and to alter the results. Furthermore, inventors normalized the primers and probes variability using plasmids and standard curves.
  • the new reduced GGI according to the invention applies to the use of either qRT-PCR methods on either fresh- frozen sample or paraffin- embedded (for example Formalin Fixed Paraffin Embedded, FFPE) samples or microarray methods on fresh- frozen sample with accuracy and concordance either comparable or even improving the 97-gene GGI performance as with the 4 genes- PCR-GGI.
  • paraffin- embedded for example Formalin Fixed Paraffin Embedded, FFPE
  • the new GGI according to the invention recapitulates in an accurate and reproducible manner the diagnosis, grading and prognostic power of GGI derived from micro-array and the GGI applied to qRT-PCR of Toussaint et al by using both fresh-frozen and paraffin-embedded tumor samples when applied to either a microarray analysis or a qRT-PCR analysis.
  • a Genomic Grade Index based on the determination of the expression level of a combination of genes selected in a group of 24 genes therefore supplies the practitioner with reduced signatures providing similar to better efficiencies for the diagnosis, grading and/or prognosis of breast cancer associated with costs and time reduction for performing the analysis compared to the molecular tools from the prior art.
  • the invention therefore relates to methods for the diagnosis, determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, preferably breast cancer based on the analysis of the expression level of genes in a biological sample from a subject and the determination of said diagnosis, grading or prognosis by using algorithms as described hereunder.
  • FIG. 3 Comparison of the 5-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
  • Figure 4 Comparison of the 5-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
  • Figure 5 Comparison of the 10-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
  • FIG. 6 Comparison of the 10-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
  • Figure 7 Comparison of the 5-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
  • Figure 8 Comparison of the 5-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
  • Figure 9 Comparison of the 10-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
  • Figure 10 Comparison of the 10-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
  • FIG. 11 Comparison of the 5-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
  • FIG. 12 Comparison of the 5-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
  • Figure 13 Comparison of the 10-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
  • Figure 14 Comparison of the 10-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
  • a subject in the context of the invention refers to a mammal, therefore including rodents, felines, primates, cows, horses or canines, but are not limited to these examples.
  • a subject according to the invention is a human.
  • a subject according to the invention is a human having a solid tumor cancer.
  • a subject according to the invention is a human having a breast cancer.
  • a subject according to the invention is a human having an estrogen-receptor (ER)-positive and node- negative breast cancer.
  • ER estrogen-receptor
  • a subject can also be one who has not been previously diagnosed as having an estrogen-receptor (ER)-positive and node-negative breast cancer.
  • ER estrogen-receptor
  • cancer and “cancerous” refer to or describe the physiological condition in mammals that is typically characterized by unregulated cell growth.
  • estrogen-receptor (ER)-positive breast cancer is used herein to refer to cancer that has receptors for the estrogen hormone.
  • node negative breast cancer is used herein to refer to cancer with a maximal number of three invaded lymph nodes.
  • a “biological sample” is a biological sample isolated from a subject and can include, by way of example and not limitation, a tissue sample, a fluid sample such as for example lymphatic fluid, ascites fluid, interstitital fluid, bone marrow, cerebrospinal fluid (CSF), saliva, mucous, sputum, sweat, urine, or any other secretion, excretion, or other bodily fluids, or a cell sample such as blood cells, endothelial cells, a blood sample referring to whole blood or any fraction thereof, including blood cells, serum and plasma and the like from said subject, preferably from the breast of said subject.
  • a biological sample is a tissue sample or a cell sample from the breast.
  • a biological sample according to the invention is a solid tumor biological sample.
  • a breast tumor biological sample is a breast tumor biopsy or a postoperative sample.
  • a breast tumor biological sample is fresh, fresh- frozen or paraffin- embedded sample.
  • the sample can be a biopsy specimen (e.g., tumor, polyp, mass (solid, cell)), aspirate or smear sample or a blood sample.
  • the sample is a tissue from a breast that has a tumor (e.g., cancerous growth) and/or tumor cells.
  • a tumor biopsy can be obtained in an open biopsy, a procedure in which an entire (excisional biopsy) or partial (incisional biopsy) mass is removed from a target area.
  • a tumor sample can be obtained through a percutaneous biopsy, a procedure performed with a needle-like instrument through a small incision or puncture (with or without the aid of an imaging device) to obtain individual cells or clusters of cells (e.g., a fine needle aspiration (FNA)) or a core or fragment of tissues (core biopsy).
  • the biopsy samples can be examined cytologically (e.g., smear), histologically (e.g., frozen or paraffin section) or using any other suitable method (e.g., molecular diagnostic methods).
  • a tumor sample can also be obtained by in vitro harvest of cultured human cells derived from an individual's tissue.
  • Biological samples can, if desired, be stored before analysis by suitable storage means that preserve a sample's protein and/or nucleic acid in an analyzable condition, such as quick freezing, or a controlled freezing regime. If desired, freezing can be performed in the presence of a cryoprotectant, for example, dimethyl sulfoxide (DMSO), glycerol, or propanediol-sucrose.
  • a cryoprotectant for example, dimethyl sulfoxide (DMSO), glycerol, or propanediol-sucrose.
  • Biological samples can also be fixed by using chemical fixative in order to be embedded in paraffin solution for its further analysis. Fixatives which can be used for such purpose includes without any limitation formalin (i.e.
  • Biological samples can be pooled, as appropriate, before or after storage for purposes of analysis.
  • the term “tumor” refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
  • the terms “at least two,” “at least three,” etc. in reference to the genes listed in any particular gene set means any one or any and all combinations of the genes listed.
  • the term “genes” refers to a polynucleotide sequence, e.g., isolated, such as desoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The sequences of the genes may be the sequences as listed in Table A and Table B or any complement sequence.
  • This sequence may be the complete sequence of the gene, or a fragment of the gene which would also be suitable to perform the method of the analysis according to the invention.
  • a person skilled in the art may choose the position and length of the gene by applying routine experiments.
  • the term should also be understood to include, as equivalents, analogs of RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides.
  • ESTs, chromosomes, cDNAs, mRNAs and rRNAs are representative examples of molecules that may be referred to as nucleic acids.
  • DNA may be obtained from said nucleic acids sample and RNA may be obtained by transcription of said DNA.
  • mRNA may be isolated from said nucleic acids sample and cDNA may be obtained by reverse transcription of said mRNA.
  • Genes according to the invention can be selected in a group consisting in 24 genes as listed in Table A and the 3 references genes listed in Table B.
  • CX3CR1 chemokine (C-X3-C NM_001337 SEQID N°21 NP_001328 SEQID N°22 motif) receptor 1
  • RACGAP1 Rac GTPase activating NM_001126103 SEQID N°37 NP_001119575 SEQID N°38 protein 1
  • TPT1 tumor protein NM_003295 SEQID N°39 NP_003286 SEQID N°40 translationally-controlled 1
  • TROAP trophinin associated NM_005480 SEQID N°43 NP_005471 SEQID N°44 protein (tastin)
  • gene expression refers to the translation of information encoded in a gene into a gene product (e.g., RNA, protein). Expressed genes include genes that are transcribed into RNA that is subsequently translated into protein, as well as genes that are transcribed into non-coding functional RNA molecules that are not translated into protein (e.g., transfer RNA (tRNA), ribosomal RNA (rRNA), microRNA, ribozymes). Gene expression can be monitored by measuring the levels of either the entire RNA or protein products of the gene or fragments thereof. For the methods according to the invention, gene expression can be assessed in a biological sample from a subject.
  • tRNA transfer RNA
  • rRNA ribosomal RNA
  • microRNA microRNA
  • Level of expression or “expression level” refers to the level (e.g. the amount) of one or more products (e.g., mRNA, protein) encoded by a given gene in a sample or reference standard.
  • differentially expressed gene refers to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically breast cancer, relative to its expression in a normal or control subject.
  • the terms also include genes whose expression is activated to a higher or lower level at different stages or different grades of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example.
  • Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects and subjects suffering from a disease, specifically breast cancer, or between various stages of the same disease.
  • Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages.
  • the term "over-expression" in reference to a gene occurs when the transcription and/or the translation of the gene leads to an expression level in a biological sample that is at least 10% superior to the level of expression of said gene in a control sample, preferably at least 50% superior to the level of expression of said gene in the control sample, and most preferably at least 100% superior to the level of expression of said gene in the control sample.
  • under-expression in reference to a gene occurs when the transcription and/or the translation of the gene leads to an expression level in a biological sample that is at least 10% inferior to the level of expression of said gene in a control sample, preferably at least 50% inferior to the level of expression of said gene in the control sample, and most preferably at least 100% inferior to the level of expression of said gene in the control sample.
  • a "control" as used herein corresponds to one or more biological samples from a cell, a tissue sample or a biopsy from a tissue, preferably from the breast.
  • said control comprises non-tumoral cells, still preferably normal breast tissues.
  • Said control may be obtained from the same subject than the one to be tested or from another subject, preferably from the same specie, or from a population of subject, preferably from the same specie, that may be the same or different from the test subject.
  • said control may correspond to a biological sample from a cell line, a tissue sample or a biopsy from a solid tumor, preferably from breast cancer and can be referred to a reference sample.
  • the over- or under-expression of a specific gene can be validated by comparing the expression of said gene in the biological sample to a grade 1 and/or grade 3 reference sample(s). Said over- or under-expression is confirmed if the corresponding expression level of said gene is different from less than 50%, preferably less than 25% and most preferably from less than 10% to the corresponding over- or under-expression level of said gene in said grade 1 or grade 3 reference sample.
  • the "grade" of a cancer is a system used to classify cancer cells. By informing on the agressivity of a tumor, it contributes to define the long term prognosis and the treatment process.
  • the grade according to the invention is a genomic grade (GG), i.e. said grade is determined on the basis of the expression levels of the genes.
  • the grade determined according to the invention can be assigned as Grade 1 (GGl), i.e. a "low grade", or Grade 3 (GG3), i.e. a "high grade”.
  • GGl genomic grade
  • GG3 Grade 3
  • a Genomic Grade 1 is indicative of a "good-prognosis" as a Genomic Grade 3 is indicative of a "poor-prognosis”.
  • the term "prognosis” relates to an individual assessment of the malignancy of a tumor, i.e. the prediction of the likelihood of cancer-attributable death or progression of a cancer including the risk of recurrence, metastatic spread or drug resistance, or to the expected survival rate of the subject such as the overall survival (OS), the disease free survival (DFS), the metastasis-free survival (MFS), the relapse-free survival (RFS) or the Distant Recurrence-Free Interval (DRFI) as defined in Hudis et al, Journal of Clinical Oncology, vol. 25, n°15, 2007.
  • OS overall survival
  • DFS disease free survival
  • MFS metastasis-free survival
  • RFS relapse-free survival
  • DRFI Distant Recurrence-Free Interval
  • DRFI refers to the time from random assignment or registration until invasive recurrence at a distant site, or death from breast cancer.
  • Methodastasis refers to cancer cells that have spread from the original (i.e. primary) tumor to distant organs or distant lymph nodes.
  • a "relapse” refers to the development of a new breast tumor after the remission of the cancer, preferably the breast cancer.
  • a "high-risk” of recurrence means the subject is expected to have a cancer, preferably a breast cancer, relapse or metastasis in less than 10 years, preferably in less than 5 years.
  • a "low-risk” of recurrence means the subject is expected to have no cancer, preferably breast cancer, relapse or metastasis within 5 years, preferably within 10 years.
  • a "good-prognosis” according to the invention indicates that the patient afflicted with cancer, preferably breast cancer, is expected to have no distant metastases within 5 years, preferably 10 years, of initial diagnosis of cancer, i.e. a metastasis-free survival (MFS) or a relapse-free survival (RFS) superior to 5 years, preferably superior to 10 years.
  • a "good-prognosis” according to the invention corresponds to a Metastasis-Free survival (MFS) superior to 5 years, preferably 10 years, or a long-term survival.
  • a "poor-prognosis” according to the invention indicates that the patient afflicted with cancer, preferably breast cancer, is expected to have some distant metastases within 10 years, preferably within 5 years, of initial diagnosis of cancer, i.e. a metastasis-free survival (MFS) or a relapse-free survival (RFS) inferior to 10 years, preferably 5 years.
  • MFS metastasis-free survival
  • RFS relapse-free survival
  • a “poor-prognosis” according to the invention corresponds to a MFS inferior to 10 years, preferably 5 years or a long-term survival or not a long-term survival.
  • long-term survival is used herein to refer to survival for at least 5 years, preferably for at least 8 years, most preferably for at least 10 years following surgery or other treatment.
  • the terms "formula,” “classifier” and “model” are used interchangeably for any mathematical equation, algorithmic, analytical or programmed process, or statistical technique that takes one or more continuous or categorical inputs, also called parameter, explanatory variable or predictor characteristic, and calculates an output value, sometimes referred to as an index, an index value, a categorical response associated or not with a belonging probability and/or the predicted class of the sample.
  • PCA Principal Component Analysis
  • Logistic Regression Logistic Regression
  • LDA Linear Discriminant Analysis
  • ELDA Eigengene Linear Discriminant Analysis
  • SVM Support Vector Machines
  • RF Random Forest
  • RPART Recursive Partitioning Tree
  • SC Shruken Centroids
  • SC Kth-Nearest Neighbor
  • Boosting Decision Trees, Neural Networks, Bayesian Networks
  • Hidden Markov Models Linear Regression or classification algorithms, Nonlinear Regression or classification algorithms, analysis of variants (ANOVA), hierarchical analysis or clustering algorithms; hierarchical algorithms using decision trees; kernel based machine algorithms such as kernel partial least squares algorithms, kernel matching pursuit algorithms, kernel Fisher's discriminate analysis algorithms, or kernel principal components analysis algorithms, among others.
  • the resulting predictive models may be validated in other studies, or cross-validated
  • a "GGI Formula” is a formula developed as described herein and used to calculate an output from inputs comprising the results from analysis of a biological sample comprising determining the expression level of genes as described herein.
  • a GGI Formula is the preferred means for calculating an output according to the invention.
  • the term “Agreement” or “concordance” is defined as the percentage of well predicted samples.
  • HGl Histological Grade 1
  • GGl Geneomic Grade 1
  • agGl The Histological Grade 1 (HGl)/Genomic Grade 1 (GGl) agreement (agGl) corresponds to the percentage of HGl samples that are correctly identified as GGl.
  • al corresponds to the number of HGl samples that are identified as GGl, i.e. to classify the sample correctly; and wherein a3 corresponds to the number of HGl samples are identified as GG3, i.e. samples that have been incorrectly classified.
  • the Histological Grade 3 (HG3)/Genomic Grade 3 (GG3) agreement corresponds to the percentage of Histological Grade 3 samples that are identified as Genomic Grade 3.
  • the agG3 is calculated as follows :
  • bl corresponds to the number of HG3 samples that are identified as GGl, i.e. samples that have been incorrectly classified, and wherein b3 corresponds to the HG3 samples that are identified as GG3, i.e. samples that have been correctly classified.
  • Performance is a term that relates to the overall usefulness and quality of a diagnostic or prognostic test, including, among others, clinical and analytical accuracy, other analytical and process characteristics, such as use characteristics (e.g., stability, ease of use), health economic value, and relative costs of components of the test. Any of these factors may be the source of superior performance and thus usefulness of the test.
  • use characteristics e.g., stability, ease of use
  • health economic value e.g., health economic value
  • relative costs of components of the test e.g., cost of the test. Any of these factors may be the source of superior performance and thus usefulness of the test.
  • statically significant it is meant that the alteration is greater than what might be expected to happen by chance alone. Statistical significance can be determined by any method known in the art. Commonly used measures of significance include the p-value, which indicates the probability that an observation has arisen by chance alone.
  • Clinical parameters encompasses all non-sample or non-analyte expression levels of genes of subject health status or other characteristics, such as, without limitation, age (AGE), race or ethnicity (RACE), gender (SEX), family history (FX).
  • DNA arrays consist of large numbers of DNA molecules or DNA fragments, herein designated “probes”, spotted in a systematic order on a solid support or substrate such as a nylon membrane, glass slide, glass beads or a silicon or ceramic chip.
  • polynucleotide refers to a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases.
  • a polynucleotide in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.
  • the polynucleotide sample isolated from the subject and obtained at step (a) is RNA, preferably mRNA. Said polynucleotide sample isolated from said subject can also correspond to cDNA obtained by Reverse Transcription of the mRNA, or a product of ligation after specific hybridization of specific probes to mRNA or cDNA.
  • the term "immobilized on a support” means bound directly or indirectly thereto including attachment by covalent binding, hydrogen binding, ionic interaction, hydrophobic interaction or otherwise.
  • kits refers to any delivery system for delivering materials.
  • delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., oligonucleotides, enzymes, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another.
  • reaction reagents e.g., oligonucleotides, enzymes, etc. in the appropriate containers
  • supporting materials e.g., buffers, written instructions for performing the assay etc.
  • enclosures e.g., boxes
  • fragment kit refers to delivery systems comprising two or more separate containers that each contains a subportion of the total kit components.
  • the containers may be delivered to the intended recipient together or separately.
  • a first container may contain an enzyme for use in an assay, while a second container contains oligonucleotides.
  • fragment kit is intended to encompass kits containing Analyte specific reagents (ASR's) regulated under section 520(e) of the Federal Food, Drug, and Cosmetic Act, but are not limited thereto. Indeed, any delivery system comprising two or more separate containers that each contains a subportion of the total kit components are included in the term "fragmented kit.”
  • ASR's Analyte specific reagents
  • kits refers to a delivery system containing all of the components of a reaction assay in a single container (e.g., in a single box housing each of the desired components).
  • kit includes both fragmented and combined kits.
  • a “diagnostic system” is any system capable of carrying out the methods of the invention, including computing systems, environments, and/or configurations that may be suitable for use with the methods or system of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • a machine-readable storage medium can comprise a data storage material encoded with machine readable data or data arrays which, when using a machine programmed with instructions for using said data, is capable of use for a variety of purposes, such as, without limitation, subject information relating to breast cancer or in response to breast cancer drug therapies, drug discovery, and the like.
  • Measurements of the expression levels of genes of the invention and/or the resulting diagnosis or prognosis from those genes can implemented in computer programs executing on programmable computers, comprising, inter alia, a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
  • Program code can be applied to input data to perform the functions described above and generate output information.
  • the output information can be applied to one or more output devices, according to methods known in the art.
  • the computer may be, for example, a personal computer, microcomputer, or workstation of conventional design.
  • Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. The language can be a compiled or interpreted language.
  • Each such computer program can be stored on a storage media or device (e.g., ROM or magnetic diskette or others as defined elsewhere in this disclosure) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
  • the health-related data management system of the invention may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform various functions described herein. Expression levels of genes can then be determined and compared to a reference value, e.g. a control subject or population whose breast cancerous state is known or an index value or baseline value.
  • a reference value e.g. a control subject or population whose breast cancerous state is known or an index value or baseline value.
  • the reference sample or index value or baseline value may be taken or derived from one or more subjects who has been diagnosed with breast cancer, one or more subjects whose breast cancer has been histologicaly graded, whose prognosis has been determined and/or who has been exposed to a treatment.
  • the reference sample or index value or baseline value may be taken or derived from one or more subjects who have not been exposed to the treatment.
  • a reference value can also comprise a value derived from algorithms or computed indices from population studies such as those disclosed herein.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including memory storage devices.
  • the present invention provides methods for the diagnosis, determination of the grade of a solid tumor, and for the prognosis of a subject suffering from a cancer by using an algorithmic analysis of genes in a biological sample from the subject.
  • Algorithms are typically deterministic functions that map a multi-dimensional vector of biological measurements such as the expression level of genes to a binary (or n- ary) outcome variable that encodes the absence or existence of a clinically-relevant class, phenotype, distinct physiological state or distinct state of disease.
  • Such algorithms include any of a variety of statistical analyses used to determine relationships between variables.
  • the process of building or learning a classifier involves two steps: (1) selection of a family functions that can approximate the systems response, and using a finite sample of observations (training data) to select a function from the family of functions that best approximates the system's response by minimizing the discrepancy or expected loss between the system's response and the function predictions at any given point.
  • training data a finite sample of observations
  • the combination of the different data can take place before or after feature selection.
  • the combined data is then used as input to train and validate the classifier.
  • PCA Principal Components Analysis
  • Logistic Regression Logistic Regression
  • LogReg Logistic Regression
  • ELDA Linear Discriminant Analysis
  • SVM Support Vector Machines
  • RF Random Forest
  • RPART Recursive Partitioning Tree
  • SC Shrunken Centroids
  • K-NN K-nearest neighbor classifiers
  • Boosting Bagging, Decision Trees, Neural Networks, Bayesian Networks, and Hidden Markov Models, Linear Regression or classification algorithms, Nonlinear Regression or classification algorithms, analysis of variants (ANOVA), generalized partial least squares (GPLS), hierarchical analysis or clustering algorithms; hierarchical algorithms
  • model and formula types beyond those mentioned herein and in the definitions above are well known to one skilled in the art.
  • the actual model type or formula used may itself be selected from the field of potential models based on the performance and diagnostic accuracy characteristics of its results in a training population.
  • the specifics of the formula itself may commonly be derived from the histological grade results, the clinical parameters and/or the expression level of genes in the relevant training population.
  • such formula may be intended to map the feature space derived from the expression level of genes inputs to a set of subject classes (e.g. useful in predicting class membership of subjects as normal or subject having a breast cancer, etc), to derive an estimation of a probability function of risk using a Bayesian approach, or to estimate the class-conditional probabilities, then use Bayes' rule to produce the class probability function as in the previous case.
  • subject classes e.g. useful in predicting class membership of subjects as normal or subject having a breast cancer, etc
  • Bayesian approach e.g. useful in predicting class membership of subjects as normal or subject having a breast cancer, etc
  • Preferred formulas include the broad class of statistical classification algorithms, and in particular the use of discriminant analysis.
  • the goal of discriminant analysis is to predict class membership from a previously identified set of features.
  • LDA linear discriminant analysis
  • features can be identified for LDA using an eigengene based approach with different thresholds (ELDA) or a stepping algorithm based on a multivariate analysis of variance (MANOVA). Forward, backward, and stepwise algorithms can be performed that minimize the probability of no separation based on the Hotelling-Lawley statistic.
  • Eigengene-based Linear Discriminant Analysis is a feature selection technique developed by Shen et al. (2006). The formula selects features (e.g. the expression level of genes) in a multivariate framework using a modified eigen analysis to identify features associated with the most important eigenvectors. "Important” is defined as those eigenvectors that explain the most variance in the differences among samples that are trying to be classified relative to some threshold.
  • a support vector machine is a classification formula that attempts to find a hyperplane that separates two classes.
  • This hyperplane contains support vectors, data points that are exactly the margin distance away from the hyperplane.
  • the dimensionality is expanded greatly by projecting the data into larger dimensions by taking non-linear functions of the original variables (Venables and Ripley, 2002).
  • filtering of features for SVM often improves prediction.
  • Features e.g., expression level of genes
  • KW Kruskal-Wallis
  • Support vector machines are a set of related supervised learning techniques used for classification and regression and are described, e.g., in Cristianini et al, "An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods," Cambridge University Press (2000). Support vector machine analysis can be performed, e.g., using the SVM ⁇ /fe/>” software developed by Thorsten Joachims (Cornell University) or using the LIBSVM software developed by Chih- Chung Chang and Chih-Jen Lin (National Taiwan University).
  • Random forests are learning statistical classifier systems that are constructed using an algorithm developed by Leo Breiman and Adele Cutler. Random forests use a large number of individual decision trees and decide the class by choosing the mode (i.e., most frequently occurring) of the classes as determined by the individual trees. Random forest analysis can be performed, e.g., using the RandomForests software available from Salford Systems (San Diego, CA). See, e.g., Breiman, Machine Learning, 45:5-32 (2001); and http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm, for a description of random forests.
  • Classification and regression trees represent a computer intensive alternative to fitting classical regression models and are typically used to determine the best possible model for a categorical or continuous response of interest based upon one or more predictors.
  • Classification and regression tree analysis can be performed, e.g., using the CART software available from Salford Systems or the Statistica data analysis software available from StatSoft, Inc. (Tulsa, OK).
  • CART software available from Salford Systems
  • Statistica data analysis software available from StatSoft, Inc. (Tulsa, OK).
  • a description of classification and regression trees is found, e.g., in Breiman et al. "Classification and Regression Trees," Chapman and Hall, New York (1984); and Steinberg et al, "CART: Tree-Structured Non-Parametric Data Analysis,” Salford Systems, San Diego, (1995).
  • the learning statistical classifier systems described herein can be trained and tested using a cohort of samples from healthy individuals, cancer patients, cancer cell lines, and the like as a training data containing instances labeled according to classes, e.g. HG1 and HG3 or healthy and diseased, and then tested on at least one test data set which includes novel instances not used for the training.
  • the training data can be obtained from a selected population of individuals where historical information is available regarding the histological grade of their breast cancers, the values of expression level of genes as described hereunder in the population and/or their clinical outcomes.
  • Said training data can be obtained from samples from patients diagnosed by a physician, and preferably by an oncologist, as having cancer are suitable for use in training and testing the learning statistical classifier systems of the present invention.
  • Samples from healthy individuals can include those that were not identified as having cancer.
  • samples from cancer cell lines can be used in training and testing the learning statistical classifier systems described herein.
  • Any formula may be used to combine results into indices herein called "output" useful in the practice of the invention.
  • An output from an algorithm of the invention can be a score, i.e.
  • an output from an algorithm of the invention can also be a status, such as the predicted class of the sample, for example the presence or absence of a breast cancer in said subject.
  • indices may indicate, among the various other indications, the probability, likelihood, prognosis, long-term survival, Metastasis-Free survival in the diagnosis of breast cancer, the diagnosis of the grade of a breast tumor or the prognosis of breast cancer.
  • An expected output is an index or index value, a probability and/or the predicted class of the sample.
  • the expression level of genes according to the invention are obtained from one or more samples collected from the subject and used as input data (inputs into a Formula fitted to the actual historical data obtained from the selected population of individuals).
  • the numeric result of a classifier formula itself may be transformed post-processing by its reference to an actual clinical population and study results and observed endpoints, in order to calibrate to absolute risk and provide confidence intervals for varying numeric results of the classifier or formula.
  • An example of this is the presentation of absolute risk, and confidence intervals for that risk, derived using an actual clinical study, chosen with reference to the output of the recurrence score formula in the Oncotype Dx product of Genomic Health, Inc. (Redwood City, Calif.).
  • a further modification is to adjust for smaller sub-populations of the study based on the output of the classifier or risk formula and defined and selected by their Clinical Parameters, such as age or sex.
  • the output of the invention is calculated automatically.
  • the output of the invention can be calculated by a computer, a calculator, a programmable calculator, or any other device capable of computing, and can be communicated to the individual by a health care practitioner, including, but not limited to, a physician, nurse, nurse practitioner, pharmacist, pharmacist's assistant, physician's assistant, laboratory technician, or by an organization such as a health maintenance organization, a hospital, a clinic, an insurance company, a health care company, or a national, federal, state, provincial, municipal, or local health care agency or health care system, or automatically, for example, by a computer, microprocessor, or dedicated device for delivering such advice.
  • the algorithms of the present invention can use a quantile measurement of a particular profile, i.e. the expression level of genes according to the invention, within a given population as a variable.
  • Quantiles are a set of "cut points" that divide a sample of data into groups containing (as far as possible) equal numbers of observations. For example, quartiles are values that divide a sample of data into four groups containing (as far as possible) equal numbers of observations. The lower quartile is the data value a quarter way up through the ordered data set; the upper quartile is the data value a quarter way down through the ordered data set.
  • Quintiles are values that divide a sample of data into five groups containing (as far as possible) equal numbers of observations.
  • the present invention can also include the use of percentile ranges of profiles (e.g., tertiles, quartile, quintiles, etc.), or their cumulative indices (e.g., quartile sums of profiles, etc.) as variables in the algorithms (just as with continuous variables).
  • percentile ranges of profiles e.g., tertiles, quartile, quintiles, etc.
  • cumulative indices e.g., quartile sums of profiles, etc.
  • cut-off values can be determined and independently adjusted for each of a number of genes to observe the effects of the adjustments on clinical parameters.
  • Design of Experiments (DOE) methodology can be used to simultaneously vary the cut-off values and to determine the effects on the resulting clinical parameters.
  • DOE methodology is advantageous in that variables are tested in a nested array requiring fewer runs and cooperative interactions among the cut-off variables can be identified.
  • Optimization software such as DOE Keep It Simple Statistically (KISS) can be obtained from Air Academy Associates (Colorado Springs, CO) and can be used to assign experimental runs and perform the simultaneous equation calculations. Using the DOE KISS program, an optimized set of cut-off values for a given clinical parameter and a given set of biomarkers can be calculated.
  • ECHIP optimization software available from ECHIP, Inc. (Hockessin, DE), and Statgraphics optimization software, available from STSC, Inc. (Rockville, MD), are also useful for determining cut-off values for a given set of genes.
  • cut-off values can be determined using Receiver Operating Characteristic (ROC) curves and adjusted to achieve the desired clinical parameter values.
  • ROC Receiver Operating Characteristic
  • any of the aforementioned Clinical Parameters may be used in the practice of the invention as an input to a formula or as a pre-selection criteria defining a relevant population to be measured using a particular formula.
  • Clinical Parameters may also be useful in the genes normalization and pre-processing, or in formula type selection and derivation, and formula result post-processing.
  • One embodiment of the invention is to tailor formulas to the population and endpoint or use that is intended.
  • the breast cancer endpoints of the invention include, among others, the Overall Survival (OS), the Recurrence-Free Survival (RFS), the Distant Relapse-Free Survival (DRFS) Metastasis-Free Survival (MFS) and the Distant Recurrence Free Interval (DRFI), as defined by Hudis CA et al, J Clin Oncol. 2007 May 20;25(15):2127-32.
  • the genes and formulas may be used for assessment of subjects for primary prevention and diagnosis and for secondary prevention and management.
  • the genes and formulas may be used for prediction and risk stratification for conditions and for the diagnosis of breast cancer.
  • the genes and formulas may be used for prognosis of breast cancer.
  • the genes and formulas may be used for clinical decision support, such as determining whether to defer intervention to next visit, to recommend normal preventive check-ups, to recommend increased visit frequency, to recommend increased testing and to recommend therapeutic intervention.
  • the genes and formulas may also be useful for intervention in subjects with breast cancer, such as therapeutic selection and response, adjustment and dosing of therapy, monitoring ongoing therapeutic efficiency and indication for change in therapeutic intervention.
  • a biological sample can be provided from a non-treated subject or from a subject undergoing treatment regimens or therapeutic interventions, e.g., drug treatments, for breast cancer.
  • treatment regimens or therapeutic interventions can include, but are not limited to, surgical intervention, administration of pharmaceuticals, and treatment with therapeutics or prophylactics used in subjects diagnosed with breast cancer.
  • biological samples are obtained from the subject at various time points before, during, or after treatment.
  • a test sample from the subject can also be exposed to a therapeutic agent or a drug, and the expression level of genes can be determined.
  • the expression level of genes can be compared to sample derived from the subject before and after treatment or exposure to a therapeutic agent or a drug, or can be compared to samples derived from one or more subjects who have shown improvements as a result of such treatment or exposure.
  • the invention provides improved diagnosis, determination of the grade of a solid tumor, and prognosis of a subject suffering from cancer by measuring the expression level of genes according to the invention and utilizing mathematical algorithms, classifiers or formula in order to combine information from results into a single output enabling such a diagnosis or prognosis.
  • a first aspect of the invention concerns a method for determining the GGI score of a solid tumor in a subject having a cancer, said method comprising a step a) of analyzing a biological sample from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTG1, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM10, KPNA2, CDC2 and CDC20, and a further step b) of determining the GGI score of said solid tumor from said subject suffering from cancer, wherein said a GGI formula is executed based on inputs comprising the expression level of said genes from said subject as determined in step a) and wherein said GGI Formula is:
  • Xi is the expression level of the i th gene
  • Another aspect of the invention concerns a method for the diagnosis of a cancer, said method comprising a step a) of analyzing a biological sample from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes , said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, KPNA2, CDC2 and CDC20, and a step b) of determining the diagnosis of said subject on the basis of the expression level of said genes as determined in step a).
  • Another aspect of the invention concerns a method for determining the genomic grade of a solid tumor in a subject suffering from cancer, said method comprising the following steps :
  • iii) an overexpression of a gene selected in the group consisting of ASPM, AURKA, BIRC5, CCNA2, CCNB2, CDC2, CDC20, CDCA3, CENPA, CEP55, KIF11, KPNA2, MCM10, MELK, PTTG1, RACGAP1, TPX2, TROAP, TUBA1B, UBE2C is associated with a Genomic Grade 3.
  • a combination of at least 2 genes up to 24 genes corresponds to a combination of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 22, 23 or 24 genes selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM 10, KPNA2, CDC2 and CDC20.
  • the genomic grade is determined on the basis of an output from an algorithm, said algorithm being executed on the basis of inputs comprising the expression level of said genes from said subject as determined in step a).
  • said algorithm is selected in the group comprising but not limited to Decision learning trees (CART, Recursive partitional tree/RPART), hierarchical clustering, Random Forest (RF).
  • said output is a Genomic Grade Index (GGI) score indicating the genomic grade from said tumor, and wherein said GGI score is calculated from the following GGI formula :
  • X is the expression level of the i th gene
  • a is the coefficient affected to the i th gene
  • i.e. the coefficient affected to the ith gene, enables to balance the expression of the genes.
  • the coefficient "b" is used to adjust the cut-off of separation between the GGI and GG3.
  • it enables to put the cutoff to the value 0, therefore the samples having a value of GGI inferior to 0 would be considered as GGI and the samples having a value of GGI superior to 0 would be considered as GG3.
  • the coefficient "a” is a multiplicative coefficient which is used to "normalize” the GGI. In a preferred embodiment, it can be used in order to match the GGI samples to a GGI with a negative value -1, and the GG3 samples to a GGI with a value of 1.
  • said algorithm is selected in the group comprising but not limited to Support Vector Machine (SVM), radial or linear kernel (SVMr or SVMI), Sum of gene expressions, Probit model, Logistic model, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Principal component analysis (PCA), preferably Principal component analysis (PCA).
  • SVM Support Vector Machine
  • SVMr or SVMI radial or linear kernel
  • Sum of gene expressions Probit model
  • Logistic model Linear Discriminant Analysis
  • LDA Linear Discriminant Analysis
  • QDA Quadratic Discriminant Analysis
  • PCA Principal component analysis
  • PCA Principal component analysis
  • said method of determining the grade of a solid tumor in a subject suffering from cancer is a method for determining the prognosis of said tumor, and wherein a tumor identified as having a Genomic Grade 1 is indicative of a "good-prognosis", whereas a tumor identified as having a Genomic Grade 3 is indicative of a "poor-prognosis”.
  • a "good-prognosis” is a Metastasis-Free survival (MFS) superior to 5 years, preferably 10 years, or a long-term survival and a "poor- prognosis” is an MFS inferior to 10 years, preferably 5 years or a long-term survival or not a long-term survival.
  • MFS Metastasis-Free survival
  • said tumor according to the invention has been previously identified as a Histological Grade 2 tumor.
  • Said method according to the invention can also be used for determining the treatment of said solid tumor from said subject suffering from cancer.
  • the grade of said solid tumor is indicative of the aggressiveness of the treatment which will be needed to said subject.
  • Solid tumors having a Genomic Grade 3 will require more aggressive treatments (for example chemotherapy with adjuvant) than solid tumors having a Genomic Grade 1.
  • said method of determining the grade of a solid tumor in a subject suffering from cancer according to the invention further comprises a step a') of normalizing the expression levels of said genes as determined in step a) with at least one, preferably two or three references genes selected in the group comprising the genes GUS, TBP and RPLP0.
  • the method for determining the grade of a solid tumor in a subject suffering from cancer according to the invention is characterized in that the following combinations of genes are excluded :
  • the method for determining the grade of a solid tumor in a subject suffering from cancer comprises the step a) of analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of genes comprising at least the 3 genes ASPM, CX3CR1 and MCM10, to which are added from 0 to 7 genes selected in a group consisting of PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCM10, FRY, CCNA2, CDC2 and CDCA3.
  • the method for determining the grade of a solid tumor in a subject suffering from cancer comprises the step a) of analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of genes comprising at least the 3 genes ASPM, CX3CR1 and MCM10, to which are added from 0 to 4 genes selected in a group consisting PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCM10 and FRY.
  • the method for determining the grade of a solid tumor in a subject suffering from cancer comprises the step a) of analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of the 6 genes consisting of PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY.
  • said coefficients ai affected to the ith genes corresponding to the 6 genes consisting of PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY are selected in Table D when the algorithm chosen to determine the grade of a solid tumor from a subject suffering from cancer is selected between the probit, Logit or Sum alogirthms.
  • Table D Preferred coefficients for the GGI, using the 6-gene signature consisting of ASPM, CCNB2, CX3CR1, FRY, MCM10 and PTTGl, according to the algorithm used (probit, logit, sum or PCA) and the method used to describe the gene expression level (Ctnorm, NCN, ⁇ 02 or ⁇ 0 ⁇ 3)
  • the value of said coefficients a, affected to the i th genes corresponding to the 6 genes consisting of PTTG1, CCNB2, ASPM, CX3CR1, MCM10 and FRY as defined in Table D can vary from 0,1%, 10%, 20%, 30%, 40% to 50%.
  • a subject according to the invention is a mammal, preferably a human.
  • a cancer according to the invention is a breast cancer.
  • a biological sample according to the invention is a tissue sample, a fluid sample, a cell sample or a blood sample of said subject, preferably from the breast of said subject.
  • said biological sample is a fresh/frozen or a paraffin-embedded biological sample, preferably a paraffin-embedded biological sample.
  • the determination of the expression level of genes according to the invention is performed on nucleic acids from a biological sample as disclosed previously.
  • said step of determining the expression level of genes according to the invention is performed by Reverse-Transcription Polymerase Chain Reaction (RT-PCR), preferably by real-time Reverse-Transcription Polymerase Chain Reaction (qRT-PCR).
  • RT-PCR Reverse-Transcription Polymerase Chain Reaction
  • qRT-PCR real-time Reverse-Transcription Polymerase Chain Reaction
  • said step of determining the expression level of genes according to the invention is performed on DNA microarrays.
  • the step of determining the expression level according to the invention is performed by determining the amount of proteins in a biological sample.
  • said method for determining the grade of a solid tumor in a subject suffering from cancer further comprises generating a printed report of some or all the conclusions drawn from the data, or of a score or comparison between the results obtained for said subject.
  • Another aspect of the invention relates to a polynucleotide library comprising or corresponding to polynucleotide sequences allowing the detection of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAPl, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPTl, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, KPNA2, CDC2 and CDC20 listed in Table A .
  • polynucleotide sequences allowing the detection of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAPl, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPTl, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM 10, KPNA2, CDC2 and CDC20 listed in Table A according to the invention can be any sequence between 3' and 5' end of the polynucleotide sequences of the corresponding genes as defined in Table A allowing a complete detection of the implicated genes.
  • the polynucleotide library of the invention may comprise or may consist of the polynucleotide sequences as defined in the examples or derivatives thereof, listed in Table A.
  • the polynucleotide library of the invention may comprise or may consist of the polynucleotide sequences listed in Table A or derivatives thereof.
  • the polynucleotide library according to the invention comprises or corresponds to polynucleotide sequences allowing the detection of a combination of at least the 3 genes ASPM, CX3CR1 and MCM IO to which are added from 0 to 7 genes selected in a group consisting of PTTGl, CCNB2, ASPM, TPTl, CX3CR1, MCMIO, FRY, CCNA2, CDC2 and CDCA3 listed in Table A.
  • the polynucleotide library according to the invention comprises or corresponds to polynucleotide sequences allowing the detection of a combination of genes comprising at least the 3 genes ASPM, CX3CR1 and MCM IO, to which are added from 0 to 4 genes selected in a group consisting PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCMlO and FRY.
  • the polynucleotide library according to the invention comprises or corresponds to polynucleotide sequences allowing the detection of of the combination of the 6 genes PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY.
  • the polynucleotide library according to the invention does not comprise more than 500 polynucleotide sequences, preferably not more than 200 polynucleotide sequences, and most preferably not more than 100 polynucleotide sequences.
  • the expression level of genes can be determined in a biological sample and compared to the "normal control level", utilizing techniques such as reference limits, discrimination limits, or risk defining thresholds to define cutoff points and abnormal values for breast cancer.
  • Such normal control level and cutoff points may vary based on whether a gene is used alone or in a formula combining with other genes into an index.
  • the methods according to the invention for the diagnosis of breast cancer, the diagnosis of the grade of a breast tumor and the prognosis of breast cancer in a subject are intended to provide accuracy in clinical diagnosis and prognosis.
  • the accuracy of a diagnostic or prognostic test, assay, or method concerns the ability of the test, assay, or method to distinguish between subjects having breast cancer is based on whether the subjects have an "effective amount” or a "significant alteration" in the levels of one or more genes.
  • an effective amount or “significant alteration,” it is meant that the measurement of the expression level of a gene is different than the predetermined cut-off point (or threshold value) for that gene and therefore indicates that the subject has a breast cancer for which the gene is a determinant.
  • the difference in the level of genes between normal and abnormal is preferably statistically significant and may be an increase in gene expression level or a decrease in gene expression level.
  • achieving statistical significance and thus the preferred analytical and clinical accuracy, generally but not always requires that combinations of several genes be used together in panels and combined with mathematical algorithms in order to achieve a statistically significant genomic grade index.
  • an "acceptable degree of diagnostic reliability" is herein defined as a test or assay (such as the test of the invention for determining the clinically significant expression level of genes, which thereby indicates the diagnosis or prognosis of breast cancer) in which the AUC (area under the ROC curve for the test or assay) is at least 0.60, desirably at least 0.65, more desirably at least 0.70, preferably at least 0.75, more preferably at least 0.80, and most preferably at least 0.85.
  • a “very high degree of diagnostic reliability” it is meant a test or assay in which the AUC (area under the ROC curve for the test or assay) is at least 0.80, desirably at least 0.85, more desirably at least 0.875, preferably at least 0.90, more preferably at least 0.925, and most preferably at least 0.95.
  • the methods of the invention for the diagnosis of a breast cancer in a subject for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject enables to obtain an AUC (area under the ROC curve) of at least 0.60, desirably at least 0.65, more desirably at least 0.70, preferably at least 0.75, more preferably at least 0.80, and most preferably at least 0.85.
  • AUC area under the ROC curve
  • the methods of the invention for the diagnosis of a breast cancer in a subject for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject enables to obtain an AUC (area under the ROC curve) is at least 0.80, desirably at least 0.85, more desirably at least 0.875, preferably at least 0.90, more preferably at least 0.925, and most preferably at least 0.95.
  • the gene expression levels values or learning statistical classifier algorithms can be selected in the methods of the invention for the diagnosis of a breast cancer in a subject, for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject such that the agreement HG3/GG3 (agG3) is at least about 60%, and can be, for example, at least about 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
  • the gene expression levels values or learning statistical classifier algorithms can be selected in the methods of the invention for the diagnosis of a breast cancer in a subject, for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject such that the agreement HG1/GG1 (agGl) is at least about 60%, and can be, for example, at least about 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
  • nucleic acids from a biological sample from a subject
  • step (b) reacting said nucleic acids obtained in step (a) with a polynucleotide library as defined previously;
  • step (b) detecting the reaction product of step (b).
  • nucleic acids with the polynucleotide library in the sense of the invention is meant contacting the nucleic acids of the sample with polynucfeotide sequences in conditions allowing the hybridization of cDNA or mRNA total sequence of the gene or of cDNA or mRNA subsequences or of primers of the gene with polynucleotide sequences of the library. Therefore, the reaction step according to the invention is performed by hybridizing the nucleic acids with a polynucleotide library as defined previously.
  • the nucleic acids from said biological sample can be labeled, e.g., before reaction step (b), and the label of the nucleic acids sample can be selected from the group consisting of radioactive, calorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels.
  • the polynucleotide libraries of the invention can be immobilized on a solid support to form an array.
  • the solid support can, for example, be selected from the group consisting of nylon membrane, nitrocellulose membrane, glass slide, glass beads, and membranes on glass support or a silicon chip.
  • the method of the invention for determining the expression level of genes further comprises :
  • RT-PCR Reverse-Transcriptase Polymerase Chain Reaction
  • qRT-PCR real-time quantitative Reverse-Transcriptase PCR
  • the determination of the expression levels of genes in the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject is performed on nucleic acids from a biological sample from said subject, and most preferably is based on the measurement of the level of transcription.
  • This measurement can be performed by various methods which are known in themselves, including without limitation in particular quantitative methods involving for example Reverse Transcriptase PCR (RT-PCR) or real-time quantitative Reverse-Transcriptase PCR (qRT-PCR), and methods involving the use of DNA arrays (macroarrays or microarrays).
  • qRT-PCR real-time quantitative Reverse-Transcriptase PCR
  • the first step in gene expression profiling i.e. determining the expression level of genes according to the invention, by RT- PCR or qRT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction, preferably a real-time PCR also called quantitative PCR (qPCR).
  • qPCR quantitative PCR
  • the relative number of gene transcripts in a sample is thus determined by reverse transcription of gene transcripts (e.g., mRNA), followed by amplification of the reverse-transcribed products by a polymerase chain reaction (PCR), preferably a real-time PCR also called a quantitative PCR (qPCR).
  • the relative number of gene transcripts in a sample is determined by a Reverse-Transcriptase Polymerase Chain Reaction (e.g., RT-PCR).
  • RT-PCR Reverse-Transcriptase Polymerase Chain Reaction
  • the gene expression level is assessed by using real-time quantitative Reverse- Transcriptase PCR (qRT-PCR).
  • the expression level of genes in the methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer according to the invention is determined by Reverse-Transcriptase Polymerase Chain Reaction (RT-PCR), more preferably by a real-time quantitative Reverse- Transcriptase PCR (qRT-PCR).
  • RT-PCR Reverse-Transcriptase Polymerase Chain Reaction
  • qRT-PCR real-time quantitative Reverse- Transcriptase PCR
  • RT-PCR Reverse-Transcriptase PCR
  • RNA to be reverse-transcribed are previously isolated from a biological sample.
  • the starting material is typically total RNA isolated from human tumors or tumor cell lines, and corresponding normal tissues or cell lines, respectively.
  • RNA can be isolated from a variety of primary breast tumors, tumor, or tumor cell lines, with pooled DNA from healthy donors. If the source of mRNA is a primary tumor, mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.
  • RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions.
  • RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns.
  • Other commercially available RNA isolation kits include MasterPure(TM) Complete DNA and RNA Purification Kit (EPICENTRE(R), Madison, Wis.), and Paraffin Block RNA Isolation Kit (Ambion, Inc.).
  • Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test).
  • RNA prepared from tumor can be isolated, for example, by cesium chloride density gradient centrifugation. [000161] Extracted RNA can be reverse-transcribed into cDNA by using reverse transcriptase according to known techniques from the art.
  • RNA into cDNA examples include without limitation the superscript lll/vilo Test from INVITROGEN.
  • the derived cDNA can then be used as a template in the subsequent PCR reaction.
  • the further PCR reaction consists in a method relying on thermal cycling consisting of cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA. Said replication of DNA is performed by Primers (short DNA fragments) containing sequences complementary to the target region along with a DNA polymerase. Any skilled person in the art would be able to find instructions for the realization of such PCR reaction as it is a common laboratory technique well-known in the art.
  • Real time PCR is compatible both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization or reference gene contained within the sample, or a housekeeping gene for RT-PCR.
  • PCR step can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5' proofreading endonuclease activity.
  • TaqMan(R) PCR typically utilizes the 5'-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5' nuclease activity can be used.
  • Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction.
  • a third oligonucleotide, or probe is designed to detect nucleotide sequence located between the two PCR primers.
  • the probe is non- extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye, such as FAM or VIC, and a quencher fluorescent dye.
  • any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe.
  • the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner.
  • the resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the quencher.
  • One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
  • 5'-Nuclease assay data are initially expressed as Ct, or the threshold cycle. Fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).
  • TaqMan(R) qRT-PCR can be performed using commercially available equipment, such as, for example, ABI PRISM 7700(TM) Sequence Detection System(TM) (PERKIN-ELMER-APPLIED BIOSYSTEMS, Foster City, Calif., USA), Lightcycler (ROCHE MOLECULAR BIOCHEMICALS, Mannheim, Germany) or Rotorgene (QIAGEN).
  • the 5' nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7700(TM) Sequence Detection System(TM).
  • the system consists of a thermocycler, laser, charge-coupled device (CCD), camera and computer.
  • the system includes software for running the instrument and for analyzing the data.
  • the Reverse-Transcriptase PCR (RT-PCR) or real time RT-PCR (qRT-PCR) according to the invention can be performed either in a two step-PCR comprising a first step of reverse transcription of the RNA and a second step of PCR, or into a one step RT-PCR or qRT-PCR wherein both reverse transcription and PCR are performed together.
  • the qRT-PCR according to the invention is performed in a one-step manner.
  • polynucleotide sequences to be used for the determination of the expression levels of genes in the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject according to the invention correspond to the polynucleotide library as defined previously.
  • the steps of a representative protocol for profiling gene expression using fixed, paraffin-embedded tissues as the RNA source, including mRNA isolation, purification, primer extension and amplification are given in various published journal articles ⁇ for example: T. E. Godfrey et al. J. Molec. Diagnostics 2: 84-91 [2000]; K. Specht et al., Am. J. Pathol. 158: 419-29 [2001] ⁇ .
  • the expression level of genes in the methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer according to the invention is determined by using a DNA array, i.e. the microarray technique.
  • a method of determining the expression level of genes by DNA array involves the following steps :
  • step (b) Reacting the nucleic acids sample obtained in step (a) with a polynucleotide library immobilized on a solid support,
  • step (b) Detecting the reaction product of step (b).
  • the microarray technique consists in combining complementary, single- stranded nucleic acids or nucleotide analogues into a single double stranded molecule.
  • the polynucleotide library immobilized on the solid support is exposed to a sample. If complementary nucleic acids exist in the sample, these will hybridize to the library and can thus be detected.
  • DNA arrays can be categorized as microarrays (each DNA spot has a diameter less than 250 microns) and macroarrays (spot diameter is higher than 300 microns). When the solid support used is small in size, arrays are also referred to as DNA chips. Depending on the spotting technique used, the number of spots on a glass microarray can range from hundreds to thousands. [000173]
  • the expression profile of breast cancer-associated genes can be measured in fresh, frozen or paraffin-embedded tumor tissues. Using microarray technology, fresh or frozen samples are preferred.
  • the source of polynucleotides typically is total RNA isolated from human tumors or tumor cell lines, and corresponding normal tissues or cell lines.
  • polynucleotides can be isolated from a variety of primary tumors or tumor cell lines.
  • mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin- fixed) tissue samples, which are routinely prepared and preserved in everyday clinical practice.
  • the nucleic acids sample obtained from said subject at step (a) is labeled before its reaction at step (b) with the polynucleotide library immobilized on a solid support.
  • Such labeling is well known from one of skill in the art and includes, but is not limited to, radioactive, colorimetric, enzymatic, molecular amplification, bioluminescent, electrochemical or fluorescent labeling and may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest.
  • the labeled nucleic acids sample is incubated with the DNA array, in conditions allowing selective hybridization between the polynucleotides (cDNA) and the corresponding probes affixed to the array. After the incubation, non-hybridized polynucleotides(cDNA) are removed by washing.
  • the Probe-polynucleotide (cDNA) hybridization is usually detected and quantified by fluorescence-based detection of fluorophore-labeled targets to determine relative abundance of nucleic acid sequences in the target.
  • the signal produced by the labeled polynucleotides(cDNA) hybridized at their corresponding probe locations is measured.
  • the intensity of this signal is proportional to the quantity of labeled polynucleotides(cDNA) hybridized to the probe, and thus to the quantity of the corresponding mRNA expressed in the sample.
  • the miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes.
  • Gene expression on an array or gene chip can be assessed using an appropriate algorithm (e.g., statistical algorithm). Suitable software applications for assessing gene expression levels using a microarray or gene chip are known in the art. In a particular embodiment, gene expression on a microarray is assessed using Affymetrix Microarray Analysis Suite (MAS) 5.0 software and/or DNA Chip Analyzer (dChip) software, for example, as described herein in Example 1.
  • MAS Affymetrix Microarray Analysis Suite
  • dChip DNA Chip Analyzer
  • Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols, such as by using for example the AFFYMETRIX GENCHIP technology, or INCYTE's microarray technology or AGILENT technology.
  • RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by any experimental treatment.
  • RT-PCR or qRT-PCR Reverse-Transcriptase Polymerase Chain Reaction or Reverse-Transcriptase real-time quantitative PCR
  • invariant endogenous control i.e. reference gene or normalizer
  • Any variation in the normalizer will obscure real changes and produce artifactual changes.
  • Usual references genes are expressed at a constant level among different tissues, and are unaffected by the experimental treatment.
  • the expression level of genes in the methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer according to the invention can be normalized with at least one, preferably two or three reference genes selected in the group comprising the genes GUS, TBP and RPLPO. Said genes are described in the table B.
  • the TFRC gene is excluded of the reference genes which can be used for the normalization of the expression levels of genes according to the invention.
  • Expression levels may be normalized with respect to the expression level of one or more reference genes using global normalization methods. Those skilled in the art will recognize that numerous methods of normalization are known, and can be applied for use in the methods of the present disclosure.
  • the determination of the expression level of genes in the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject may be performed by determining the amount of proteins expressed from said genes.
  • the determination of the expression levels of genes according to the invention can therefore comprises the step of :
  • the proteins can be obtained directly from the sample; e.g., by standard extraction or isolation techniques or can be obtained by translation of mRNA obtained from the samples.
  • Detection of protein levels may be performed by for example, immunoassays including ELISA, Western Blot or sandwich immunoassays using antibodies capable of binding specifically to any one or more of the proteins encoded by the genes of interest. Immunohistochemistry methods are also suitable for detecting the expression levels of the genes of the present invention.
  • antibodies or antisera preferably polyclonal antisera, and most preferably monoclonal antibodies specific for each gene can be used to detect expression.
  • the antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase.
  • unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody.
  • a labeled secondary antibody comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody.
  • This method normalizes the threshold cycle of the target gene (Ct target) using the mean of the threshold cycles of several reference genes (mean(Ct reference))- Therefore, the normalized Ct is calculated as follows :
  • CT.norm target Ct target " mean(Ct reference)
  • NCN normalized copy number
  • the normalized copy number, representing the expression level of a given target gene, is :
  • N.log CNtarget log CN ta r g et - mean(log CN reference )
  • the A ACt2 method allows to normalize the Ct corresponding to each gene i (target or reference), using only one plasmid sample containing 10 2 copies of the gene.
  • a ACt2 target- ACt2target mean (ACt2 re f erence )
  • the AACt3 method [000196] The ⁇ ⁇ 3 method allows to normalize the Ct corresponding to each gene i (target or reference), using only one plasmid sample containing 10 3 copies of the gene. ACt3j - Ctj - Ctj (piasmid 10 ⁇ 3)
  • the method used for valuing the expression level of genes according to the invention is the NCN method is.
  • the methods of the invention for the diagnosis of a breast cancer for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject may involve a previous step of obtaining at least one biological sample from the subject.
  • Such methods of sampling are well known of one of skill in the art, and as an example, one can cite surgery.
  • Other examples of biological sampling are defined hereabove within the definition of "biological sample”.
  • the analysis of a biological sample for determining the expression level of genes according to the invention may be determined before any surgical removal of tumor, or may be determined following surgical removal of tumor.
  • the provided methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject may also correspond to an in vitro method, which does not include such a step of sampling.
  • kits for the diagnosis of a breast cancer for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject according to the invention and comprising at least one primer or at least one probe or at least one antibody, which can be used in a method as defined in the present invention, for analyzing a biological sample from a subject by determining the expression level of genes as defined previously.
  • the kit according to the invention comprises means and reagents for RT-PCR analysis as described hereabove, and more preferably for qRT-PCR.
  • the kit comprises means and reagents for a microarray analysis as described hereabove.
  • the kit comprises a polynucleotide library as described previously.
  • the kit further comprises a DNA array for the determination of the expression level of genes according to the invention by microarray.
  • the present kits can also include one or more reagents, buffers, hybridization media, nucleic acids, primers, nucleotides, probes, molecular weight markers, enzymes, solid supports, databases, computer programs for calculating dispensation orders and/or disposable lab equipment, such as multi-well plates, in order to readily facilitate implementation of the present methods.
  • Enzymes that can be included in the present kits include reverse transcriptases, nucleotide polymerases and the like.
  • Solid supports can include beads and the like whereas molecular weight markers can include conjugatable markers, for example biotin and streptavidin or the like.
  • the kit further includes an analysis tool for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject from the expression level of genes according to the invention from a biological sample from a subject.
  • the kit is made up of instructions for carrying out the method described herein for the diagnosis, determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer.
  • the instructions can be provided in any intelligible form through a tangible medium, such as printed on paper, computer readable media, or the like.
  • Still a further aspect of the present invention refers to the use, for the diagnosis, determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, of the abovementioned kit comprising at least one primer, at least one prove or at least one antibody, which can be used in a method as defined for analyzing the expression of the genes as defined previously.
  • the invention embraces a diagnostic test system comprising (1) means for obtaining test results comprising expression level of genes in a biological sample; (2) means for collecting and tracking test results for one or more individual biological sample; (3) means for calculating an output from inputs using an algorithm as described hereabove, wherein said inputs comprise the expression level of said genes, and (3) means for reporting said index value.
  • said output is a score; the score can be calculated according to any of the methods described herein.
  • the means for collecting and tracking test results for one or more individuals can comprise a data structure or database.
  • the means for calculating a score can comprise a computer, microprocessor, programmable calculator, dedicated device, or any other device capable of calculating the GGI score.
  • the means for reporting the score can comprise a visible display, an audio output, a link to a data structure or database, or a printer.
  • the means for collecting and tracking test results data representing for one or more individuals comprises a data structure or database.
  • the means for computing a score comprises a computer or microprocessor.
  • the means for reporting the score comprises a visible display, an audio output, a link to a data structure or database, or a printer.
  • a related embodiment of the invention is a medical diagnostic test system for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, the system comprising: a data collection tool adapted to collect expression level of genes' data representative of the expression level of genes in at least one biological sample from a subject; and an analysis tool comprising a statistical analysis engine adapted to generate a representation of a correlation between the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer and the expression level of said genes, wherein the representation of the correlation is adapted to be executed to generate a result; and an index computation tool adapted to analyze the result to determine the subject's diagnosis, the grade of the solid tumor and/or prognosis of said subject suffering from cancer and represent the result as an output; wherein said genes are defined as described hereabove.
  • the analysis tool comprises a first analysis tool comprising a first statistical analysis engine, the system further comprising a second analysis tool comprising a second statistical analysis engine adapted to select the representation of the correlation between the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer in a subject and the expression level of genes from among a plurality of representations capable of representing the correlation.
  • the system further comprising a reporting tool adapted to generate a report comprising the index value.
  • Still another embodiment of the invention is a computer readable medium having computer executable instructions for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer
  • the computer readable medium comprising: a routine, stored on the computer readable medium and adapted to be executed by a processor, to store genes expression level's data; and a routine stored on the computer readable medium and adapted to be executed by a processor to analyze the gene expression level's data for diagnosing a breast cancer, for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer.
  • Another aspect of the invention further relates to a recording computer program comprising instructions for performing methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer.
  • Any of the provided methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer can further involve generating a printed report, for instance a report of some or all the data, of some or all the conclusions drawn from the data, or of a score or comparison between the results of a subject or individual and other individuals or a control or baseline.
  • HG Histological grade
  • SBR Scarff-Bloom-Richardson
  • Estrogen Receptor (ER) status ER status is usually measured by immunohistochemistry. It can be positive or negative.
  • Lymph Node status (LN). A number of invaded lymph nodes of 0 to 3 is considered as a negative status.
  • Log-Rank (LR) is a hypothesis test used to compare the survival distributions of two groups (for instance, GGl and GG3). The two groups have significantly different distributions if the p-value is less than 0.05.
  • Hazard Ratio Hazard ratio of the GG (and the associated p-value) is determined by a multivariate cox fitted on the cohort, with Lymph Node status as an additional variable kept in the model (the hypothesis of hazard proportionality has been verified).
  • An HR equal to x and significantly greater than 1 i.e. with an associated p-value ⁇ 0.05 means that the risk for a GG3 sample to have the considered event is x times the risk for a GGl sample.
  • Histological grade was based on the Elston-Ellis or the SRB grading system.
  • HGl + HG3 breast cancer samples with ER+ status, were split into a training set (60% of the samples, 370 genomic profiles) and a validation set (40% of the samples, 258 genomic profiles). All 273 HG2 samples with available prognostic data were used to validate the prognostic performances of the genomic grade.
  • Training set ("ER+ HG1/HG3 training set” or “training set")
  • the training set included a total of 370 genomic profiles from HGl and HG3 patients, all having positive ER status (Table 2).
  • the training set was used to learn the model, which was then applied to various validation sets, in order to assess the performances of the different combinations of genes.
  • Tab. 2 Composition of the training set.
  • the ER+ HG1/HG3 validation set included a total of 258 genomic profiles from HGl and HG3 patients, all having positive ER status (Table 3).
  • Metastasis-free survival was used as an endpoint to validate the relevance of HG2 reclassification into GGl and GG3.
  • MFS status was available for IPC, PACS, TGEN, LOI, BORDET, HaibeKains, OXFU, OXFT.
  • Time was truncated at 5 or 10 years.
  • MFS was available for a total of 273 patients with HG2 breast tumors (Tab. 4). These samples (“HG2 validation cohort” or "HG2 validation dataset”) were used to assess the prognostic value of the new GGI. Within this group, 40 patients had relapsed during the first five years after surgery, and 58 during the first ten years after surgery.
  • thirty-two patients had relapsed during the first five years after surgery, and 47 during the first ten years.
  • 22 patients had relapsed during the first five years after surgery, and 34 during the first ten years.
  • Tab. 4 Repartition of HG2 samples (without regards on ER and LN status) having MFS data, by dataset.
  • the first dataset used was constituted of 91 ER+ samples (45 HGl + 46 HG3) from IJB (Institut Jules Bordet, Brussels, Belgium) and Blackjack hospital ("IJB/Mercy dataset” or "IJB/Mercy cohort”).
  • the expression levels of 25 target genes (BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, CDC2, CDC20, KPNA2 and MYBL2) and 4 control genes (TFRC, GUS, TBP and RPLPO) were assessed by qRT-PCR.
  • a second dataset included 86 HGl and 60 HG3 ER+ samples from IJB ("IJB dataset” or "IJB cohort”).
  • IJB dataset or "IJB cohort”
  • 3 endogenous genes TBP, GUS and RPLPO
  • the "6-gene signature” encompasses the genes PTTGl, CCNB2, ASPM, CX3CR1, MCMIO and FRY.
  • the "3-gene signature" includes ASPM, CX3CR1 and MCMIO.
  • SVM Support Vector Machine
  • SVMr or SVMI radial or linear kernel
  • Random Forest Other known classifiers not tested here, including, but not limited to, schrunken centroids, k- nearest neighbours, QLDA, ELDA, DQ.DA and neural networks are supposed to lead to similar performances.
  • a rule of classification needs to be learnt on a training set.
  • the rule is then applied to new data (validation sets), in order to assess performances.
  • the expected output is an index and/or a probability, and the predicted class (GGI or GG3) of the sample. All analyses were done using R (http://cran.r- project.org/).
  • the logrank test is a hypothesis test used to compare the survival distributions of two groups (GG1 and GG3).
  • the two groups have significantly different distributions if the p-value is less than 0.05.
  • hazard ratio of the GG is determined by a multivariate Cox fitted on the cohort, with Lymph Node status as an additional variable kept in the model (the hypothesis of hazard proportionality has been verified).
  • An HR equal to x and significantly greater than 1 means that the risk for a GG3 sample to have the considered event is x times the risk for a GG1 sample.
  • EXAMPLE 5A MATERIAL AND METHODS FOR MICROARRAYS EXPERIMENTS
  • the experiments are conducted in Clinpath Advisors facility in Costa Mesa (CA) according to the present protocol.
  • the study will start upon receipt of the samples shipped by Ipsogen.
  • the samples will be ready to use.
  • the set-up of the qRT-PCR reaction will be done in 384 wells microplates with a Qiagility robotic platform.
  • the qRT-PCR reactions will be run on an ABI 7900 HT instrument
  • Step 1 Extraction RNA will be extracted partially on a QIACUBE instrument with the QIAGEN kit reference 73504: "RNeasy FFPE Kit” September 2010 version, according to the SOP L002.DRAFT4 "RNA Extraction from FFPE” (based on QIAGEN RNeasy FFPE kit manual).
  • Step 2 Evaluation of gDNA contamination Contamination by genomic DNA (gDNA) was assessed on an aliquot of the sample, in absence of reverse transcriptase (RT), with at least one set of primers and probes with designs at risk of amplification of gDNA.
  • gDNA genomic DNA
  • Run in absence of reverse transcriptase (RT), an aliquot of each sample with at least one set of primers and probes with designs at risk of amplification of genomic DNA.
  • RT reverse transcriptase
  • Step 3 RNA clean-up (optional)
  • Q-PCR primers Preparation Primers are prepared and stored at 25x: 10 ⁇ forward and Reverse, 5 ⁇ probe in TE pH 8.0 plus 300 Mg/ml salmon DNA as buffer, ([lx] 400nM primers, 200 nm probe).
  • Ct threshold was fixed to 0.1.
  • Primer 62 AURKA_R2 AG G CTCCAG AG ATCC ACCTT 1511 20 60,2
  • Primer 80 CCNB2_R1 GCTGAGGGTTCTCCCAATCT 536 20 60,6
  • Primer 104 TUBA1B_R1 ATCTTTGGGAACCACGTCAC 1202 20 59,8
  • EXAMPLE 6 NUMBER OF REFERENCE GENES IN qRT-PCR It is a common way in qRT-PCR to use reference genes to normalize the raw data. Using several genes increases precision, without decreasing performance of the test.
  • Data were normalized using 1, 2 or 3 reference genes selected in the group comprising the reference genes GUS, TBP and RPLPO.
  • EXAMPLE 7 THE NEW GGI SHOWS IMPROVED PERFORMANCES. COMPARED TO EXISTING

Abstract

The invention relates to methods for determining the genomic grade of a solid tumor in a subject suffering from cancer, said method comprising the steps of a) analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes, said genes being selected in a set of 24 genes, and b) determining the genomic grade of said tumor in said subject on the basis of the expression level of said genes as determined in step a). The determination of the genomic grade of a solid tumor in a subject suffering from cancer according to the invention is based on the calculation of a Genomic Grade Index (GGI) score. The invention also relates to methods for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject. Finally, the present invention relates to a polynucleotide library and a kit thereof.

Description

METHODS FOR THE DIAGNOSIS, THE DETERMINATION OF THE GRADE OF A SOLID TUMOR AND THE PROGNOSIS OF A SUBJECT SUFFERING FROM CANCER
The present patent application claims priority from the provisional patent application US 61/563,931 filed on November 28, 2011 which whole content is herein incorporated by reference.
BACKGROUND
[0001] Breast cancer is the most common cancer in women and a leading cause of cancer death worldwide. The variability of breast cancers depends on morphological appearances, molecular features, behavior and response to therapy.
[0002] One of the frequently used factor for characterizing cancers and especially breast cancers and for helping to determine the patient's prognosis is the histological grade (Rakha et al., Breast Cancer Research, 2010, vol. 12, pp:207). Said histological grade is based on the degree of differentiation of the tumor tissue and is determined by evaluating three parameters which are the frequency of cell mitosis (i.e. the rate of cell division), the tubule formation (i.e. the percentage of cancer composed of tubular structures), and the nuclear pleomorphism (i.e. the change in cell size and uniformity). In addition to characterizing the degree of differentiation of the tumor tissue, the histological grade is also a factor capable of indicating prognosis of said cancers and survival of patients by predicting tumor behavior: indeed low-grade (Grade 1) breast cancers tend to show a very good outcome whereas high- grade (Grade 3) breast cancers tend to recur and metastasize early following the diagnosis. Those indications are determinant for the identification of the treatment which should be administered to the patient. Therefore, a patient having a Grade 3 breast cancer will benefit a much more aggressive treatment than a patient with a Grade 1 breast cancer having a good prognosis. An example of test being run in order to determine the histological grade of a breast cancer is the Nottingham Modification of the Bloom-Richardson system, also known as the Nottingham Grading System (NGS), which has been highly recommended by various professional bodies internationally and which grades breast carcinomas by adding up scores for the three analyzed parameters as defined previously for classifying breast cancers into three grades: Grade 1, Grade 3 and an intermediate Grade 2.
[0003] However, it appears that 30% to 60% of the tumors are classified as an intermediate histological Grade 2. Said tumor Grade 2 tends to present an intermediate risk of recurrence, thus not being sufficiently informative for the clinical decision making regarding the treatment to assign to the patient. It has been observed that some Grade 2 breast tumors behave like Grade 1 breast tumors and therefore should not need adjuvant chemotherapy whereas some other Grade 2 breast tumors behave like Grade 3 breast tumors needing a much more aggressive treatment. Therefore, in the case of Grade 2 breast tumor, some patients might receive an aggressive treatment while their breast cancer will behave like a Grade 1 breast tumor which would not have needed such an aggressive treatment. The use of histological grade for the determination of the prognosis and treatment of breast cancer therefore appears not to be sufficient in the case of Grade 2 breast cancer. In addition to classify a large proportion of patients in an intermediate grade prognosing an intermediate risk of recurrence, the histological grading system is considered to lack reproducibility especially due to its dependence on the tissue handling, fixation and preparation. Said lack of reproducibility may also as well be caused by variation of the practitioners performing the histological grading despite the introduction of guidelines for standardization of pre- analytical parameters such as the preparation of the tissue (Rakha et al., Breast Cancer Research, 2010, vol. 12, pp:207).
[0004] In order to increase the prognostic value of tumor grading, new molecular tools have been developed, such as tools based on the gene expression profiling combined to the use of algorithms enabling the computation of indices for assessing the diagnosis, grading or prognosis of breast cancer with reference to a historical cohort. The use of such predictive mathematical algorithms and computed indices has increasingly been incorporated into guidelines for diagnostic testing and treatment, and encompass indices obtained from and validated with, inter alia, multi-stage, stratified samples from a representative population. Moreover, based on the analysis of the expression of selected genes, they enable to personalize the medicine to the patient and adapt the treatment to the disease and profile of said patient. A personalized treatment based on the patient's genomic profile also enables to avoid useless and expensive treatments which could be inefficient to the patient's disease, and thus avoid important side effects of such treatments for the patients.
[0005] An example of such molecular tool is the Gene expression Grade Index, also known as Genomic Grade Index (GGI) and that we will call in the application "97-gene GGI" or "GGI97", a microarray-based gene expression profiling disclosing classification and prognosis of breast cancers, especially estrogen receptor (ER)- positive and node-negative breast cancers, by calculating an index score on the basis of the expression levels of 97 genes. The 97-gene GGI enables classifying breast cancers into two classes, Genomic Grade 1 and Genomic Grade 3, respectively with low and high risk of recurrence, instead of three grades 1, 2 and 3 with a low, intermediate and high risk of recurrence, respectively, in the case of histological grading system, and therefore can safely spare adjuvant chemotherapy to breast cancer patients presenting an intermediate histological Grade 2 behaving like a Grade 1 breast cancer. Indeed, it has been shown that the gene expression profiles of histological Grade 1 and Grade 3 breast cancers are distinct, whereas histological Grade 2 tumors have heterogeneous gene expression profiles ranging from those for histological Grade 1 to those for histological Grade 3. Intermediate histological Grade 2 tumors can be assigned to two groups with high versus low risks of recurrence similar to those of histological Grade 3 and Grade 1 respectively. It has also been shown by a multivariable analysis that the association between relapse free survival and gene expression grade was stronger than the association between relapse-free survival and histological grade. The histological grading system based on a three categories classification can therefore be replaced with a two categories gene expression grading system providing a more accurate and medically useful prognosis. (Ignatiadis M et al, Pathobiology. 2008;75(2):104-11. Epub 2008 Jun 10) (Sotiriou et al, February 15, 2006, Journal of the National Cancer Institute, Vol. 98, No. 4, pp : 262 - 272).
[0006] In order to improve and facilitate the clinical applicability of the known Genomic Grade Index, another test, the PCR-GGI, called herein the "4-gene PCR-GGI" or "Toussaint GGI-PCR", has been developed by Toussaint et al for transposing the 97-gene GGI onto a real-time quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) assay based on a reduced set of genes compared to the 97 genes from GGI, said reduced set of genes comprising 4 genes representative of the GGI and 4 reference genes. The "4-gene PCR-GGI" is capable of reproducing in a reasonably accurate and reproducible manner the grading and prognostic of the "97-gene GGI" for estrogen receptor (ER) - positive breast cancers using both frozen and paraffin-embedded (Formalin Fixed paraffin embedded, FFPE) tumor samples, said samples being more widely available than fresh-frozen samples used in the "97-gene GGI". The "4-gene PCR-GGI" has limited prognostic performances compared to the 97-gene GGI (p value=0,07 for HG2 (Toussaint er al, BMC Genomics, 2009, vol. 10, pp:424)). The "4-gene PCR-GGI" test also enables to predict benefit from a treatment to adjuvant tamoxifen in early breast cancer patients or to first line tamoxifen in advanced breast cancer patients (Toussaint et al, BMC Genomics, 2009, vol. 10, pp:424).
[0007] Despite the numerous studies and algorithms that have been used to assess the diagnosis, grading and prognosis of breast cancer based on a molecular analysis, a need exists for alternative and accurate methods of determination of such diagnosis, grading or prognosis. Clearly, there remains a need for more practical methods of assessing the diagnosis, grading and prognosis of breast cancer with high prognostic performances. Despite the development of the Genomic Grade Index in 2006 and the improvement of said test with the 4-gene PCR-GGI from Toussaint et al in 2009, there is still a need for a reduced GGI-based signature providing efficient diagnosis, grading and prognosing of breast cancer in a more practical routine lab test.
SUMMARY
[0008] The inventors have developed a new alternative Genomic Grade Index test, herein called "GGI of the invention", "new GGI" or "new GGI-PCR", based on a minimal set of genes and that could recapitulate in an accurate and reproducible manner the grading, diagnosis and prognostic performance of the 97-gene GGI using both frozen and paraffin-embedded tumor samples, to facilitate its use in clinical practice for the diagnosis, grading of a solid tumor and prognosis of a subject suffering from cancer, preferably breast cancer. [0009] Among the 97 genes from the 97-gene GGI test, the inventors have selected a set of 24 genes among which three genes in common with the 4-gene PCR-GGI, and demonstrated that new combinations of a reduced number of genes from two genes to 24 genes selected from said set of genes enable the diagnosis, grading of a solid tumor and prognosis of a subject suffering from cancer, preferably breast cancer with comparable performances or even a better overall efficiency and practicability compared to the 4-gene and the 97-gene GGI-based methods.
[00010] The inventors have selected a set of 24 genes to meet some performance criteria. The first performance criterion is a good correlation to the 97-gene GGI. The feature selection consists of finding the combination of genes that best portray the GGI. Initially, a stepwise linear model selection was performed but the best combination was the 97-gene. A combination of 10 genes was then looked for. To avoid selection bias, the bootstrap method has been used: 100 selections were done on a resampling dataset (with replacement). Genes were ordered by selection frequency. [00011] The second performance criteria was a good prediction of Genomic Grade. The feature selection consists of finding the combination of genes that best predicts the GG (Genomic Grade), a binary variable: GGI or GG3. Different methodologies have been tested, including bayesian approach and stepwise forward combined with probit mixed model. Genes lists and scores have been compared between these different methods (intersection and differences).
[00012] The third performance criteria was a good prediction of histological grade. The feature selection consists of finding the combination of genes that best predicts the HG (Histological Grade), a binary variable: HG1 or HG3. Same methodologies have been tested. Genes lists and scores have been compared between these methods (intersection and differences).
[00013] Finally the fourth performance criteria that was analyzed for the selection of the set of genes is prognostic value (MFS at 5 years and RFS at 10 years). Said prognostic value can be evaluated with two approaches: prognostic value at defined time, e.g. 5-year MFS or 10-year RFS, or instantaneous risk evaluation. In the first case, the first step was to censure data which did not have enough follow-up and to define categories: event before T- years versus event after T-years or no event. The variable to explain was binary and the different techniques of feature selection described above were applied. In the second approach, prognostic value has been evaluated using a Cox model (stratified by datasets and ER status) adjusted for age, tumor size, nodal involvement and her2 status if necessary with stepwise forward algorithm.
[00014] These 4 steps have been made in parallel and the selection of genes was finally based on frequency of selections and intersections between the different obtained gene lists. [00015] In addition to the bio-statistical development of an alternative new GGI of the invention with a large number of combinations of genes selected in a reduced set of genes, the inventors demonstrated in independent cohorts that said new tests enable the genomic grading of breast cancer in concordance with the histological grade in histological grade 1 and 3, enables the classification of intermediate histological grade 2 breast cancer into two groups (Grade 1 and Grade 3) and the prognosis of breast cancer.
[00016] Another improvement characterizing the new Genomic Grade Index test according to the invention is the use of a reduced number of reference genes for the normalization of the expression levels of the genes to be analyzed, especially compared to the 4-gene PCR-GGI from Toussaint et al. The expression level of the genes can be normalized in order to adjust and improve the accuracy and reliability of expression levels of genes according to the invention. The normalization can be realized with one, two up to three reference genes displaying uniform expression during various phases of development, across different tissue types, and under different environmental and experimental conditions. The reference genes which can be used according to the invention are selected in the group comprising the genes GUS, TBP and RPLP0, whereas the 4-gene PCR-GGI of Toussaint et al used a normalization with a set of 4 reference genes comprising the three reference genes previously cited and another reference gene TFRC, which the inventors found to be correlated to the grade of breast cancer and to alter the results. Furthermore, inventors normalized the primers and probes variability using plasmids and standard curves.
[00017] In addition to the reduced number of genes analyzed in the methods according to the invention, the ability of the practitioner to use a combination of genes of the invention with a Genomic Grade Index test supplies an easier and practical method for diagnosing, grading and prognosing breast cancer in a routine lab test. Indeed, contrary to the 97-gene GGI method, which is based on the use of microarrays on fresh-frozen samples and which is therefore less applicable to routine testing in most hospitals and pathology laboratories due to the process of freezing the sample before testing, the new reduced GGI according to the invention applies to the use of either qRT-PCR methods on either fresh- frozen sample or paraffin- embedded (for example Formalin Fixed Paraffin Embedded, FFPE) samples or microarray methods on fresh- frozen sample with accuracy and concordance either comparable or even improving the 97-gene GGI performance as with the 4 genes- PCR-GGI.
[00018] The inventors have therefore demonstrated that the new GGI according to the invention recapitulates in an accurate and reproducible manner the diagnosis, grading and prognostic power of GGI derived from micro-array and the GGI applied to qRT-PCR of Toussaint et al by using both fresh-frozen and paraffin-embedded tumor samples when applied to either a microarray analysis or a qRT-PCR analysis. [00019] A Genomic Grade Index based on the determination of the expression level of a combination of genes selected in a group of 24 genes therefore supplies the practitioner with reduced signatures providing similar to better efficiencies for the diagnosis, grading and/or prognosis of breast cancer associated with costs and time reduction for performing the analysis compared to the molecular tools from the prior art.
[00020] The invention therefore relates to methods for the diagnosis, determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, preferably breast cancer based on the analysis of the expression level of genes in a biological sample from a subject and the determination of said diagnosis, grading or prognosis by using algorithms as described hereunder.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1. Kaplan-Meier estimates of 5-years MFS, according to genomic grade, as determined using the 24-genes, 6-genes and 3-genes signatures, in comparison with GGI97 (MQDX)
Figure 2. Kaplan-Meier estimates of 10-years MFS, according to genomic grade, as determined using the 24-genes, 6-genes and 3-genes signatures, in comparison with GGI97 (MQDX)
Figure 3. Comparison of the 5-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
Figure 4. Comparison of the 5-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
Figure 5. Comparison of the 10-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
Figure 6. Comparison of the 10-years MFS between a 3-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
Figure 7. Comparison of the 5-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
Figure 8. Comparison of the 5-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
Figure 9. Comparison of the 10-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
Figure 10. Comparison of the 10-years MFS between a 6-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
Figure 11. Comparison of the 5-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients.
Figure 12. Comparison of the 5-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
Figure 13. Comparison of the 10-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 patients. Figure 14. Comparison of the 10-years MFS between a 24-genes signature according to the invention and the 4-genes PCR-GGI (OLD GGI-PCR) in HG2 ER+ patients.
Figure 15. 5-years MFS using the classifiers SVMr, SVMI and Sum for the 24-genes signature Figure 16. 10-years MFS using the classifiers SVMr, SVMI and Sum for the 24-genes signature Figure 17. 5-years MFS using the classifiers SVMr, SVMI and Sum for the 6-genes signature Figure 18. 10-years MFS using the classifiers SVMr, SVMI and Sum for the 6-genes signature Figure 19. 5-years MFS using the classifiers SVMr, SVMI and Sum for the 3-genes signature Figure 20. 10-years MFS using the classifiers SVMr, SVMI and Sum for the 3-genes signature Figure 21. Effect of the classification of HG2 patients in GG1, GG3 or Equivocal class.
Figure 22. Comparison of DRFI for HG1, HG3 and HG2 classified in GG1, GG3 and Equivocal by Kaplan Meier - all pts
Figure 23. Comparison of DRFI for HG1, HG3 and HG2 classified in GG1, GG3 and Equivocal by Kaplan Meier - NO pts
Figure 24. Comparison of DRFI for HG1, HG3 and HG2 classified in GG1, GG3 and Equivocal by Kaplan Meier - N0/N1-3 pts
DETAILED DESCRIPTION OF THE INVENTION
I- Definitions [00021] As used herein, a "subject" in the context of the invention refers to a mammal, therefore including rodents, felines, primates, cows, horses or canines, but are not limited to these examples. In a preferred embodiment, a subject according to the invention is a human. . In a preferred embodiment, a subject according to the invention is a human having a solid tumor cancer. In a preferred embodiment, a subject according to the invention is a human having a breast cancer. In a still preferred embodiment, a subject according to the invention is a human having an estrogen-receptor (ER)-positive and node- negative breast cancer. Alternatively, a subject can also be one who has not been previously diagnosed as having an estrogen-receptor (ER)-positive and node-negative breast cancer. [00022] The terms "cancer" and "cancerous" refer to or describe the physiological condition in mammals that is typically characterized by unregulated cell growth.
[00023] The term "estrogen-receptor (ER)-positive" breast cancer is used herein to refer to cancer that has receptors for the estrogen hormone.
[00024] The term "node negative" breast cancer is used herein to refer to cancer with a maximal number of three invaded lymph nodes.
[00025] A "biological sample" according to the invention is a biological sample isolated from a subject and can include, by way of example and not limitation, a tissue sample, a fluid sample such as for example lymphatic fluid, ascites fluid, interstitital fluid, bone marrow, cerebrospinal fluid (CSF), saliva, mucous, sputum, sweat, urine, or any other secretion, excretion, or other bodily fluids, or a cell sample such as blood cells, endothelial cells, a blood sample referring to whole blood or any fraction thereof, including blood cells, serum and plasma and the like from said subject, preferably from the breast of said subject. In a preferred embodiment, a biological sample is a tissue sample or a cell sample from the breast. In the case of the determination of the grade of a solid tumor in a subject having a cancer, a biological sample according to the invention is a solid tumor biological sample. In a preferred embodiment, a breast tumor biological sample is a breast tumor biopsy or a postoperative sample. In a still preferred embodiment, a breast tumor biological sample is fresh, fresh- frozen or paraffin- embedded sample.
[00026] Any means of sampling from a subject, for example, by tissue smear or scrape, or tissue biopsy can be used to obtain a sample. Thus, the sample can be a biopsy specimen (e.g., tumor, polyp, mass (solid, cell)), aspirate or smear sample or a blood sample. In a preferred embodiment, the sample is a tissue from a breast that has a tumor (e.g., cancerous growth) and/or tumor cells. For example, a tumor biopsy can be obtained in an open biopsy, a procedure in which an entire (excisional biopsy) or partial (incisional biopsy) mass is removed from a target area. Alternatively, a tumor sample can be obtained through a percutaneous biopsy, a procedure performed with a needle-like instrument through a small incision or puncture (with or without the aid of an imaging device) to obtain individual cells or clusters of cells (e.g., a fine needle aspiration (FNA)) or a core or fragment of tissues (core biopsy). The biopsy samples can be examined cytologically (e.g., smear), histologically (e.g., frozen or paraffin section) or using any other suitable method (e.g., molecular diagnostic methods). A tumor sample can also be obtained by in vitro harvest of cultured human cells derived from an individual's tissue.
[00027] Biological samples can, if desired, be stored before analysis by suitable storage means that preserve a sample's protein and/or nucleic acid in an analyzable condition, such as quick freezing, or a controlled freezing regime. If desired, freezing can be performed in the presence of a cryoprotectant, for example, dimethyl sulfoxide (DMSO), glycerol, or propanediol-sucrose. Biological samples can also be fixed by using chemical fixative in order to be embedded in paraffin solution for its further analysis. Fixatives which can be used for such purpose includes without any limitation formalin (i.e. a formaldehyde solution), buffered formalin, formalin-Alcohol, Alcohol-formalin-acetic acid solution (AFA) or Formalin-sodium acetate solution. Biological samples can be pooled, as appropriate, before or after storage for purposes of analysis.
[00028] As used herein, the term "tumor" refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. [00029] In the context of the present invention, the terms "at least two," "at least three," etc. in reference to the genes listed in any particular gene set means any one or any and all combinations of the genes listed. [00030] The term "genes" refers to a polynucleotide sequence, e.g., isolated, such as desoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The sequences of the genes may be the sequences as listed in Table A and Table B or any complement sequence. This sequence may be the complete sequence of the gene, or a fragment of the gene which would also be suitable to perform the method of the analysis according to the invention. A person skilled in the art may choose the position and length of the gene by applying routine experiments. The term should also be understood to include, as equivalents, analogs of RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides. ESTs, chromosomes, cDNAs, mRNAs and rRNAs are representative examples of molecules that may be referred to as nucleic acids. DNA may be obtained from said nucleic acids sample and RNA may be obtained by transcription of said DNA. In addition, mRNA may be isolated from said nucleic acids sample and cDNA may be obtained by reverse transcription of said mRNA.
[00031] Genes according to the invention can be selected in a group consisting in 24 genes as listed in Table A and the 3 references genes listed in Table B.
[00032] Table A. 24 genes
Gene Gene name Accession Nucleic acid Accession Amino acid symbol number sequence number (amino sequence
(Nucleic acid (SEQ ID N°) acid sequence) (SEQ ID N°) sequence)
ASP asp (abnormal N _018136 SEQ ID N°l NP_060606 SEQ ID N°2 spindle) homolog,
microcephaly
associated
(Drosophila)
AURKA aurora kinase A NM_198433 SEQ ID N°3 NP_940835 SEQ ID N°4
BIRC5 baculoviral IAP repeat NM_001012271 SEQ ID N°5 NP_001012271 SEQ ID N°6 containing protein 5
CCNA2 cyclin A2 NM_001237 SEQ ID N°7 NP_001228 SEQ ID N°8
CCNB2 cyclin B2 NM_004701 SEQ ID N°9 NP_004692 SEQ ID N°10
CDC2 cyclin-dependent NM_001786 SEQ ID N°ll NP_001777 SEQ ID N°12 kinase 1 CDC20 cell division cycle 20 NM_001255 SEQID N°13 NP_001246 SEQID N°14 homolog (S.
cerevisiae)
CDCA3 cell division cycle NM_031299 SEQID N°15 NP_112589 SEQID N°16 associated 3
CENPA centromere protein A NM_001809 SEQID N°17 NP_001800 SEQID N°18
CEP55 centrosomal protein NM_018131 SEQID N°19 NP_060601 SEQID N°20
55kDa
CX3CR1 chemokine (C-X3-C NM_001337 SEQID N°21 NP_001328 SEQID N°22 motif) receptor 1
FU21062 chromosome 7 open NM_001039706 SEQID N°23 NP_001034795 SEQID N°24 reading frame 63
FRY furry homolog NM_023037 SEQID N°25 NP_075463 SEQ ID N°26
(Drosophila)
KIF11 kinesin family NM_004523 SEQID N°27 NP_004514 SEQID N°28 member 11
KPNA2 karyopherin alpha 2 NM_002266 SEQID IM°29 NP_002257 SEQID N°30
(RAG cohort 1,
importin alpha 1)
MCM10 minichromosome NM_182751 SEQID N°31 NP_877428 SEQID N°32 maintenance complex
component 10
MELK maternal embryonic NM_014791 SEQID N°33 NP_055606 SEQID N°34 leucine zipper kinase
PTTG1 pituitary tumor- NM_004219 SEQID N°35 NP_004210 SEQID N°36 transforming 1
RACGAP1 Rac GTPase activating NM_001126103 SEQID N°37 NP_001119575 SEQID N°38 protein 1
TPT1 tumor protein, NM_003295 SEQID N°39 NP_003286 SEQID N°40 translationally- controlled 1
TPX2 TPX2, microtubule- NM_012112 SEQID N°41 NP_036244 SEQID N°42 associated, homolog
(Xenopus laevis)
TROAP trophinin associated NM_005480 SEQID N°43 NP_005471 SEQID N°44 protein (tastin)
TUBA1B tubulin, alpha lb NM_006082 SEQID N°45 NP_006073 SEQID N°46
UBE2C ubiquitin-conjugating NM_181802 SEQID N°47 NP_861518 SEQID N°48 enzyme E2C [00033] Table B. 3 reference genes
Figure imgf000012_0001
[00034] As used herein, "gene expression" refers to the translation of information encoded in a gene into a gene product (e.g., RNA, protein). Expressed genes include genes that are transcribed into RNA that is subsequently translated into protein, as well as genes that are transcribed into non-coding functional RNA molecules that are not translated into protein (e.g., transfer RNA (tRNA), ribosomal RNA (rRNA), microRNA, ribozymes). Gene expression can be monitored by measuring the levels of either the entire RNA or protein products of the gene or fragments thereof. For the methods according to the invention, gene expression can be assessed in a biological sample from a subject.
[00035] "Level of expression" or "expression level" refers to the level (e.g. the amount) of one or more products (e.g., mRNA, protein) encoded by a given gene in a sample or reference standard.
[00036] The terms "differentially expressed gene", "differential gene expression" and their synonyms, which are used interchangeably, refer to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically breast cancer, relative to its expression in a normal or control subject. The terms also include genes whose expression is activated to a higher or lower level at different stages or different grades of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects and subjects suffering from a disease, specifically breast cancer, or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages.
[00037] As used herein, the term "over-expression" in reference to a gene occurs when the transcription and/or the translation of the gene leads to an expression level in a biological sample that is at least 10% superior to the level of expression of said gene in a control sample, preferably at least 50% superior to the level of expression of said gene in the control sample, and most preferably at least 100% superior to the level of expression of said gene in the control sample.
[00038] As used herein, the term "under-expression" in reference to a gene occurs when the transcription and/or the translation of the gene leads to an expression level in a biological sample that is at least 10% inferior to the level of expression of said gene in a control sample, preferably at least 50% inferior to the level of expression of said gene in the control sample, and most preferably at least 100% inferior to the level of expression of said gene in the control sample.
[00039] A "control" as used herein corresponds to one or more biological samples from a cell, a tissue sample or a biopsy from a tissue, preferably from the breast. Preferably, said control comprises non-tumoral cells, still preferably normal breast tissues. Said control may be obtained from the same subject than the one to be tested or from another subject, preferably from the same specie, or from a population of subject, preferably from the same specie, that may be the same or different from the test subject. In another embodiment, said control may correspond to a biological sample from a cell line, a tissue sample or a biopsy from a solid tumor, preferably from breast cancer and can be referred to a reference sample.
[00040] According to a specific embodiment, the over- or under-expression of a specific gene can be validated by comparing the expression of said gene in the biological sample to a grade 1 and/or grade 3 reference sample(s). Said over- or under-expression is confirmed if the corresponding expression level of said gene is different from less than 50%, preferably less than 25% and most preferably from less than 10% to the corresponding over- or under-expression level of said gene in said grade 1 or grade 3 reference sample.
[00041] As used herein, the "grade" of a cancer is a system used to classify cancer cells. By informing on the agressivity of a tumor, it contributes to define the long term prognosis and the treatment process. The grade according to the invention is a genomic grade (GG), i.e. said grade is determined on the basis of the expression levels of the genes. The grade determined according to the invention can be assigned as Grade 1 (GGl), i.e. a "low grade", or Grade 3 (GG3), i.e. a "high grade". A Genomic Grade 1 is indicative of a "good-prognosis" as a Genomic Grade 3 is indicative of a "poor-prognosis". [00042] The term "prognosis" relates to an individual assessment of the malignancy of a tumor, i.e. the prediction of the likelihood of cancer-attributable death or progression of a cancer including the risk of recurrence, metastatic spread or drug resistance, or to the expected survival rate of the subject such as the overall survival (OS), the disease free survival (DFS), the metastasis-free survival (MFS), the relapse-free survival (RFS) or the Distant Recurrence-Free Interval (DRFI) as defined in Hudis et al, Journal of Clinical Oncology, vol. 25, n°15, 2007. Herein, the term DRFI refers to the time from random assignment or registration until invasive recurrence at a distant site, or death from breast cancer. [00043] "Metastasis" refers to cancer cells that have spread from the original (i.e. primary) tumor to distant organs or distant lymph nodes.
[00044] A "relapse" refers to the development of a new breast tumor after the remission of the cancer, preferably the breast cancer.
[00045] A "high-risk" of recurrence means the subject is expected to have a cancer, preferably a breast cancer, relapse or metastasis in less than 10 years, preferably in less than 5 years. [00046] A "low-risk" of recurrence means the subject is expected to have no cancer, preferably breast cancer, relapse or metastasis within 5 years, preferably within 10 years.
[00047] A "good-prognosis" according to the invention indicates that the patient afflicted with cancer, preferably breast cancer, is expected to have no distant metastases within 5 years, preferably 10 years, of initial diagnosis of cancer, i.e. a metastasis-free survival (MFS) or a relapse-free survival (RFS) superior to 5 years, preferably superior to 10 years. A "good-prognosis" according to the invention corresponds to a Metastasis-Free survival (MFS) superior to 5 years, preferably 10 years, or a long-term survival. [00048] A "poor-prognosis" according to the invention indicates that the patient afflicted with cancer, preferably breast cancer, is expected to have some distant metastases within 10 years, preferably within 5 years, of initial diagnosis of cancer, i.e. a metastasis-free survival (MFS) or a relapse-free survival (RFS) inferior to 10 years, preferably 5 years.. A "poor-prognosis" according to the invention corresponds to a MFS inferior to 10 years, preferably 5 years or a long-term survival or not a long-term survival. [00049] The term "long-term survival" is used herein to refer to survival for at least 5 years, preferably for at least 8 years, most preferably for at least 10 years following surgery or other treatment. [00050] The terms "formula," "classifier" and "model" are used interchangeably for any mathematical equation, algorithmic, analytical or programmed process, or statistical technique that takes one or more continuous or categorical inputs, also called parameter, explanatory variable or predictor characteristic, and calculates an output value, sometimes referred to as an index, an index value, a categorical response associated or not with a belonging probability and/or the predicted class of the sample.
[00051] The number of algorithm families (techniques) is very large. Non-limiting examples of "formulas" include sums, ratios, and regression operators, such as coefficients or exponents, gene value transformations and normalizations, rules and guidelines, statistical classification models, and neural networks trained on historical populations. Of particular use for the expression level of genes are linear and non-linear equations and statistical classification analyses to determine the relationship between expression levels of genes detected in a subject sample and the subject's diagnosis or prognosis.
[00052] Of particular interest are structural and synactic statistical classification algorithms, and methods of risk index construction, utilizing pattern recognition features, including established techniques such as, Principal Component Analysis (PCA), factor rotation, Logistic Regression (LogReg), Linear Discriminant Analysis (LDA), Eigengene Linear Discriminant Analysis (ELDA), Support Vector Machines (SVM), Random Forest (RF), Recursive Partitioning Tree (RPART), as well as other related decision tree classification techniques, Shruken Centroids (SC), Kth-Nearest Neighbor, Boosting, Decision Trees, Neural Networks, Bayesian Networks, and Hidden Markov Models, Linear Regression or classification algorithms, Nonlinear Regression or classification algorithms, analysis of variants (ANOVA), hierarchical analysis or clustering algorithms; hierarchical algorithms using decision trees; kernel based machine algorithms such as kernel partial least squares algorithms, kernel matching pursuit algorithms, kernel Fisher's discriminate analysis algorithms, or kernel principal components analysis algorithms, among others. The resulting predictive models may be validated in other studies, or cross-validated in the study they were originally trained in, using such techniques as Leave-One-Out (LOO) and 10-Fold cross- validation (10-Fold CV).
[00053] A "GGI Formula" is a formula developed as described herein and used to calculate an output from inputs comprising the results from analysis of a biological sample comprising determining the expression level of genes as described herein. A GGI Formula is the preferred means for calculating an output according to the invention. [00054] As used herein, the term "Agreement" or "concordance" is defined as the percentage of well predicted samples.
[00055] The Histological Grade 1 (HGl)/Genomic Grade 1 (GGl) agreement (agGl) corresponds to the percentage of HGl samples that are correctly identified as GGl. The agGl is calculated as follows : agGl(%) = -^- x 100
to v ' l+a3 wherein al corresponds to the number of HGl samples that are identified as GGl, i.e. to classify the sample correctly; and wherein a3 corresponds to the number of HGl samples are identified as GG3, i.e. samples that have been incorrectly classified.
[00056] The Histological Grade 3 (HG3)/Genomic Grade 3 (GG3) agreement (agG3) corresponds to the percentage of Histological Grade 3 samples that are identified as Genomic Grade 3. The agG3 is calculated as follows :
Wherein bl corresponds to the number of HG3 samples that are identified as GGl, i.e. samples that have been incorrectly classified, and wherein b3 corresponds to the HG3 samples that are identified as GG3, i.e. samples that have been correctly classified.
Table C. Variables for the calculation of the Agreement
Figure imgf000016_0001
[00057] As used herein, the Overall agreement corresponds to the general performance of the model. It is calculated as follows :
Overall agreement(%) = x 100
° al+a3+bl+b3
[00058] "Performance" is a term that relates to the overall usefulness and quality of a diagnostic or prognostic test, including, among others, clinical and analytical accuracy, other analytical and process characteristics, such as use characteristics (e.g., stability, ease of use), health economic value, and relative costs of components of the test. Any of these factors may be the source of superior performance and thus usefulness of the test. [00059] By "statistically significant", it is meant that the alteration is greater than what might be expected to happen by chance alone. Statistical significance can be determined by any method known in the art. Commonly used measures of significance include the p-value, which indicates the probability that an observation has arisen by chance alone. A result is often considered highly significant at a p-value of 0.05 or less, representing a 5% or less chance that the observation of interest arose by chance. Such p-values depend significantly on the power of the study performed. [00060] "Clinical parameters" or "CPs" encompasses all non-sample or non-analyte expression levels of genes of subject health status or other characteristics, such as, without limitation, age (AGE), race or ethnicity (RACE), gender (SEX), family history (FX).
[00061] "DNA arrays" consist of large numbers of DNA molecules or DNA fragments, herein designated "probes", spotted in a systematic order on a solid support or substrate such as a nylon membrane, glass slide, glass beads or a silicon or ceramic chip.
[00062] In the present invention, the term "polynucleotide" refers to a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. A polynucleotide in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA. The polynucleotide sample isolated from the subject and obtained at step (a) is RNA, preferably mRNA. Said polynucleotide sample isolated from said subject can also correspond to cDNA obtained by Reverse Transcription of the mRNA, or a product of ligation after specific hybridization of specific probes to mRNA or cDNA.
[00063] According to the invention, the term "immobilized on a support" means bound directly or indirectly thereto including attachment by covalent binding, hydrogen binding, ionic interaction, hydrophobic interaction or otherwise.
[00064] As used herein, the term "kit" refers to any delivery system for delivering materials. In the context of reaction assays, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., oligonucleotides, enzymes, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials.
[00065] As used herein, the term "fragmented kit" refers to delivery systems comprising two or more separate containers that each contains a subportion of the total kit components. The containers may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container contains oligonucleotides. The term "fragmented kit" is intended to encompass kits containing Analyte specific reagents (ASR's) regulated under section 520(e) of the Federal Food, Drug, and Cosmetic Act, but are not limited thereto. Indeed, any delivery system comprising two or more separate containers that each contains a subportion of the total kit components are included in the term "fragmented kit."
[00066] In contrast, a "combined kit" refers to a delivery system containing all of the components of a reaction assay in a single container (e.g., in a single box housing each of the desired components). The term "kit" includes both fragmented and combined kits.
[00067] As used herein, a "diagnostic system" is any system capable of carrying out the methods of the invention, including computing systems, environments, and/or configurations that may be suitable for use with the methods or system of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
[00068] A machine-readable storage medium can comprise a data storage material encoded with machine readable data or data arrays which, when using a machine programmed with instructions for using said data, is capable of use for a variety of purposes, such as, without limitation, subject information relating to breast cancer or in response to breast cancer drug therapies, drug discovery, and the like. Measurements of the expression levels of genes of the invention and/or the resulting diagnosis or prognosis from those genes can implemented in computer programs executing on programmable computers, comprising, inter alia, a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code can be applied to input data to perform the functions described above and generate output information. The output information can be applied to one or more output devices, according to methods known in the art. The computer may be, for example, a personal computer, microcomputer, or workstation of conventional design. [00069] Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. The language can be a compiled or interpreted language. Each such computer program can be stored on a storage media or device (e.g., ROM or magnetic diskette or others as defined elsewhere in this disclosure) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The health-related data management system of the invention may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform various functions described herein. Expression levels of genes can then be determined and compared to a reference value, e.g. a control subject or population whose breast cancerous state is known or an index value or baseline value.
[00070] The reference sample or index value or baseline value may be taken or derived from one or more subjects who has been diagnosed with breast cancer, one or more subjects whose breast cancer has been histologicaly graded, whose prognosis has been determined and/or who has been exposed to a treatment. Alternatively, the reference sample or index value or baseline value may be taken or derived from one or more subjects who have not been exposed to the treatment. A reference value can also comprise a value derived from algorithms or computed indices from population studies such as those disclosed herein.
[00071] The steps of the methods and systems according to the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the methods or system of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like, including those systems, environments, configurations and means described elsewhere within this disclosure.
[00072] The steps of the methods and systems according to the invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In both integrated and distributed computing environments, program modules may be located in both local and remote computer storage media including memory storage devices.
II- The invention [0185] The present invention provides methods for the diagnosis, determination of the grade of a solid tumor, and for the prognosis of a subject suffering from a cancer by using an algorithmic analysis of genes in a biological sample from the subject. [00073] Algorithms are typically deterministic functions that map a multi-dimensional vector of biological measurements such as the expression level of genes to a binary (or n- ary) outcome variable that encodes the absence or existence of a clinically-relevant class, phenotype, distinct physiological state or distinct state of disease. Such algorithms include any of a variety of statistical analyses used to determine relationships between variables.
[00074] The process of building or learning a classifier involves two steps: (1) selection of a family functions that can approximate the systems response, and using a finite sample of observations (training data) to select a function from the family of functions that best approximates the system's response by minimizing the discrepancy or expected loss between the system's response and the function predictions at any given point. Depending on the chosen feature selection strategy, the combination of the different data (clinical data, mRNA, microRNA, metabolites, proteins) can take place before or after feature selection. The combined data is then used as input to train and validate the classifier. However, it is also possible to train several different classifiers for the different data separately and then combine the classifiers to the predictive signature. As the data types may be very different from qualitative/categorical to quantitative/numerical, not all classifiers may work for such multilevel data; e.g., some classifiers accept only quantitative data. Hence, depending on the data types one has to choose a class of functions for classification which has an appropriate domain. Numerous feature selection strategies for classification have been proposed, for a comprehensive survey see e.g. [M. A. Hall and G. Holmes, Benchmarking Attribute Selection Techniques for Discrete Class Data Mining.
[00075] To achieve this various classification, well-known methods such as, but not limited to, cross-, Principal Components Analysis (PCA), factor rotation, Logistic Regression (LogReg), (diagonal) linear or quadratic discriminant analysis (Linear Discriminant Analysis (LDA), QDA, DLDA, DQDA), Eigengene Linear Discriminant Analysis (ELDA), Support Vector Machines (SVM), Random Forest (RF), Recursive Partitioning Tree (RPART), related decision tree classification techniques, partitioning around medoids (PAM), self organizing maps (SOM), perceptron, Shrunken Centroids (SC), , Kth-Nearest Neighbor, K-nearest neighbor classifiers (K-NN), Boosting, Bagging, Decision Trees, Neural Networks, Bayesian Networks, and Hidden Markov Models, Linear Regression or classification algorithms, Nonlinear Regression or classification algorithms, analysis of variants (ANOVA), generalized partial least squares (GPLS), hierarchical analysis or clustering algorithms; hierarchical algorithms using decision trees; kernel based machine algorithms such as kernel partial least squares algorithms, kernel matching pursuit algorithms, kernel Fisher's discriminate analysis algorithms, or kernel principal components analysis algorithms, or other mathematical and statistical methods can be used to develop a Formula for calculation of an output correlated with or indicating the diagnosis or prognosis of breast cancer.
[00076] Although various preferred formula are described here, several other model and formula types beyond those mentioned herein and in the definitions above are well known to one skilled in the art. The actual model type or formula used may itself be selected from the field of potential models based on the performance and diagnostic accuracy characteristics of its results in a training population. [00077] The specifics of the formula itself may commonly be derived from the histological grade results, the clinical parameters and/or the expression level of genes in the relevant training population.
[00078] Amongst other uses, such formula may be intended to map the feature space derived from the expression level of genes inputs to a set of subject classes (e.g. useful in predicting class membership of subjects as normal or subject having a breast cancer, etc), to derive an estimation of a probability function of risk using a Bayesian approach, or to estimate the class-conditional probabilities, then use Bayes' rule to produce the class probability function as in the previous case.
[00079] Preferred formulas include the broad class of statistical classification algorithms, and in particular the use of discriminant analysis. The goal of discriminant analysis is to predict class membership from a previously identified set of features. In the case of linear discriminant analysis (LDA), the linear combination of features is identified that maximizes the separation among groups by some criteria. Features can be identified for LDA using an eigengene based approach with different thresholds (ELDA) or a stepping algorithm based on a multivariate analysis of variance (MANOVA). Forward, backward, and stepwise algorithms can be performed that minimize the probability of no separation based on the Hotelling-Lawley statistic.
[00080] Eigengene-based Linear Discriminant Analysis (ELDA) is a feature selection technique developed by Shen et al. (2006). The formula selects features (e.g. the expression level of genes) in a multivariate framework using a modified eigen analysis to identify features associated with the most important eigenvectors. "Important" is defined as those eigenvectors that explain the most variance in the differences among samples that are trying to be classified relative to some threshold.
[00081] A support vector machine (SVM) is a classification formula that attempts to find a hyperplane that separates two classes. This hyperplane contains support vectors, data points that are exactly the margin distance away from the hyperplane. In the likely event that no separating hyperplane exists in the current dimensions of the data, the dimensionality is expanded greatly by projecting the data into larger dimensions by taking non-linear functions of the original variables (Venables and Ripley, 2002). Although not required, filtering of features for SVM often improves prediction. Features (e.g., expression level of genes) can be identified for a support vector machine using a non-parametric Kruskal-Wallis (KW) test to select the best univariate features. Support vector machines are a set of related supervised learning techniques used for classification and regression and are described, e.g., in Cristianini et al, "An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods," Cambridge University Press (2000). Support vector machine analysis can be performed, e.g., using the SVM</fe/>" software developed by Thorsten Joachims (Cornell University) or using the LIBSVM software developed by Chih- Chung Chang and Chih-Jen Lin (National Taiwan University).
[00082] Random forests are learning statistical classifier systems that are constructed using an algorithm developed by Leo Breiman and Adele Cutler. Random forests use a large number of individual decision trees and decide the class by choosing the mode (i.e., most frequently occurring) of the classes as determined by the individual trees. Random forest analysis can be performed, e.g., using the RandomForests software available from Salford Systems (San Diego, CA). See, e.g., Breiman, Machine Learning, 45:5-32 (2001); and http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm, for a description of random forests.
[00083] Classification and regression trees represent a computer intensive alternative to fitting classical regression models and are typically used to determine the best possible model for a categorical or continuous response of interest based upon one or more predictors. Classification and regression tree analysis can be performed, e.g., using the CART software available from Salford Systems or the Statistica data analysis software available from StatSoft, Inc. (Tulsa, OK). A description of classification and regression trees is found, e.g., in Breiman et al. "Classification and Regression Trees," Chapman and Hall, New York (1984); and Steinberg et al, "CART: Tree-Structured Non-Parametric Data Analysis," Salford Systems, San Diego, (1995).
[00084] Other formula may be used in order to pre-process the results of individual expression level of genes according to the invention into more valuable forms of information, prior to their presentation to the predictive formula. Most notably, normalization of expression level of genes results, using either common mathematical transformations such as logarithmic or logistic functions, as normal or other distribution positions, in reference to a population's mean values, etc. are all well known to those skilled in the art. [00085] In addition to the individual parameter values of one subject potentially being normalized, an overall predictive formula for all subjects, or any known class of subjects, may itself be recalibrated or otherwise adjusted based on adjustment for a population's expected prevalence and mean expression level of genes' parameter values, according to the technique outlined in D'Agostino et al. (2001) JAMA 286:180-187, or other similar normalization and recalibration techniques. Such epidemiological adjustment statistics may be captured, confirmed, improved and updated continuously through a registry of past data presented to the model, which may be machine readable or otherwise, or occasionally through the retrospective query of stored samples or reference to historical studies of such parameters and statistics. Additional examples that may be the subject of formula recalibration or other adjustments include statistics used in studies by Pepe, M. S. et a I, 2004 on the limitations of odds ratios; Cook, N. R., 2007 relating to ROC curves; and Vasan, R. S., 2006 regarding biomarkers of cardiovascular disease.
[00086] The learning statistical classifier systems described herein can be trained and tested using a cohort of samples from healthy individuals, cancer patients, cancer cell lines, and the like as a training data containing instances labeled according to classes, e.g. HG1 and HG3 or healthy and diseased, and then tested on at least one test data set which includes novel instances not used for the training. For example, the training data can be obtained from a selected population of individuals where historical information is available regarding the histological grade of their breast cancers, the values of expression level of genes as described hereunder in the population and/or their clinical outcomes. Said training data can be obtained from samples from patients diagnosed by a physician, and preferably by an oncologist, as having cancer are suitable for use in training and testing the learning statistical classifier systems of the present invention. Samples from healthy individuals can include those that were not identified as having cancer. In certain embodiments, samples from cancer cell lines can be used in training and testing the learning statistical classifier systems described herein. One skilled in the art will know of additional techniques and diagnostic criteria for obtaining a cohort of samples that can be used in training and testing the learning statistical classifier systems of the present invention. [00087] Any formula may be used to combine results into indices herein called "output" useful in the practice of the invention. An output from an algorithm of the invention can be a score, i.e. a number for a subject that is determined using an algorithm according to the methods of the present invention, such as an index, an index value or a probability. An output from an algorithm of the invention can also be a status, such as the predicted class of the sample, for example the presence or absence of a breast cancer in said subject. As mentioned hereabove, such indices, without limitation, may indicate, among the various other indications, the probability, likelihood, prognosis, long-term survival, Metastasis-Free survival in the diagnosis of breast cancer, the diagnosis of the grade of a breast tumor or the prognosis of breast cancer. An expected output is an index or index value, a probability and/or the predicted class of the sample. To calculate an output for a given subject according to the invention, the expression level of genes according to the invention are obtained from one or more samples collected from the subject and used as input data (inputs into a Formula fitted to the actual historical data obtained from the selected population of individuals). Finally, the numeric result of a classifier formula itself may be transformed post-processing by its reference to an actual clinical population and study results and observed endpoints, in order to calibrate to absolute risk and provide confidence intervals for varying numeric results of the classifier or formula. An example of this is the presentation of absolute risk, and confidence intervals for that risk, derived using an actual clinical study, chosen with reference to the output of the recurrence score formula in the Oncotype Dx product of Genomic Health, Inc. (Redwood City, Calif.). A further modification is to adjust for smaller sub-populations of the study based on the output of the classifier or risk formula and defined and selected by their Clinical Parameters, such as age or sex.
[00088] In some embodiments of the invention, the output of the invention is calculated automatically. The output of the invention can be calculated by a computer, a calculator, a programmable calculator, or any other device capable of computing, and can be communicated to the individual by a health care practitioner, including, but not limited to, a physician, nurse, nurse practitioner, pharmacist, pharmacist's assistant, physician's assistant, laboratory technician, or by an organization such as a health maintenance organization, a hospital, a clinic, an insurance company, a health care company, or a national, federal, state, provincial, municipal, or local health care agency or health care system, or automatically, for example, by a computer, microprocessor, or dedicated device for delivering such advice.
[00089] In certain instances, the algorithms of the present invention can use a quantile measurement of a particular profile, i.e. the expression level of genes according to the invention, within a given population as a variable. Quantiles are a set of "cut points" that divide a sample of data into groups containing (as far as possible) equal numbers of observations. For example, quartiles are values that divide a sample of data into four groups containing (as far as possible) equal numbers of observations. The lower quartile is the data value a quarter way up through the ordered data set; the upper quartile is the data value a quarter way down through the ordered data set. Quintiles are values that divide a sample of data into five groups containing (as far as possible) equal numbers of observations. The present invention can also include the use of percentile ranges of profiles (e.g., tertiles, quartile, quintiles, etc.), or their cumulative indices (e.g., quartile sums of profiles, etc.) as variables in the algorithms (just as with continuous variables).
[00090] In certain instances, cut-off values can be determined and independently adjusted for each of a number of genes to observe the effects of the adjustments on clinical parameters. In particular, Design of Experiments (DOE) methodology can be used to simultaneously vary the cut-off values and to determine the effects on the resulting clinical parameters. The DOE methodology is advantageous in that variables are tested in a nested array requiring fewer runs and cooperative interactions among the cut-off variables can be identified. Optimization software such as DOE Keep It Simple Statistically (KISS) can be obtained from Air Academy Associates (Colorado Springs, CO) and can be used to assign experimental runs and perform the simultaneous equation calculations. Using the DOE KISS program, an optimized set of cut-off values for a given clinical parameter and a given set of biomarkers can be calculated. ECHIP optimization software, available from ECHIP, Inc. (Hockessin, DE), and Statgraphics optimization software, available from STSC, Inc. (Rockville, MD), are also useful for determining cut-off values for a given set of genes. Alternatively, cut-off values can be determined using Receiver Operating Characteristic (ROC) curves and adjusted to achieve the desired clinical parameter values.
[00091] Moreover, any of the aforementioned Clinical Parameters may be used in the practice of the invention as an input to a formula or as a pre-selection criteria defining a relevant population to be measured using a particular formula. As noted above, Clinical Parameters may also be useful in the genes normalization and pre-processing, or in formula type selection and derivation, and formula result post-processing.
[00092] . One embodiment of the invention is to tailor formulas to the population and endpoint or use that is intended. The breast cancer endpoints of the invention include, among others, the Overall Survival (OS), the Recurrence-Free Survival (RFS), the Distant Relapse-Free Survival (DRFS) Metastasis-Free Survival (MFS) and the Distant Recurrence Free Interval (DRFI), as defined by Hudis CA et al, J Clin Oncol. 2007 May 20;25(15):2127-32. For example, the genes and formulas may be used for assessment of subjects for primary prevention and diagnosis and for secondary prevention and management. For the primary assessment, the genes and formulas may be used for prediction and risk stratification for conditions and for the diagnosis of breast cancer. For secondary prevention and management, the genes and formulas may be used for prognosis of breast cancer. The genes and formulas may be used for clinical decision support, such as determining whether to defer intervention to next visit, to recommend normal preventive check-ups, to recommend increased visit frequency, to recommend increased testing and to recommend therapeutic intervention. The genes and formulas may also be useful for intervention in subjects with breast cancer, such as therapeutic selection and response, adjustment and dosing of therapy, monitoring ongoing therapeutic efficiency and indication for change in therapeutic intervention.
[00093] Finally, methods according to the invention can be used for enhancing performance for use also in subjects undergoing therapeutic interventions. Identifying the breast cancer subject and their genomic grade enables the selection and initiation of various therapeutic interventions or treatment regimens in order to treat such breast cancer. In this method, a biological sample can be provided from a non-treated subject or from a subject undergoing treatment regimens or therapeutic interventions, e.g., drug treatments, for breast cancer. Such treatment regimens or therapeutic interventions can include, but are not limited to, surgical intervention, administration of pharmaceuticals, and treatment with therapeutics or prophylactics used in subjects diagnosed with breast cancer. If desired, biological samples are obtained from the subject at various time points before, during, or after treatment. To identify therapeutics or drugs that are appropriate for a specific subject, a test sample from the subject can also be exposed to a therapeutic agent or a drug, and the expression level of genes can be determined. The expression level of genes can be compared to sample derived from the subject before and after treatment or exposure to a therapeutic agent or a drug, or can be compared to samples derived from one or more subjects who have shown improvements as a result of such treatment or exposure.
[00094] The invention provides improved diagnosis, determination of the grade of a solid tumor, and prognosis of a subject suffering from cancer by measuring the expression level of genes according to the invention and utilizing mathematical algorithms, classifiers or formula in order to combine information from results into a single output enabling such a diagnosis or prognosis.
[00095] Therefore, a first aspect of the invention concerns a method for determining the GGI score of a solid tumor in a subject having a cancer, said method comprising a step a) of analyzing a biological sample from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTG1, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM10, KPNA2, CDC2 and CDC20, and a further step b) of determining the GGI score of said solid tumor from said subject suffering from cancer, wherein said a GGI formula is executed based on inputs comprising the expression level of said genes from said subject as determined in step a) and wherein said GGI Formula is: .
GGI = a x (∑"=1 cci xi— b),
Where :
a, b e IR (set of real numbers)
Xi is the expression level of the ith gene
a, is the coefficient affected to the ith gene [00096] Another aspect of the invention concerns a method for the diagnosis of a cancer, said method comprising a step a) of analyzing a biological sample from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes , said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, KPNA2, CDC2 and CDC20, and a step b) of determining the diagnosis of said subject on the basis of the expression level of said genes as determined in step a).
[00097] Another aspect of the invention concerns a method for determining the genomic grade of a solid tumor in a subject suffering from cancer, said method comprising the following steps :
a) analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes , said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTG1, CCNA2, CCNB2, ASPM,
FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM10, KPNA2, CDC2 and CDC20, and b) determining the genomic grade of said tumor in said subject on the basis of the expression level of said genes as determined in step a), wherein i) an overexpression of a gene selected in the group consisting of CX3CR1, FU21062, FRY and TPT1 is associated with a Genomic Grade 1
ii) An underexpression of a gene selected in the group consisting of CX3CR1, FU21062, FRY and TPT1 is associated with a Genomic Grade 3
iii) an overexpression of a gene selected in the group consisting of ASPM, AURKA, BIRC5, CCNA2, CCNB2, CDC2, CDC20, CDCA3, CENPA, CEP55, KIF11, KPNA2, MCM10, MELK, PTTG1, RACGAP1, TPX2, TROAP, TUBA1B, UBE2C is associated with a Genomic Grade 3.
iv) An underexpression of a gene selected in the group consisting of ASPM, AURKA, BIRC5, CCNA2, CCNB2, CDC2, CDC20, CDCA3, CENPA, CEP55, KIF11, KPNA2, MCM10, MELK, PTTGl, RACGAP1, TPX2, TROAP, TUBA1B, UBE2C is associated with a Genomic Grade 1.
[00098] In a preferred embodiment, a combination of at least 2 genes up to 24 genes corresponds to a combination of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 22, 23 or 24 genes selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM 10, KPNA2, CDC2 and CDC20.
[00099] In a still preferred embodiment, the genomic grade is determined on the basis of an output from an algorithm, said algorithm being executed on the basis of inputs comprising the expression level of said genes from said subject as determined in step a). [000100] In a still preferred embodiment, said algorithm is selected in the group comprising but not limited to Decision learning trees (CART, Recursive partitional tree/RPART), hierarchical clustering, Random Forest (RF).
[000101] In another still preferred embodiment, said output is a Genomic Grade Index (GGI) score indicating the genomic grade from said tumor, and wherein said GGI score is calculated from the following GGI formula :
GGI = a x (∑i=1 i xi - b),
Where :
a, b e R (set of real numbers)
X, is the expression level of the ith gene
a, is the coefficient affected to the ith gene
[000102] According to the invention, α, , i.e. the coefficient affected to the ith gene, enables to balance the expression of the genes.
[000103] According to the invention, the coefficient "b" is used to adjust the cut-off of separation between the GGI and GG3. In a preferred embodiment, it enables to put the cutoff to the value 0, therefore the samples having a value of GGI inferior to 0 would be considered as GGI and the samples having a value of GGI superior to 0 would be considered as GG3.
[000104] According to the invention, the coefficient "a" is a multiplicative coefficient which is used to "normalize" the GGI. In a preferred embodiment, it can be used in order to match the GGI samples to a GGI with a negative value -1, and the GG3 samples to a GGI with a value of 1.
[000105] In another preferred embodiment, said algorithm is selected in the group comprising but not limited to Support Vector Machine (SVM), radial or linear kernel (SVMr or SVMI), Sum of gene expressions, Probit model, Logistic model, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Principal component analysis (PCA), preferably Principal component analysis (PCA).
[000106] In a still preferred embodiment, said method of determining the grade of a solid tumor in a subject suffering from cancer is a method for determining the prognosis of said tumor, and wherein a tumor identified as having a Genomic Grade 1 is indicative of a "good-prognosis", whereas a tumor identified as having a Genomic Grade 3 is indicative of a "poor-prognosis".
[000107] In another preferred embodiment, a "good-prognosis" is a Metastasis-Free survival (MFS) superior to 5 years, preferably 10 years, or a long-term survival and a "poor- prognosis" is an MFS inferior to 10 years, preferably 5 years or a long-term survival or not a long-term survival.
[000108] Therefore, in a still preferred embodiment, said tumor according to the invention has been previously identified as a Histological Grade 2 tumor.
[000109] Indeed, the use of an output determined according to the invention for the reclassification of Histological Grade 2 tumors into Genomic Grade 1 tumors and Genomic Grade 3 tumors ranging respectively from the Grade 1 and Grade 3 tumors' values has been demonstrated by the inventors.
[000110] Said method according to the invention can also be used for determining the treatment of said solid tumor from said subject suffering from cancer. Indeed, the grade of said solid tumor is indicative of the aggressiveness of the treatment which will be needed to said subject. Solid tumors having a Genomic Grade 3 will require more aggressive treatments (for example chemotherapy with adjuvant) than solid tumors having a Genomic Grade 1.
[000111] In another preferred embodiment, said method of determining the grade of a solid tumor in a subject suffering from cancer according to the invention further comprises a step a') of normalizing the expression levels of said genes as determined in step a) with at least one, preferably two or three references genes selected in the group comprising the genes GUS, TBP and RPLP0.
[000112] In a preferred embodiment, the method for determining the grade of a solid tumor in a subject suffering from cancer according to the invention is characterized in that the following combinations of genes are excluded :
- CCNA2, CDC2, KPNA2, CDC20
- CCNA2, CDC2, KPNA2, AURKA
- CCNA2, CDC2, CDC20, AURKA
- CCNA2, KPNA2, CDC20, AURKA
- CDC2, KPNA2, CDC20, AURKA
- CCNA2, CDC2, KPNA2, CDC20, AURKA
- BIRC5,PTTG1
- BIRC5JUBA1B
- RACGAP1,FRY
- RACGAP1,KIF11
- PTTGIJUBAIB
- RACGAP1,CCNA2,KIF11
- RACGAP1,CCNA2,CDC2
- BIRC5,RACGAP1,PTTG 1,TUBA1B
- PTTG 1,ASPM,MCM 10,CDC20 [000113] In a preferred embodiment, the method for determining the grade of a solid tumor in a subject suffering from cancer according to the invention comprises the step a) of analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of genes comprising at least the 3 genes ASPM, CX3CR1 and MCM10, to which are added from 0 to 7 genes selected in a group consisting of PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCM10, FRY, CCNA2, CDC2 and CDCA3.
[000114] In a still preferred embodiment, the method for determining the grade of a solid tumor in a subject suffering from cancer according to the invention comprises the step a) of analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of genes comprising at least the 3 genes ASPM, CX3CR1 and MCM10, to which are added from 0 to 4 genes selected in a group consisting PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCM10 and FRY.
[000115] In a still preferred embodiment, the method for determining the grade of a solid tumor in a subject suffering from cancer according to the invention comprises the step a) of analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of the 6 genes consisting of PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY.
[000116] In a still most preferred embodiment, said coefficients ai affected to the ith genes corresponding to the 6 genes consisting of PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY are selected in Table D when the algorithm chosen to determine the grade of a solid tumor from a subject suffering from cancer is selected between the probit, Logit or Sum alogirthms.
[000117] Table D. Preferred coefficients for the GGI, using the 6-gene signature consisting of ASPM, CCNB2, CX3CR1, FRY, MCM10 and PTTGl, according to the algorithm used (probit, logit, sum or PCA) and the method used to describe the gene expression level (Ctnorm, NCN, ΔΔ02 or ΔΔ0ί3)
Algorith Method a b aASPM 3CCNB2 acX3CRl a FRY aMCMlO a PTTGl m for gene
expressi
on
Probit CTnorm 1 4.95 -0.87 -0.13 0.80 0.30 -0.17 -0.67
NCN 1 3.22 7.32 -1.18 -5.43 -0.87 0.06 3.01
DDCt2 1 2.11 -0.43 -0.22 0.68 0.51 -0.31 -0.44
DDCt3 1 3.31 -0.97 0.08 0.63 0.56 -0.33 -0.50
Logit CTnorm 1 8.51 -1.56 -0.19 1.45 0.47 -0.25 -1.24
NCN 1 6.23 12.63 -1.89 -9.24 -1.08 0.25 5.14 DDCt2 1 3.97 -0.76 -0.39 1.17 0.85 -0.54 -0.79
DDCt3 1 5.92 -1.68 0.15 1.08 0.91 -0.61 -0.87
Sum CTnorm -0.18 13.46 1 1 -1 -1 1 1
NCN 0.59 -3.17 1 1 -1 -1 1 1
DDCt2 -0.18 10.05 1 1 -1 -1 1 1
DDCt3 -0.18 10.95 1 1 -1 -1 1 1
PCA CTnorm -0.46 -0.37 0.32 0.21 -0.51 -0.48
NCN 0.46 0.41 -0.33 -0.25 0.49 0.46
DDCt2 -0.48 -0.43 0.29 0.22 -0.48 -0.47
DDCt3 -0.47 -0.41 0.31 0.17 -0.50 -0.48
[000118] In a still preferred embodiment, the value of said coefficients a, affected to the ith genes corresponding to the 6 genes consisting of PTTG1, CCNB2, ASPM, CX3CR1, MCM10 and FRY as defined in Table D can vary from 0,1%, 10%, 20%, 30%, 40% to 50%.
[000119] In a preferred embodiment, a subject according to the invention is a mammal, preferably a human.
[000120] In a preferred embodiment, a cancer according to the invention is a breast cancer.
[000121] In a preferred embodiment, a biological sample according to the invention is a tissue sample, a fluid sample, a cell sample or a blood sample of said subject, preferably from the breast of said subject.
[000122] In a still preferred embodiment, said biological sample is a fresh/frozen or a paraffin-embedded biological sample, preferably a paraffin-embedded biological sample.
[000123] In a preferred embodiment, the determination of the expression level of genes according to the invention is performed on nucleic acids from a biological sample as disclosed previously.
[000124] In a still most preferred embodiment, said step of determining the expression level of genes according to the invention is performed by Reverse-Transcription Polymerase Chain Reaction (RT-PCR), preferably by real-time Reverse-Transcription Polymerase Chain Reaction (qRT-PCR).
[000125] In a still most preferred embodiment, said step of determining the expression level of genes according to the invention is performed on DNA microarrays. [000126] In another preferred embodiment, the step of determining the expression level according to the invention is performed by determining the amount of proteins in a biological sample. [000127] In another preferred embodiment, said method for determining the grade of a solid tumor in a subject suffering from cancer further comprises generating a printed report of some or all the conclusions drawn from the data, or of a score or comparison between the results obtained for said subject. [000128] Another aspect of the invention relates to a polynucleotide library comprising or corresponding to polynucleotide sequences allowing the detection of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAPl, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPTl, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, KPNA2, CDC2 and CDC20 listed in Table A .
[000129] In a preferred embodiment, polynucleotide sequences allowing the detection of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAPl, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPTl, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM 10, KPNA2, CDC2 and CDC20 listed in Table A according to the invention can be any sequence between 3' and 5' end of the polynucleotide sequences of the corresponding genes as defined in Table A allowing a complete detection of the implicated genes.
[000130] In a most preferred embodiment, the polynucleotide library of the invention may comprise or may consist of the polynucleotide sequences as defined in the examples or derivatives thereof, listed in Table A.
[000131] In still most preferred embodiment, the polynucleotide library of the invention may comprise or may consist of the polynucleotide sequences listed in Table A or derivatives thereof.
[000132] In another preferred embodiment, the polynucleotide library according to the invention comprises or corresponds to polynucleotide sequences allowing the detection of a combination of at least the 3 genes ASPM, CX3CR1 and MCM IO to which are added from 0 to 7 genes selected in a group consisting of PTTGl, CCNB2, ASPM, TPTl, CX3CR1, MCMIO, FRY, CCNA2, CDC2 and CDCA3 listed in Table A.
[000133] In another preferred embodiment, the polynucleotide library according to the invention comprises or corresponds to polynucleotide sequences allowing the detection of a combination of genes comprising at least the 3 genes ASPM, CX3CR1 and MCM IO, to which are added from 0 to 4 genes selected in a group consisting PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCMlO and FRY.
[000134] In another preferred embodiment, the polynucleotide library according to the invention comprises or corresponds to polynucleotide sequences allowing the detection of of the combination of the 6 genes PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY.
[000135] In a still preferred embodiment, the polynucleotide library according to the invention does not comprise more than 500 polynucleotide sequences, preferably not more than 200 polynucleotide sequences, and most preferably not more than 100 polynucleotide sequences.
[000136] The expression level of genes can be determined in a biological sample and compared to the "normal control level", utilizing techniques such as reference limits, discrimination limits, or risk defining thresholds to define cutoff points and abnormal values for breast cancer. Such normal control level and cutoff points may vary based on whether a gene is used alone or in a formula combining with other genes into an index.
[000137] Amongst the various assessments of performance, the methods according to the invention for the diagnosis of breast cancer, the diagnosis of the grade of a breast tumor and the prognosis of breast cancer in a subject are intended to provide accuracy in clinical diagnosis and prognosis. The accuracy of a diagnostic or prognostic test, assay, or method concerns the ability of the test, assay, or method to distinguish between subjects having breast cancer is based on whether the subjects have an "effective amount" or a "significant alteration" in the levels of one or more genes.
[000138] By "effective amount" or "significant alteration," it is meant that the measurement of the expression level of a gene is different than the predetermined cut-off point (or threshold value) for that gene and therefore indicates that the subject has a breast cancer for which the gene is a determinant. The difference in the level of genes between normal and abnormal is preferably statistically significant and may be an increase in gene expression level or a decrease in gene expression level. As noted below, and without any limitation of the invention, achieving statistical significance, and thus the preferred analytical and clinical accuracy, generally but not always requires that combinations of several genes be used together in panels and combined with mathematical algorithms in order to achieve a statistically significant genomic grade index.
[000139] In the categorical diagnosis of a disease state, changing the cut point or threshold value of a test (or assay) usually changes the sensitivity and specificity, but in a qualitatively inverse relationship. Therefore, in assessing the accuracy and usefulness of a proposed medical test, assay, or method for assessing the diagnosis or prognosis of a subject's condition, one should always take both sensitivity and specificity into account and be mindful of what the cut point is at which the sensitivity and specificity are being reported because sensitivity and specificity may vary significantly over the range of cut points. Use of statistics such as AUC (Area under the ROC curve), encompassing all potential cut point values, is preferred for most categorical risk measures using the invention, while for continuous risk measures, statistics of goodness-of-fit and calibration to observed results or other gold standards, are preferred.
[000140] Using such statistics, an "acceptable degree of diagnostic reliability", is herein defined as a test or assay (such as the test of the invention for determining the clinically significant expression level of genes, which thereby indicates the diagnosis or prognosis of breast cancer) in which the AUC (area under the ROC curve for the test or assay) is at least 0.60, desirably at least 0.65, more desirably at least 0.70, preferably at least 0.75, more preferably at least 0.80, and most preferably at least 0.85.
[000141] By a "very high degree of diagnostic reliability", it is meant a test or assay in which the AUC (area under the ROC curve for the test or assay) is at least 0.80, desirably at least 0.85, more desirably at least 0.875, preferably at least 0.90, more preferably at least 0.925, and most preferably at least 0.95.
[000142] The predictive value of any test depends both on the sensitivity and specificity of the test, and on the prevalence of the breast cancer condition in the population being tested. This notion, based on Bayes' theorem, provides that the greater the likelihood that the condition being screened for is present in a subject or in the population, the greater the validity of a positive test and the greater the likelihood that the result is a true positive. Thus, the problem with using any test in any population where there is a low likelihood of the condition being present is that a positive result has more limited value (i.e., a positive test is more likely to be a false positive). Similarly, in populations at very high risk, a negative test result is more likely to be a false negative.
[000143] Therefore, in a preferred embodiment, the methods of the invention for the diagnosis of a breast cancer in a subject, for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject enables to obtain an AUC (area under the ROC curve) of at least 0.60, desirably at least 0.65, more desirably at least 0.70, preferably at least 0.75, more preferably at least 0.80, and most preferably at least 0.85.
[000144] Therefore, in another preferred embodiment, the methods of the invention for the diagnosis of a breast cancer in a subject, for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject enables to obtain an AUC (area under the ROC curve) is at least 0.80, desirably at least 0.85, more desirably at least 0.875, preferably at least 0.90, more preferably at least 0.925, and most preferably at least 0.95.
[000145] In a still preferred embodiment, the gene expression levels values or learning statistical classifier algorithms can be selected in the methods of the invention for the diagnosis of a breast cancer in a subject, for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject such that the agreement HG3/GG3 (agG3) is at least about 60%, and can be, for example, at least about 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
[000146] In a still preferred embodiment, the gene expression levels values or learning statistical classifier algorithms can be selected in the methods of the invention for the diagnosis of a breast cancer in a subject, for diagnosing the grade of a subject having a breast cancer and for the prognosis of a breast cancer in a subject such that the agreement HG1/GG1 (agGl) is at least about 60%, and can be, for example, at least about 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
[000147] Another aspect of the invention regards a method for determining the expression level of genes according to the invention in a biological sample comprises the step of :
a) obtaining nucleic acids from a biological sample from a subject;
b) reacting said nucleic acids obtained in step (a) with a polynucleotide library as defined previously; and
c) detecting the reaction product of step (b).
[000148] By reacting nucleic acids with the polynucleotide library in the sense of the invention is meant contacting the nucleic acids of the sample with polynucfeotide sequences in conditions allowing the hybridization of cDNA or mRNA total sequence of the gene or of cDNA or mRNA subsequences or of primers of the gene with polynucleotide sequences of the library. Therefore, the reaction step according to the invention is performed by hybridizing the nucleic acids with a polynucleotide library as defined previously. [000149] In a preferred embodiment, the nucleic acids from said biological sample can be labeled, e.g., before reaction step (b), and the label of the nucleic acids sample can be selected from the group consisting of radioactive, calorimetric, enzymatic, molecular amplification, bioluminescent or fluorescent labels. [000150] In another embodiment, the polynucleotide libraries of the invention can be immobilized on a solid support to form an array. The solid support can, for example, be selected from the group consisting of nylon membrane, nitrocellulose membrane, glass slide, glass beads, and membranes on glass support or a silicon chip.
[000151] In a preferred embodiment, the method of the invention for determining the expression level of genes further comprises :
d) obtaining a control polynucleotide sample;
e) reacting said control sample with said polynucleotide library; and
f) detecting a control sample reaction product and comparing the amount of said polynucleotide sample reaction product to the amount of said control sample reaction product.
[000152] Methods for determining the expression level of genes in a biological sample as described previously are known from one of skill in the art and can be performed by conventional techniques used for the quantification of RNA expression in a sample including Reverse-Transcriptase Polymerase Chain Reaction (RT-PCR), preferably the real-time quantitative Reverse-Transcriptase PCR (qRT-PCR), or microarrays.
[000153] In a preferred embodiment, the determination of the expression levels of genes in the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject is performed on nucleic acids from a biological sample from said subject, and most preferably is based on the measurement of the level of transcription. This measurement can be performed by various methods which are known in themselves, including without limitation in particular quantitative methods involving for example Reverse Transcriptase PCR (RT-PCR) or real-time quantitative Reverse-Transcriptase PCR (qRT-PCR), and methods involving the use of DNA arrays (macroarrays or microarrays).
[000154] Of the techniques listed above, the most sensitive and most flexible quantitative method is the real-time quantitative Reverse-Transcriptase PCR (qRT-PCR), which can be used to compare mRNA levels in different sample populations, in normal and tumor tissues, with or without drug treatment, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure.
[000155] As RNA cannot serve as a template for PCR, the first step in gene expression profiling, i.e. determining the expression level of genes according to the invention, by RT- PCR or qRT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction, preferably a real-time PCR also called quantitative PCR (qPCR). [000156] In a particular embodiment, the relative number of gene transcripts in a sample is thus determined by reverse transcription of gene transcripts (e.g., mRNA), followed by amplification of the reverse-transcribed products by a polymerase chain reaction (PCR), preferably a real-time PCR also called a quantitative PCR (qPCR). In a preferred embodiment, the relative number of gene transcripts in a sample is determined by a Reverse-Transcriptase Polymerase Chain Reaction (e.g., RT-PCR). In a still preferred embodiment, the gene expression level is assessed by using real-time quantitative Reverse- Transcriptase PCR (qRT-PCR).
[000157] In a preferred embodiment, the expression level of genes in the methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer according to the invention is determined by Reverse-Transcriptase Polymerase Chain Reaction (RT-PCR), more preferably by a real-time quantitative Reverse- Transcriptase PCR (qRT-PCR).
[000158] Methods involving Reverse-Transcriptase PCR (RT-PCR) comprise a first step of reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction.
[000159] RNA to be reverse-transcribed are previously isolated from a biological sample. The starting material is typically total RNA isolated from human tumors or tumor cell lines, and corresponding normal tissues or cell lines, respectively. Thus RNA can be isolated from a variety of primary breast tumors, tumor, or tumor cell lines, with pooled DNA from healthy donors. If the source of mRNA is a primary tumor, mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.
[000160] General methods for mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995) and include for example without limitation the Qiagen RNeasy FFPE test. In particular, RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions. For example, total RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns. Other commercially available RNA isolation kits include MasterPure(TM) Complete DNA and RNA Purification Kit (EPICENTRE(R), Madison, Wis.), and Paraffin Block RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumor can be isolated, for example, by cesium chloride density gradient centrifugation. [000161] Extracted RNA can be reverse-transcribed into cDNA by using reverse transcriptase according to known techniques from the art. An example of such technique for the reverse transcription of RNA into cDNA include without limitation the superscript lll/vilo Test from INVITROGEN. The derived cDNA can then be used as a template in the subsequent PCR reaction. [000162] The further PCR reaction consists in a method relying on thermal cycling consisting of cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA. Said replication of DNA is performed by Primers (short DNA fragments) containing sequences complementary to the target region along with a DNA polymerase. Any skilled person in the art would be able to find instructions for the realization of such PCR reaction as it is a common laboratory technique well-known in the art.
[000163] Another alternative to the quantification of DNA by using the polymerase chain reaction is the real-time PCR also called quantitative PCR allowing an improved quantification, which measures PCR product accumulation through for example a dual- labeled fluorigenic probe (i.e., TaqMan(R) probe) (for review on RT-PCR and qRT-PCR, cf. for instance: FREEMAN et al., Biotechniques, 26, 112-22, 24-5, 1999; BUSTIN & MUELLER, Clin Sci (Loud), 109, 365-79, 2005). Real time PCR is compatible both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization or reference gene contained within the sample, or a housekeeping gene for RT-PCR. For further details see, e.g. Held et al., Genome Research 6:986-994 (1996). Although the PCR step can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5' proofreading endonuclease activity. Thus, TaqMan(R) PCR typically utilizes the 5'-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5' nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. A third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non- extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye, such as FAM or VIC, and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the quencher. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data. [000164] 5'-Nuclease assay data are initially expressed as Ct, or the threshold cycle. Fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).
[000165] TaqMan(R) qRT-PCR can be performed using commercially available equipment, such as, for example, ABI PRISM 7700(TM) Sequence Detection System(TM) (PERKIN-ELMER-APPLIED BIOSYSTEMS, Foster City, Calif., USA), Lightcycler (ROCHE MOLECULAR BIOCHEMICALS, Mannheim, Germany) or Rotorgene (QIAGEN). In a preferred embodiment, the 5' nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7700(TM) Sequence Detection System(TM). The system consists of a thermocycler, laser, charge-coupled device (CCD), camera and computer. The system includes software for running the instrument and for analyzing the data.
[000166] The Reverse-Transcriptase PCR (RT-PCR) or real time RT-PCR (qRT-PCR) according to the invention can be performed either in a two step-PCR comprising a first step of reverse transcription of the RNA and a second step of PCR, or into a one step RT-PCR or qRT-PCR wherein both reverse transcription and PCR are performed together. In a preferred embodiment, the qRT-PCR according to the invention is performed in a one-step manner.
[000167] In a preferred embodiment, polynucleotide sequences to be used for the determination of the expression levels of genes in the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject according to the invention correspond to the polynucleotide library as defined previously. [000168] The steps of a representative protocol for profiling gene expression using fixed, paraffin-embedded tissues as the RNA source, including mRNA isolation, purification, primer extension and amplification are given in various published journal articles {for example: T. E. Godfrey et al. J. Molec. Diagnostics 2: 84-91 [2000]; K. Specht et al., Am. J. Pathol. 158: 419-29 [2001]}.
[000169] In another preferred embodiment, the expression level of genes in the methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer according to the invention is determined by using a DNA array, i.e. the microarray technique.
[000170] A method of determining the expression level of genes by DNA array involves the following steps :
a) Obtaining a nucleic acids sample from a subject, and
b) Reacting the nucleic acids sample obtained in step (a) with a polynucleotide library immobilized on a solid support,
c) Detecting the reaction product of step (b). [000171] The microarray technique consists in combining complementary, single- stranded nucleic acids or nucleotide analogues into a single double stranded molecule. The polynucleotide library immobilized on the solid support is exposed to a sample. If complementary nucleic acids exist in the sample, these will hybridize to the library and can thus be detected.
[000172] Depending on the size and spacing of each DNA spot on the array, on the size of the solid surface bearing the spots and on the nature of the DNA probes, DNA arrays can be categorized as microarrays (each DNA spot has a diameter less than 250 microns) and macroarrays (spot diameter is higher than 300 microns). When the solid support used is small in size, arrays are also referred to as DNA chips. Depending on the spotting technique used, the number of spots on a glass microarray can range from hundreds to thousands. [000173] The expression profile of breast cancer-associated genes can be measured in fresh, frozen or paraffin-embedded tumor tissues. Using microarray technology, fresh or frozen samples are preferred. Just as in the RT-PCR method, the source of polynucleotides typically is total RNA isolated from human tumors or tumor cell lines, and corresponding normal tissues or cell lines. Thus polynucleotides can be isolated from a variety of primary tumors or tumor cell lines. If the source of mRNA is a primary tumor, mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin- fixed) tissue samples, which are routinely prepared and preserved in everyday clinical practice. [000174] In a preferred embodiment, the nucleic acids sample obtained from said subject at step (a) is labeled before its reaction at step (b) with the polynucleotide library immobilized on a solid support. Such labeling is well known from one of skill in the art and includes, but is not limited to, radioactive, colorimetric, enzymatic, molecular amplification, bioluminescent, electrochemical or fluorescent labeling and may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest.
[000175] The labeled nucleic acids sample is incubated with the DNA array, in conditions allowing selective hybridization between the polynucleotides (cDNA) and the corresponding probes affixed to the array. After the incubation, non-hybridized polynucleotides(cDNA) are removed by washing.
[000176] The Probe-polynucleotide (cDNA) hybridization is usually detected and quantified by fluorescence-based detection of fluorophore-labeled targets to determine relative abundance of nucleic acid sequences in the target. The signal produced by the labeled polynucleotides(cDNA) hybridized at their corresponding probe locations is measured. The intensity of this signal is proportional to the quantity of labeled polynucleotides(cDNA) hybridized to the probe, and thus to the quantity of the corresponding mRNA expressed in the sample. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels (Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106-149 (1996)). For review on DNA arrays, see for example : BERTUCCI et al., Hum. Mol. Genet., 8, 1715-22, 1999; CHURCHILL, Nat Genet, 32 Suppl, 490-5, 2002; HELLER, Annu Rev Biomed Eng, 4, 129-53, 2002; RAMASWAMY & GOLUB, J Clin Oncol, 20, 1932-41, 2002; AFFARA, Brief Funet Genomic Proteomic, 2, 7-20, 2003; COPLAND et al., Recent Prog Horm Res, 58, 25-53, 2003.
[000177] Gene expression on an array or gene chip can be assessed using an appropriate algorithm (e.g., statistical algorithm). Suitable software applications for assessing gene expression levels using a microarray or gene chip are known in the art. In a particular embodiment, gene expression on a microarray is assessed using Affymetrix Microarray Analysis Suite (MAS) 5.0 software and/or DNA Chip Analyzer (dChip) software, for example, as described herein in Example 1.
[000178] Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols, such as by using for example the AFFYMETRIX GENCHIP technology, or INCYTE's microarray technology or AGILENT technology. [000179] To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by any experimental treatment.
[000180] To minimize errors and the effect of sample-to-sample variation, the reliability of Reverse-Transcriptase Polymerase Chain Reaction or Reverse-Transcriptase real-time quantitative PCR (RT-PCR or qRT-PCR) can be improved by including invariant endogenous control, i.e. reference gene or normalizer, in the assay to correct for sample to sample variations in the RT-PCR or qRT-PCR efficiency and errors in sample quantification. Any variation in the normalizer will obscure real changes and produce artifactual changes. Usual references genes are expressed at a constant level among different tissues, and are unaffected by the experimental treatment.
[000181] Therefore, in a preferred embodiment, the expression level of genes in the methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer according to the invention can be normalized with at least one, preferably two or three reference genes selected in the group comprising the genes GUS, TBP and RPLPO. Said genes are described in the table B.
[000182] In a still preferred embodiment, the TFRC gene is excluded of the reference genes which can be used for the normalization of the expression levels of genes according to the invention.
[000183] Expression levels may be normalized with respect to the expression level of one or more reference genes using global normalization methods. Those skilled in the art will recognize that numerous methods of normalization are known, and can be applied for use in the methods of the present disclosure.
[000184] In another preferred embodiment, the determination of the expression level of genes in the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject may be performed by determining the amount of proteins expressed from said genes.
[000185] The determination of the expression levels of genes according to the invention can therefore comprises the step of :
a) obtaining proteins from a biological sample from a subject; and
b) measuring the expression level of proteins in the sample obtained in step (a), wherein said proteins are encoded by genes according to the invention. [000186] It is understood that the proteins can be obtained directly from the sample; e.g., by standard extraction or isolation techniques or can be obtained by translation of mRNA obtained from the samples.
[000187] Detection of protein levels may be performed by for example, immunoassays including ELISA, Western Blot or sandwich immunoassays using antibodies capable of binding specifically to any one or more of the proteins encoded by the genes of interest. Immunohistochemistry methods are also suitable for detecting the expression levels of the genes of the present invention. Thus, antibodies or antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies specific for each gene can be used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. [000188] Several methods can be used to represent the expression level (x) of genes, which cannot be exhaustively described. For instance and as non limited examples, for RT- qPCR, four different methods can be used and are as follows: [000189] The Ct.norm method :
[000190] This method normalizes the threshold cycle of the target gene (Ct target) using the mean of the threshold cycles of several reference genes (mean(Ct reference))- Therefore, the normalized Ct is calculated as follows :
CT.norm target = Ct target " mean(Ct reference)
[000191] The NCN method : [000192] The normalized copy number (NCN) method uses a standard curve, constructed using several plasmid samples (standards) of known copy number (CN). The Ct obtained for each standard is plotted against its log(CN), and a regression line (slope : S, intercept : I) is constructed. This allows to determine, for new experimental samples, the copy number of a particular gene i (i= target or reference), from the knowledge of the corresponding Ct :
log CNi = ^
The normalized copy number, representing the expression level of a given target gene, is :
N.log CNtarget = log CNtarget - mean(log CNreference)
[000193] The AACt2 method :
[000194] The A ACt2 method allows to normalize the Ct corresponding to each gene i (target or reference), using only one plasmid sample containing 102 copies of the gene.
ACt2j = Ctj - Ctj (plasmid 10Λ2)
Then, the A ACt2 is calculated as follows :
A ACt2 target- ACt2target mean (ACt2reference)
[000195] The AACt3 method [000196] The Δ ΔΦ3 method allows to normalize the Ct corresponding to each gene i (target or reference), using only one plasmid sample containing 103 copies of the gene. ACt3j - Ctj - Ctj (piasmid 10Λ3)
Then, the ΔΔθ:3 is calculated as follows :
ΔΔα3 target- ACt3target mean (Z_iCt3reference)
[000197] In a preferred embodiment, the method used for valuing the expression level of genes according to the invention is the NCN method is.
[000198] In another preferred embodiment, the methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject may involve a previous step of obtaining at least one biological sample from the subject. Such methods of sampling are well known of one of skill in the art, and as an example, one can cite surgery. Other examples of biological sampling are defined hereabove within the definition of "biological sample".
[000199] In a preferred embodiment, the analysis of a biological sample for determining the expression level of genes according to the invention may be determined before any surgical removal of tumor, or may be determined following surgical removal of tumor.
[000200] The provided methods of the invention for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject may also correspond to an in vitro method, which does not include such a step of sampling.
[000201] Another aspect of the invention concerns a kit for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject according to the invention and comprising at least one primer or at least one probe or at least one antibody, which can be used in a method as defined in the present invention, for analyzing a biological sample from a subject by determining the expression level of genes as defined previously. In a preferred embodiment, the kit according to the invention comprises means and reagents for RT-PCR analysis as described hereabove, and more preferably for qRT-PCR. In another preferred embodiment, the kit comprises means and reagents for a microarray analysis as described hereabove. [000202] Preferably, the kit comprises a polynucleotide library as described previously. [000203] In another preferred embodiment, the kit further comprises a DNA array for the determination of the expression level of genes according to the invention by microarray.
[000204] The present kits can also include one or more reagents, buffers, hybridization media, nucleic acids, primers, nucleotides, probes, molecular weight markers, enzymes, solid supports, databases, computer programs for calculating dispensation orders and/or disposable lab equipment, such as multi-well plates, in order to readily facilitate implementation of the present methods. Enzymes that can be included in the present kits include reverse transcriptases, nucleotide polymerases and the like. Solid supports can include beads and the like whereas molecular weight markers can include conjugatable markers, for example biotin and streptavidin or the like.
[000205] In some variations, the kit further includes an analysis tool for the diagnosis of a breast cancer, for diagnosing the grade of a breast tumor in a subject having a breast cancer and/or for the prognosis of a breast cancer in a subject from the expression level of genes according to the invention from a biological sample from a subject.
[000206] In another embodiment, the kit is made up of instructions for carrying out the method described herein for the diagnosis, determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer. The instructions can be provided in any intelligible form through a tangible medium, such as printed on paper, computer readable media, or the like.
[000207] Still a further aspect of the present invention refers to the use, for the diagnosis, determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, of the abovementioned kit comprising at least one primer, at least one prove or at least one antibody, which can be used in a method as defined for analyzing the expression of the genes as defined previously. [000208] In another embodiment, the invention embraces a diagnostic test system comprising (1) means for obtaining test results comprising expression level of genes in a biological sample; (2) means for collecting and tracking test results for one or more individual biological sample; (3) means for calculating an output from inputs using an algorithm as described hereabove, wherein said inputs comprise the expression level of said genes, and (3) means for reporting said index value.
[000209] In one embodiment, said output is a score; the score can be calculated according to any of the methods described herein. The means for collecting and tracking test results for one or more individuals can comprise a data structure or database. The means for calculating a score can comprise a computer, microprocessor, programmable calculator, dedicated device, or any other device capable of calculating the GGI score. The means for reporting the score can comprise a visible display, an audio output, a link to a data structure or database, or a printer.
[000210] In some variations, the means for collecting and tracking test results data representing for one or more individuals comprises a data structure or database. In some variations, the means for computing a score comprises a computer or microprocessor. In some variations, the means for reporting the score comprises a visible display, an audio output, a link to a data structure or database, or a printer. [000211] A related embodiment of the invention is a medical diagnostic test system for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, the system comprising: a data collection tool adapted to collect expression level of genes' data representative of the expression level of genes in at least one biological sample from a subject; and an analysis tool comprising a statistical analysis engine adapted to generate a representation of a correlation between the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer and the expression level of said genes, wherein the representation of the correlation is adapted to be executed to generate a result; and an index computation tool adapted to analyze the result to determine the subject's diagnosis, the grade of the solid tumor and/or prognosis of said subject suffering from cancer and represent the result as an output; wherein said genes are defined as described hereabove. In some variations, the analysis tool comprises a first analysis tool comprising a first statistical analysis engine, the system further comprising a second analysis tool comprising a second statistical analysis engine adapted to select the representation of the correlation between the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer in a subject and the expression level of genes from among a plurality of representations capable of representing the correlation. In some variations, the system further comprising a reporting tool adapted to generate a report comprising the index value. [000212] Still another embodiment of the invention is a computer readable medium having computer executable instructions for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer, the computer readable medium comprising: a routine, stored on the computer readable medium and adapted to be executed by a processor, to store genes expression level's data; and a routine stored on the computer readable medium and adapted to be executed by a processor to analyze the gene expression level's data for diagnosing a breast cancer, for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer.
[000213] Another aspect of the invention further relates to a recording computer program comprising instructions for performing methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer. [000214] Any of the provided methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer can further involve generating a printed report, for instance a report of some or all the data, of some or all the conclusions drawn from the data, or of a score or comparison between the results of a subject or individual and other individuals or a control or baseline.
EXAMPLES
Unless otherwise explained, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below.
All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including explanations of terms, will control. In addition, the materials, methods and examples are illustrative only and not intended to be limiting.
The present invention will be understood more clearly on reading the description of the experimental studies performed in the context of the research carried out by the applicant, which should not be interpreted as being limiting in nature.
I- DATA AND METHODS Definitions and abbreviations
1. Histological grade (HG) : HG is based on Elston-Ellis or Scarff-Bloom-Richardson (SBR) grading system. 3 levels : HG1, HG2 and HG3.
2. Estrogen Receptor (ER) status : ER status is usually measured by immunohistochemistry. It can be positive or negative.
3. Lymph Node status (LN). A number of invaded lymph nodes of 0 to 3 is considered as a negative status.
4. Genomic Grade (GG), 2 levels : GGl and GG3.
5. Log-Rank (LR) : The logrank test (LR test) is a hypothesis test used to compare the survival distributions of two groups (for instance, GGl and GG3). The two groups have significantly different distributions if the p-value is less than 0.05.
6. Hazard Ratio (HR) : Hazard ratio of the GG (and the associated p-value) is determined by a multivariate cox fitted on the cohort, with Lymph Node status as an additional variable kept in the model (the hypothesis of hazard proportionality has been verified). An HR equal to x and significantly greater than 1 (i.e. with an associated p-value < 0.05) means that the risk for a GG3 sample to have the considered event is x times the risk for a GGl sample.
7. The terms "formula" and "model" are used interchangeably for any mathematical equation, algorithmic, analytical or programmed process, or statistical technique that takes one or more continuous or categorical inputs (also called "parameter"= "explanatory variable" = "predictor' -"characteristic") and calculates output values (it can be an index and/or a categorical response associated or not with a belonging probability) .
The number of algorithm families (techniques) is very large. EXAMPLE 1 : DATA
1.1 Microarray datasets
Several datasets from different origins (private or public) of primary breast cancer were used. Information on histological grade, ER and node status is presented in Table X. Histological grade (HG) was based on the Elston-Ellis or the SRB grading system.
A total of 1192 genomic profiles (HGU133plus2 Affymetrix chips) are available.
Tab. 1 Available datasets (private and public) from breast cancer studies
Identifier N Affymetrix Histological grade (%) ER Node status GEO or Reference platform (%) (%)
1 II III + - + -
TGEN 329 U133Plus2.0 10.4 40.5 49.1 53.9 27.5 56.9 43.1 GSE2109
LOI 87 U133Plus2.0 24.3 52.9 22.9 100 0 65.7 34.3 GSE6532
Bordet 76 U133Plus2.0 24.1 34.5 41.4 100 0 48.3 51.7 GSE9195
DFI 115 U133Plus2.0 20 22.5 57.5 55.8 44.2 47.5 47.5 GSE5460
IPC 270 U133Plus2.0 8.9 22.5 68.6 46.6 53.4 23.5 12.7 Private
PACS 121 U133Plus2.0 21.5 46.7 31.8 84.1 14.9 100 0 Private
Hess 108 U133A 1.85 38 60.15 63 38 67.6 32.4 Hess et al.JCO, 2006
Pawitan 90 U133A 15 33 52 83 17 - - GSE1456
Miller 237 U133A 24 56 20 73 12 29 55 GSE4922
Haibe-Kains 192 U133A 15.8 42.1 42.1 67.9 32.1 0 100 GSE7390
Schmidt 173 U133A 15 66.5 18.5 - - 0 100 GSE11121
OXFU 64 U133A 26.9 47.1 26 66.3 33.7 0 100 GSE6532
OXFT - 101 U133A 31.1 43.7 25.2 100 0 42.7 49.5 GSE6532
Wang 256 U133A - - - 28 72 0 100 GSE2034
Curie 164 U133Plus2.0 32.5 42.9 24.5 68.1 31.9 0 100 Private
The HGl + HG3 breast cancer samples, with ER+ status, were split into a training set (60% of the samples, 370 genomic profiles) and a validation set (40% of the samples, 258 genomic profiles). All 273 HG2 samples with available prognostic data were used to validate the prognostic performances of the genomic grade.
Training set ("ER+ HG1/HG3 training set" or "training set")
The training set included a total of 370 genomic profiles from HGl and HG3 patients, all having positive ER status (Table 2). The training set was used to learn the model, which was then applied to various validation sets, in order to assess the performances of the different combinations of genes. Tab. 2 Composition of the training set. HGl HG3
169 ( 45.68 } 201 { 54.32 %)
Bordot 8 { 36.36 %) 14 ( 63.64 %)
Curie 25 ( 71.43 %) 10 ( 28.57 %}
DFI 17 ( 68 %) 8 { 32 %)
HaibcKains 19 ( 51.35 %) 18 ( 48.65 %)
Hess 0 ( 0 %) 17 ( 100 %) '
IPC 7 ( 18.42 %) 31 ( 81.58 )
LOI 10 ( 52.63 %) 9 ( 47.37 ) '
Miller 33 ( 61.11 %) 21 ( 38.89 %)
OXFT 15 ( 68.18 %) 7 { 31.82 %)
OXFU 1 ( 25 %) 3 { 75 %)
PACS 13 ( 46.43 %) 15 ( 53.57 %)
Pawitan 5 ( 19.23 %) 21 ( 80.77 %)
TGEN 16 ( 37.21 %) 27 ( 62.79 }
Validation set ("ER+ HG1/HG3 validation set")
The ER+ HG1/HG3 validation set included a total of 258 genomic profiles from HGl and HG3 patients, all having positive ER status (Table 3).
Tab. 3 Composition of the ER+ HG1/HG3 validation set.
HGl HG3
106 ( 41.09 %) 152 { 58.91 %)
Bordet 6 ( 37.5 %) 10 { 62.5 %)
Curie 16 ( 66.67 %) 8 ( 33.33 %)
DFI 7 { 38.89 %} 11 ( 61.11 )
HaibcKains 10 ( 38.46 %) 16 ( 61.54 %)
Hess 2 { 16.67 %) 10 { 83.33 %)
IPC 4 ( 15.38 %) 22 { 84.62 %)
LOI 7 ( 50 %) 7 ( 50 %)
Miller 26 ( 70.27 %) 11 ( 29.73 %)
OXFT 6 f 40 %) 9 { 60 %)
OXFU 2 ( 50 %) 2 ( 50 %)
PACS 9 { 47.37 %) 10 f 52.63 %)
Pawitan 5 { 27.78 %) 13 ( 72.22 %)
TGEN 6 ( 20.69 ) 23 ( 79.31 %)
Prognostic performances on HG2 cohorts
Metastasis-free survival (MFS) was used as an endpoint to validate the relevance of HG2 reclassification into GGl and GG3. MFS status was available for IPC, PACS, TGEN, LOI, BORDET, HaibeKains, OXFU, OXFT. Time was truncated at 5 or 10 years. MFS was available for a total of 273 patients with HG2 breast tumors (Tab. 4). These samples ("HG2 validation cohort" or "HG2 validation dataset") were used to assess the prognostic value of the new GGI. Within this group, 40 patients had relapsed during the first five years after surgery, and 58 during the first ten years after surgery.
Within this HG2 cohort, we then focused on samples with ER+ status ("HG2/ER+ validation cohort" or "HG2/ER+ validation dataset", n=226, Tab. 5). In this subgroup, thirty-two patients had relapsed during the first five years after surgery, and 47 during the first ten years.
Then, within the HG2/ER+ subgroup, we focused further on patients with negative lymph node status, which is the target population of the new GGI ("HG2/ER+/LN- validation cohort" or "HG2/ER+/LN- validation dataset", n=174, Tab. 6). In this subgroup, 22 patients had relapsed during the first five years after surgery, and 34 during the first ten years. Tab. 4 Repartition of HG2 samples (without regards on ER and LN status) having MFS data, by dataset.
tt
Total 273
Bordet 20
Curie 68
HaibeKains 80
LOI 37
OXFT 45
OXFU 23
Tab. 5 Repartition of HG2/ER+ samples (without regards on LN status) with MFS data, by dataset.
n
Total 227 :
Bordet 20
Curie 51
HaibeKains 66
LOI 37
OXFT 45
OXFU 8
Tab. 6A Repartition of HG2/ER+/LIM - samples with MFS data, by dataset.
n
Total 174
Bordet 11
Curie 51
HaibeKains 66
LOI 12
OXFT 26
OXF U 8 1.2 qRT-PCR datasets
The first dataset used was constituted of 91 ER+ samples (45 HGl + 46 HG3) from IJB (Institut Jules Bordet, Brussels, Belgium) and Mercy hospital ("IJB/Mercy dataset" or "IJB/Mercy cohort"). In these samples, the expression levels of 25 target genes (BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, CDC2, CDC20, KPNA2 and MYBL2) and 4 control genes (TFRC, GUS, TBP and RPLPO) were assessed by qRT-PCR. A second dataset included 86 HGl and 60 HG3 ER+ samples from IJB ("IJB dataset" or "IJB cohort"). In these samples, the expression level of 7 target genes (TPT1, MCMIO, ASPM, PTTGl, FRY, CCNB2 and CX3CR1) and 3 endogenous genes (TBP, GUS and RPLPO) were assessed by qRT-PCR. Ninety-eight samples (58 HGl and 40 HG3), corresponding to 2/3 of this cohort, were kept for training set ("IJB training set"), and 48 samples (28 HGl and 20 HG3) were used as validation ("IJB validation set"), to calculate the agreement.
EXAMPLE 2 : SIGNATURES
Of all the possible combinations from the 24 genes BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTGl, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM IO, CDC2, CDC20 and KPNA2, a total of 1 410 884 combinations has been computed with microarray (Affymetrix) data :
- From 2 to 5 gene-combinations and from 19 to 24 combinations: all combinations were tested.
From 6 to 18 gene-combinations: a sample (random) of 100 000 combinations were tested. The "24-gene signature" to which it is referred in the following examples is constituted of all of the above 24 genes.
The "6-gene signature" encompasses the genes PTTGl, CCNB2, ASPM, CX3CR1, MCMIO and FRY.
The "3-gene signature" includes ASPM, CX3CR1 and MCMIO.
The following gene combinations are excluded from the present invention :
- CCNA2, CDC2, KPNA2, CDC20
- CCNA2, CDC2, KPNA2, AURKA
- CCNA2, CDC2, CDC20, AURKA
- CCNA2, KPNA2, CDC20, AURKA
- CDC2, KPNA2, CDC20, AURKA - CCNA2, CDC2, KPNA2, CDC20, AURKA
EXAMPLE 3: CLASSIFICATION ALGORITHMS
The following classification algorithms were assessed, for three particular gene combinations (24-gene, 6-gene and 3-gene signatures), as examples.
- Support Vector Machine (SVM), with radial or linear kernel (SVMr or SVMI)
- Sum of gene expressions (∑y6GG3 x,- -∑jeGGi xj)
Probit model
Logistic model
Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
Recursive partitional tree (RCHART)
Principal component analysis (PCA)
Random Forest (RF) Other known classifiers not tested here, including, but not limited to, schrunken centroids, k- nearest neighbours, QLDA, ELDA, DQ.DA and neural networks are supposed to lead to similar performances.
Whatever the classifier, and the combination of genes, a rule of classification (model) needs to be learnt on a training set. The rule is then applied to new data (validation sets), in order to assess performances. The expected output is an index and/or a probability, and the predicted class (GGI or GG3) of the sample. All analyses were done using R (http://cran.r- project.org/).
EXAMPLE 4: CRITERIA
Several criteria were used, in order to assess the performances of the new GGI, and most importantly, the agreement of GG with HG in HG1+HG3 samples, and the prognostic value of GG in HG2 samples.
4.1 Agreement
Agreement is calculated on the ER+ HG 1/HG3 validation set as the number of well predicted samples (HG1/GG1 + HG3/GG3) over the total number of samples (HG1+HG3). 4.2 Survival analysis
Survival analysis was done using the survival package of R. We considered MFS of breast cancer patients as an endpoint for all patients. Prognostic performances were evaluated by a log-rang test and Cox proportional hazard model.
As defined hereabove, the logrank test is a hypothesis test used to compare the survival distributions of two groups (GG1 and GG3). The two groups have significantly different distributions if the p-value is less than 0.05.
As defined hereabove, hazard ratio of the GG (and the associated p-value) is determined by a multivariate Cox fitted on the cohort, with Lymph Node status as an additional variable kept in the model (the hypothesis of hazard proportionality has been verified). An HR equal to x and significantly greater than 1 (i.e. with an associated p-value < 0.05) means that the risk for a GG3 sample to have the considered event is x times the risk for a GG1 sample.
EXAMPLE 5A : MATERIAL AND METHODS FOR MICROARRAYS EXPERIMENTS
For the samples from the private microarray datasets (IPC, PACS and Curie), total RNA was extracted from fresh/frozen tumor samples, using the RNeasy Mini kit or the RNeasy Micro kit, both from Qjagen. Concentration and quality of RNA obtained from each tumor sample were respectively assessed with a Nanodrop ND-1000 spectrophotometer, and via the RNA profile generated by the Agilent Bioanalyzer. Human Genome U133 Plus 2.0 Array GeneChips (Affymetrix, Santa Clara, California) were used. RNA amplification, hybridization and scanning were done according to standard Affymetrix protocols. Gene expression values were normalized using RMAdx, an internally-designed normalization method based on RMA (Robust Multichip Average, Irizarry et al, Biostatistics, 2003), using a calibration set.
EXAMPLE 5B : MATERIAL AND METHODS FOR qRT-PCR
5.1 Organization and set-up
The experiments are conducted in Clinpath Advisors facility in Costa Mesa (CA) according to the present protocol. The study will start upon receipt of the samples shipped by Ipsogen. The samples will be ready to use. The set-up of the qRT-PCR reaction will be done in 384 wells microplates with a Qiagility robotic platform. The qRT-PCR reactions will be run on an ABI 7900 HT instrument
5.2 Traceability
The results of the study will be sent to Ipsogen for analysis. They will be presented as a report (Excel file) including minimally:
- A list of the material used, supplier, reference and lot
- As required, a list of the deviations to the present protocol
- An analysis of the controls
- A summary of the Ct obtained for each sample and each genes
- A sheet containing raw data
5.3 Material & method
5.3.1 Samples
Two ΙΟμπι serial unstained FFPE slices on one slide each and one serial H&E slide per sample will be needed.
5.3.2 Equipment and reagents for RNA extraction and qRT-PCR
Table 6B. Equipement and reagents for RNA extraction and qRT-PCT
Manufacturer/
Equipment / Reagent description Cat # Number
Vendor
Eppendorf Thermixer 22670000
QIAGEN QiaCube 9001292
QIAGEN Starter Pack, QIAcube 990395
QIAGEN Qiagility 9001531
QIAGEN Qiagility Adaptor 9018955
ABI ABI 7900 QPCR See quote
QIAGEN Protease K 19133
RNeasy FFPE Kit (50) September 2010
QIAGEN 73504
version
QIAGEN RNeasy MinElute Cleanup Kit (50) 74204
Agilent salmon sperm DNA 201190
QIAGEN 2ml tubes 990381
QIAGEN 50 μΙ Conductive Filtered Tips 990512
QIAGEN 200 μΙ Conductive Filtered Tips (960) 990522
Applied Biosystems optical plate film 4311971
Applied Biosystems 384-Well Spectral Calibration Kit 4323977
Applied Biosystems 384 well plates 4343370
EXPRESS One-Step Superscript® qRT-PCR
Invitrogen 11791-01K
with Premixed ROX (1000 reactions) VWR 1.5ml microcentrifuge tubes 20170-650
Invitrogen TE, pH 8.0 AM9849
Invitrogen Nuclease free water (100ml) AM9938
VWR Xylene BDH2000-4LG
VWR 100% Ethanol EM-EX0276-3S
Primers and probes sequences can be found Q-PCR primers and
IDT/Sigma Aldrich
in Appendix 2 probe
5.3.3 Extraction Method:
Step 1: Extraction RNA will be extracted partially on a QIACUBE instrument with the QIAGEN kit reference 73504: "RNeasy FFPE Kit" September 2010 version, according to the SOP L002.DRAFT4 "RNA Extraction from FFPE" (based on QIAGEN RNeasy FFPE kit manual).
Step 2: Evaluation of gDNA contamination Contamination by genomic DNA (gDNA) was assessed on an aliquot of the sample, in absence of reverse transcriptase (RT), with at least one set of primers and probes with designs at risk of amplification of gDNA.
Run, in absence of reverse transcriptase (RT), an aliquot of each sample with at least one set of primers and probes with designs at risk of amplification of genomic DNA.
If tests without RT report Undetermined results then, proceed to qRT-PCR; if Cts are reported within or close to the previously determined Ct window proceed to step 3.
Step 3: RNA clean-up (optional)
In case gDNA contamination was detected, RNA was purified with the RNeasy MinElute Cleanup kit (Ref 74204, Qiagen), according to Qiagen protocol, and gDNA contamination was reassessed.
Elute RNA with 80μΙ of RNase-free water and measure nucleic acid concentration with Nanodrop. Repeat step 2: "evaluation of gDNA contamination". 5.3.4 qRT-PCR
Q-PCR primers Preparation Primers are prepared and stored at 25x: 10 μΜ forward and Reverse, 5 μΜ probe in TE pH 8.0 plus 300 Mg/ml salmon DNA as buffer, ([lx] 400nM primers, 200 nm probe).
Mastermix Preparation
For each probe set, create the following mastermix (per sample)
PCR mastermix with ROX 10μΙ
Superscript-Ill R/T 2μΙ
Sterile water 5.2μΙ For each sample and primer and probe set, program the Qiagility in order to distribute per well:
Previous Mastermix 17.2μΙ
Primers and probe 0,8μΙ
Sample 2μΙ
No Template Control
With the Qiagility distribute 2μΙ. of sterile water into NTC, 2 times for each primer and probe set. Cap sample plate and transfer to the Q-PCR instrument. PCR Conditions:
60°C for 15-minutes hold (cDNA synthesis)
95°C for 2 minutes hold (RT inhibition).
40 cycles of:
95°C for 20 seconds
60°C for 30 seconds
All probes are FAM-BHQ1 labeled.
For analysis, enable baseline correction.
Ct threshold was fixed to 0.1.
Quality Control :
Ct of No Template Control: Undetermined for at least one well on two, for each primer & probe set.
Table 6C : Description of the primers (forward "F"and reverse "R") and probes ("P"), used for the qPCR reactions
SEQ
primer
Gene ID ID Primer name Sequence start bp Tm set #
Primer 55 BIRC5_F3 GGGC I CA I 1 1 1 I GU G I 1 IT 864 20 59,2
BIRC5
set 3 56 BIRC5_R3 G CCTTCTTCCTCCCTCACTT 926 20 59,8 57 BIRC5_P3 TTCCCGGGCTTACCAGGTGA 886 20 67 amplicon 63
58 CEP55_F1 G AAAAAGTTG CCG CCTCA 1582 18 59,9
Primer 59 CEP55_R1 CATTCCACCAGGCTTTCATT 1643 20 59,9
CEP55
set 1 60 CEP55_P1 CCAAAAAGTCCCACTGCTGCACTC 1600 24 68,3 amplicon 62
61 AURKA_F2 GACTACCTGCCCCCTGAAAT 1447 20 60,3
Primer 62 AURKA_R2 AG G CTCCAG AG ATCC ACCTT 1511 20 60,2
AURKA
set 2 63 AURKA_P2 TC ATCATG C ATCCG ACCTTC AATC 1467 24 67,4 amplicon 65
64 RACGAP1_F1 AC AC ATCTG G C AG CATTCAA 650 20 60,2
Primer 65 RACGAP1_R1 GGTTGGCCTCTGTTGAGAAA 719 20 60,2
RACGAP1
set 1 66 RACGAP1_P1 CG AG G AG C AAA AATC AG CTCTG G C 675 24 69,3 amplicon 70
67 MELK_F1 CGCCTGTCAGAAGAGGAGAC 445 20 60,1
Primer 68 MELK_R1 TGTG CACATAAG CAACAG CA 511 20 60,1
MELK
set 1 69 MELK_P1 CCGGGTTGTCTTCCGTCAGATAGTATC 465 27 67,66 amplicon 67
70 CX3CR1_F3 TCATC ACCGTCATCAG CATT 448 20 60,1
Primer 71 CX3CR1_R3 GTCCGGTTGTTCATGGAGTT 517 20 59,8
CX3CR1
set 3 72 CX3CR1_P3 ACCTGGCCATCGTCCTGGCC 475 20 71,03 amplicon 70
73 PTTG1_F1 1 1 1 1 ACL 1 JLC I AAGAGC 426 20 60,5
Primer 74 PTTG1_R1 AG G ATCATG AG AG G CACTCC 491 20 59,2
PTTG1
set 1 75 PTTG1_P1 CC AG ATTG CG C ACCTCCCCT 447 20 68,98 amplicon 66
76 CCNA2_F5 TGAAGATGCCCTGGCTTTTA 671 20 60,7
Primer 77 CCNA2_R5 TCAAGAGGGACCAATGGTTT 739 20 59,38
CCNA2
set 5 78 CCNA2_P5 TCAGCCATTAGTTTACCTGGACCCAGA 693 27 68,48 amplicon 69
79 CCNB2_F1 G CTTTTTCTG ATG CCTTG CT 470 20 59,6
Primer 80 CCNB2_R1 GCTGAGGGTTCTCCCAATCT 536 20 60,6
CCNB2
set 1 81 CCNB2_P1 TGCAAAATCGAGGACATTGATAACGA 491 26 67,43 amplicon 67
82 ASPM_F1 CCCTCTTCGACAACAGCTTC 1947 20 60
Primer 83 ASPM_R1 C AC ATTTG CATCTTCCATG C 2015 20 60
ASPM
set 1 84 ASPM_P1 TGCTCGGAAAAGAAAGAGCGATGG 1970 24 69,52 amplicon 69
Primer 85 FRY_F1 GTGCCCAAGAAGTTTGGTGT 6446 20 60
FRY
set 1 86 FRY_R1 TCTGTCC AGTGTG G CACTTC 6511 20 59,9 87 FRY_P1 TCGACCGATCCTCTGACCCACC 6468 22 70,01 amplicon 66
88 CENPA_F1 CTCCTGCACCCAGTGTTTCT 608 20 60,3
Primer 89 CENPA_R1 GAGAGTCCCCGGTATCATCC 673 20 60,7
CENPA
set 1 90 CENPA_P1 TCTTTCCTG CTC AG CC AG G G G 633 21 68,18 amplicon 66
91 FU21062_F1 ACTGTATGGCGGTGGCCTA 82 19 61,5
Primer 92 FU21062_R1 AGTCGCTAGAGTCGCGAAAG 143 20 59,9
FU21062
set 1 93 FU21062_P1 CCCU GCGGAA I 1 1 I GGA 103 20 70,21 amplicon 62
94 TPT1_F3 I G I GGCAA I I A I 1 1 I GGA I C I 617 21 57,1
Primer 95 TPT1_R3 TGGTGTTGTGTGGATGACAA 685 20 59,4
TPT1
set 3 96 TPT1_P3 TCACCTGTCATCATAACTGGCTTCTGC 639 27 68,46 amplicon 69
97 KIF11_F1 ATCCAGGTGGTGGTGAGATG 363 20 60,8
Primer 98 KIF11_R1 T ATTG AATG G G CG CTAG CTT 428 20 59,8
KIF11
set 1 99 KIF11_P1 CAGACCATTTAATTTGGCAGAGCGGA 383 26 69,67 amplicon 66
100 TROAP_F2 CGGTACGCTCTCAGAAACG 235 19 59,6
Primer 101 TROAP_R2 TCTTG GTTCTCCTG GTCCAC 304 20 60,1
TROAP
set 2 102 TROAP_P2 TCCCCACTGTTAC ATCGTG CG C 262 22 69,52 amplicon 70
103 TUBA1B_F1 GACCCTCGCCATGGTAAATA 1137 20 59,8
Primer 104 TUBA1B_R1 ATCTTTGGGAACCACGTCAC 1202 20 59,8
TUBA1B
set 1 105 TUBA1B_P1 TGGCTTGCTGCCTGTTGTACCG 1159 22 69,62 amplicon 66
106 CDCA3 Fl AACAGATGCCACCTTGGAAC 529 20 60
Primer 107 CDCA3_R1 CTTG CTTCCTCCTTG G A A A A 595 20 59,4
CDCA3
set 1 108 CDCA3_P1 CAGACTGAGTTCCCCTCCAAACAGG 549 25 68,28 amplicon 67
109 TPT1_F1 CCTACTCAAAG CAG GTCACCA 1234 21 60,3
Primer 110 TPT1_R1 AAAAG ACG AC ACAAG G ACAG G 1303 21 59,2
UBE2C
set 1 111 TPT1_P1 CCAGGAGCCCTGACCCAGGC 1256 20 70,79 amplicon 70
112 TPX2_F1 TGAAAATGCAGCAAGAGGTG 1354 20 60
Primer 113 TPX2_R1 CCAG CCAG AG CAAGTTTCTT 1423 20 59,6
TPX2
set 1 114 TPX2_P1 TGGAGATGCGGAAAAAGAATGAAGAA 1375 26 67,75 amplicon 70
Primer 115 MCM10_F1 TGAAAGGTCAACCCCTATGC 3776 20 59,9
MCM10
set 1 116 MCM10_R1 GTTGCCCAGATGGAAGAATC 3845 20 59,5 117 MCM10_P1 ACCACAGCAAAGGTTTCATTCAGGA 3801 25 67 amplicon 70
118 CDC20_F1 GCAAATCCAGTTCCAAGGTTC 283 21 60,8
Primer 119 CDC20_R1 ATG G G G G AT AT AG CG GTC AC 347 20 60,9
CDC20
set 1 120 CDC20_P1 AGACCACTCCTAGCAAACCTGGCG 304 24 68,74 amplicon 65
121 CDC2_F1 CCC AA ATG G A A ACC AG G AAG 866 20 61,6
Primer 122 CDC2_R1 GCAAA 1 CCAAGCCA 1 1 1 I CA 932 20 61
CDC2
set 1 123 CDC2_P1 CCTAGCATCCCATGTCAAAAACTTGGA 886 27 68,7 amplicon 67
124 GUSB_F1 TTCCCTCC AG CTTCAATG AC 367 20 60,2
Primer 125 GUSB_R1 ACACCCAGCCGACAAAATG 429 19 61,9
GUSB
set 1 126 GUSB_P1 ATC AG CCAGG ACTG G CGTCTG CG 387 23 73,81 amplicon 63
127 MYBL2 F1 TGTGGATGAGGATGTGAAGC 1985 20 59,6
Primer 128 MYBL2_R1 CAGTTGTCGGCAAGGATAGAG 2049 21 59,9
MYBL2
set 1 129 MYBL2_P1 TGATGATGTCCACACTGCCCAAGT 2005 24 68, 1 amplicon 65
130 RPLP0_F1 G A AC ACC ATG ATG CG C A AG 408 19 61,2
Primer 131 RPLP0_R1 AGTTTCTCCAG AG CTG GGTTG 470 21 60,8
RPLPO
set 1 132 RPLP0_P1 CC ATCCG AG G G C ACCTG G A A A A 428 22 71,27 amplicon 63
133 TBP_F1 G G G G CATTATTTGTG C ACTG 1470 20 61,3
Primer 134 TBP_R1 TAG CAGCACG GTATG AG CAAC 1535 21 61,4
TBP
set 1 135 TBP_P1 AGAACACCGCGCAGCGTGACTGT 1490 23 72,46 amplicon 66
136 TFRC_F1 TAGTTGGGGCCCAGAGAGAT 1494 20 61
Primer 137 TFRC_R1 AGCTGTGCCTACACCGGATT 1555 20 62
TFRC
set 1 138 TFRC_P1 ATGGGGCCCTGGAGCTGCAA 1516 20 71,44 amplicon 62
139 KPNA2_F1 CCA I LCA CA I I CA I 1 1 C I C 620 21 61,7
Primer 140 KPNA2_R1 CCAGACAGCTTGTTCACTGATG 684 22 60,9
KPNA2
set 1 141 KPNA2_P1 TGTTG G C ATCTCCCC ATG CTC A 641 22 70,24 amplicon 65
Related documents for the qRT-PCR protocol comprise :
• SOP L002.DRAFT4 "RNA Extraction from FFPE"
• RNeasy FFPE Handbook (QIAGEN kit reference 73504: "RNeasy FFPE Kit") Sept. 2010. · RNeasy MinElute Cleanup Handbook (QIAGEN kit reference 74204 "RNeasy MinElute
Cleanup") Oct. 2010 • Protocol: PCR GG Reclassification study (PCR Genome Grade: Validation of the reclassification performances on a cohort of ER+/N0-3 small size invasive breast carcinoma.) Dec. 2010.
EXAMPLE 6 : NUMBER OF REFERENCE GENES IN qRT-PCR It is a common way in qRT-PCR to use reference genes to normalize the raw data. Using several genes increases precision, without decreasing performance of the test.
Data were normalized using 1, 2 or 3 reference genes selected in the group comprising the reference genes GUS, TBP and RPLPO.
In the example 11, performances were calculated using 1 gene, 2 genes or 3 genes to normalize the data (qRT-PCR data).
Material and Methods
86 HGl and 60 HG3 samples, all ER+, from Institut Jules Bordet (IJB, Brussels, Belgium) were analyzed by qRT-PCR, as described previously. Seven target genes (TPT1, MCM 10, ASPM, PTTGl, FRY, CCNB2 and CX3CR1) and 3 reference genes (TBP, GUS and RPLPO) were amplified. From these samples, 98 (58 HGl and 40 HG3) were kept for the training set to learn the model, and 48 samples (28 HGl and 20 HG3) were used as a validation set. Two classifiers (sum and SVM) were used, and performances (HG/GG agreement) were calculated using 1 gene, or 2 genes or 3 genes to normalize the data.
RESULTS
EXAMPLE 7: THE NEW GGI SHOWS IMPROVED PERFORMANCES. COMPARED TO EXISTING
SOLUTIONS
In this example, performances (agreement, hazard ratio and log-rank p-value) of the new GGI-PCR were assessed on the microarray datasets, and compared to prior art (GGI97 and OLD 4-gene).
A. RESULTS OF THE COMBINATIONS OF X GENES AMONG 24
Each table presented hereafter outlines quantiles (5%, 25%, 50%, 75% and 95%) and the mean of the distribution of performances obtained for all combinations of size x (x between 2 and 24). For LR and pvalHR, the "min" corresponds to the best performance, and the "max" to the worst. On the contrary, for the agreement and HR, the "min" corresponds to the worst performance, and the "max" to the best.
7.1 2-gene signatures (276 combinations)
7.1.1 Agreement HG/GG in the ER + HG1/HG3 validation set
As seen in table 7, for all the 276 combinations comprising 2 genes, the agreement was always greater than 0.643. For 95% of these 2-gene combinations, the agreement was superior to 0.721, and the agreement of the best 2-gene combination was 0.864. For n=258, an agreement superior to 0.553 is significantly superior to hazard. Therefore, agreement performances of all the 2-gene combinations are superior to hazard.
Tab. 7 Summary of agreement performances for signatures of 2 genes on the validation set agreement
mm 0.643
5% 0.721
25% 0.764
mcciianc 0.791
mean 0.7855
Figure imgf000062_0001
max 0.864 The results show the agreement obtained for said combinations of 2-gene signatures.
7.1.2 Prognostic performances in the HG2 cohort In the HG2 validation cohort, all 2-gene combinations performed well (Table 8). Indeed, all the 276 combinations had a prognostic value at 10 years : the maximum value of LR_10y was 0.01402 (< 0.05). At least 95% of the signatures were also prognostic at 5 years, with a LR_5y inferior to 0.005234 (<0.05). Moreover, for all 276 signatures, HR was significantly greater than 1 (pvalHR were all inferior to 0.026), meaning that the risk to have metastasis was significantly higher in patients with GG3 tumors versus patients with GG1 tumors.
Tab. 8 Summary of prognostic performances for signatures of 2 genes in the HG2 validation cohort
min 5% 25% mcdiauc moan 75% 95% max
LR_5y 1.186c-09 6.97SC-07 1.566 05 0.0001452 0.001707 0.0009225 0.005234 0.1048
LR_I0y 3.79SC-1Q 1.711e-{)7 2.909e-06 4.071C-05 0.0004622 0.0002487 0.001904 0.01402
HR 1.921 2.335 2.903 3.328 3.468 3.951 4.93 6.39 pvalHR 5.3e-0S 4.55C-06 2.60-05 0.00018 0.0009717 0.00076 0.00395 0.026
7.1.3 Prognostic performances in the HG2/ER+ cohort
In the HG2/ER+ validation cohort, all 2-gene combinations performed well (Table 9). Indeed, all the 276 combinations had a prognostic value at 5 years : the maximum value of LR_5y was 0.0396 (< 0.05). At least 95% of the signatures were also prognostic at 10 years, with a LR_10y inferior to 0.01594 (<0.05). Moreover, HR was always greater than 1, at a significant level (pvalHR were all inferior to 0.02625) for more than 95% of the signatures.
Tab. 9 Summary of prognostic performances for signatures of 2 genes in HG2 and ER positive cohort
min 5% 25% raediane mean 75% 9 5% max
LR_5y 1.368C-08 6.741 Jc-06 6.556C-05 0.0002387 0.001687 ' 0.001268 0 .005781 0.0396
LB_10v 3.51.9C-08 1.82 : l.e-06 2.409c-05 0.0001761 0.00369 0.0009377 0 .01594 0.1292
HR 1.51 2.02C > 2.804 3.422 3.511 3.998 5 .258 6.562 pvalHR 9.2O-07 2.7c- 05 0.00013 0.00054 0.004872 ! 0.002125 0 .02625 0.17
7.1.4 Prognostic performances in the HG2/ER+/LN- cohort
In the HG2/ER+/LN- validation cohort, more than 95% of the 2-gene signatures had a prognostic value at 5 years (LR_5y<0.03489), and more than 75% of the 276 combinations had a prognostic value at 10 years (LR_10y<0.01007). HR was significant for more than 75% of the combinations (pvalHR<0.013). Tab. 10 Summary of prognostic performances for signatures of 2 genes in HG2 and ER positive and LN negative cohort
rain 5% 25% mediane mean 75% 95% max
LR_5y 4.873c-07 5.613e-05 0.000614 0.002186 0.0081.19 0.005967 0.03489 0.237
LR_10v 3.848c-06 5.292O-05 0.0005688 0.002073 0.01 62 0.01007 0.108 0.3452
HR 1.384 1.784 2.547 3.032 3.09 3.604 4.714 5.645 pvalHR 3.2o05 0.00022-5 0.0013 0.00345 0.01953 0.013 0.1125 0.35 7.2 3-gene signatures (2024 combinations)
7.2.1 Agreement HG/GG in ER + cohort
Tab. 11 Summary of agreement performances for signatures of 3 genes on the validation set agreement
min 0.686
5% 0.76
25% 0.795
mediane 0.81
mean 0.8077
75% 0.826
95% 0.841
max 0.872 7.2.2 Prognostic performances in HG2
Tab. 12 Summary of prognostic performances for signatures of 3 genes in HG2 cohort
miii 5% 25% mediane mean 75% 95% max
LR_5v 6.9-!lc-lO l.OlSc-07 8.3590-OC 6.1070-05 0.0006916 0.0003528 0.002752 0.07802
LFLlOy 5.0! So- 1 1 5.01 Ic-OS ! ,0 V"..■-(!(.; 8.69o-06 0.00011.14 5.133e-05 0.0005 3 0.00700-1
HR 2.088 2.634 3.17 3.643 3.7-16 4.209 5.155 7.282 pvalHR l .Sc-08 1.6C-06 1.3C-05 6.2c-05 0.0003068 0.00022 0.0013 0.016
7.2.3 Prognostic performances in HG 2 ER + cohort
Tab. 13 Summary of prognostic performances for signatures of 3 genes in HG2 and ER positive cohort
min r.ty. 25% mcilianc η.» '·ιΐι 75% 95 max:
LR_5v 1.031e-08 2.587< 06 3.3910-05 0.0001489 0.0006856 0.000534. 0.003318 0.04123
LR_1 OVl.OOTc-OS .Ol c-06 1.071e-05 6.302c-05 0.0007851 0.0003467 0.003587 0.0623
ER 1.698 2.47 3.1 3.662 3.762 4.309 5.418 8.258 pvalHR 4.7O-07 1.4C-05 7.1C-05 0.00024 0.001412 0,0009625 0.0063 0.074
7.2.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 14 Summary of prognostic performances for signatures of 3 genes in HG2 and ER positive and LN negative cohort mill 5% 25% mediant? mean 75% 95% max.
LRJSy 3.702e-08 2.607e-05 0.0003526 0.001313 0.003978 0.003913 0.01763 0.08182
LRJOv 8.7Q9e-07 2.7240-05 0.0002363 0.001129 0.007191 0.005198 0.03369 0.2263
HR 1.54 2.111 2.694 3.223 3.294 3.804 4.773 6.638 pvalHR 1.4e-05 0.00013 0.00065 0.0021 0.008603 0.007325 0.038 0.23
7.3 4-gene signatures (10626 combinations)
7.3.1 Agreement HG/GG in ER + cohort
Tab. 15 Summary of agreement performances for signatures of 4 genes on the validation set agreement
min 0.721
5% 0.7S3
25% 0.S06
mediane 0.S22
mean 0.8196
75%, 0.833
95% 0.849
max 0.872
The results show the agreement obtained for said combinations of 3-gene signatures which is hardly as close to the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.3.2 Prognostic performances in HG2
Tab. 16 Summary of prognostic performances for signatures of 4 genes in HG2 cohort
rain 5% 25% medians mean 75% 95% max
1.1 :_.-.·>- S.OOlo-10 3.567<>07 5.839O-06 3.443O-05 0.0003346 0.0002005 0.001508 0.04934.
LR_10y 3.411οι 1 2.6c-08 5. !.72c-07 3.567O-06 4.834e-05 2.157G-05 0.0002002 0.01221
HR 1.948 2.818 3.33 3.785 3.895 4.339 5.397 7.875 pvalHR 1.8C-08 le-06 7.4O-06 30-05 0.0001526 0.00011 0.00063 0.02
7.3.3 Prognostic performances in HG 2 ER + cohort
Tab. 17 Summary of prognostic performances for signatures of 4 genes in HG2 and ER positive cohort
min 5% 25% mediane mean 75% 95% max
LR_5v S.39e~Q9 1.803e-( )6 2.10-05 8.772e-05 0.0003789 0.0003381 0.001701 0.01695
LRJ v- 7.169ο- 10 5.312c-( 37 5.737e-06 3.402e-05 0.0003212 0.0001823 0.001502 0.04176
HR 1.794 2.664 3.243 3.774 3.906 4.44 5.575 8.943 pvalHR 1.3C-07 8.4e-06 4.3e-05 0.00015 0.0007054 0.00057 0.0031 0.051
7.3.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 18 Summary of prognostic performances for signatures of 4 genes in HG2 and ER positive and LN negative cohort min 5% 25% mediane mean 75% 95% max
LR_5y 3.04.e-08 1.699e-05 0.0002314 O.000S436 0.002519 0.002612 0.01048 0.0S225
LR-lOv 7,744e-08 1.6650-05 0.0001519 0.0007017 0.004038 0.003 17 0.01925 0.2338
HR 1.501 2.279 2.834 3.333 3.424 3.904 4.889 7.634 pvalHR 3.9e-06 9.7C-05 0.00045 0.0014 0.005142 0.0047 0.023 0.24
7.4 5-gene signatures (42504 combinations)
7.4.1 Agreement HG/GG in ER + cohort
Tab. 19 Summary of agreement performances for signatures of 5 genes on the validation set
agreement
min 0.736
5% 0.798
25% 0.818
rnediane 0.829
mean l l.Nl'fi
75% 0.837
95% 0.853
max 0.88
The results show the agreement obtained for said combinations of 5-gene signatures which is equivalent to the agreement obtained for 97-gene signature from the state of the art and higher than the agreement obtained for the 4-gene signature from the state of the art (see comparison with the state of the art below).
7.4.2 Prognostic performances in HG2
Tab. 20 Summary of prognostic performances for signatures of 5 genes in HG2 cohort
min 5% 25% mediane mean 75% 95% max
LR_5y 3.143O-10 3.1 12c-07 4.222< 06 2.372e-05 0.0001983 O.000126S 0.0009232 0.05561
LR_10v 1.052c-! 1 1.6 67O-08 2.756c-07 1.815C-06 2.31Se-05 1.066&-05 9.764e-05 0.008731
HR 1.976 2.9 27 3.439 3.877 3.989 4.42 5.449 9.82 pvalHR 9.2e-09 7.4 -07 4.7e-06 1.8c-05 8.74Sc-05 6.8c-05 0.00037 0.016
7.4.3 Prognostic performances in HG 2 ER + cohort
Tab. 21 Summary of prognostic performances for signatures of 5 genes in HG2 and ER positive cohort
min 5% 25% mediane mean ί 5%! 9·" i% max
LR_5 2.28C-09 1.410-06 1.455C-05 6.109e-05 0.0002457 0.0002232 0. 001094 0.01562
ULhK 4.615c- 10 3.686c-07 3.787C-06 1.9890-05 0.0001711 0.0001009 0. 0007762 0.04408
HR 1.798 2.803 3.364 3.869 3.996 4.497 5. 612 9.682 pvalHR 8.1c-08 6.5c-06 3.10-05 0.0001 0.0004294 0.00035 0. 0019 0.061
7.4.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 22 Summary of prognostic performances for signatures of 5 genes in HG2 and ER positive and LN negative cohort ram 5% 25% mediant: mean 75% 95% max
LIl_5y 1.736e-08 1.182e-05 0.0001498 0.0005943 0.001.744 0.001838 0.007169 0.07167
LIUOy 3.315O-08 1.249G-05 0.0001051 0.0004576 0.002577 0.001941 0.01182 0.259
HR 1.474 2.4 2.953 3.432 3.518 3.984 4.942 7.S06 pvalHR 1.9C-06 7.6e-05 0.00033 0.00099 0.003454 0.0031 0.015 0,26
7.5 6-gene signatures (sample of 100 000 combinations)
7.5.1 Agreement HG/GG in ER + cohort
Tab. 23 Summary of agreement performances for signatures of 6 genes on the validation set
agreement
min 0.744
5% 0.802
25% 0.822
mcdiaiic 0.833
mean 0.8317
75% 0.841
Figure imgf000067_0001
max 0.8S4
The results show the agreement obtained for said combinations of 6-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.5.2 Prognostic performances in HG2
Tab. 24 Summary of prognostic performances for signatures of 6 genes in HG2 cohort
mm 5%, 25% ruediane mean 75 95% max
LR_5y 2.604ο- 10 2.773e-07 3,259e-06 1.718e-05 0.00012S5 S.698c-05 0.0006083 0.02082
LR_lQy 3.375e-12 1.244e-08 1.757e-07 1.065e-06 1.297e-Q5 6.OI80-O6 5.477e-05 0.003464
HR 2.148 3.019 3,521 3.951 4.05 4.469 5.434 8.843 pyalHR 7.8C-09 5.9e-0~ 3.3c-06 1.2e-05 5.7c-Q5 4.4c-05 0.00024 0.0087
7.5.3 Prognostic performances in HG 2 ER + cohort
Tab. 25 Summary of prognostic performances for signatures of 6 genes in HG2 and ER positive cohort
min 5% 25% mcdianc mean 75% 95% m x
LR_5y 2.647C-09 1. I42C-06 l,032e-05 4.4040-05 0.0001722 0.0001596 0,0007583 0.01056
!JLJ iiv 3.531C-10 2.S72e-C)7 2.6820-06 1.326c-05 0.00010 6.44e-05 0,0004714 0.01988
HR 1.999 2.902 3.45 3.944 4.057 4.54 5.583 9,934 pvalHR 1.4C-07 5.3o00 2.4C-05 7.5c-05 0.0002914 0.00025 0.0013 0,029
7.5.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 26 Summary of prognostic performances for signatures of 6 genes in HG2 and ER positive and LN negative cohort mill 5% 25% mcdianc mean 75% 95% max
LR_5y 2.7S8e-09 S.664e-06 0.0001032 0.0004307 0.001268 0.001383 0.005275 0.09351
LRJOv 3.79C-08 9.8 1 1C-06 7.7Q2e-05 0.0003233 0.001767 0.001336 0.00S125 0.1449
HR 1.647 2.503 3.044 3.51 1. 3.59 4.037 4.954 8.061 pvalHR 1.7e-06 6.4e~QS 0.00026 0.00075 0.002484 0.0023 0.011 0.15
7.6 7-gene signatures (sample of 100 000 combinations)
7.6.1 Agreement HG/GG in ER + cohort
Tab. 27 Summary of agreement performances for signatures of 7 genes on the validation set
agreement
min 0.756
5% 0.81
25% 0.826
mcdianc 0.837
mean 0.8357
75% 0.845
95% 0.86
max 0.8S4
The results show the agreement obtained for said combinations of 7-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.6.2 Prognostic performances in HG2
Tab. 28 Summary of prognostic performances for signatures of 7 genes in HG2 cohort
min % 25% mcdianc me n 75% 95% max
LR_5y 1.297e-0< ) 2.453e-07 2.616c-06 1.336 -05 9.1120-05 6.5290-05 0.0004241 0.01309
LR_10Y S.S73o-i: > 9.997c-09 1.193e-07 6.846c-07 .079o-06 3.7e-06 3.355c-05 0.002922
HR 2493 3.097 3.597 4.015 4.098 4.504 5.401 7.991 pvalHR 8.4ο- 09 4.9C-07 2.0C-06 S.7e-06 3.982C-05 3e-05 0.00017 0.0073
7.6.3 Prognostic performances in HG 2 ER + cohort
Tab. 29 Summary of prognostic performances for signatures of 7 genes in HG2 and ER positive cohort
min 5% 25% lnediane mean 75% 95% max
LR_5y 4.915e-0D 9.325 -07 7.873e4)6 3.279(s05 0.0001275 0.0001201 0.0005693 0.008935
LR Ov 2.3S6O-10 2.36GC-07 1.9710-06 9.284C-06 6.S32e-05 4.281e-05 0.0003052 D mes;
HR 2.091 2.995 3.538 4.01 4. 109 4.571 5.552 10.12 pvalHR 1.5. -07 4.4C-06 l .Se-05 5.6<>05 0.0002089 0.00018 0.00088 0.021
7.6.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 30 Summary of prognostic performances for signatures of 7 genes in HG2 and ER positive and LN negative cohort min 5% 25% mcdianc mean 75% 95% max
LR_5y 1.736C-0S 7,lS9c-06 7.459e-05 0.0003121 0.0009529 0.001059 0.004013 0.07206
LR_10v 5.296e-09 8.129e-06 5.875e-05 0.000236S 0.001265 0.0009432 0.005644 0.1191
HR 1.711 2.593 3.135 3,585 3.654 4.092 4.953 8.854 pvalHR i . k-iiii 5.4C-05 0.00021 0.00058 0.001857 0.0017 0.0077 0.12
7.7 8-gene signatures (sample of 100 000 combinations)
7.7.1 Agreement HG/GG in ER + cohort
Tab. 31 Summary of agreement performances for signatures of 8 genes on the validation set agreement
min 0.771
5% 0.814
25% 0.829
mcdianc 0.841
mean 0.8389
75% 0.849
95% 0.864
max 0.S88
The results show the agreement obtained for said combinations of 8-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.7.2 Prognostic performances in HG2
Tab. 32 Summary of prognostic performances for signatures of 8 genes in HG2 cohort
min 5% 25% mediane mean 75% 95% max
LILOv 3.241C-10 2.1080-07 2.134e-Q6 1.073. -05 6.9 8e-05 5.033C-05 0.0003217 0.008304
LR_l0 2.955C-12 7.969O-09 S.425O-08 4.601C-07 5.411C-06 2.407O-06 2. l88e-05 0.002714
HR 2.223 3.165 3.663 4.072 4.142 4.541 5.359 8.325 pvalHR 3.9c-09 4.1c-Q7 2e-06 6.4C-06 2.922c-05 2.2P-05 0.00012 0.0063
7.7.3 Prognostic performances in HG 2 ER + cohort
Tab. 33 Summary of prognostic performances for signatures of 8 genes in HG2 and ER positive cohort
min 5% 25% mcdianc mean 75% 95% max
1 1 ;_.".·. 4.4960-09 7.671e-07 6.101C-06 2.535e-05 9.893C-05 9.433C-05 0.0004314 0.006413
LR_10v 1.1860-10 1.944e-07 1.492C-06 6.536e~06 4.753c-05 2.971.C-05 0.0002054 0.01313
HR. 2.079 3.076 3.618 4.08 4.16 4.606 5.509 8.99 pvalHR 4.2c-08 3.8e-()6 1.5C-05 4.3c-()5 0.0001563 0.00014 0.00064 0.024
7.7.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 34 Summary of prognostic performances for signatures of 8 genes in HG2 and ER positive and LN negative cohort miri 5% 25% mcdianc mean 75% 95% max
LR_5y 1.699e-0S 5.532e-06 5.654e-05 0.0002445 0.0007486 0.0008502 0.003127 0.03634
LRJOv 1.643O-0S 6.833e-06 4.617e-05 0.0001779 0.0009189 0.0006809 0.004116 0.1171
HB 1.704 2.675 3.21.9 3.655 3.713 4.139 4.951 7.717 pvalHR lc-06 4.7e-05 0.00017 0.00047 0.001416 0.0013 0.0058 0.12
7.8 9-gene signatures (sample of 100 000 combinations)
7.8.1 Agreement HG/GG in ER + cohort
Tab. 35 Summary of agreement performances for signatures of 9 genes on the validation set agreement
rain 0.771
5% 0.818
25% 0.833
mcdiane 0.S41
mean 0.8418
75% 0.853
0 U.u RCUid-t
max 0.88S
The results show the agreement obtained for said combinations of 9-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.8.2 Prognostic performances in HG2
Tab. 36 Summary of prognostic performances for signatures of 9 genes in HG2 cohort
min 5% 25% mcdianc mean 75% 95% max
LR_5y 2.913(3-10 l.S92e-07 1.786&-06 8.746c-06 5.376e-05 3.996e-05 0.0002524 0.004713
LRJOv 1.1370-11 6.527C-09 6.275C-08 3.242e-07 3.603c-06 1.646e-06 1.4870-05 0.001158
H.R 2.34 3.239 3.727 4.121 4.18 4.569 5.312 8.54 pvalHR 8.2e-09 3.6O-07 1.6c~06 50-06 2.149<i-05 1.60-05 9o-05 0.0036
7.8.3 Prognostic performances in HG 2 ER + cohort
Tab. 37 Summary of prognostic performances for signatures of 9 genes in HG2 and ER positive cohort
irriii 5% 25% mcdiane mcati 75% 95% max
LR_5y 4.438o-09 6.41.2O-07 4.863c-06 1.97e-05 7.739e-05 7.2060-05 0.0003492 0.004807 l.iL luv 1.290-09 1.644O-07 1.147C-06 4.782e-06 3.342C-05 2.103 -05 0.0001482 0.00-1855
HR 2.277 3.158 3.695 4.141 4.207 4.64 5.472 8.63 pvalHR 1. I 3.3e-06 .20-05 3.4O-05 0.0001 181 0.0001 0.00049 0.0095
7.8.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 38 Summary of prognostic performances for signatures of 9 genes in HG2 and ER positive and LN negative cohort min 5% 25% medianc mean 75% 95% max
LR_5v 2.162O-08 4.8Sle-06 4.175O-05 0.0001909 0.0005911 0.0006623 0.002499 0.02308
LR_10v 4.035C-08 5.71e-06 3.713c-05 0.0001358 0.0006806 0.0005052 0.003002 0.07015
HR 1.845 2.761 3.299 3.719 3.77 4.183 4.957 7.582 pvalHR 2.1C-06 4. lc-05 0.00015 0.00038 0.0011 0.001 0.0045 0.075
7.9 10-gene signatures (sample of 100 000 combinations)
7.9.1 Agreement HG/GG in ER + cohort
Tab. 39 Summary of agreement performances for signatures of 10 genes on the validation set
agreement
min 0.783
5% 0.818
25% 0.833
medianc 0.845
mean 0.8443
75% 0.857
95% 0.868
max 0.888
The results show the agreement obtained for said combinations of 10-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.9.2 Prognostic performances in HG2
Tab. 40 Summary of prognostic performances for signatures of 10 genes in HG2 cohort
min 5% 25% medianc mean 75% 95%;. max
LR_5y 1.815C-10 1.68C-07 1.524e-06 7.136C-06 4.314C-05 3.246e-05 0.0002038 0.003779
LRJ Oy 8.802C-12 5.4C-09 4.828e-08 2.3720-07 2.6C-06 L 161.C-06 1.029C-05 0.0008901
HR 2.434 3.303 3.784 4.167 4.216 4.595 5.287 7.803 pvalHR 5.2e-09 3.1e-07 1.3c-06 3.ik--06 1.662e-05 1.3o05 6.8C-05 0.0028
7.9.3 Prognostic performances in HG 2 ER + cohort
Tab. 41 Summary of prognostic performances for signatures of 10 genes in HG2 and ER positive cohort
min 5%; 25% medianc mean 75%, 95% max
LR_5y 3.992o-09 5.275O-07 4.002( 06 I .509o05 6.231C-05 5.766e-05 0.0002841 0.003852
LR_10v 1.209e-10 1.335C-07 9.015C-07 3.602C-06 2.475C-05 1.5240-05 0.0001093 0.004072
HR 2.346 3.228 3.769 4.202 4.254 4.678 5.447 8.695 pvalHR 3.4c-08 2.8c-06 lc-05 2.7 c- 05 9.298C-05 8. lc-05 0.00039 0.0086
7.9.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 42 Summary of prognostic performances for signatures of 10 genes in HG2 and ER positive and LN negative cohort nvin 5% 25% mediane mean τ <0 (7><; 95% max
LR_5y 1.183e-0S 3.776e-06 3.107.--05 0.0001482 0.0004789 0.0005171 0.002031 0.01896
LR_10v 1.439O-08 4.638e-06 2.962C-05 0.0001055 0.00051 1 0.0003839 0.002282 0.05055
HR 1.942 2.835 3.368 3.78 3.827 4.236 4.971 7.448 pvalHR 7.9e-07 3.5e~05 0.00012 0.00031 0.0008714 0.00083 0.0035 0.055
7.10 11-gene signatures (sample of 100 000 combinations)
7.10.1 Agreement HG/GG in ER + cohort
Tab. 43 Summary of agreement performances for signatures of 11 genes on the validation set
agreement
min. 0.787
5% 0.822
25% 0.837
mediane 0.849
mean 0.8465
75% 0.857
Figure imgf000072_0001
max 0.88S
The results show the agreement obtained for said combinations of 11-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.10.2 Prognostic performances in HG2
Tab. 44 Summary of prognostic performances for signatures of 11 genes in HG2 cohort
mill 5% 25% mediane mean 75% 95% max
LR_5y 1.414o09 i..511e-07 1.329e-06 6.038C-06 3.575C-05 2.737 -05 0.0001701 0.004037
LRJOy 2.026e-ll 4.362<3-09 3.698e-0S 1.74 1.-07 1.857e-06 8.336e-07 7.2S8e-06 0.0005706
HR 2.407 3.37 3.844 4.219 4.261 4.63 5.2S9 7.813 pvalHR 1.20-08 2.7C-07 l . le-06 3.2c-06 1.278C-05 9.8e-06 5.1c-05 0.002
7.10.3 Prognostic performances in HG 2 ER + cohort
Tab. 45 Summary of prognostic performances for signatures of 11 genes in HG2 and ER positive cohort
min 5% 25% mediane mean 75% 95 max
LR_5y 3.992e-09 4. 1-07 3.065^06 1.284C-05 5.122O-0S 4.467c-05 0.0002406 0.003794
LR_10v 3.39< -10 1.061C-07 7.055£:-07 2.742C-06 1.799C-05 1 , 1.04ί;-05 7.835c-05 0.003722
HR 2.369 3.311 3.849 4.266 4.315 4.73 5.462 8.669 pvalHR 5.9O-08 2.4e-06 8.5e-06 2.2e-05 7.2240-05 6.3e-05 0.0003 0.0076
7.10.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 46 Summary of prognostic performances for signatures of 11 genes in HG2 and ER positive and LN negative cohort min 5% 25% medianc mean 75% 95% max
LI y 3.956e-08 3.267G-06 2.369c-05 0.0001132 0.0003925 0.0004152 0.001721 0.01601
LR_10y 1.543e-QS 3,61Se-06 2.3S)2e-05 S.237e-Q5 0.0003784 0.000288 0.00165 0.04764
HR 1.986 2.924 3.44S 3.852 3.896 4.299 5.017 7.346 pvalHR 9.9O-07 2.9o05 0.0001 0.00026 0.0006792 0.00066 0.0027 0.052
7.11 12-gene signatures (sample of 100 000 combinations)
7.11.1 Agreement HG/GG in ER + cohort
Tab. 47 Summary of agreement performances for signatures of 12 genes on the validation set
agreement
min. 0.787
5% 0.826
25% 0.841
Hicdiaiie 0.849
mean 0.8485
75% 0.857
95% 0.868
max 0.891
The results show the agreement obtained for said combinations of 12-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.11.2 Prognostic performances in HG2
Tab. 48 Summary of prognostic performances for signatures of 12 genes in HG2 cohort
min 5% 25% modiano mean 75% 95% max
LR_5y 1.692o-09 1.460-07 1.1550-06 5.043e-06 2.S65C-05 2.17O-05 0.0001336 0.004911 l.l i m 1.104.0-11 3.589e-09 2.929e-08 1.322^-07 1..322e-06 6.061 e-07 5.026e-06 0.0007058
HR 2.536 3.433 3.901 4.263 4.301 4.663 5.29 7.66 pvalHR 7.9c-09 2.3C-07 9.2c-07 2.6c-06 9.817c-06 7.7e-06 3.9C-05 0.0021
7.11.3 Prognostic performances in HG 2 ER + cohort
Tab. 49 Summary of prognostic performances for signatures of 12 genes in HG2 and ER positive cohort
min 5% 25% modiano moan 75% 95% max l .IL5v 3.307(^-09 3.6660-O: < 2.492C-06 9.922C-06 4..061C-05 3.4246-05 0.000191 S 0.00361.
LR_10v 1.5020-10 8.355e-0; 5.545C-07 2. 1 12e-06 1. 010-05 8. 108C-06 5.607c-05 0.003521
HR 2.349 3.388 3.921 4.326 4.37 4.777 5.476 8.173 pvalHR 5.2e-08 2O-06 7.1C-G6 l.Sc-05 5,6120-05 Sc-05 0.00023 0.0071
7.11.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 50 Summary of prognostic performances for signatures of 12 genes in HG2 and ER positive and LN negative cohort min 5% 25% iiicdianc mean 75% 95% max
LR_5 2.162e-08 2.641C-06 2,063e-05 8.354e-05 0.000317 0.0003095 0.001424 0.01496
LRJLOv l..S34e-03 2.82e-06 1.8240-05 6.494O-05 0.000281 0.0002206 0.001.21 0.05826
HR 1.914 3.009 3.519 3.914 3.96 4.36 5.053 7.021 pvalHR Llo-06 2.5e-05 8.6C-05 0.00021 0.0005351 0.00053 0.0021 0.063
7.12 13-gene signatures (sample of 100 000 combinations)
7.12.1 Agreement HG/GG in ER + cohort
Tab. 51 Summary of agreement performances for signatures of 13 genes on the validation set
agreement
min 0.798
5% 0.826
25% 0.841
modiane 0.853
mean 0.8504
75% 0.86
95% 0.872
max 0.891
The results show the agreement obtained for said combinations of 13-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.12.2 Prognostic performances in HG2
Tab. 52 Summary of prognostic performances for signatures of 13 genes in HG2 cohort
min 5% 25% mcdiane mean 75% 95% max
LR-5y 9.926C-10 1.302C-07 1.0060-06 4. 133C-06 2.3880-0» 1.796C-05 0.0001139 0.002666
LlUOy 7.946e-12 2.7536-09 2.297C-08 1.0140-07 9.583e-07 4.501O-07 3.673C-06 0.000343
HR 2.598 3.499 3.957 4.3 1 4.347 4.699 5.314 7.09S pvalHR 4.8c-09 2e-07 7.8e-07 2.1.C-06 7.71 -'-<>*·. C.2 -06 3.--05 0.001.1
7.12.3 Prognostic performances in HG 2 ER + cohort
Tab. 53 Summary of prognostic performances for signatures of 13 genes in HG2 and ER positive cohort
min 5% 25% medianc moan 75% 95%: max
LR_5 5.3210-09 3.29c-07 2.132 -06 7.554o-06 3.351, c-05 2.964O-05 0.0001521 0.0022 l.iu m 9.049c- 11 6.326< 0S 4.342C-07 I .6 I0-O6 9.748 p-06 6.IO80-O6 4.092C-05 0.002715
HR 2.405 3.467 3.995 4.392 4.434 4.841 5.516 7.538 pvalHR 3.2o-Os 1.7O-06 6c-06 1.5o-05 4.473 e-05 1. ·-!!.-, 0.00018 0.0058
7.12.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 54 Summary of prognostic performances for signatures of 13 genes in HG2 and ER positive and LN negative cohort inin 5% 25% mediaoe mean 75% 95% max
LR_5y 2.162e-0S 2,154e-06 l,499e-05 6.529O-05 0.000261 0.0002753 0.001207 0.009811
LR-lOv 8.054c-09 2.144e-06 1.393. - 5.055e-05 0.0002123 0.0001688 0.O00S972 0.04171
HR 2.007 3.091 3.595 3.986 4.034 4.434 5.128 6.983 pvalHR 6.Se-07 20-05 7.1O-05 0.00018 0.0004275 0.00043 0.0016 0.046
7.13 14-gene signatures (sample of 100 000 combinations)
7.13.1 Agreement HG/GG in ER + cohort
Tab. 55 Summary of agreement performances for signatures of 14 genes on the validation set
agreement
min 0.802
5% 0.829
25% 0.845
mcdianc 0.853
mean 0.8523
75% 0.86
95% 0.872
max 0.891
The results show the agreement obtained for said combinations of 14-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.13.2 Prognostic performances in HG2
Tab. 56 Summary of prognostic performances for signatures of 14 genes in HG2 cohort
min 5%; 25% mcdianc mean 75% 95% max
LR_5y 3.278e-09 1.314C-07 8.942e-07 3.581c-06 1.946C-05 1.461e-05 8.851e-05 0.001678
LR_10y 1.723e-l l 2.2S3C-09 1.85SC-08 7.991e-QS 6.715C-07 3.41 1e-07 2.556C-06 0.000171
HR 2.656 3.563 4.011 4.353 4.389 4.733 5.334 7.758 pvalHR 8.2e-09 l .Sc-07 6.7C-07 l .Sc-OS 5.95e-06 5· -06 2.3e-05 0.00067
7.13.3 Prognostic performances in HG 2 ER + cohort
Tab. 57 Summary of prognostic performances for signatures of 14 genes in HG2 and ER positive cohort
min 5% 25% mcdianc mean 75% 95% max
LR_5y 3.472C-09 2.963C-07 1.6 3e-06 6. I I80O6 2.684e-05 2.357e-05 0.0001209 0.002094
LR_10v 3.39e-10 5.014C-08 3.486C-07 1.278C-06 6.988e-06 4.605e-()6 2,87e-05 0.001189
HR 2.554 3.555 4.064 4.454 4.496 4.889 5.561 8. 122 pvalHR 5.5e-08 1.5e-06 5.2e-06 1.3C-05 3.486o05 3.2o05 0.00013 0.0029
7.13.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 58 Summary of prognostic performances for signatures of 14 genes in HG2 and ER positive and LN negative cohort min 5% 25% mediaue mean 75% 95% max
LR_5y Ί.455Ο-08 2.154O-06 1.4 HC-05 4.725O-05 0.0002102 0.0002068 0.0009442 0.01047
Ι.[ϊ_1θν 1.495O-0S 1.5S2e-06 1.115C-05 3.948e-05 0.0001586 0.0001295 0.0006688 0.02201
HR 2.165 3.174 3.67 4.057 4.106 4.495 5.202 6.795 pvalHR 9.7c-07 1.70-05 6e-05 0.00015 0.0003401 0.00035 0.0013 0.025
7.14 15-gene signatures (sample of 100 000 combinations)
7.14.1 Agreement HG/GG in ER + cohort
Tab. 59 Summary of agreement performances for signatures of 15 genes on the validation set
agreement
min 0.806
5% 0.829
25% 0.S45
mediaue 0.857
mean 0.8541
/ &7ϋ 0.8G4
95% 0.872
max 0.891
The results show the agreement obtained for said combinations of 15-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.14.2 Prognostic performances in HG2
Tab. 60 Summary of prognostic performances for signatures of 15 genes in HG2 cohort
min 5% 25% mediaue mean Y5% 95% max
LR_5y 2. l< !:.i< -HO 1.326e-07 8.637c-07 3.0370-06 1.561c-05 1.212O-05 7.06c-05 0.001804
LlLlOv 1.435c- 11 1.86C-09 1.485C-08 6.3340-08 4.68e-Q7 2.608C-07 1.786e-06 0.0001298
HR 2.773 3.632 4.058 4.396 4.435 4.772 5.362 7.15 pvatHR 9.1e-09 1.60-07 5.7e-07 1 .5O-06 4.592C-06 4Je-06 1.70-05 0.0006
7.14.3 Prognostic performances in HG 2 ER + cohort
Tab. 61 Summary of prognostic performances for signatures of 15 genes in HG2 and ER positive cohort
min 5% 25% ii -diaiie mean 75%, 95% ma
L.R_oy 8.852e~09 2.963O-07 1.523e-f D6 4.586e-06 2.116e-05 1.808c-05 9.495e-05 0.001866
LR_10v 3.037c- 10 3.857^08 2.658e-( 37 9.7790-07 4.986e-06 3.513e-06 2.004O-05 0.000729
HR 2.678 3.645 4.1.34 4.523 4.565 4.96 5.621 7.424 pvalHR 6.5O-0S I ::.-! !(; 4.3 -06 le-05 2.7086-05 2.7.-05 0.000.1 0.0017
7.14.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 62 Summary of prognostic performances for signatures of 15 genes in HG2 and ER positive and LN negative cohort miii 5% 25% media lie mean 75% 95% max
LR_5y 4.885o0S 1.692o06 1.0456-05 3.717o05 0.0001659 0.0001538 0.0007248 0.009902
LILlOy 1.834e-0S 1.172O-06 S.324e-06 3.021O-05 0.0001157 9.89e-05 0.0004846 0.01245
HR 2.315 3.26 3,745 4.134 4.186 4.579 5.285 6.795 pvalH l.le-06 1.4e-05 4.9o05 0.00012 0.0002669 0.00029 0.00099 0.015
7.15 16-gene signatures (sample of 100 000 combinations)
7.15.1 Agreement HG/GG in ER + cohort
Tab. 63 Summary of agreement performances for signatures of 16 genes on the validation set
agreement
min 0.81
5% 0.833
25% 0.849
mcdiane 0.857
mean 0.8561
-J Λ' Π U.0 ¾f0M4
95% 0.872
max 0.888
The results show the agreement obtained for said combinations of 16-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.15.2 Prognostic performances in HG2
Tab. 64 Summary of prognostic performances for signatures of 16 genes in HG2 cohort
min 5%) 25% mcdiane mean 75% 95% max
LR_5y 3.27SC-09 1.460-Q7 7.729o07 2.655O-06 1.254o05 9.5S1.C-06 5.608e-05 0.001331
LRJOy 1.9470-11 1.605o09 I.I880O8 4.97 .0O8 3.284e-07 I.99I0O7 1.26C-06 9.297C-05
HR 2.853 3.706 4.112 4.442 4.483 4.816 5.391. 6.915 pvalHR 9.9c-09 1.4C-07 5e-07 .1.30-06 3.572e-06 3.4e-06 1.3C-05 0.00043
7.15.3 Prognostic performances in HG 2 ER + cohort
Tab. 65 Summary of prognostic performances for signatures of 16 genes in HG2 and ER positive cohort
min 5% 25% mediaue meat 1 i 95% max
LR_5y 1.38 e-O β 2.963οι D7 1.1.99o06 4.075O06 1.67? ie-05 1.37.1.0-05 7.557e-05 0.001351
LR_10v 2.898c;- 1 0 3.073oi D8 2.085o07 7.632o07 3.58c :-06 2.647C-06 1.4 C-05 0.0004868
H 2.764 3.733 4.211 4.593 4.631 ' 5.024 5.692 7.16 pvalHR 6.5C-08 L le-06 3.7C-06 8.8e-06 2.12; fe~05 2.2c-05 7.8e-05 0.0014
7.15.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 66 Summary of prognostic performances for signatures of 16 genes in HG2 and ER positive and LN negative cohort mill y Ze25% mediane mean 75% 95% max
LR_5y 6.432e-0i is 1.662e-06 9.514e-06 2.974C-0S 0.0001307 0.0001132 0.0005512 0.00682
LfLi Ov 1.077e-0i S <S.912e-07 6.484e-06 2.275.-0."» 8.587e-05 7.55Se-05 0.0003549 0.01245
HR 2.315 3.351 3.827 4.217 4.268 4.657 5.377 6.839 pvalHR Se-07 l.le-05 4.1c-05 9.Sc-05 0.0002122 0.00023 0.00077 0.015
7.16 17-gene signatures (sample of 100 000 combinations)
7.16.1 Agreement HG/GG in ER + cohort
Tab. 67 Summary of agreement performances for signatures of 17 genes on the validation set
agreement
mini 0.806
5% 0.837
25% 0.853
.mediane 0.86
m an 0.85S
75%: 0.86S
95% 0.876
max 0.891
The results show the agreement obtained for said combinations of 17-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.16.2 Prognostic performances in HG2
Tab. 68 Summary of prognostic performances for signatures of 17 genes in HG2 cohort
min 5% 25%; mediane mean 75% 95% OliLX
LR_5y 5.963e-09 1.4930-07 7.514C-07 2.289e-06 9.828G-06 7.983C-0 € 4.219c-05 0.00132
LRJ Oy 2.236e-l 1 1.437e-09 9.744e-09 3.9Q6e-0S 2.239e-07 1.515e-0 7 S.623e-07 0.0001043
HR 2.802 3.776 4.165 4.491 4.527 4.848 5.406 6.929 pvalHR l. le-08 1.3e-07 4.4e~07 i .le-06 2.749C-06 2.8e-06 10-05 0.00041
7.16.3 Prognostic performances in HG 2 ER + cohort
Tab. 69 Summary of prognostic performances for signatures of 17 genes in HG2 and ER positive cohort
min 5% 25 mediane mean vS .% 95%; ni ax
LR_5y 2.43C-08 2.963C-07 1.089C-06 3.258C-06 1.298C-05 U i)39c-05 5.807C-05 0.001428
LR_10v 4.802C-10 2.675e-08 1.63c-07 5.816c-07 2.504e-06 2.1 ()02e-06 1.004c-05 0.0007496
HR 2.73 3.826 4.285 4.668 4.707 5J 086 5.721 7.217 pvalHR 8.4e-08 9.2e-07 3e-06 7.3e-06 1 .6 ίΓ.. -05 .i Sc-05 5.9. -05 0.0(519
7.16.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 70 Summary of prognostic performances for signatures of 17 genes in HG2 and ER positive and LN negative cohort min 5% 25% medianc mean 75% 95%, max
LR_5y 1.206e-07 1.662e-06 7.21C-06 2.291e-05 0.0001004 8.25e-05 0.0004749 0.007169
LIUOv 1.3S4e-Q8 7.9G4G-07 4.865e-Q6 1.734e-05 6.242e-05 5.6S lc-05 0.0002575 0.006493
HR 2.502 3.442 3.909 4.298 4.346 4.734 5.419 6.666 pvalHR 7.7e-07 lc-05 3.4e-05 S. lc-05 0.0001673 0.00019 0.00059 0.0085
7.17 18-gene signatures (sample of 100 000 combinations)
7.17.1 Agreement HG/GG in ER + cohort
Tab. 71 Summary of agreement performances for signatures of 18 genes on the validation set
agreement
rain 0.81
5% 0.841
25% 0.853
mediane 0.86
mean 0.8602
Figure imgf000079_0001
959( 0.876
max 0.891
The results show the agreement obtained for said combinations of 18-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.17.2 Prognostic performances in HG2
Tab. 72 Summary of prognostic performances for signatures of 18 genes in HG2 cohort
min 5% 25% mediane mean 75% 95% max
L.R_5y S..l07e-09 1.7070-07 7.729c-07 2.224c-06 7.703O-CM j 6.3950-06 3.262e-05 0.0007869
LR_l0v 3.579ο- 11 1.2SO-09 7.893c-09 3.072c-0S .1.556e-0 7 1.153C-07 6.30 i.c-07 5.127e-05
MR 2.968 3.839 4.222 4.54 4.573 4.88S 5.431 6.799 pvalHR 1.&>-nS 1.2c-07 3.8c-07 9.2c-07 2.155C-0 j ·-·..;··.-(«; 7.80-06 0.00028
7.17.3 Prognostic performances in HG 2 ER + cohort
Tab. 73 Summary of prognostic performances for signatures of 18 genes in HG2 and ER positive cohort
min 5% 25% mediane mean 75% 95%; max
LR_5y 3.191C-08 2.963e-07 1.079e-06 2.959cv06 9.962e-06 7.615e-06 4.434e-(S5 0.0007803
LR_10y 8.O60IO 2.281C-08 1.803e-07 4.401o-07 1.785e-06 1.484e-06 7.307e-06 0.0002831
HR 2.856 3,906 4.371 4.752 4.781 5.152 5.783 7.152 pvalHR. 8.9e-08 8.6e-07 2.7c-06 6o0« 1.292ο-0δ 1.4c-05 4.6c-05 0.00083
7.17.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 74 Summary of prognostic performances for signatures of 18 genes in HG2 and ER positive and LN negative cohort min 5% 25% mediane mean 75% 95% max
LiLSy 1.206e-07 1.662G-06 7.189e-06 2.063e-05 7.645C-05 5.943c-05 0.0003631 0.005881
LRJOv 1.834e-08 6.134.0-07 3.9.1e-06 1.289C-0S 4.59e-05 4.344(3-05 0.0001906 0.003375
HR 2.673 3.523 3.993 4.381 4.429 4.821 5.498 6.666 pvalHR l.le-06 8.5e-06 2.8o05 6.5e-05 0.000133 0.00015 0.00047 0.0048
7.18 19-gene signatures (42504 combinations)
7.18.1 Agreement HG/GG in ER + cohort
Tab. 75 Summary of agreement performances for signatures of 19 genes on the validation set
agreement.
min 0.81.8
5% 0.841
2aV¾ 0.8¾ /
mediane 0.864
mean 0.8625
75% 0.872
π
max 0.891
The results show the agreement obtained for said combinations of 19-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.18.2 Prognostic performances in HG2
Tab. 76 Summary of prognostic performances for signatures of 19 genes in HG2 cohort
min 5% 25% mediane mean 75% 95% max
LR_5y ]..224e-08 1.778e-07 8.683e-07 1.9560-06 5.988C-06 5.396C-06 2.508O-05 0.0004701
LR_10y 5.1.8C-11 1.2590-09 6.662e-0i) 2.279(3-08 1.O71C-07 S.82C-08 4.729C-07 1.842C-05
HR 3.166 3.899 4.278 4.601 4.62 4.918 5.448 6,475 pvalHR 1.9e-08 l.le-07 3.5e-07 7.5c- 07 1.689e-06 i .9c-06 6.2c-06 0.00011
7.18.3 Prognostic performances in HG 2 ER + cohort
Tab. 77 Summary of prognostic performances for signatures of 19 genes in HG2 and ER positive cohort
min 5% 25% mediane mean 75% 95% max
LR_5v 4.964C-08 2.963C-07 1.079e-06 2.35SC-06 7.541e-06 6. 18e-06 3.364e-05 0.0006323
LRJOv 1.4160-09 1.93C-08 9.665C-08 3.214C-07 1.265e-06 1.112e-06 5.4 7c-06 0.0001342
HR 3.079 3.983 4.442 4.837 4.857 5.235 5.789 6.713 pvalHR 1.3C-07 8e-07 2.1e-06 4.9e-06 1.017c-05 1.2e-05 3.7e-05 0.0005
7.18.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 78 Summary of prognostic performances for signatures of 19 genes in HG2 and ER positive and LN negative cohort nun s%. 25% iticdianc mean / ·-> ,-'C 95% max
LR_5y 1.481e-07 i.662e-06 6.313O-06 l.S9Ge-05 5.689c-05 4.663o-05 0.0002282 O.0O3885 l.lUOv 2.417O-0S 4.764e-07 2.724C-06 9.228e-06 3.385e-Q5 3.239e-05 0.0001428 0.001916
HR 2.853 3.604 4.074 4.493 4.514 4.899 5.55 6.588 pvalHR 1.3O-06 7.6e-06 2.3e-0S S.4e-05 0.0001061 0.00013 0.00038 0.0031
7.19 20-gene signatures (10626 combinations)
7.19.1 Agreement HG/GG in ER + cohort
Tab. 79 Summary of agreement performances for signatures of 20 genes on the validation set
agreement
mm 0.822
0.S45
0.86
medianc 0.868
mean 0.8649
75% 0.872
95% 0.88
max 0.891
The results show the agreement obtained for said combinations of 20-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.19.2 Prognostic performances in HG2
Tab. 80 Summary of prognostic performances for signatures of 20 genes in HG2 cohort
min 5% 25% mcdianc mean 75%; 95% max
L.R_5y 1.714C-08 2.067e-07 8.698c-07 1.956o06 4.693c-06 4.162C-06 1.903c-05 0.0001992
LRJOy 9.5940-11 l,057e-09 5.601e-09 1.84O-08 7.813C-08 6.451e-08 3.535e-07 5.033e-06
HR 3.238 3.954 4.337 4.677 4.671 4.966 5.476 6.475 pvalHR 2.5O-08 l.lc-07 3c-07 6.3C-07 1.36o-06 l.5e-06 5.2e-06 3 9- -0Γ.
7.19.3 Prognostic performances in HG 2 ER + cohort
Tab. 81 Summary of prognostic performances for signatures of 20 genes in HG2 and ER positive cohort
min 5% 25% mediai tc mean t so 95% max
LR_5y 6.438C-08 2.963C-07 1.079c -06 2.132C-06 5.718e-06 5.5690-06 2.535c-05 0.00026
LIUOy 1.41C-09 1.6150-08 8.605c -08 2.413c-07 9.309e-07 S. 179C-07 4.285C-06 6.956e-05
HR 3.165 4.062 4.529 4.938 4.937 5.295 5.814 6.786 pvalHR 1.3C-Q7 6.8C-07 2c-06 4C-06 S.lSc-06 9c-06 3. lc-05 0.00026
7.19.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 82 Summary of prognostic performances for signatures of 20 genes in HG2 and ER positive and LN negative cohort mill 5% 25% mediane mean 75% 95% max
IJLSy 2.491c-07 1.662e-06 6.313e-06 1.411O-05 4.2 1e-05 3.29O-05 0.0001675 0.001561
LFLlOv 1.834C-0S 4.201O-07 2.172e-06 6.851C-06 2.55c- 05 2.444C-05 0.000 106 0.001516
HR 2.913 3.694 4.166 4.608 4.603 4.99 5.55 6.666 pvalHR 1.10-06 7.6C-06 2(>-05 4.2e-05 3.567O-05 0.000.1 0.00031 0.0025
7.20 21-gene signatures (2024 combinations)
7.20.1 Agreement HG/GG in ER + cohort
Tab. 83 Summary of agreement performances for signatures of 21 genes on the validation set
agreement
min 0.826
5% 0.849
25% 0.864
medianc 0.868
mean 0.8676
75% 0.872
95% 0.88
max 0.8S8
The results show the agreement obtained for said combinations of 21-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.20.2 Prognostic performances in HG2
Tab. 84 Summary of prognostic performances for signatures of 21 genes in HG2 cohort
min 5% 25% medianc mean 75% 95% max
LR_5v 1.831e-08 2.421C-07 1.0410-06 l,956e-06 3.739e-06 3.581e-06 1.249e -05 9.991e-05
LFLIOY 9.594C-11 1.0570-09 5.252O-09 1.197e-08 5.574e-08 4.721e-08 2.565c -07 1.4520-06
HR 3.572 4.032 4.41.9 4.75 4.722 4.971 5.441 6.253 pvalHR 2.9c-08 9.9e-08 2.80-07 5o-07 1.078e-06 1.20-06 3.985c -06 1.50-05
7.20.3 Prognostic performances in HG 2 ER + cohort
Tab. 85 Summary of prognostic performances for signatures of 21 genes in HG2 and ER positive cohort
min 5% 25%. medianc moan 75% 9 %: max
LR_5 9.57O-08 2.963e~07 1.079. -( M..; 2.132e-06 4.476e-0f > 4.0700-06 1.4860-05 0.0001265
LR_10v 3.328e-09 1.615C-0S 7.136C-0S 1.590-07 6.8530-01 r 5.85e-07 3.137O-06 2.182O-05
HR 3.533 4.137 4.638 5.07 5.014 5.295 5.952 6.509 pvalHR 2.5O-07 0.09c-07 1.8c- 06 3< 06 6.61C-06 7.¾ 0R 2.4e-05 0.00011
7.20.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 86 Summary of prognostic performances for signatures of 21 genes in HG2 and ER positive and LN negative cohort mill 5% 25% ntediane mean 75% 95% max
LR_5y 3.436e-07 1.662c-06 7.1S9&-06 1.41 lc-05 3.202O-05 2.974e-05 0.G001.1.S3 0.0009442
LR_10v 5.21e-08 3.17SO-07 2.1630-06 4.5720-06 1.868C-05 1.583e-05 8.58*5-05 0.0007041
HR 3.113 3.787 4.287 4.75 4.6S9 4.99 5.711 6.351 pvalHR 2.1c-06 5.535c-06 1.9o05 3.1e-05 6.845C-05 7.SC-05 0.00026 0.0013
7.21 22-gene signatures (276 combinations)
7.21.1 Agreement HG/GG in ER + cohort
Tab. 87 Summary of agreement performances for signatures of 22 genes on the validation set
agreement
min 0.84.1
5% 0.853
modiane 0.872
mean 0.8705
%
95% 0.88
max 0.884
The results show the agreement obtained for said combinations of 22-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.21.2 Prognostic performances in HG2
Tab. 88 Summary of prognostic performances for signatures of 22 genes in HG2 cohort
min 5% 25% !Ticdiimc mean 75% 95% max
LR_5y 5.8040-08 3.392©-07 l .CMlc-06 1.956e-06 3.011.O-06 2.75C-06 8.765C-06 3.721.C-05
LR_1 Oy- 1.7490- 10 9..1 So- .10 4.999c-09 7.605O-09 3.935C-08 2.569e-08 1.547c-07 9.309e-07
ER 3.777 4.131 4.563 4.841 4.783 4.966 5.476 6.072 pvalHR 4.30-08 9.9c-0S 2.8c-07 3.7C-07 8.394e-07 8.675C-07 2.925C-06 1. lc-05
7.21.3 Prognostic performances in HG 2 ER + cohort
Tab. 89 Summary of prognostic performances for signatures of 22 genes in HG2 and ER positive cohort
min 5% 25% mediar!c mean 75 95%. max
LR_5y 1.408O-07 4.24C-07 "1.5230-06 2.132e-06 3.548e-06 2.959c-06 1.113e-05 4.467O-05
LR_10v 6.09ie-09 1.615C-08 7.8420-08 1.122e-07 4.955e-07 3.2 8c-07 2.288e-06 7.557O-06
HR 3.832 4.26 4.831 5.201 5.102 5.295 5.952 6.314 pvalHR 3.9e-07 6e-07 L9e-06 2.35C-06 5.248C-06 4.85C-06 1.925C-05 4.8c-05
7.21.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 90 Summary of prognostic performances for signatures of 22 genes in HG2 and ER positive and LN negative cohort miri 5% 25% mcdianc mean 75% 95% max
LR_5y 5.588c-07 2.154e-06 9.514e-06 1.411C-05 2.425C-05 2.063C-05 6.959e-Q5 0.0003095
L'FLlOy l.OSlc-0 3.105c-Q7 2.172e-06 3.064.C-06 1.259c-05 9.789e-06 6.796e-05 0.0002062
HR 3.432 3.8S6 4.493 4.821 4.786 4.99 5.711 6.127 pvalHR 3.2G-06 S.4e-06 L975e-05 2.5e-05 5.196C-05 5.575e-05 0.00022 0.00048
7.22 23-gene signatures (24 combinations)
7.22.1 Agreement HG/GG in ER + cohort
Tab. 91 Summary of agreement performances for signatures of 23 genes on the validation set
agreement
min 0.853
25% 0.872
n 87 1
mean 0.8737
t-> /(! U. i t
95% 0.88
max 0.884
The results show the agreement obtained for said combinations of 23-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
7.22.2 Prognostic performances in HG2
Tab. 92 Summary of prognostic performances for signatures of 23 genes in HG2 cohort
min 5' "Ά. 25% mediane mean 75% 95% max
LR_5y 3.392c-07 i , 69e-07 1.956e-06 1 ,956e-06 2.187C-06 2.347C-06 4.59 l< -πϋ 6, 1310-06 l .iU ity 9.18c- 10 175o09 6.662o-09 6.662e-09 2.252e-08 1.124C-08 1.374e-07 1.6490-07
HR 4.041 4. .232 4.78 4.906 4.819 4.906 5.182 5.506 pvalHR 9.9C-08 1. .46 C-07 3.5C-07 3.5e-07 6.133C-07 50-07 2.335C-06 3.1o06
7.22.3 Prognostic performances in HG 2 ER + cohort
Tab. 93 Summary of prognostic performances for signatures of 23 genes in HG2 and ER positive cohort
min 5¾ 25% mctliane mean 75%, 95%, max
LR_5y 2.963c-07 8.057e-0 2.1 320-06 2.132(^-06 2.520-06 2.959e-06 5.407C-06 S.282O-06
LR_1.0y 1.094e-08 3.193e-08 8.605O-08 8.605e-0S 3. I43C-07 1 , 153, -07 1.996c-06 2.304O-06
HR 4.155 4.336 5.124. 5.295 5.1.68 5.295 5.591 6, 1.28 pvalHR 5.80-07 9.92c-07 2e-06 2e-06 3.91 3. -01 ; 3e-06 1..765C-05 1.9e-05
7.22.4 Prognostic performances in HG 2 ER + and LN - cohort
Tab. 94A Summary of prognostic performances for signatures of 23 genes in HG2 and ER positive and LN negative cohort min r 25% medians mean 75% 95% max
LR_5v 1.398e-06 6.313C-06 1.4110-05 1.4 1O-05 1.6S2C-05 2.063e-03 4.089e-05 4.663O-05
LRJOv 2.166C-07 6.972C-07 2.172C-06 2.172e-Q6 8.335C-06 3.99c-06 3.954e-05 7.436O-05
HR 3.762 4.035 4.786 4.99 4.S55 4.99 5.326 5.914 pvalHR 5c-06 9.17Se-06 2c-05 2O-05 3.973C-05 3o-05 0.000144 0.00023
7.23 24-gene signature (1 combination)
Tab. 94B- Summary of performances (agreement HG/GG in the ER+ cohort, and prognostic performances in the HG2, HG2/ER+ and HG2/ER+/L.IM- cohorts) for the 24-gene signature
Cohort Criterion Value
- ~Φ ¾f:
ER+ ¾ Agreement 0.88
HG2 LR_5y 1.956e-06
~ 6.662e-09 '" "
HR 4.906
pvalHR 3.5e-07
HG2/ER+ LR_5y 2.132e-06
LR_10y 8.6056-08 1
HR 5.295
pvalHR 2e-06
HG2/ER+/LN- LR_5y 1.411e-05
LR_10y * - 2.172e-06 ί
HR 4.99
pvalHR 2e-05
The results show the agreement obtained for said combinations of 24-gene signatures which is higher than the agreements obtained for the 4-gene and 97-gene signatures from the state of the art (see comparison with the state of the art below).
B. RESULTS OF THE COMBINATIONS OF GENES INCLUDING CX3CR1. ASPM. MCM10 + 0 TO 7 OTHER PARTICULAR GENES Hereafter are presented performance results, for some gene combinations, including 3 particular genes (CX3CR1, ASPM, MCM10), to which are added 0 to 7 genes selected from a group consisting of 7 genes (2 upGl and 5 upG3) : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK. As seen in the following tables, these gene combinations exhibit very good performances, in terms of agreement and prognostic value.
7.24 ASPM. CX3CR1. MCM10
Tab. 95 Performances of the 3-gene signature ASPM, CX3CR1, MCM10 agreement LR.5y LEUOy HR HR.inf RR.Sy RR.10y
HG 1 & 3 ER + 0.78 NA NA
HG 2 3.7e-05 .le-05 3.28 1.82 3.80 2.65
HG2 ER, + 0.00035 0.00038 2.86 1.53 3.61 2.25
HG 2 ER- LN - 0.0033 0.0025 2.78 1.39 3.37 2.20
7.25 ASPM. CX3CR1. MCM10 + 1 gene
Tab. 96 Summary of agreement performances for signatures of 4 genes (ASPM, CX3CR1, MCM10 +1 gene selected from a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) on the HG1/HG3 validation set
agreement
min 0.787
o 0.7894
1 A) 0.7965
mcdianc 0.806
mean 0.8044
75% 0.808
Figure imgf000086_0001
max 0.829
Tab. 97 Summary of prognostic performances for signatures of 4 genes (ASPM, CX3CR1, MCM 10 +1 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 cohort
min 5% 25% mcdianc mean < ·:>/<,· 95% max
LR.Sv 1.146c-06 L673C-06 2.94c-06 S.253c-06 6.603C-0; ) 0.0001194 0.0001998 0.0002081
LR.lOy 2.979e-07 3.85e-07 1.7740-06 5.3420-06 3.327O-0? 2.8970-05 0.0001289 0.0001658
HR 2.668 2.833 3.24 3.267 3.485 3.786 4.322 4.408
HR.inf 1.518 1.588 1.783 1.857 .1.911 2.061 2.278 2.311
RR.Sy 3.368 3.445 3.682 4.41.9 4.313 4.64 5.426 5.761
RR. lO 2.247 2.371 2.703 2.774 2.839 2.989 3.39 3.47
Tab. 98 Summary of prognostic performances for signatures of 4 genes (ASPM, CX3CR1, MCM 10 +1 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + cohort
rain 5% 25% mcdianc mean. 95% max
LR.5y 1.103C-05 1.145C-05 2.046o05 7.259c-05 0.0004672 0.0003913 0.001856 0.002364
LR.lOy 1.078C-05 1.713C-05 7.558c-05 0.0001361 0.0004892 0.0006319 0.001506 0.001862
HR 2.624 2.657 2.753 3.034 3.115 3.324 3.876 3.993
HR.inf 1.37 1.4 1.474 1.634 1.635 1.744 1.958 2.007
RR.Sy 3.271 3.332 3.846 4.55 4.386 4.704 5.5 5.782
RR.lOv 2.133 2.162 2.301 2.374 2.492 2.616 3.001 3.103
Tab. 99 Summary of prognostic performances for signatures of 4 genes (ASPM, CX3CR1, MCM 10 +1 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + LN - cohort miii 5% 25% medianc mean 75% 95% max
LR.5v 0.0002209 0.0002576 0.0004542 0.001745 0.002723 0.003187 0.00S172 0.009811
IJi lOy 0.0004986 0.0005203 0.000599 0.001645 0.002079 0.002612 0.005119 0.005988
HR 2.638 2.665 2.798 2.942 2.983 3.201 3.287 3.298
HR.inf 1.286 1.309 1.392 1.455 1.474 1.59 1.61 1.612
RR.5y 3.068 3.133 3.48 3.774 3.909 4.374 4.734 4.812
RR.lOy 2.163 2.189 2.262 2.327 2.384 2.511 2.616 2.654
7.26 ASPM. CX3CR1. MCM10 + 2 eenes
Tab. 100 Summary of agreement performances for signatures of 5 genes (ASPM, CX3CR1, MCM10 +2 genes selected from a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) on the HG1/HG3 validation set
agreement
mm 0.791
Figure imgf000087_0001
25% 0.802
medianc 0.S14
mean 0.8132
n U.Ootloe
95% 0,829
max 0.841
Tab. 101 Summary of prognostic performances for signatures of 5 genes (ASPM, CX3CR1, MCM10 +2 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 cohort
miii 5% 25% medianc mean <5% 95%) max
LR.Sv 1.307C-06 1.51C-06 2.2S3O-06 1.133o0; 5 7.4S6e-Q5 0.000111 0.0003204 0.0003778
LR.lOv 1.249c-C)7 2.2S9O-07 5.S48c-07 3.656e-0< j 1.956O-05 1.629C-05 5.386c-Q5 0.0002255
HR 2.614 3.04 3.297 3.557 3.681 4.092 4.461 4.537
HR.inf 1.488 1.687 1.814 1.968 2.002 2.239 2.337 2.413
RR.5V 3.313 3.332 3.683 4.486 4.393 4.881 5.574 5.667
RR. lOv 2.198 2.566 2.783 2.965 3 3.32 3.47 3.548
Tab. 102 Summary of prognostic performances for signatures of 5 genes (ASPM, CX3CR1, MCM 10 +2 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + cohort
min 5% 25%: medianc mean 75% 95% max
LR .oy 4.819e -06 7.166c-f 10 1. 16C-05 7.827e-()5 0.0003476 0.0001567 0.001961 0.002364
LR. lOv 7.921c -06 9.475e-( .10 2.384C-05 5.424c-05 0.i)002278 0.0001859 0.0008542 0.00147 nil 2.669 ·> fiS2 3.068 3.424 3.367 3.692 3.931 4.04.1
HR .inf 1.399 1.434 1.621 1.8 1.745 1.902 2.0.11 2,06
RR .5y 3.271 3.336 4.291 4.55 4.64 5,322 5.537 6.131
RR .l Oy 2.077 2.351 2.53 2.714 2.685 2.842 3.082 3.103
Tab. 103 Summary of prognostic performances for signatures of 5 genes (ASPM, CX3CR1, MCM10 +2 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + LN - cohort min 5% 25% mediane mean 75% 95% max
L .5y 1.415e-05 3.191e-05 0.0001456 0.001217 0.002208 0.002561 0.008126 0.01178
LRJOv 1.01e-05 9.361e-05 0.0004019 0.00096 0.001757 0.002124 0.006268 0.007574
HR 2.491 2.577 2.926 3.0S 3.166 3.32 3.8 4.413
HR.iiif 1.246 1.275 1.432 1.518 1.555 1.641 1.85 2.148
RR.5v 2.991 3.148 3.677 4.226 4,336 4.81 5.803 6.895
RR.lOy 2.1 2.122 2.315 2.483 2.515 2.582 2.87 3.392
7.27 ASPM. CX3CR1. MCM10 + 3 genes
Tab. 104 Summary of agreement performances for signatures of 6 genes (ASPM, CX3CR1, MCM10 +3 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) on the HG1/HG3 validation set
agreement
min 0.798
5% 0.806
25% 0.814
mediane 0.818
mean 0.8207
! ·) /e U.o<£b
95% 0.841
max 0.841
Tab. 105 Summary of prognostic performances for signatures of 6 genes (ASPM, CX3CR1, MCM 10 + 3 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 cohort
min 3% 25% mediane mean 75% 95% max
LR.5y 2.635. 07 6.679e-07 1.752C-06 5.560-06 5.568c-05 3.006C-05 0.00025S 0.0006623
LR.lOv 4.853< 08 1.143c-07 5.994o07 1.255C-06 5.398e-06 5.84 U>06 1.982e-05 4.721C-05
HR 2.985 3.172 3.533 3.777 3.817 4.071 4.63 4.668
HR.inf 1.G8 1.762 1.946 2.051 2.075 2.206 2,427 2.513
RR.5v 3.101 3.333 4.203 4.561 4.608 5.126 5.692 5.948
RR.lOy 2.484 2.617 2.899 3.065 3.077 3.208 3.619 3.711
Tab. 106 Summary of prognostic performances for signatures of 6 genes (ASPM, CX3CR1, MCM10 + 3 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + cohort
min 5% mediane mean 75% 9S% ma
LR.Sy 1.4690-06 2.994C-06 5.881 e-06 2.772e-05 0.0002735 0.0001994 0.001337 0.001621
LR. lOv 2.53e-06 3.824C-06 2.457C-05 3.519e-05 0.0001701 0.0001349 0.0008852 0.001144
HR 2.688 2.731 3.221 3.529 3,193 3.712 4..1G5 4.345
HR.inf 1 .402 1.444 1.686 1.816 1.806 1.912 2.144. 2.219
RR.Sv 3.401 3.448 4.334 i.74> 4.919 5.537 6.542 6.9
RR.lOv 2.246 2.332 2.573 2.759 2.747 2.929 3.126 3.333
Tab. 107 Summary of prognostic performances for signatures of 6 genes (ASPM, CX3CR1, MCM 10 + 3 genes selected in a group consisting of 7 genes : CCNB2, PTTG 1, FRY, CCNA2, CDCA3, CDC2 and M ELK) in HG2 ER + LN - cohort mirt 5% 25% mediane mean 75% 95% max
LR.oy 6.357O-06 7.331c-06 3.96c-05 0.0004991 0.001673 0.002892 0.005842 0.008126
LR. lOy l. lOSe-06 3.319c-05 0.000203 0.0004149 0.001794 0.002143 0.007875 0.01191
HR 2.4 2.496 2.839 3.272 3.503 4.154 5.198
HR.inf 1.187 1.241 1.412 1.628 1.606 1.731 2.006 2.482
RR.5y 3.148 3.289 3.582 4.336 4.766 5.647 7.335 7.475
RR.10.y- 2.03 2.085 2.301 2.564 2.554 2.711. 3.08 3.757
7.28 ASPM. CX3CR1. MCM10 + 4 genes
Tab. 108 Summary of agreement performances for signatures of 7 genes (ASPM, CX3CR1, MCM10 +4 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) on the HG1/HG3 validation set
agreement
rakt 0.81.4
5% 0.814
25% 0.818
mediane 0.826
mean 0.8281
75% 0.837
95% 0.845
max 0.849
Tab. 109 Summary of prognostic performances for signatures of 7 genes (ASPM, CX3CR1, MCM 10 + 4 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 cohort
min 5% 25% mediane mean Y5% 95% max
LR.Sy 2.8 e 0-07 3.936e-07 1.124c-06 :'..40 | .-.or, 4.723C-0? > 2.964e-05 0.0002815 0.0004434
LR.lOy 5.701C-08 1.1 3e-07 2.045C-07 9.564.C-07 ! . 11 ><;<·-α; ) 3.794e-Q6 2.17C-05 0.000272
HR 2.608 3.202 3.547 3.865 3.87 4.213 4.549 4.5S5
HR.inf 1.4.8 1.791 1.945 2.143 2.111 2.28 2.414 2.444
RR.Sv 3.153 3.389 4.169 4.8 4.696 5.219 5.788 5.849
RR. lOv 2.198 2.586 2.909 3.099 3.095 3.348 3.548 3.658
Tab. 110 Summary of prognostic performances for signatures of 7 genes (ASPM, CX3CR1, MCM 10 + 4 genes selected in a group consisting of 7 genes : CCN B2, PTTG 1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + cohort
min 5% 25% mediane mean 75% 95% max
LR.oy 1.165e-06 1.76.1 e-OC 3 4.85c-06 2.217e-05 0.0001516 0.0001284 0.0007041 0.001093
LR.lOv 2.52e-06 3.339e-(K 3 1.103C-05 4.347e-05 0.000.11.19 0.0001072 0.0002488 0.001579
II 2.528 3.049 3.21.2 3.539 3.552 3.917 4.272 4.476
HR.inf 1.35 1.586 1.68.1 1.816 1.841 2.026 2.165 2.24
RR.5.V 3.537 3.656 4.506 4.731 5.046 5.761 6.765 6.9
RR. lOy 2.077 2. 19 2.581 2.694 2.758 2.965 3.254 3.273
Tab. Ill Summary of prognostic performances for signatures of 7 genes (ASPM, CX3CR1, MCM10 + 4 genes selected in a group consisting of 7 genes : CCNB2, PTTG 1, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + LN - cohort mill 5% 25% mediane mean 75% 95% max
LR.ay 2.861C-06 3.182C-06 2.417C-05 0.0001753 0.0009044 0.002305 0.003602 0.003602
LR.lOv 3,389e-Q5 6.552c-05 9.943c-05 0.0004073 0.001592 0.002209 0.003367 0.02053
HR 2.199 2.737 2.854 3.31 3.276 3.703 3.907 4.188
HR.inf 1.11 1.359 1.411 1.639 1.616 1.83 1.901 2.001
RR.5y 3.49 3.49 3.677 4.678 5.066 5.965 7.745 7.895
RR.lOv 1.813 2.136 2.327 2.493 2.524 2.77 2.923 3.153
7.29 ASPM. CX3CR1. MCM10 + 5 genes
Tab. 112 Summary of agreement performances for signatures of 8 genes (ASPM, CX3CR1, MCM10 +5 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2,
CDCA3, CDC2 and MELK) on the HG1/HG3 validation set
agreement
min 0.818
5% 0.822
25% 0.826
mcxliano 0.833
mean 0.8336
75% 0.841
95% 0.849
max 0.853
Tab. 113 Summary of prognostic performances for signatures of 8 genes (ASPM, CX3CR1, MCM10 + 5 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2,
CDCA3, CDC2 and MELK) in HG2 cohort
min 5% 25%, mediane mean 75% 95% max
I.li . '.v 1.511e-07 34 C-07 1.012C-06 1.6340-06 3.071e-05 2.947C-06 0.0001031 0.0004407
LR.lOy 7.816C-08 S.5G6e-0i 5 1.726c-07 5.536O-07 3.104C-06 1.112C-06 1.301C-05 3.754C-05
HR 3.063 3.301 3.689 3.944 3.971 4.277 4.587 4.705
HR.inf 1.702 1.817 2.043 2 15* 2.169 2.351 2.404 2.504
RR.S 3.205 3.68 4.494 5.047 4.923 5.472 5.752 6.049
R R. lOv 2.596 2.713 2.909 3.099 3.134 3.385 3.548 3.658
Tab. 114 Summary of prognostic performances for signatures of 8 genes (ASPM, CX3CR1, MCM10 + 5 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + cohort
min 5% 25% mediane mean 75% 9 : max
LR.Sy 6.070-07 1.3090-06 5.443O-06 7.3S7c-06 8.655e-05 7.8270-05 0.0003819 0.0004725
LR . IOv 1.993C-06 3.163 -00 9.997e-06 2.95o-05 6.S8S0-OS 4. L73C-05 0.0001902 0.0006972
H R. 2.752 3.1 14 3.442 3.585 3.656 3.833 4.336 4.538
HR. inf 1.454 1.621 1.818 1.835 1.897 1.984 2.218 2.268
RR.5 3.824 3.824 4.64 5.537 5.335 5.761 6.765 7.18
RR. lOy 2.295 2.48 2.623 2.765 2.785 2.84 3.16 3.361
Tab. 115 Summary of prognostic performances for signatures of 8 genes (ASPM, CX3CR1, MCM10 + 5 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + LN - cohort tn in 5% 25% medtane mean 75% 95% max
LR.Sv 2.189C-06 3.319O-06 2.86e-05 4.075o05 0.0006033 0.001432 0.002305 0.002892
LR.lOv 1.2S3e-05 3.516e-05 0.0001364 0.000275 0.001041 0.0009287 0.002471 0.009398
HR 2.432 2.831 3.041 3.418 3.357 3.603 4.063 4,488
HR.iiif 1.217 1.4 1.522 1.69 1.659 1.782 1.979 2.144
RR.5y 3.582 3.677 3.875 5.647 5.491 5.803 7.681 7.895
RR.lO 2.05 2.248 2.327 2.582 2.539 2.674 2.944 3.264
7.30 ASPM. CX3CR1. MCM10 + 6 genes
Tab. 116 Summary of agreement performances for signatures of 9 genes (ASPM, CX3CR1, MCM10 +6 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) on the HG1/HG3 validation set
agreement
niin 0.822
5% 0.8253
25% 0.835
mediane 0.841
mean 0.8389
75% 0.845
95% 0.849
max 0.849
Tab. 117 Summary of prognostic performances for signatures of 9 genes (ASPM, CX3CR1, MCM10 + 6 genes selected in a group consisting of 7 genes : CCNB2, PTTG1, FRY, CCNA2,
CDCA3, CDC2 and MELK) in HG2 cohort
min ■■>A, 25%: meclianc mean 75% M% max
LR.Sv 4.023c. -07 6.435e-07 1.206C-06 1.206e-06 5.042e-06 7.04.C-06 1.56. '-( 15 1.719O-05
LR. lOv 1 .408e -07 1.468C-07 2.327c-07 3.951.C-07 7.SC-07 1.172c- 06 1.966e-06 2.115C-06
HR 3.573 3.633 3.789 3.973 3.935 4.12 4. 174 4.182
HR.inf 1 .983 2.007 2.08 2.182 2.163 2.259 2.295 2.3
RR.5y 4.203 4.224 4.577 5.047 4.893 5.047 5.541 5.752
RR.lOy 2.836 2.875 2.969 3.097 3.084 3.202 3.289 3.31
Tab. 118 Summary of prognostic performances for signatures of 9 genes (ASPM, CX3CR1, MCM10 + 6 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + cohort
min 5% 5%! mediane mean 75% 95% max
LR.5y 1.649< :-06 2.787e-06 5.443e-06 7.387e-06 4.305e-05 1.985c-05 0.0001789 0.000241.7
LR.lOv 6.643< v06 7.337e-06 9.55O-06 1.1870-05 3.008e-05 3.124c-0G 8.76 -05 0.0001105
MR 3.247 3.307 3.471 3.813 3.656 3.835 3.896 3.921
HR.inf 1.689 1.727 1.829 1.983 1.907 1.991 2.026 2.038
RR.Sv 3.976 4,18 5.096 5.537 5.395 5.648 6.43 6.765
RR.lOv 2.547 2.569 2.623 2.833 2.754 2.872 2.91 2.91
Tab. 119 Summary of prognostic performances for signatures of 9 genes (ASPM, CX3CR1, MCM10 + 6 genes selected in a group consisting of 7 genes : CCNB2, PTTGl, FRY, CCNA2, CDCA3, CDC2 and MELK) in HG2 ER + LN - cohort mil) 5% 25% medial i.c mean 75% 95% max
LR.av 3,319e-Q6 1.09&-05 2.86e-05 4.075c-05 0.0002555 0.0001277 0.001067 0.001432
LR.lOy 0.0001193 0.0001251 0.000176 0.0002167 0.000502 0.0006857 0.001243 0.001455
HR 2.974 3.011 3.121 3.48 3.345 3.541 3.628 3.64
HR.inf 1.471 1.494 1.562 1.721 1.66 1.752 1.794 1.8
RR.Sv 3.875 4,078 5.1 5.647 5.573 5.803 7.118 7.681
RR.lOv 2.327 2.336 2.356 2.582 2.507 2.628 2.674 2.674
7.31 ASPM. CX3CR1. MCM10 + 7 genes
Tab. 120 Performances of the 10-gene signature ASPM, CX3CR1, MCM10, CCNB2, PTTG1, FRY, CCNA2, CDCA3, CDC2 and MELK.
agreement LR.5y LR.lOy HR HR.inf RR.Sy RR.lOy
HG 1 k 3 E + 0.78 NA NA
HG 2 3.7o05 l.le-05 3.28 1.82 3.80 2.65
HG2 ER + 0.00035 0.00038 2.86 1.53 3.61 2.25
HG 2 ER+ & LN - 0.0033 0.0025 2.78 1.39 3.37 2.20
C. COMPARISON WITH STATE-OF-THE-ART
In order to assess the performance of the new GGI, we compared some criteria (agreement, hazard ratio and log-rank p-value) for the new GGI and for the two following state-of-the-art GGI : GGI97 and PCR-GGI of Toussaint (OLD 4-gene).
One of the main issues with the State-of-the-art is the cost and complexity of GGI 97 in a routine setting:
D. The number of gene expressions to be assessed makes it quite difficult and near to impossible to perform with PCR methods, and makes DNA array technology necessary
E. DNA array requires specific, costly equipments that are not usually present in most diagnostic laboratories
F. DNA array requires fresh or frozen samples that require specific logistic management and equipment.
Obtaining same or better performances than GGI 97 is then of importance, as the new GGI requires fewer genes and can be more easily performed at lower complexity, lower cost and with paraffin embedded samples through easier, cheaper and more common technologies (like PCR instead of micro-array).
Performances of the new GGI (24-gene, 6-gene and 3-gene) were at least comparable and often better than performances of GGI97 and OLD 4-gene, with fewer genes compared to the GGI-97. The new 3-gene GGI has also comparable to better performances than the OLD 4 gene-GGI, with fewer genes. These results are shown in the following tables and figures, evidencing different criteria such as LR5Ypvalue, LRlOYpvalue, Hazard Ratio (for different endpoints, such as 5 year MFS, 10 year MFS), and Overall agreement. As shown in table 121A, the agreement obtained using the 24-gene and 6-gene signatures (0.880 and 0.841, respectively) were superior to agreement obtained with OLD 4-gene, and even to the one obtained with GGI97 (0.826 and 0.829, respectively). In both the HG2 and the HG2/ER+ validation cohorts, LR5Ypvalue and LRlOYpvalue were inferior for the new GGI, compared with state-of-the-art GGI, which denotes a better prognostic value of the new GGI.
Table 121A : Comparison of the agreement, hazard ratio and log-rank p-value for the new GGI (24-gene, 6-gene and 3-gene, as examples) and for the two following state-of-the-art GGI : 97-genes GGI (GGI97) and 4-genes PCR-GGI of Toussaint (OLD 4-gene)
OLD 4-genc Oz -rie 21-jicrie 6 3-geiie
.Agreement 0.826 0.829 0.SS0 0.841 0.783 M.R.H.C2 2.74 [ 1.56 - 4.S2 3.44 Γ 1.96 - 6.06 4.91 [ 2.66 - 9.01 ] 3.9 [ 2.14 - 7.12 3.28 [ 1.82 - 5.9 ' LR5Ypvahie.itC2 .1.0369300-04 2.055304O-05 1.056262e-O6 1.067547e-06 3.667S50O-05 LRlOYpvaluc.HC;2 l.985243c-04 2.8556440-06 6.6624160-09 1.005934e-06 :i.097867e-05 HR.ERpos 2.9i. f 1.54 - 5.51 3.69 I 1.96 - 6.97 5.29 ( 2.R6 - 10.53 ] 3.63 [ J.8S - ' 2.86 [ 1.53 - 5.36 1 LR5Ypv¾]ue.ilG2ERp< 1.036930e-04 3.885512o-05 2.131726e-06 4.422S51C-06 3.465735O-04 r.Rl OYpvaiue.HG2E(ip(.s ■1.41.71 " c-04 1.0S 19130-05 S.605040^-08 3.51S548<--05 3.845914c-04
Figures 1 and 2 represent Kaplan-Meier estimates of 5-years MFS and 10-years MFS, according to genomic grade, as determined using the 24-genes, 6-genes and 3-genes signatures, in comparison with GGI97 (MQDX)
These figures 1 and 2 show that, whatever the endpoint considered (MFS at 5 years or at 10 years), the new GGI (here, 3 particular gene combinations : 24-gene, 6-gene and 3-gene signatures, as examples) performs at least as well as GGI97/MQDX, with fewer genes.
The figures 3 to 14 show that the 3-gene, 6-gene and 24-gene signatures from the new GGI have a better prognostic power than the OLD GGI-PCR. Indeed, the 2 groups (GGI and GG3), as determined by the new GGI, are better discriminated than the 2 groups determined by the OLD GGI-PCR. For instance, the high-risk group (GG3) predicted by our new GGI 3-gene signature has lower survival probabilities than the high-risk group defined by the OLD-GGI- PCR. Comparably, the low-risk group (GG I) predicted by our new GGI 3-gene signature has higher survival probabilities than the low-risk group defined by the OLD-GGI-PCR. EXAMPLE 8: COMBINATIONS EXCLUDED BECAUSE OF BAD PERFORMANCES
Bad gene combinations were defined by a log-rank p-value greater than 0.05 at 5 years and at 10 years and a hazard ratio not significantly greater than 1.
These gene signatures were therefore excluded, because they fail to divide HG2 tumors into two relevant subgroups with distinct clinical outcome (as assessed by the MFS). These signatures and their performances are summarized in Table 121B.
Table 121B : Gene combinations with low performances. LR.5(10) = log-rank p-value at 5(10) years. HR = hazard ratio
Genes combination Agreement LR.5 LR.10 HR HR.pval
BIRC5, PTTG1 0.81 0.07 0.18 1.63 0.18
BIRC5, TUBA1B 0.78 0.11 0.05 2.04 0.06
RACGAP1,FRY 0.86 0.07 0.06 1.90 0.07
RACGAP1,KIF11 0.73 0.24 0.11 1.79 0.11
PTTG1JUBA1B 0.81 0.06 0.20 1.60 0.20
RACGAP1,CCNA2,KIF11 0.78 0.07 0.08 1.87 0.09
RACGAP1,CCNA2,CDC2 0.76 0.06 0.06 1.98 0.06
BIRC5,RACGAP1,PTTG1,TUBA1B 0.83 0.05 0.06 1.98 0.07
PTTG1,ASPM,MCM10,CDC20 0.80 0.06 0.11 1.76 0.11
EXAMPLE 9: PERFORMANCES OF THE NEW GGI DO NOT DEPEND ON THE TYPE OF CLASSIFICATION ALGORITHMS
Various classifiers were tested for 3 particular gene signatures (24-gene, 6-gene and 3-gene signatures), as described in the Material and Methods. As shown in Table 122, the value of the agreement did not depend on the type of classifier chosen (for instance, for the 24-gene signature, agreement was comprised between 0.814 and 0.88). Tables 123-125 show that the prognostic value of the 24-gene, 6-gene and 3-gene signatures did not vary much with the type of classifier used (for instance, in the HG2 cohort, for the 6-gene signature, LR_5y was comprised between 1.068e-06 and 7.851e-05). Tab. 122 Agreement HG/GG for the 24-gene, 6-gene and 3-gene signatures, with 10 different classifiers, in the HG1/HG3 ER+ cohort. Signature 2-1 SVMr 21 SVMl 21 sum 24 rf 24 pr bil 2·! logit 21 ACP 24 rpart 24 L A 24 QDA
Agreement 0.88 0.876 0.833 0.845 0.845 0.80 0.845 0.814 0.833 0.841
Signature 6 SVMr C SVMl C sum 6 rf 6 probit 6 logit C ACP 6 rpart 6 LDA (i QDA
Agreement 0.841 0.833 0.837 0.826 O.Sl 0.81 ' 0.822 0.756 0.S22 0.705
Signature 3 SVMr 3 SVMl 3 sum 3 rf 3 probit 3 !ogit. 3 ACP 3 rpart 3 I DA 3 QDA
Agree merit 0.783 0.787 0.787 0.76 0.775 0.783 0.802 0.733 0.795 0.771»
Tab. 123 Prognostic performances of the 24-gene, 6-gene and 3-gene signatures, using 10 different classifiers, in the HG2 cohort
LR 5y MFS LR lOy MFS HR HR.inf HR.sup cox pval RR.oy RR . I Oy
24 SVMr 1.9560-06 6.662C-09 4.906 2.661 9.044 3.5C-07 4.728 3.711
24 SVMl 1.896C-06 6.725C-09 4.91 2.664 9.052 3.4C-07 4.648 3.711
24 sum 9.273o-05 9. 169e-07 3.869 2.123 7.054 le-05 3.6 3.097
24 rf 5.1240-06 2.135O-07 4.206 2.291 7.719 3.5O-06 4.721 3.321
24 probit 4.744C-06 2.173O-08 4.3 2.414 7.66 7.40-07 4.151 3.409
24 logit. 3.125e-05 2.385e-07 3.785 2.1.47 6.673 4.2O-06 3.611 3.07
24 ACP 0.0002574 1.999O-06 3.887 2.08 7.263 2.1C-05 3.444 3.204
24 rpart 1.374c-05 1.007O-06 3.895 2.125 7.138 .lc-05 4.345 3.04
24 LDA 3.2180-07 3,298c- 10 5.45 2.959 10.04 5.30-08 5.149 4.058
24 QDA 6.0620-06 1.709C-06 3.708 2.035 6.758 1.9c-05 4.491 3.03
6 SVMr I .O680-O6 1.006c-U6 3.903 2.14 7.1 9 90-06 5.382 3.097
6 SVMl 2.364O-05 7.093c-06 3.492 1.918 6.357 4.3e-05 4.274 2.963
6 mm 5.7850-06 2.331c-06 3.55 1.992 6.327 1.7e-05 4.419 2.912
6 rf 4.979e-06 7.956c-06 3.742 1.968 7.115 5.70-05 5.931 2.838
0 probit 4.041C-05 2.341O-06 3.72 2.04 6.782 l.Sc-05 3.867 3.097
6 logit 3.15O-05 1.5690-06 3.781 2.077 6.883 1.40-05 3.932 3.167
6 AGP 6.046o06 2.04Se-07 4.809 2.468 9.372 4O-06 5.304 3.868
6 rpart 8.334C-06 2.719O-05 3.422 1.806 6.486 0.00016 5.456 2.532
6 LDA 7.851e-05 1.115C-05 3.257 1.831 5.793 5.9<>05 3.6 2.848
G QDA 1.8370-06 9.777C-07 3.747 2.096 6.699 8.3c-06 4.881 2.965
3 SVMr 3.668O-05 1.098C-05 3.279 1.824 5.897 7.3. -05 3.804 2.654
3 SVMl 2.784C-06 5.5930-07 3.974 2.177 7.255 7c-06 4.643 3.097
3 sum 6.872c-07 7. 13o-07 3.672 2.075 6.497 7.9o-06 4.809 3.001
3 rf 1.4950-06 4.008c-06 3.638 1.973 6.71 3.6. -115 5.667 2.845
3 probit 3.681.C-06 3.809o-07 3.926 2.179 7.075 5.3O-06 4.346 3.099
3 logit 1.5410-06 2. 890-07 4.095 2.249 7.455 4O-06 4.8 3.167
3 ACP 5.053o-06 1.7490-06 3.903 2.106 7.234 1 5o-05 4.954 3.321
3 rpart 1.4C-05 1.964.C-06 4.165 2.149 8.071 2.40-05 4.722 3.023
3 LDA 2.69C-06 1.0600-06 3.669 2.054 6.553 1. le-05 4.419 2.977
3 QDA 9.59c-06 1.039C-05 3.094 1.767 5.416 7.70-05 4.049 2.591
Tab. 124 Prognostic performances of the 24-gene, 6-gene and 3-gene signatures, using 10 different classifiers, in the HG2/ER+ cohort
LR 5y MPS LR 10v MPS HR HR.inf HR.sup cox pval RJl.Sy RR.10?
24 SVMr 2.132e-06 8.605O-08 5.295 2.662 10.53 2e-06 5.996 3.74
24 SVMl 2.024.O-06 8.406e-08 5.3 2.664 10.54 2e-06 5.877 8.74
24 sum 8.995e-05 5.204O-0G 4.115 2.102 8.055 3.7&-05 4.387 3.16
24 rf 4.586e-06 1.813O-06 4.439 2. 13 8.787 1.90-05 6.377 3.361
24 probit 3.1510-06 1.093e-07 4.693 2.484 8.865 1.9&-06 5.092 3.439
24 logit 2.741e-05 1.413O-06 4.025 2.158 7.508 1.2e-05 4.248 3.028
24 ACP 0.0008925 3.593O-05 3.71 1.868 7.37 0.00018 3.606 3.021
24 rpart 6.362O-05 3.199e-05 3.597 1.856 6.972 0.00015 4.731 2.765
24 LDA 2.099e-07 2.3O-09 6.1.59 3.106 12.21 2e-07 6.771 4.165
24 QDA 1.854e-05 2.253O-05 3.659 1.904 7.032 9.9e-05 5.117 2.833
6 SVMr 4.423e-06 3.519e-05 3.627 1.88 6.999 0.00012 6.253 2.759
6 SVMl 8.072c-05 0.0001428 3.258 1.693 6.269 0.0004 4.731 2.686
6 sum 2.814C-05 S.467G-05 3.253 1.733 6.107 0.0002 4.748 2.573
6 rf 3.676e-05 0.0002184 3.308 1.668 6.561 0.00062 6.165 2.503
6 probit 0.00015 5.1790-05 3.519 1.826 6.782 0.00017 4.135 2.833
6 logit 0.0001:1.72 3.561e-05 3.591 1.868 6.904 0.0001.3 4.217 2.91
6 AGP 6.455e-0S 1.1040-05 4.305 2.11 8.781 6e-05 5.244 3.415
6 rpart 3.758o-05 0.0003824 3.129 1..587 6.171 0.00099 5.812 2.243
6 LDA 0.0003949 0.00033S2 2.928 1.563 5.485 0.0008 3.685 2.504
6 QDA 2.1 12e-06 1.39C-05 3.726 1.953 7.109 6.60- 05 6.503 2.833
8 SVMr 0.0003466 0.0003846 2.863 1.531 5.356 0.00099 3.613 2.25
3 SVMl 3.61*0-05 3.583C-05 3.491 1.836 6.639 0.00014 4.474 2.694
8 sum 1.3 5o-05 7.254e-05 8 1 7 1.712 5.822 0.00023 4.598 2.647
3 rf 5. 72c-06 5.814C-05 3.403 1,78 6.507 0.00021 6.012 2.547
3 probit: 5.148O-05 2.707C-05 3.462 1.847 6.49 0.00011 4.151 2.715
3 logit 2.048C-05 1.733C-05 3.623 1.913 6.862 7.SO-05 4.655 2.767
3 AGP 9. 101e-05 0.0001603 3.228 1.678 6.21 0.00045 4.64 2.686
8 rpart 7.015e-05 3.33e-05 3.883 1.921 7.847 0.00016 4.85 2.679
3 LDA 3.80-O5 7.086O-05 3.206 1.724 5.961 0.00023 4.235 2.604
3 QDA 7.6660-OG 7.237C-05 3.117 1.682 5.779 0.00031. 5.042 2.534
Tab. 125 Prognostic performances of the 24-gene, 6-gene and 3-gene signatures, using 10 different classifiers, in the HG2/ER+/UM- cohort
L 5y MFS LR lOy MFS HR HRinf HR.sup cox pval RR.Sy RR.10y
24 SVMr 1.41le-05 2.172O-06 4.99 2.385 10.44 2e-05 6.132 3.5
24 SVMI 1.3046-05 2.069e-06 5.002 2.391 10.47 1.9e-05 5.965 3.5
24 sum 0.0004991 6.255e-05 3.912 1.904 8.038 0.0002 4.311 3.049
24 rf 4.569O-05 8.425C-05 3.946 1.885 8.259 0.00027 6.369 2.943
24 probit 0.0001023 3.424C-06 4.575 2.262 9.255 2.3C-05 4.586 3.317
24 logit 0.0009517 S.042e-05 3.79 1.897 7.574 0.00016 3.637 2.818
24 ACP 0.001956 0.0001153 4.466 1.942 10.27 0.00043 4.443 3.484
24 rpart 0.0007254 0.001329 3.062 1.492 6.28-4 0.0023 4.336 2.315
24 I.DA 3.436O-06 2.649O-07 5.592 2.672 11.7 4.9O-06 6.672 3.757
24 QDA 7.337O-05 0.0003709 3.338 1.651 6.748 0.00079 5.351 2.582
6 S VMr 7.748e-06 0.0002717 3.421 1.692 6.917 0.00062 7.082 2.582
6 SVMI 0.0002517 0.001306 3.002 1.485 6.07 0.0022 4.941 2.408
6 sum 0.0001351 0.001532 2.876 1.452 5.697 0.0025 4.81 2.236
6 rf 0.0004927 0.005993 2.638 1.285 5.412 0.0082 5.047 2.021
6 probit 0.000509 0.0004447 3.289 1.627 6.65 0.00092 4.197 2.582
6 logit 0.000509 0.0004447 3.289 1.627 6.65 0.00092 4.197 2.582
6 AGP 0.002676 0.002896 3.014 1.405 6.462 0.0046 4.222 2.509
6 rpart 0.001165 0.01979 2.295 1.119 4.71 0.023 4.443 1.705
6 I.DA 0.002552 0.001685 2.886 1.444 5.767 0.0027 3.464 2.356
6 QDA 3.484c-05 4.827o-05 4.109 1.963 8.604 0.00018 6.369 3.046
3 SVMr 0.003317 0.002512 2.781 1.392 5.558 0.0038 3.372 2.197
3 SVMI 0.001098 0.000511 3.197 1.6 6.391 0.001 3.759 2.528
3 sum 0.0002089 0.0002055 3.39 1.71 6.723 0.00047 4.454 2.786
3 rf 0.0002696 0.001627 2.942 1.455 5.948 0.0027 4.812 2.248
3 probit 0.000622 0.0002123 3.426 1.713 6.849 0.0005 3.974 2.717
3 logit 0.001.098 0.00051 1 3.197 1.6 6.391 0.001 3.759 2.528
3 ACP 0.001661 0.002541 2.S84 1.404 5.922 0.0039 4.118 2.478
3 rpart 0.002238 0.002487 2.96 1.415 6.191 0.0039 3.717 2.095
3 LDA 0.0004477 0.0001291 3.554 1.778 7.107 0.00033 4.087 2.818
3 QDA 0.0001504 0.0002733 3.364 1.682 6.728 0.0006 4.81 2.717
The figures 15-20 show that, using the 24-gene, 6-gene and 3-gene signatures, as examples, and, most importantly, whatever the classifier/algorithm used (SVMr, SVMI or Sum, as examples), HG2 samples were well reclassified into high-risk (GG3) and low-risk (GGl) groups.
EXAMPLE 10: THE NEW GGl SHOWS GOOD PERFORMANCES WITH QRT-PCR
TECHNOLOGIES
For each combination of genes, a SVM model was learnt on the training set (IJB/Mercy dataset), and then applied to the validation set (for the 6-gene and 3-gene signatures), in order to assess performances of the combination of genes. Agreement of GG with HG was used as the main criterion.
10.1 Improved HG/GG agreement using the new GGI. compared with the 4-gene GGI-PCR of Toussaint
HG/GG agreement of several combinations was assessed using 10-fold cross-validation on the 91 samples from the IJB/Mercy dataset. As shown in table 126, global agreement was better with the new GGI (comprised between 0.82 and 0.88) than with the 4-gene GGI-PCR of Toussaint (0.72). In particular, the HG1/GG1 agreement was highly improved using the new GGI (agGl between 0.82 and 0.90) versus the 4-gene GGI-PCR of Toussaint (agGl of 0.54).
Tab. 126 Agreement performances of the new GGI (24-gene, random 15-gene, random 10- gene, 6-gene and 3-gene signatures), in comparison with the 4-gene GGI-PCR of Toussaint (CDC2, CDC20, KPNA2, MYBL2) in the IJB/Mercy dataset
agGl agG3 agreement
24-geue 0.85 0.79 0.82
15-gciic (random) 0.82 0.83 0.83
10-gcno {random) 0.85 0.84 0.S5
6-gcne 0.92 0.85 0.88
3-gcnc 0.90 0.86 0.88
4-gene Toussaint 0.54 0.87 0.72
10.2 Similarly good HG/GG agreement, were obtained, using the new GGI, in an independent validation set
Performance of the new GGI was then assessed in an independent dataset (IJB dataset, 86 HG1 + 60 HG3). Results are presented in table 127. The values of the agreement obtained were comparable to that obtained in the IJB/Mercy cohort.
Tab. 127 Performance of the 6-gene and 3-gene signatures on the IJB validation set
agGl ag;G3 agreement
6-gene 0.88 0.83 0.86
3-gcnc 0.91 0.68 0.81
EXAMPLE 11: PERFORMANCES OF THE NEW GGI DO NOT DEPEND ON THE SELECTION OF REFERENCE GENES CHOSEN AMONG GUS. RPLP0 AND TBP
11.1 Results
Results for 2 genes signatures (6-gene and 3-gene signatures), as an example, are presented in tables 128 and 129. Tab. 128 Comparison of performances (HG/GG agreement) of a 6-gene signature (PTTGl, CCNB2, ASPM, CX3CR1, MCM 10, FRY), with normalization by 1, 2 or 3 control genes, chosen among GUS, TBP and RPLPO. Two classifiers were tested (Sum and SVM) sum SVM
GUS+TBP- 4-RPLPO 0.917 f 0.791 - 0.973 ] 0.958 f 0.846 - 0.993 ]
GI JS-hTBP 0.936 [ 0.814 - 0.983 j 0.957 [ 0.843 - 0.993 j
GUS- 4-RPLPO 0.936 [ 0.814 - 0.983 j 0.957 f 0.843 - 0.993 j
ΤΒΡ· 4-RPLPO 0.936 ( 0.81.4. - 0.983 ] 0.915 [ 0.787 - 0.972 j
RPLPO 0.936 [ 0.8.14 - 0.983 j 0.872 I 0.736 - 0.947 ]
GUS 0.936 [ 0.81.4 - 0.983 j 0.957 j 0.843 - 0.993 j
TBP 0.936 f 0.814 - 0.983 ] 0.915 f 0.787 - 0.972 j
Tab. 129 Comparison of performances (HG/GG agreement) of a 3-gene signature (PTTGl, CCNB2, ASPM), with normalization by 1, 2 or 3 control genes, chosen among GUS, TBP and RPLPO. Two classifiers were tested (Sum and SVM)
sum SVM.
GUS+TBP+ PLP0 0.938 1 [ 0.818 - 0.9S4 ] 0.938 I 0.818 - 0.984 ]
GUS+TBP 0.958 [ 0.846 - 0.993 ] 0.958 ! f 0.846 - 0.993 j
GUS 4- PLPO 0.938 1 0.818 - 0.984 j 0.938 f 0.818 - 0.984 j
TBP + RPLPO 0.938 [ 0.818 - 0.9S4 j 0.896 [ 0.766 - 0.961 j
RPLPO 0.917 [ 0.791 - 0.973 j 0.854 [ 0.716 - 0.935 j
GUS 0.938 [ 0.81.8 - 0.984 j 0.896 [ 0.766 - 0.961. j
TBP 0.91.7 [ 0.791 - 0.973 ] 0.917 [ 0.791. - 0.973 j
These results show that using one, or two or three genes of references (among TBP, RPLPO and GUS) leads to equivalent performances. For instance, for the 6-gene signature (Table 126), using a "Sum" classifier, HG/GG agreement was equal to 0.936, whatever the combination of reference genes used, except when the 3 reference genes was used (HG/GG agreement of 0.917).
EXAMPLE 12: THE TFRC GENE IS CORRELATED WITH TUMOR GRADE. AND CANNOT BE USED AS A REFERENCE GENE
For real-time quantitative PCR, housekeeping genes are commonly used for normalization. The stability of reference gene expression, regardless of biological variations, is a prerequisite for accurate standardization of target gene expression data. The same four genes as Toussaint were candidates for the normalization of the GGI target genes : TBP, GUS, RPLPO, and TFRC.
A t-test has been performed on the training set (IJB cohort, 91 samples, 46 HG3, 45 HG1) to test the equivalence of the gene expression in each class (grade 1 or 3).
Table 130 : pvalue of t-test and standard deviation (σ) of each gene
GUS RPLPO TBP TFRC
t test 0.85 0.56 0.52 9E-4
a 0.57 0.71 0.48 0.80
As shown in table 130, TFRC was significantly associated with the grade (pvalue = 9E-4) and the most variable. Therefore, it is not suitable to use this gene as a reference gene, and we excluded it of our normalization.
EXAMPLE 13: PERFORMANCE OF THE NEW GGI WAS FURTHER VALIDATED ON A NEW
PATIENTS SET
The new GGI, as determined using the 6-gene signature and a PCA algorithm, was further tested in a 883 samples, from breast cancer patients. Said patients were treated with either tamoxifen or letrozole monotherapy in a Breast International Group (BIG) study with 8.1 year median follow-up.
The expression levels of the genes of the 6-gene signature according to the invention were determined by qRT-PCT, as disclosed previously.
The association between GG or continuous GGI and Distant Recurrence-Free Interval (DRFI) was evaluated in Cox regression models stratified for the 2 vs 4 arm randomization option, chemotherapy use and hormonotherapy, with and without adjustment for clinicopathological characteristics.
In all patients, either GG or GGI as continuous variable significantly improved the prognostic performance of the clinicopathological model to predict distant recurrence (Table 131). Table 131 : Multivariate analysis for DRFI including either GG or GGl with the clinicopathological model
Model Concordance-index comparison Number of patients Chi-square p (95% CI)
Clinicopathological
model + GG
vs 859 7.0 0.03 0.74 (0.69-0.79)
Clinicopathological
model
Clinicopathological
model + GGl
vs 859 6.5 0.01 0.75 (0.70-0.79)
Clinicopathological
model
Kaplan Meier analyses were also carried out in order to determine and compare the Distant Recurrence Free Interval (DRFI) of patients classified as HG1, HG3, GGl, GG3 or HG2.
On the following figures :
HG1 corresponds to patients classified as Histological Grade 1
HG3 corresponds to patients classified as Histological Grade 3
- GGl corresponds to patients classified as Genomic Grade 1 according to the invention
- GG3 corresponds to patients classified as Genomic Grade 3 according to the invention
- HG2/GG1 corresponds to patients classified as Histological Grade 2, who were then classified as GGl according to the method of the present invention
HG2/GG3 corresponds to patients classified as Histological Grade 2, who were then classified as GG3 according to the method of the present invention
HG2/Eq corresponds to patients classified as Histological Grade 2, who were then classified as Equivocal. As used herein, the term « Equivocal » is used to designate the intermediate Genomic Grade - i.e. when the obtained score falls into the confidence interval around the cut-off, it is not possible to attribute a Grade 1 or a Grade 3 with certainty, therefore it has been chosen to be called Equivocal.
The results of the present example can be seen in figures 21 to 24.
Figure 21 represents the effect of the classification of HG2 patients in GGl, GG3 or Equivocal class. The GG classified 502 patients with HG2 tumors as either GGl (N=202, 40%), equivocal (N=220, 44%) or GG3 (N=80, 16%).
Figure 21 clearly shows that HG2 patients that were classified as GGl (HG2/GG1 patients) had a better prognosis than HG2 patients that were classified as GG3 (HG2/GG3 patients) (10-year DRFI of 94% versus 83%, respectively).
Figure 22 shows the classification of HG2 patients as GGl, GG3 or Equivocal, and the impact it has on the DRFI compared to the DRFI observed in HGl and HG3 patients by Kaplan Meier. HG2/Eq patients, corresponding to HG2 patients who have not been classified either as GGl or as GG3, had an intermediate 10-year DRFI (92%).
HG2 patients who were classified as GGl (HG2/GG1 patients) showed a 10-year DRFI (94%) which is close to the 10-year DRFI of HGl patients (96%).
Also, HG2 patients who were classified as GG3 (HG2/GG3 patients) showed a 10-year DRFI which is similar to the 10-year DRFI of HG3 patients (83%).
Figure 23 shows the classification, among NO patients, of HG2 patients in GGl, GG3 or Equivocal and the impact it has on the DRFI compared to the DRFI observed in HGl and HG3 patients by Kaplan Meier.
HG2/Eq patients correspond to HG2 patients who have not been classified either in GGl or GG3 and have an intermediate 10-year DRFI (93%).
HG2 patients who were classified as GGl (HG2/GG1 patients) show a 10-year DRFI (98%) which is close to the 10-year DRFI of HGl patients (100%). Also, HG2 patients who were classified as GG3 (HG2/GG3 patients) show a 10-year DRFI (85%) which is close to the 10-year DRFI of HG3 patients (88%).
Figure 24 shows the classification, among the 773 NO/Nl-3 patients, of HG2 patients in GGl and GG3 and the impact it has on the DRFI compared to the DRFI observed in HGl and HG3 patients by Kaplan Meier.
When these NO/1-3 patients were analyzed based on HG, the Kaplan-Meier estimate for 10- year DRFI was 98% (95% confidence intervals 96-100%) for the 123 HGl, 92% (88-95%) for the 456 HG2 and 84% (78-90%) for the 194 HG3 patients. Interestingly, when the 456 HG2 and NO/1-3 patients were analyzed based on GG, the 10 year DRFI was 95% (92-100%) for the 185 HG2/GG1 patients, 92% (88-96%) for the 202 HG2/GG equivocal patients, 86% (76-96%) for the 69 HG2/GG3 patients. More specifically, HG2 patients who were classified as GG1 (HG2/GG1 patients) show a 10- year DRFI (96%) which is close to the 10-year DRFI of HG1 patients (98%). Also, HG2 patients who were classified as GG3 (HG2/GG3 patients) show a 10-year DRFI (85%) which is close to the 10-year DRFI of HG3 patients (84%).
HG2/Eq patients correspond to HG2 patients who have not been classified either in GG1 or GG3 and have an intermediate DRFI (92%).
Therefore, the present results clearly demonstrate and confirm that the method according to the invention enable to classify most of the Histological Grade 2 subjects into Genomic Grade 1 set or Genomic Grade 3 set. Thus, the Genomic Grade of subjects previously identified as HG2 enables to improve the determination of their prognosis and may help with treatment decision. Moreover, Genomic Grade provides independent prognostic information for risk of distant recurrence beyond clinicopathological characteristics in patients treated with endocrine therapy.

Claims

A method for determining the genomic grade of a solid tumor in a subject suffering from cancer, said method comprising the following steps :
a) analyzing a biological sample from said solid tumor from said subject by determining the expression level of a combination of at least 2 genes and at most 24 genes , said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTG1, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM10, KPNA2, CDC2 and CDC20, and b) determining the genomic grade of said tumor in said subject on the basis of the expression level of said genes as determined in step a), wherein i) an overexpression of a gene selected in the group consisting of CX3CR1, FU21062, FRY and TPT1 is associated with a Genomic Grade 1
ii) An underexpression of a gene selected in the group consisting of CX3CR1, FU21062, FRY and TPT1 is associated with a Genomic Grade 3
iii) an overexpression of a gene selected in the group consisting of ASPM, AURKA, BIRC5, CCNA2, CCNB2, CDC2, CDC20, CDCA3, CENPA, CEP55, KIF11, KPNA2, MCM10, MELK, PTTGl, RACGAP1, TPX2, TROAP, TUBA1B, UBE2C is associated with a Genomic Grade 3.
iv) An underexpression of a gene selected in the group consisting of ASPM, AURKA, BIRC5, CCNA2, CCNB2, CDC2, CDC20, CDCA3, CENPA, CEP55, KIF11, KPNA2, MCM10, MELK, PTTGl, RACGAP1, TPX2, TROAP, TUBA1B, UBE2C is associated with a Genomic Grade 1.
The method for determining the genomic grade of a solid tumor in a subject suffering from cancer according to claim 1, wherein the genomic grade is determined on the basis of an output from an algorithm, said algorithm being executed on the basis of inputs comprising the expression level of said genes from said subject as determined in step a).
The method for determining the genomic grade of a solid tumor in a subject suffering from cancer according to claim 2, wherein said output is a Genomic Grade Index (GGI) score indicating the genomic grade from said tumor, and wherein said GGI score is calculated from the following GGI formula : GGI = a x (∑=1 c i xi— b),
Where :
a, b e R (set of real numbers)
X, is the expression level of the ith gene
a, is the coefficient affected to the ith gene
The method for determining the genomic grade of a solid tumor in a subject suffering from cancer according to claim 2, wherein said algorithm is selected in the group comprising but not limited to Decision learning trees (CART, Recursive partitional tree/RPART), hierarchical clustering, Random Forest (RF).
The method for determining the genomic grade of a solid tumor in a subject suffering from cancer according to any one of claims 2 to 3, wherein said algorithm is selected in the group comprising but not limited to Support Vector Machine (SVM), radial or linear kernel (SVMr or SVMI), Sum of gene expressions, Probit model, Logistic model, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Principal component analysis (PCA), preferably Principal component analysis (PCA).
The method for determining the genomic grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 5, wherein said method is a method for determining the prognosis of said tumor, and wherein a tumor identified as having a Genomic Grade 1 is indicative of a "good-prognosis", whereas a tumor identified as having a Genomic Grade 3 is indicative of a "poor- prognosis".
The method according to claim 6, wherein a "good-prognosis" is a Metastasis- Free survival (MFS) superior to 5 years, preferably 10 years, or a long-term survival and a "poor-prognosis" is an MFS inferior to 10 years, preferably 5 years or a long-term survival or not a long-term survival.
8. The method according to any one of claims 1 to 7, wherein said tumor has been previously identified as a Histological Grade 2 tumor. 9. The method for determining the genomic grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 8, wherein said method further comprises a step a') of normalizing the expression levels of said genes as determined in step a) with at least one, preferably two or three references genes selected in the group comprising the genes GUS, TBP and RPLP0.
10. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 9, wherein said combination of genes comprises at least the 3 genes ASPM, CX3CR1 and MCM10, to which are added from 0 to 7 genes selected in a group consisting of PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCM10, FRY, CCNA2, CDC2 and CDCA3.
11. The method for determining the grade of a solid tumor in a subject suffering from cancer according to claim 10, wherein said combination of genes comprises at least the 3 genes ASPM, CX3CR1 and MCM10, to which are added from 0 to 4 genes selected in a group consisting PTTGl, CCNB2, ASPM, TPT1, CX3CR1, MCM lO and FRY.
12. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 11, wherein said combination of genes consists of the 6 genes PTTGl, CCNB2, ASPM, CX3CR1, MCM10 and FRY.
13. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 12, wherein the following combinations of genes are excluded :
- CCNA2, CDC2, KPNA2, CDC20
- CCNA2, CDC2, KPNA2, AURKA
- CCNA2, CDC2, CDC20, AURKA
- CCNA2, KPNA2, CDC20, AURKA
- CDC2, KPNA2, CDC20, AURKA
- CCNA2, CDC2, KPNA2, CDC20, AURKA
- BIRC5,PTTG1
- BIRC5JUBA1B
- RACGAP1,FRY
- RACGAP1,KIF11
- PTTG1JUBA1B
- RACGAP1,CCNA2,KIF11
- RACGAP1,CCNA2,CDC2
- BIRC5,RACGAP1,PTTG1,TUBA1B
- PTTG 1,ASPM,MCM 10,CDC20
14. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 13, wherein said subject is a mammal, preferably a human.
15. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 14, wherein said cancer is a breast cancer.
16. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 15, wherein a biological sample is a tissue sample, a fluid sample, a cell sample or a blood sample of said subject, preferably from the breast of said subject.
17. The method for determining the grade of a solid tumor in a subject suffering from cancer according to claim 16, wherein said biological sample is a fresh/frozen or a paraffin-embedded biological sample, preferably a paraffin-embedded biological sample.
18. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 17, wherein determining the expression level of genes is performed on nucleic acids from a biological sample according to claim 16 and 17.
19. The method for determining the grade of a solid tumor in a subject suffering from cancer according to claim 18, wherein determining the expression level of genes is performed by Reverse-Transcription Polymerase Chain Reaction (RT-PCR), preferably by real-time quantitative Reverse-Transcription Polymerase Chain Reaction (qRT-PCR).
20. The method for determining the grade of a solid tumor in a subject suffering from cancer according to claim 18, wherein determining the expression level of genes is performed on DNA microarrays.
21. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 17, wherein determining the expression level of genes is performed by determining the amount of proteins in a biological sample.
22. The method for determining the grade of a solid tumor in a subject suffering from cancer according to any one of claims 1 to 21, wherein said method further comprises generating a printed report of some or all the conclusions drawn from the data, or of a score or comparison between the results obtained for said subject.
23. A recording computer program comprising instructions for performing any method according to any one of claims 1 to 22.
24. A polynucleotide library comprising or corresponding to polynucleotide sequences allowing the detection of at least 2 genes and at most 24 genes, said genes being selected in a group consisting of BIRC5, CEP55, AURKA, RACGAP1, MELK, CX3CR1, PTTG1, CCNA2, CCNB2, ASPM, FRY, CENPA, FU21062, TPT1, KIF11, TROAP, TUBA1B, CDCA3, UBE2C, TPX2, MCM 10, KPNA2, CDC2 and CDC20 listed in Table A.
25. The polynucleotide library according to claim 24, wherein said polynucleotide sequences can be any sequence between 3' and 5' end of the polynucleotide sequences of the corresponding genes as defined in Table A. 26. The polynucleotide library according to any one of claims 24 or 25, wherein said the polynucleotide library comprises or corresponds to polynucleotide sequences allowing the detection of a combination of at least the 3 genes ASPM, CX3CR1 and MCM10 to which are added from 0 to 7 genes selected in a group consisting of PTTG1, CCNB2, ASPM, TPT1, CX3CR1, MCM 10, FRY, CCNA2, CDC2 and CDCA3 listed in Table A.
27. The polynucleotide library according to any one of claims 24 to 26, wherein said polynucleotide library comprises or corresponds to polynucleotide sequences allowing the detection of the combination of the 6 genes PTTG1, CCNB2, ASPM, CX3CR1, MCM 10 and FRY.
28. A kit comprising a polynucleotide library according to any one of claims 24 to 27.
29. A kit according to claim 29, wherein said kit further comprises any means for performing a Reverse-Transcription Polymerase Chain Reaction (RT-PCR), preferably a real-time quantitative Reverse-Transcription Polymerase Chain Reaction (qRT-PCR).
30. A kit according to claim 29, wherein said kit further comprises any means performing a DNA microarray.
PCT/EP2012/004895 2011-11-28 2012-11-28 Methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer WO2013079188A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161563931P 2011-11-28 2011-11-28
US61/563,931 2011-11-28

Publications (1)

Publication Number Publication Date
WO2013079188A1 true WO2013079188A1 (en) 2013-06-06

Family

ID=47278239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/004895 WO2013079188A1 (en) 2011-11-28 2012-11-28 Methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer

Country Status (1)

Country Link
WO (1) WO2013079188A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761372A (en) * 2014-01-06 2014-04-30 上海海事大学 Multilevel inverter fault diagnosis strategy based on principal component analysis and multi-classification related vector machine(PCA-mRVM)
CN110887798A (en) * 2019-11-27 2020-03-17 中国科学院西安光学精密机械研究所 Nonlinear full-spectrum water turbidity quantitative analysis method based on extreme random tree
CN111242206A (en) * 2020-01-08 2020-06-05 吉林大学 High-resolution ocean water temperature calculation method based on hierarchical clustering and random forests
CN114355850A (en) * 2021-12-28 2022-04-15 汉谷云智(武汉)科技有限公司 Atmospheric and vacuum pressure device fault diagnosis method based on queue competition algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008155661A2 (en) * 2007-04-16 2008-12-24 Ipsogen Methods of assessing a propensity of clinical outcome for a female mammal suffering from breast cancer
WO2009083780A1 (en) * 2007-12-28 2009-07-09 Ipsogen Breast cancer expresion profiling
EP2241634A1 (en) * 2009-04-16 2010-10-20 Université Libre de Bruxelles Diagnostic method and tools to predict the effiacy of targeted agents against IGF-1 pathway activation in cancer
US20110045480A1 (en) * 2009-08-19 2011-02-24 Fournier Marcia V Methods for predicting the efficacy of treatment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008155661A2 (en) * 2007-04-16 2008-12-24 Ipsogen Methods of assessing a propensity of clinical outcome for a female mammal suffering from breast cancer
WO2009083780A1 (en) * 2007-12-28 2009-07-09 Ipsogen Breast cancer expresion profiling
EP2241634A1 (en) * 2009-04-16 2010-10-20 Université Libre de Bruxelles Diagnostic method and tools to predict the effiacy of targeted agents against IGF-1 pathway activation in cancer
US20110045480A1 (en) * 2009-08-19 2011-02-24 Fournier Marcia V Methods for predicting the efficacy of treatment

Non-Patent Citations (32)

* Cited by examiner, † Cited by third party
Title
AFFARA, BRIEF FUNET GENOMIC PROTEOMIC, vol. 2, 2003, pages 7 - 20
AUSUBEL ET AL.: "Current Protocols of Molecular Biology", 1997, JOHN WILEY AND SONS
BAE INSOO ET AL: "BRCA1 regulates gene expression for orderly mitotic progression", CELL CYCLE, vol. 4, no. 11, November 2005 (2005-11-01), pages 1641 - 1666, XP055057491, ISSN: 1538-4101 *
BERTUCCI ET AL., HUM. MOL. GENET., vol. 8, 1999, pages 1715 - 22
BREIMAN ET AL.: "Classification and Regression Trees", 1984, CHAPMAN AND HALL
BREIMAN, MACHINE LEARNING, vol. 45, 2001, pages 5 - 32
BUSTIN; MUELLER, CLIN SCI (LOUD, vol. 109, 2005, pages 365 - 79
CATHY B. MOELANS ET AL: "Molecular differences between ductal carcinoma in situ and adjacent invasive breast carcinoma: a multiplex ligation-dependent probe amplification study", CELLULAR ONCOLOGY, vol. 34, no. 5, 6 May 2011 (2011-05-06), pages 475 - 482, XP055051189, ISSN: 2211-3428, DOI: 10.1007/s13402-011-0043-7 *
CHURCHILL, NAT GENET, vol. 32, 2002, pages 490 - 5
COPLAND ET AL., RECENT PROG HORM RES, vol. 58, 2003, pages 25 - 53
CRISTIANINI ET AL.: "An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods", 2000, CAMBRIDGE UNIVERSITY PRESS
D'AGOSTINO ET AL., JAMA, vol. 286, 2001, pages 180 - 187
DE ANDRES ET AL., BIOTECHNIQUES, vol. 18, 1995, pages 42044
FILHO O M ET AL: "Genomic Grade Index: An important tool for assessing breast cancer tumor grade and prognosis", CRITICAL REVIEWS IN ONCOLOGY / HEMATOLOGY, ELSEVIER SCIENCE IRELAND LTD., LIMERICK, IE, vol. 77, no. 1, 1 January 2011 (2011-01-01), pages 20 - 29, XP027597725, ISSN: 1040-8428, [retrieved on 20110101], DOI: 10.1016/J.CRITREVONC.2010.01.011 *
FREEMAN ET AL., BIOTECHNIQUES, vol. 26, no. 112-22, 1999, pages 24 - 5
HELD ET AL., GENOME RESEARCH, vol. 6, 1996, pages 986 - 994
HELLER, ANNU REV BIOMED ENG, vol. 4, 2002, pages 129 - 53
HUDIS CA ET AL., J CLIN ONCOL., vol. 25, no. 15, 20 May 2007 (2007-05-20), pages 2127 - 32
HUDIS, JOURNAL OF CLINICAL ONCOLOGY, vol. 25, no. 15, 2007
IGNATIADIS M ET AL., PATHOBIOLOGY, vol. 75, no. 2, 10 June 2008 (2008-06-10), pages 104 - 11
IRIZARRY ET AL., BIOSTATISTICS, 2003
KRISTIAN WENNMALM ET AL: "A simple method for assigning genomic grade to individual breast tumours", BMC CANCER, vol. 11, no. 1, 1 January 2011 (2011-01-01), pages 306, XP055051185, ISSN: 1471-2407, DOI: 10.1056/NEJMoa021967 *
RAKHA ET AL., BREAST CANCER RESEARCH, vol. 12, 2010, pages 207
RAMASWAMY; GOLUB, J CLIN ONCOL, vol. 20, 2002, pages 1932 - 41
RUPP; LOCKER, LAB INVEST, vol. 56, 1987, pages A67
S. E GHAYAD ET AL: "Identification of TACC1, NOV, and PTTG1 as new candidate genes associated with endocrine therapy resistance in breast cancer", JOURNAL OF MOLECULAR ENDOCRINOLOGY, vol. 42, no. 2, 13 November 2008 (2008-11-13), pages 87 - 103, XP055010988, ISSN: 0952-5041, DOI: 10.1677/JME-08-0076 *
SCHENA ET AL., PROC. NATL. ACAD. SCI. USA, vol. 93, no. 2, 1996, pages 106 - 149
SOTIRIOU, JOURNAL OF THE NATIONAL CANCER INSTITUTE, vol. 98, no. 4, 15 February 2006 (2006-02-15), pages 262 - 272
STEINBERG ET AL.: "CART: Tree-Structured Non-Parametric Data Analysis", 1995, SALFORD SYSTEMS
TOUSSAINT ET AL., BMC GENOMICS, vol. 10, 2009, pages 424
VIJVER VAN DE M J ET AL: "A GENE-EXPRESSION SIGNATURE AS A PREDICTOR OF SURVIVAL IN BREAST CANCER", NEW ENGLAND JOURNAL OF MEDICINE, MASSACHUSETTS MEDICAL SOCIETY, BOSTON, MA, US, vol. 347, no. 25, 19 December 2002 (2002-12-19), pages 1999 - 2009, XP008032093, ISSN: 1533-4406, DOI: 10.1056/NEJMOA021967 *
YASUTO NAOI ET AL: "Development of 95-gene classifier as a powerful predictor of recurrences in node-negative and ER-positive breast cancer patients", BREAST CANCER RESEARCH AND TREATMENT, KLUWER ACADEMIC PUBLISHERS, BO, vol. 128, no. 3, 29 August 2010 (2010-08-29), pages 633 - 641, XP019923604, ISSN: 1573-7217, DOI: 10.1007/S10549-010-1145-Z *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761372A (en) * 2014-01-06 2014-04-30 上海海事大学 Multilevel inverter fault diagnosis strategy based on principal component analysis and multi-classification related vector machine(PCA-mRVM)
CN110887798A (en) * 2019-11-27 2020-03-17 中国科学院西安光学精密机械研究所 Nonlinear full-spectrum water turbidity quantitative analysis method based on extreme random tree
CN111242206A (en) * 2020-01-08 2020-06-05 吉林大学 High-resolution ocean water temperature calculation method based on hierarchical clustering and random forests
CN111242206B (en) * 2020-01-08 2022-06-17 吉林大学 High-resolution ocean water temperature calculation method based on hierarchical clustering and random forests
CN114355850A (en) * 2021-12-28 2022-04-15 汉谷云智(武汉)科技有限公司 Atmospheric and vacuum pressure device fault diagnosis method based on queue competition algorithm
CN114355850B (en) * 2021-12-28 2023-06-20 汉谷云智(武汉)科技有限公司 Atmospheric and vacuum device fault diagnosis method based on queuing competition algorithm

Similar Documents

Publication Publication Date Title
JP7042784B2 (en) How to Quantify Prostate Cancer Prognosis Using Gene Expression
JP7042717B2 (en) How to Predict the Clinical Outcomes of Cancer
US11098372B2 (en) Gene expression panel for prognosis of prostate cancer recurrence
JP6351112B2 (en) Gene expression profile algorithms and tests to quantify the prognosis of prostate cancer
JP2020500515A (en) How to predict the prognosis of breast cancer patients
KR20140105836A (en) Identification of multigene biomarkers
KR101672531B1 (en) Genetic markers for prognosing or predicting early stage breast cancer and uses thereof
US20170211155A1 (en) Method for predicting risk of metastasis
JP2017532959A (en) Algorithm for predictors based on gene signature of susceptibility to MDM2 inhibitors
WO2006052731A2 (en) Molecular indicators of breast cancer prognosis and prediction of treatment response
US20140154681A1 (en) Methods to Predict Breast Cancer Outcome
US9890430B2 (en) Copy number aberration driven endocrine response gene signature
Stec et al. Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and Affymetrix GeneChips
WO2016118670A1 (en) Multigene expression assay for patient stratification in resected colorectal liver metastases
WO2013079188A1 (en) Methods for the diagnosis, the determination of the grade of a solid tumor and the prognosis of a subject suffering from cancer
JP7239477B2 (en) Algorithms and methods for evaluating late-stage clinical endpoints in prostate cancer
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
WO2014130444A1 (en) Method of predicting breast cancer prognosis
JP5963748B2 (en) Prognosis prediction method, kit and use of patients with primary malignant lymphoma of central nervous system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12794644

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12794644

Country of ref document: EP

Kind code of ref document: A1