WO2017061953A1 - Invasive ductal carcinoma aggressiveness classification - Google Patents

Invasive ductal carcinoma aggressiveness classification Download PDF

Info

Publication number
WO2017061953A1
WO2017061953A1 PCT/SG2016/050490 SG2016050490W WO2017061953A1 WO 2017061953 A1 WO2017061953 A1 WO 2017061953A1 SG 2016050490 W SG2016050490 W SG 2016050490W WO 2017061953 A1 WO2017061953 A1 WO 2017061953A1
Authority
WO
WIPO (PCT)
Prior art keywords
genes
hgg
lgg
tumour
tumors
Prior art date
Application number
PCT/SG2016/050490
Other languages
French (fr)
Inventor
Vladimir Kuznetsov
Luay Aswad
Surya Pavan YENAMANDRA
Ghim Siong OW
Anna Ivshina
Original Assignee
Agency For Science, Technology And Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science, Technology And Research filed Critical Agency For Science, Technology And Research
Publication of WO2017061953A1 publication Critical patent/WO2017061953A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to methods of classifying subjects with Invasive ductal carcinoma (IDC), a type of breast cancer, into low- and high- aggressive tumours, categorized by an optimized composition of genetic features.
  • IDC Invasive ductal carcinoma
  • the present invention also provides a prognostic evaluation based on the selected features providing a quantitative personalized predictor of the disease outcome, facilitating therapeutic regimen based on the diagnostic results.
  • BC Breast cancer
  • IDC Invasive ductal carcinoma
  • HG Histological grading
  • oncologists as a prognostic factor.
  • HG evaluation is highly subjective with only 50%-85% inter-observer agreements.
  • the subjectivity in the assignment of the intermediate grade (histologic grade 2, HG2) breast cancers results in uncertain disease outcome prediction and sub-optima systemic therapy.
  • Grade 1 invasive ductal carcinoma cells which are sometimes called “well differentiated,” histologically look and act somewhat like healthy breast cells.
  • Grade 3 cells also called “poorly differentiated,” are more abnormal in their behavior and appearance.
  • differentiation is subjective leading to poor diagnosis and if a subject is identified as falling into the grade 2 category, it is not clear whether or not they are likely to develop into grade 3 in due course.
  • a surgeon may remove more tissue than may be necessary from subjects identified as having grade 2 cancer. Thus, having 3 grades of cancer may be seen as undesirable.
  • HG1-like histological grade 1 -like
  • HG3-like histological grade 3-like
  • HG2 patients can be dichotomized based on gene expression profiles, with high accuracy (95 %) into two genetically, and clinically distinct subclasses; histological grade 1-like (HG1-like) and histological grade 3-like (HG3-like) [Ivshina, A.V., et al., Cancer research, 2006. 66(21): p. 10292-10301 ; Ivshina AV, et al., in Keystone Symposia: Stem Cells, Senescence and Cancer. 2005. p. P. 76; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83].
  • HG1-like and HG3-like have similar gene expression profiles and clinical outcomes to HG1 and HG3 tumours, respectively.
  • the 232 genes grading classifier were involved mostly in cell cycle, p53 pathway, inhibition of apoptosis, cell adhesion, cell motility, stress, hormone response and angiogenesis [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301].
  • this genetic tumour aggressiveness grading classifier and its multiple representative 5-7 gene classification subsets can improve prognosis and therapeutic planning for BC patients diagnosed with tumour histologic type (HG2).
  • the patients have not been pre-selected based on any clinical characteristics (e.g., tumour stages, tumour size, ER and LN status). These re-classification results have been reproduced across different cohorts and treatment groups and strongly correlated with survival pattern of the re-classified tumour subgroups. Similar results were observed for the specific subpopulation of the BC selected by ER+ status [Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72; Loi, S., et al., J Clin Oncol, 2007. 25(10): p. 1239-46].
  • genetic grade signatures can improve prognosis of BC patients, especially IDC patients with HG2 tumours, which are relatively poorly defined by different grading systems and currently used molecular prognostic and predictive signatures [Ivshina, A.V., et al., Cancer research, 2006. 66(21): p. 10292-10301 ;, Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83; Francis, G.D., S.R. Stein, and G.D.
  • HG2 sub-classification studies supported the view that the low- and high- grade, defined via transcriptomic analysis, reflect independent patho-biological entities (distinct cell phenotypes) rather than a continuum of cancer progression [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301 ; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83].
  • the present invention is based upon studies carried out by the inventors to identify genetic and/or phenotypic markers which may be used to classify patients into two grades (i.e. low grade or high grade) of breast cancer, such as IDC breast cancer. These combination of markers, have not previously been shown to have use in classifying subjects into these two separate groups and especially classifying intermediate grade HG2 subjects into a low or high grade.
  • the present inventors performed integrative bioinformatics and experimental analyses of The Cancer Genome Atlas (TCGA) cohort and several other validation cohorts (total 1246 patients).
  • TCGA Cancer Genome Atlas
  • the inventors identified a 22-gene tumour aggressiveness grading classifier (22g-TAG) which permits global bifurcation in the IDC transcriptomes and reclassifies patients with HG2 tumours into two genetically and clinically distinct subclasses: histological grade 1-like (HG1-like) and histological grade 3-like (HG3-like).
  • the expression profiles and clinical outcomes of these subclasses were similar to the HG1 and HG3 tumours, respectively.
  • LGG HG1 +HG1 -like
  • a group of signature genes comprising CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRUM, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
  • the groups comprises 60 or less, such as 50, 40, 25 or less genes in total, in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC;
  • FAM72A/FAM72B/FAM72C/FAM72D CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
  • tumour subtypes includes one or more of the following classifications: estrogen receptor status, progesterone receptor status, human epidermal growth factor receptor 2 status, age, stage, lymph node status and metastasis status.
  • tumour subtypes includes the subtypes normal-like, luminal-A, luminal-B, basal-like, and HER2- enriched subtypes and luminal-A/normal-like clarify subjects as having an LGG tumour; and luminal-B/HER2-enriched/basal-like tumours classify subjects as having an HGG tumour.
  • analysis of tumour subtypes includes the subtypes normal-like, luminal-A, luminal-B, basal-like, and HER2- enriched subtypes and luminal-A/normal-like clarify subjects as having an LGG tumour; and luminal-B/HER2-enriched/basal-like tumours classify subjects as having an HGG tumour.
  • the method according to clauses 12 or 13 wherein gain of 1q21.1-1q21.3 is associated with diagnosis and classification of subjects as having an HGG tumour.
  • the method according to clauses 12 or 13 wherein a low copy number of 22q-related genes is associated with diagnosis and classification of subjects as having an LGG tumour.
  • the method according to any preceding clause wherein the population of highly mutated genes include TP53 and PIK3CA and wherein a high mutation count of PICK3XA is associated with classifying a subject as having an LGG tumour and wherein a high mutation count of TP53 is associated with diagnosis and classifying a subject as having an LGG tumour.
  • the method according to any preceding clause comprising detecting the expression of one or more genes (such as one or more of the genes identified in Table 18), wherein an increase in expression of said one or more genes allows for classification of a subject as having an HGG tumour. 24.
  • the method according to any preceding clause for use in providing a diagnostic and prognosis for a subject.
  • a kit comprising a substrate to which is bound a plurality of probes, wherein at least one of said plurality of probes is capable of specifically binding to each of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 ,
  • An assay /chip comprising, consisting essentially of, or consisting of probes for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C,
  • a PCR-based assay comprising, consisting essentially of, or consisting of primers for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D,
  • the present invention provides a method for classifying a subject with IDC breast cancer, into one of two specific grades, low genetic grade (LGG) or high genetic grade (HGG), said method comprising detecting a level of expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR and stratifying the subject into low genetic grade or high genetic grade based upon the expression levels of the identified genes.
  • the present invention provides a method for use in providing a prognosis of a subject with breast cancer, such as IDC breast cancer, the method comprising detecting a level of expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR and providing a prognosis based upon the expression level of the identified genes.
  • Relatively high expression of one or more of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and/or TICRR is typically associated with high genetic grade tumours as defined herein and hence aggressive tumor class and a poor prognosis.
  • relatively low expression is typically associated with low genetic grade tumours as defined herein and hence low- aggressive tumor class and a good prognosis or at least a prognosis which is better that for high genetic grade tumours.
  • Relatively high expression of one of more of CAPN8, NAT1, NOSTRIN, and/or KIF13B is typically associated with low genetic grade tumours as defined herein and hence low-aggressive class and a good or better prognosis than a subject who has a high grade tumour or tumours.
  • relatively low expression is typically associated with high genetic grade tumours as defined herein, high-aggressive tumor class and hence a poor prognosis or at least a prognosis which is worse than for low genetic grade tumours.
  • the inventors analysed a number of markers for their ability to classify patients into low and high grade tumours and hence provide an indication of prognosis based on the grade of the tumour. Understanding the grade of the tumour can also allow a more informed decision to be made in terms of patient management and potential treatment.
  • prognosis is distinct from “diagnosis”. “Prognosis” refers to a prediction about how a disease will develop, for example, the lifespan of the subject. In contrast, “diagnosis” refers to the identification of a disease.
  • the prognosis of subjects may be categorised into “good” or “poor” prognosis.
  • a “good” prognosis may be considered to relate to a survival time of 2 years or more. In some embodiments, a "good” prognosis may be defined as a survival time of 6 years or more. In another embodiment, a “good” prognosis may be defined as a survival time of 10 years or more.
  • a “poor” prognosis may be defined as a survival time of less than 2 years, or less than 1 year or even shorter.
  • prognosis has been calculated as the median survival time of a cohort i.e. 50% of the population in the cohort will survive for this time, based upon the high or low expression of a marker. Consequently, “prognosis” may be understood to mean a predicted survival time.
  • prognostic biomarkers may be used also for cancer classification, diagnostic and for prediction of the therapeutic treatment.
  • increased expression (which, for example, may normalised and/or determined by comparison to a reference value(s)) may be referred to as “relatively high”, while decreased expression may be referred to as “relatively low”.
  • RNA and/or protein may detect increased or decreased expression of a gene, RNA and/or protein in relation to a reference value.
  • the reference value may comprise a mean and/or a median value for the level of expression for the gene, RNA and/or protein, whereby the mean and/or median is calculated from a known cohort of subjects without disease. Skilled addressees will be aware of publically available databases of cohorts, such as TCGA, METABRIC, GEO/NCBI which reference values can be obtained and used as a comparison. Alternatively, the reference value may be obtained from a cohort of patients generated by the practitioner.
  • gene, RNA and/or protein with respect to expression will simply be referred to as “marker”, but this is not to be construed as limiting.
  • the reference value may comprise the mean and median of the selected marker. In alternative embodiments, the reference value may comprise the mean. In other embodiments, the reference value may comprise the median.
  • the differential decreased expression of a marker in relation to the reference value may be about 0.5 times, 1 times, 1.5 times, 2.0 times, 3.0 times, 5 times, 10 times or alternatively about 50 times lower than the reference value of expression for the marker. Preferably, the differential decreased expression of the marker in relation to the reference value may be including and between 0.5 and 50 times lower. More preferably, the differential decreased expression of the marker in relation to the reference value may be including and between 0.5 and 5 times lower.
  • the differential increased expression of the marker in relation to the reference value may be about 0.5 times, 1 times, 1.5 times, 2.0 times, 3.0 times, 5 times, 10 times or alternatively about 50 times higher than the reference value of expression for the marker.
  • the differential increased expression of the marker in relation to the reference value may be including and between 0.5 and 50 times higher. More preferably, the differential increased expression of the marker in relation to the reference value may be including and between 0.5 and 5 times higher.
  • the level of marker expression may be normalised. In some embodiments, marker expression may be normalised against the expression of another endogenous, regulated reference marker obtained from the sample. In an alternative embodiment, marker expression may be normalised against total cellular DNA from the sample. In some embodiments, marker expression may be normalised against total cellular RNA from the sample. In some embodiments, marker expression may be normalised against the length of the marker nucleotide transcript.
  • transcript relates to RNA, in particular mRNA, and DNA, in particular cDNA. The skilled addressee will be aware that the total number of reads for a given transcript is proportional to the expression level of the transcript multiplied by the length of the transcript.
  • a long transcript will have more reads mapping to it compared to a short gene of similar expression.
  • Various normalisation methods are known in the art, and it is to be appreciated that the above normalisation methods are in no way limiting to the skilled reader. Thus, alternative normalisation techniques not described within this invention may also be used.
  • said markers may be detected by probes specific for the markers and which may be provided for use in a kit, or be a feature of a kit.
  • said probes may be provided bound to a substrate in a kit.
  • the substrate may comprise probes capable of specifically binding said markers.
  • the substrate may comprise primers capable of specifically binding said markers, or antibodies capable of specifically binding said markers.
  • the substrate may comprise any combination of probes, antibodies and/or primers.
  • Probes may be detectably labelled for example with a fluorescent or luminescent label.
  • the kit may further comprise instructions for use, such as with an assay system. Kits for use in the detection of RNA or DNA markers may comprise at least two probes or primers per marker to be detected.
  • Kits for use in the present methods may comprise reagents for the synthesis of cDNA.
  • kits, assay chips or the like will comprise a finite number of probes, antibodies and/or markers, sufficient to detect the markers identified herein.
  • the present invention in one area is directed to a specific 22-gene set, more than simply this set of genes may be considered and each kit may comprise probes, antibodies and/or markers which are sufficient to permit up to 60, 50. 40, 25 or only the specifically identified 22 genes to be detected and their expression levels detected.
  • the markers of the present invention comprise, consist essentially of or consist of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
  • the markers detected in the present invention relate to genes. Consequently, the markers may comprise DNA, RNA or the protein/polypeptide product of the gene. Variants of the gene will also be known in the art, and will also be included in the term marker.
  • the term marker includes mutant nucleotide DNA, RNA or polypeptide sequences, allelic, splice and post translationally modified forms which are known in the art, or may be discovered in the future.
  • the term marker includes mRNA and cDNA.
  • the markers to be assayed may comprise protein/polypeptide.
  • the markers may comprise RNA.
  • the markers may comprise DNA.
  • the markers may comprise cDNA.
  • the cDNA may be synthesized from mRNA.
  • the markers may comprise DNA and RNA.
  • the sample is any appropriate tissue sample obtained from the subject.
  • the sample is a breast tissue sample obtained from the subject.
  • the sample is any appropriate fluid sample obtained from the subject.
  • the sample may comprise lymph fluid obtained from lymph nodes adjacent a tumour. Any sample may be obtained by biopsy, for example during surgery.
  • biopsy we include excisional and incisional biopsies. The term “biopsy” further includes partial or gross resection. Samples may alternatively be obtained by other methods known in the art.
  • a method of facilitating treatment for a subject with IDC breast cancer comprising detecting a level of expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample from the subject and providing a prognosis based upon the expression level of said marker or markers and selecting and/or administering a treatment based on the prognosis.
  • the expression level values obtained may be used by the clinician in assessing any of the following (a) probable or likely suitability of a subject to initially receive treatment; (b) probable or likely unsuitability of an individual to initially receive treatment; (c) dosage of treatment; (d) start date to begin treatment; (e) duration of treatment course; and (f) type of treatment to be administered.
  • Example treatments may include, but are not limited to radiotherapy, chemotherapy, anti-angiogenic compounds and/or surgery.
  • a preferred therapy comprises surgical removal of tissue. In other embodiments therapy comprises the administration of anti-angiogenic compounds.
  • the present inventors have observed that low and high grade classifications may be as a result of alternative pathways to cancer progression.
  • the targets for treatment may be different and this may lead to different therapies being proposed for low and high grade tumours.
  • minor surgical tumour resection and/or drug therapy may be indicated.
  • full mastectomy and additional chemotherapy and/or radiotherapy may be indicated. Any particular therapy regime will be determined by a particular physician with his skill and information he has to hand.
  • the present invention can be seen as an aid to assisting the physician to making a decision on how a subject should be treated, in consultation with the subject.
  • the assay systems may comprise a measurement device that measures marker expression levels.
  • the system may further comprise a data transformation device that acquires marker expression level data and performs data transformation to calculate whether or not the level determined is increased, decreased or equal to a reference value for the marker in question from the sample.
  • the assay system may also comprise an output interface device such as a user interface output device to output data to a user.
  • the assay system also includes a database of reference values, wherein the device identifies a low or high grade tumour, or good or bad prognosis upon analysis of the collective expression of the markers.
  • the device provides treatment information in the database for the low or high grade tumour, or good or bad prognosis and outputs the treatment information to the user interface output device.
  • the user interface output device may provide an output to the user, comprising notification such that the subject's gene expression is increased or decreased to the reference value, that this relates to a low or high grade tumour, or a good or a bad prognosis and if they should administer a suitable therapy, such as radiotherapy, chemotherapy, anti-angiogenic compounds and/or surgery.
  • a suitable therapy such as radiotherapy, chemotherapy, anti-angiogenic compounds and/or surgery.
  • the user interface output device may provide an output to the user, providing information on a low or high grade tumour good, or a bad prognosis and, if treatment is suitable, a time deadline by which treatment should begin.
  • the output interface device is remote from the user of the input device.
  • a subject's sample may be analysed in a local clinic or laboratory, but the results are transmitted remotely to a clinician or health care worker remote from the interface output device.
  • results can immediately be transmitted, ensuring the timely release of information to ensure the relevant treatment is started as soon as possible, particularly when information is provided about a poor prognosis.
  • an assay may provide subjects given a poor prognosis with better treatment options and in doing so a potentially longer life span and/or quality of life.
  • RNA detection methods may include nucleic acid hybridisation (Northern blotting) or nucleic acid amplification.
  • the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array.
  • the nucleic acid amplification method is reverse transcriptase PCR (RT-PCR).
  • DNA detection methods may include nucleic acid hybridisation (Southern blotting) or nucleic acid amplification.
  • the nucleic acid amplification method is PCR.
  • the nucleic acid detection method comprises a DNA microarray.
  • the nucleic acid detection method is next- generation sequencing (NGS). It will be appreciated that nucleic acid transcripts detected by next-generation sequencing may be normalised by length of transcript. Further details with regards to DNA detection techniques will be known to skilled addressees and can be found in common laboratory manuals, for example Sambrook and Russell, Molecular Cloning: A laboratory Manual, CSHL Press, 2001.
  • next-generation sequencing is known to the skilled addressee, who will look to NGS system providers' websites for reference (including, but not limited to: http://res.illumina.com/documents/products/illumina sequencing introduction.pdf; https://www.qiagen.com/gb/products/next-gen-sequencing).
  • a diagnostic chip for use in the present methods.
  • the chip comprises, consists essentially of or consists of a probe or probes for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample.
  • a diagnostic chip for use in the present methods, wherein said chip comprises a plurality of probes and/or primers which are collectively capable of specifically binding all of the above markers.
  • a diagnostic chip for use in the present methods comprising detecting a level of expression of all the above markers in a sample from the subject and providing a prognosis based upon the expression level of said markers.
  • said chip comprises a plurality of probes and/or primers which are collectively capable of specifically binding CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
  • a diagnostic chip for use in the present methods, wherein said chip comprises a plurality of probes and/or primers which are collectively capable of specifically binding CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
  • the diagnostic chip may comprise a traditional, solid phase array. Alternatively, the diagnostic chip may comprise an alternative bead array. Diagnostic chips may also be referred to as DNA microarrays.
  • the probes and/or primers for the diagnostic chip may be bound to a surface.
  • the preferred surface is silica or glass.
  • the preferred surface is plastic.
  • the probes and/or primers for the diagnostic chip may be bound to polystyrene beads.
  • oligonucleotides may be used as probes or primers.
  • Oligonucleotides for use within a kit may be labelled in order to be detected. Fluorescent labels may be used to enable direct detection. Alternatively, labels may be detected indirectly. Indirect detection methods are known in the art and may comprise, but not be limited to, biotin-avidin interactions and antibody binding. Fluorescently labelled oligonucleotides may also contain a quenching molecule.
  • the present invention can more broadly be viewed in terms of a method or multiple methods for use in classifying IDC breast cancer subjects into LGG and HGG cohorts.
  • the present invention provides a number of methods which can be used alone or in combination in order to provide suitable classification. Nevertheless in a particularly preferred embodiment, the present invention is directed to at least the use of a group of signature genes which comprise the 22-gene signature as described herein in order to classify subjects into LGG and HGG cohorts.
  • Each of the further diagnostic and prognostic features can further personalize the tumour classification or in parallel the disease sub- types, tumor's dynamics and may dictate the treatment strategy.
  • this invention proposes a novel multi-level system for classification of tumor class (LGG or HGG) and essential tumor features and provide prognosis and optionally a novel genetically supported opportunity for rational and personalized and precise therapy.
  • the strong power of the present work is based on genetically-defined determinants and statistically validated structural and functional bio-markers.
  • the present system was derived and integrated quantitatively. It focuses on a synergy of the diagnostic and prognostic bio-markers.
  • the various features may work in combination to provide for an individual patient's diagnostic outcome and related to precision medicine, disease outcome prediction and next generation bio-marker applications.
  • Table 1 overview of the clinical information of TCGA cohort.
  • A) A summary of clinical parameters for each histological grade of the 430 IDC tumors of TCGA cohort.
  • Table 3 the association of histological grades sub-classification and intrinsic subtypes:
  • Confusion matrices show training accuracies of seven training sets using PAM classifiers.
  • the seven training sets are results of under-sampling procedure applied on HG3 tumors to overcome imbalance training dataset.
  • PAM classification parameters (Class Error rate; Shrinking threshold; Overall error rate) and the number of top discriminative probesets resulted from each training are shown.
  • Table 5 tumor aggressiveness grading signature.
  • Table 6 the frequency of 22g-TAG signature genes' occurrence in 72 breast cancer signatures
  • 22g-TAG breast cancer signature in 72 breast cancer related signatures.
  • we included two molecular grading signatures 212 breast cancer genetic grading gene subset (represented 264 Affymetrix probesets, Ivshina et al, 2006) and 5-gene genetic grading signature (represented 6 Affymetrix probesets, Ivshina et al, 2006).
  • the number of occurrences represents by the number of reference signatures that contain a given gene of the 22g-TAG breast cancer signature.
  • Table 7 Survival prediction of patients grouping into low- and high- risk subclasses derived using 22g-TAG genes.
  • Table 10 Characteristics of genes and transcribed loci (represented by probesets) that are differentially expressed between HG1-like and HG3-like tumors, defined based on the 22g- TAG classifier.
  • the fold changes represent the ratio of the median expression level in HG3-like with respect to the median of expression level in HG1-like tumors.
  • a two-tailed Wilcoxon test was used to assess the significance of the difference of the gene expression profile between HG1-like and HG3-like tumors. Multiple probesets and transcribed isoforms can be associated with a gene.
  • Table 11 Gene ontology and functional enrichment analysis for differentially expressed genes between HG1-like and HG3-like tumors.
  • Table 12 A list of differentially altered genes between HG1-like and HG3-like tumors.
  • Contingency tables show frequencies produced by cross-classifying genetic grades and hierarchical clustering.
  • the hierarchical clustering performed on 4,933 genes that were differentially expressed between HG1-like and HG3-like tumors using the Euclidian distance and average linkage agglomerative method.
  • Table 14 lists of genes that were differentially expressed between LGG and HGG tumors. List of 3,073 and 2,618 genes that were up- and down-regulated in HGG tumors relative to LGG tumors. Each gene could be represented by more than one probeset or one probeset could represent multiple genes.
  • Table 15 Gene ontology and functional enrichment analysis.
  • Table 16 A list of differentially altered genes between LGG and HGG tumors.
  • CNV copy number variation
  • a contingency table shows frequencies produced by cross-classifying the HG2 samples based on copy number variations and gene expression classifications. The statistical significance of the agreement between both classifications was assessed using Cohen's Kappa correlation coefficient.
  • Table 8 enrichment analysis of stem cells related genes among molecular grading related genes.
  • A The significant enrichment of genes associated with 21 embryonic stem cells, as obtained by SAGE, in the genes that were differentially expressed between HG1-like and HG3-like tumors, specifically within the genes that are up regulated in HG3-like tumors with respect to HG1-like tumors.
  • the analysis was performed using the DAVID gene ontology tool.
  • a contingency table shows frequencies produced by cross-classifying of patients based on genetic grades and classes resulting from unsupervised hierarchical clustering of the 106 genes. These genes are commonly expressed in the 21 embryonic stem cell lines studied in the CGAP SAGE database. Hierarchical clustering was performed using Euclidian distance and average linkage agglomerative method. The statistical significance of the agreement between both classifications was assessed using Cohen's Kappa correlation coefficient. Table 20: Summary of clinical parameters of 22g-TAG breast cancer patients' cohort and PCR primers used in qPCR validation.
  • A) Summary of clinical parameters of patients' cohort used in the qPCR based grading validation of 22g-TAG signature.
  • Figure 1 Schematic overview of the gene expression-based sub-classification of histological grade 2 (HG2) samples into HG1-like and HG3-like.
  • DEGs Differentially Expressed Genes
  • DAGs Differentially Altered Genes
  • HG Histological grades
  • SWS Statistically Weighted Syndrome algorithm
  • PAM Prediction Analysis of Microarray algorithm
  • LGG Low Genetic Grades
  • HGG High Genetic Grades.
  • E) Examples of the difference in qPCR-based expression for 2 genes of 22g-TAG for all histological and genetic grades of IDC patients.
  • F) Heatmap of Kendal tau correlation coefficients between 22-gTAG genes using their qPCR-based relative expression profiles.
  • Figure 3 Major genomic and transcriptomic variations between subclasses of IDC determined by 22g-TAG classifier.
  • A) Box plots of the number of reference deviated genes (RDG) per sample for histological and genetic grades of IDC associated with 22g-TAG classifier.
  • Figure 3 Major genomic and transcriptomic variations between subclasses of IDC determined by 22g-TAG classifier.
  • RDG reference deviated genes
  • Figure 4 Copy number variation visualization of few chromosomes in which the differentially altered genes between LGG and HGG are enriched.
  • the upper bar is a plot of the negative log p-value of the Wilcoxon test per gene against its transcription start site. The Wilcoxon test assesses the difference in CNV profile between LGG and HGG tumors for each gene.
  • the middle bar is the median values of the CNV signal intensities of LGG (green) and HGG (red) tumors per gene against its transcription start site.
  • the lower bar is the ideogram of the corresponding chromosome (centromere in red).
  • B) Sistributions of Kendall's tau correlation coefficients between CNV and corresponding gene expression of differentially altered genes between LGG and HGG tumors (red), non- differentially altered genes (remaining genes in the genome, blue), and a random match between the CNV profile and gene expression profile as a control distribution (n represents number of different combinations of matching the CNV profiles of genes with their expression profiles of multiple probesets).
  • C) copy number variation visualization of chromosome 22.
  • Figure 5 progression model for LGG and HGG tumors
  • IDC tumors progression model shows the major genetic events that dichotomize and characterize each oncogenic pathway of LGG and HGG tumors.
  • DEG differentially expressed gene.
  • CSC Cancer Stem Cell. +: DNA copy number gain.
  • Figure 6 under-sampling representation of HG3 tumor samples during pattern recognition analysis.
  • HG3 tumors were shuffled and split into 7 non-overlapping subsets. Each unique subset of HG3 tumors was compared with HG1 tumors to obtain balanced training set during pattern recognition analysis.
  • Figure 7 SWS derived assigning probabilities of HG1 , HG3 and subclasses of HG2 tumors to corresponding genetic subclasses.
  • Figure 8 scatter plot of the data driven cutoff and mean values of gene expression in low and high risk groups of prognosis prediction analysis of 22g-TAG genes.
  • Figure 10 box plots of AG and mutation counts per samples for different histological and genetic grades.
  • A) box plots for of the number of amplified or deleted genes per sample separately for different histological and genetic grades.
  • Figure 12 cumulative distribution of copy number variation of 22q genes.
  • Figure 13 hierarchical clustering results performed on 106 genes associated with 21 embryonic stem cells.
  • Heatmap of gene expression profiles resulted from hierarchical clustering of 106 genes expressed in 21 different embryonic stem cells according to SAGE database. Euclidean distance as distance measurement and average linkage agglomerative methods were used for hierarchical clustering. Materials and methods
  • TCGA Cancer Genome Atlas
  • Level 2 DNA somatic mutation data were downloaded from TCGA identified using exome sequencing.
  • the mutation annotation file contains information about the mutated genes, mutation genomic coordinates, type of mutation, and genotype calls of the tumor and reference normal samples for each patient. Only 418 samples are common with the chosen 430 IDC samples. Data were converted into a two-dimensional matrix in which the rows and columns represent the genes and samples, respectively, and the data points represent the number of distinct mutated sites of a given gene in a given sample.
  • PAM is a modified nearest-centroid method used for features selection and class prediction analyses [Tibshirani, R., et al., Proceedings of the National Academy of Sciences, 2002. 99(10): p. 6567-6572]. In this work, we used it for dimensionality reduction to obtain most informative and representative features from the entire set of microarray probesets that discriminate between HG1 and HG3 tumors. PAM was implemented via the "pamr" R package.
  • SWS is a statistics-based voting class prediction and feature selection method. It selects the most informative variables (prediction features), categorizes them and tests the stability of the classification border of a feature domain of the training set based on sampling and a leave-one-out procedures [Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83; Kuznetsov, V.A., et al., Mathematical and Computer Modelling, 1996. 23(6): p. 95-119]..
  • fold change criteria > 1.25 or ⁇ 0.75 were used.
  • the number of RDG for each TCGA IDC tumor sample can be calculated independently and compared across the genetic grade subgroups.
  • the Database for Annotation, Visualization and Integrated Discovery (DAVID) [Da Wei Huang, B.T.S. and R.A. Lempicki, Nature protocols, 2008. 4(1): p. 44-57] tool was used to identify the top enriched biological processes among the differentially expressed genes through the Gene Ontology (GO) annotation database. Input of unique Entrez genes IDs was compared with a background gene list constitute all the genes in the genome using Hypergeometric test. Functional annotation chart constitutes of molecular functions, biological processes, cellular components, KEGG pathways, tissue expression, and chromosome number was reported.
  • the MetaCore tool (Thomson Reuters, St. Joseph, Ml, USA) was used to build the genes network associated with 22g-TAG genes (https://portal.genego.com/).
  • RNA samples of 84 IDC patients were obtained from OriGene (patients' clinical parameters are summarized in Table 20A).
  • the concentration of the RNA was provided by OriGene, reconfirmed using a Nanodrop® spectrophotometer, and normalized.
  • cDNA synthesis from 250 ng total RNA was conducted using a QuantiTect ® Reverse Transcription Kit based on random hexamer and Oligo (dT) primers.
  • qPCR experiments were conducted in 96-well plates using the QuantStudioTM 6 Flex Real-Time PCR System.
  • the KAPA SYBR ® FAST qPCR Kit was used for qPCR experiments, and low Rox was used as a passive reference dye.
  • Primers were designed using primer3 (v.
  • HG2 tumors are genetically heterogeneous and include tumors which oncogenic pathways could be separated into two distinct subclasses similar to either HG1 or HG3 tumors.
  • a trained pattern recognition classifier to the intermediate HG2 tumors and evaluate the ability of the classifier to stratify HG2 tumors into HG1-like or HG3-like tumors ( Figure 1A).
  • the algorithm selects the most differentially expressed genes (DEG) (represented by the microarray probesets) that discriminated HG1 and HG3 tumors in our seven training sets. These training sets resulted in the seven statistically reproducible classification signatures (The training accuracies and numbers of features are shown in Table 4). We selected 39 common probesets (corresponding to 22 genes) from the seven PAM- derived signatures.
  • DEG differentially expressed genes
  • the 22 genes comprise BUB1, CAPN8, CDC45, CDCA5, CDCA8, CENPA, CENPN, FAM72B/FAM72A, KIF13B, KIF14, KIF2C, MCM10, MELK, MTFR2, MYBL2, NAT1, NOSTRIN, ORC6, PIF1, SHCBP1, TICRR, and UBE2C.
  • SWS was performed for seven training/prediction sets to address the size imbalance of training classes.
  • the average accuracy of SWS was 90.5 ⁇ 3.4% (with average sensitivity of 90.2 ⁇ 3.7%, average specificity of 91.5 ⁇ 5.3%).
  • each HG2 tumor was assigned to either HG1-like or HG3-like sub-class.
  • the overall prediction for each sample was based on the consensus agreement across the seven trained SWS classifiers. Consensus agreement is determined by the number of times a sample assigned to a given subclass with an assigning probability threshold (p> 0.7). The tumor samples that showed predicted probability in an uncertainty zone (0.5 ⁇ 0.2) was classified as the "HG2-like" class. According to these criteria, 55.2% (101/183) and 42.6% (78/183) of HG2 tumors were assigned to HGI-like and HG3-like tumor type, respectively.
  • CAPN8 is a protease that plays a role in membrane trafficking of gastric cells and protection of gastric mucosa [Hata, S., et al., J Biol Chem, 2007. 282(38): p. 27847-56; Hata, S., et al., PLoS Genet, 2010. 6(7): p. e1001040].
  • PIF1 plays critical roles in DNA replication, cell growth, G-quadruplex, and R-loops resolving [Zhou, R., et al., Elite, 2014. 3: p.
  • ORC6 is an important cell cycle-related gene involved in DNA replication initiation and chromosome segregation [Prasanth, S.G., K.V. Prasanth, and B. Stillman, Science, 2002. 297(5583): p. 1026-31].
  • 22g-TAG signature genes are potential prognostic markers
  • the data-driven expression threshold values of survival prediction analysis of the genes and their mean expression in the low- and high-risk tumor development groups are significantly correlated (Kendal's tau correlation p ⁇ 0.05) among at least three cohorts (Table 8).
  • the 22g-TAG signature outperformed other clinical parameters in the stratification of patients into prognostically meaningful groups, according to univariate and multivariate survival analyses based on a Cox- regression model in at least three of the four validation cohorts (Table 8).
  • Table 8 Collectively, the 22g-TAG signature genes are potentially reliable prognostic markers.
  • 22g-TAG signature genes are involved in cell cycle/mitosis and oncogenic pathways
  • gene ontology (GO) enrichment analysis we performed gene ontology (GO) enrichment analysis and found that these genes are strongly enriched in cell cycle/mitosis gene ontology categories (p ⁇ 0.01 , Table 9).
  • MetaCore includes manually curated knowledge database about annotated genes, their products and functional interactions.
  • the 22g-TAG gene symbols were used as the seed nodes for "extension" of the gene network via finding the shortest path between any two genes of seed node set with maximum two intermediate nodes (genes or their products).
  • KIF2C and MYBL2 represent the convergence and divergence hubs, respectively for this network highlighting their role in IDC aggressiveness (Figure 2C).
  • Two genes of 22g-TAG (KIF2C and NAT1) could be potentially druggable genes according to the drug-gene interaction database (DGIdb) [Griffith, M., et al., Nat Methods, 2013. 10(12): p. 1209-10], whereas 10 genes of 22g-TAG associated network are druggable (AR, AURKA, AURKB, CDK1, CDK2, MYC, PLK1, SMAD2, TOP2A, and 7P53).
  • DGIdb drug-gene interaction database
  • qPCR was conducted on 84 RNA samples of BC patients obtained from OriGene (see Methods).
  • Obtained fold change values were used for the re-classification of HG2 samples using the SWS algorithm.
  • HG3 samples were shuffled and split into 3 non-overlapping sets of 16 samples each.
  • Three training-prediction sets were performed using the SWS algorithm.
  • HG2 tumors were finally sub-classified based on the consensus sub-classification of the three prediction iterations.
  • the average training accuracy is 83.3% (sensitivity: 66.6 ⁇ 7.2%, specificity: 91.7 ⁇ 3.6%).
  • HG1-like and HG3-like tumors have distinct transcriptome profiles
  • probeset signals correspond to RNA transcribed by 2147 genes: 887 genes (777 protein-coding, 26 pseudogenes, 33 ncRNA, 1 snoRNA, and 50 unknown transcripts) and 1 ,260 genes (1099 protein-coding, 83 pseudogenes, 18 ncRNA, and 60 unknown transcripts) were down- regulated and up-regulated, respectively, in HG3-like tumors with respect to HG1-like tumors (Table 10).
  • the gene locations of up-regulated genes are enriched in specific chromosomes, such as chr8, chr17, chr20, and chr22 (Benjamini p-values ⁇ 3x10 "5 , Table 11).
  • HG1-like and HG3-like tumors are distinct in their genomic constitution
  • thresholds of 1.25 for gene gain and 0.75 for gene loss were applied to CNV signal intensities of HG1-like and HG3-like tumors.
  • the number of altered genes (gain or loss) in each sample was determined based on the previously mentioned thresholds.
  • HG1-like tumors exhibited fewer altered genes (AG) per sample than did HG3-like tumors.
  • DAG Differentially altered genes
  • genes include well-known altered genes important for BC initiation, development, and progression.
  • the TP53 gene located on chromosome 17p, is deleted in 37% (37 of 101 samples) of HG1-like tumors and 64% (49 of 77 samples) of HG3-like tumors.
  • HG1-like tumors there is deletion of part of 16q.
  • HG3-like tumors showed gains in 8q, 17q, and 20q and losses in 8p, 11q, and 17p (Table 2A). It is notable that the chromosomes that harbor the DAG are the same chromosomes in which the DEG are enriched (chromosomes 8, 17, and 20).
  • genes within these chromosomes could be considered as the major players in initiation and maintenance of differential level of malignancy that distinguish between HG1- like and HG3-like tumors at both the DNA and mRNA levels.
  • HG1-like and HG3-like tumors are distinct in their DNA point mutational profiles
  • TP53 3.2x1 ⁇ *6 , 6.4x10 "4 , and 1x10 "5 , respectively, Figure 10B.
  • Figures 3A, B, C, and Figure 10 show that there is no statistically significant differences between HG1 and HG1-like tumors as well as between HG3-like and HG3 with respect to RDG, AG, and mutation counts in all cases (p-values>0.05). Moreover, no DEG between HG1 and HG1-like were detected whereas 1 ,837 DEG were detected between HG3-like and HG3 but these genes did not show significant enrichment in any biological process, cellular component, or pathway for the up- or down-regulated genes (Benjamini p-values > 0.01 ). Only a few molecular functions showed significant enrichment and are associated with ATP binding.
  • HC unsupervised hierarchical clustering
  • 16q loss occurred more frequently in HG1 and HG1-like tumors compared with HG3-like and HG3 tumors.
  • the intrinsic subtypes information of the tumors was obtained from TCGA network [Cancer Genome Atlas, N., Nature, 2012. 490(7418): p. 61-70], of which PAM50 model [Parker, J.S., et al., J Clin Oncol, 2009. 27(8): p. 1160-7] was used to achieve the classification for each sample.
  • a contingency table of the frequency of 5 different subtypes (normal-like, luminal-A, luminal-B, basal-like, and HER2-enriched subtypes) versus the 4 classes of grading classification (HG1 , HG1-like, HG3-like, and HG3) was generated.
  • Luminal-A tumors are enriched and distributed in HG1 and HG1-like tumors (low genetic grade/LGG), whereas luminal-B, HER2-enriched, and basal-like tumors are enriched in HG3 and HG3-like tumors (high genetic grade/HGG).
  • Up-regulated genes are associated with cell cycle, chromosome segregation, and DNA replication biological processes (Benjamini p-value ⁇ 1.3x10 "14 ) and involved in kinetochore and spindle microtubule cellular components (Benjamini p-value ⁇ 1.9x10 *3 ), and the genes are strongly enriched among the genes expressed in epithelial tissues (Benjamini p-value 2.7x10 "12 , Table 15). It is noteworthy that the up-regulated genes in HGG tumors are significantly enriched on chromosomes 2, 8, 16, 20, and 22 (Benjamini p-value ⁇ 1.4x10 "4 ).
  • the high number of differentially expressed genes and the functional and chromosomal enrichment of these genes indicate essential distinct genomic and transcriptomic profiles of LGG and HGG tumors.
  • DAG analysis between LGG and HGG reveal 1 ,858 DAG (1 ,432 protein-coding, 347 ncRNA, 61 pseudo, 17 snoRNA, and 1 snRNA, Table 16) enriched in a few chromosomes (Table 2D). Specifically, 52% of the DAG (971 of 1 ,845 genes) are located on chromosome 16. Visualization of the copy number variation status across the chromosome arms showed that in LGG, there is a gain of 16p and deletion of 16q whereas HGG tumors showed gain of 8q and loss of 8p and 17p (Figure 4A). Our results provide plausible evidence to support the hypothesis that that LGG and HGG tumors are distinct at the genotype level.
  • LGG tumors have significantly fewer mutations than HGG tumors.
  • the three types of mutations (missense, nonsense, and silent) show the same trend in the difference in mutations per sample across genetic grades ( Figure 10B).
  • the frequency of mutations in this gene consists of the 10%(14/130) in the LGG tumors and the 48%(137/284) in the HGG tumors. This correlation is positive regarding HGG. Inversely, for PIK3CA the frequency of mutations in the gene consists of the 48%(62/130) in the LGG tumors and 23%(67/284) in the HGG tumors, suggesting the negative correlation relatively HGG.
  • DNA copy numbers of the differentially altered genes are strongly associated with their corresponding gene expression profiles
  • RNA expression and corresponding CNV were significantly correlated for approximately 52% of the DAG (FDR ⁇ 0.01 , 976 of 1 ,845 genes).
  • the DAG (1 ,845 genes) have stronger correlation with their gene expression profile compared with non- differentially altered genes (non-DAG).
  • Chromosome 22 copy number variation is a novel indicator of LGG and HGG independence.
  • a loss of genetic material in low grade tumors but not in high grade represents the striking evidence for the independence of the low- and high- grade oncogenic pathways (e.g. 16q loss).
  • 16q loss the loss of 16q in low-grade tumors
  • 22q shows low CNV signal intensities for LGG tumors compared with HGG tumors.
  • the median values of CNV signal intensities for LGG tumors do not pass the threshold of copy number loss, the difference in copy number between LGG and HGG is significant. This difference is notable for genes located downstream of the centromeric region and at the sub-telomeric region ( Figure 4C).
  • DNA copy number variation reflects sub-classification of HG2 tumors
  • DAGs between HG1 and HG3 tumors were determined using the same criteria used previously for the selection of DAG.
  • the classifier was trained using HG1 and HG3 tumors. Similar to the gene expression data, HG3 tumors were shuffled and divided into 7 non-overlapping groups, and 7 training-prediction subsets were performed. The average classification accuracy was 77 ⁇ 4.3%. HG2 tumors were sub-classified into HG1-like and HG3-like tumors in each training-prediction subset. Each HG2 sample was assigned to a new subclass according to the consensus classification in all 7 classifiers.
  • LGG and HGG grading classification is associated with the differential expression of stem cell genes
  • HG1-like and HG3-like tumors demonstrate the strong molecular distinction between HG1-like and HG3-like tumors and their comparable genetic profiles with HG1 and HG3 tumors, respectively.
  • HG1-like and HG1 tumors to be LGG tumors
  • HG3-like and HG3 tumors to be HGG tumors.
  • LGG and HGG tumors are the two major genetically predetermined classes of breast IDC and that they have independent oncogenic pathways.
  • the distinction between LGG and HGG tumors was supported based on integrative data analysis.
  • the DAGs discriminating between LGG and HGG tumors are enriched in specific chromosomes where chr16 is the major contributor. Five major events were observed, 16q loss, and 16p gain in LGG tumors and 8p, 17p loss and 8q gain in HGG tumors.
  • Our gene-centric based copy number variation analysis helps to highlight candidate genes of which copy number alterations give a survival advantage to tumor cells during tumor evolution.
  • LGG tumors have fewer mutations than HGG tumors.
  • TP53 and PIK3CA have mutations counts positively and negatively correlated with genetic grades respectively.
  • Our analysis demonstrated that a relatively higher count of PIK3CA mutations is associated with HG1-like tumors.
  • PIK3CA mutations frequently occurs in IDC and are known to activate the PI3K/AKT/mTOR pathway, these mutations could be considered as the potential predictive biomarkers of HG1-like tumors.
  • LGG tumors originate and progress depending on the clonal evolution of normal epithelial cells
  • HGG tumors originate from stem/progenitor cells and progress via clonal evolution to multiple subtypes to include ER + and ER " tumors.
  • High tumor grade is associated with decreased overall survival [Trudeau, M.E., et al., Breast Cancer Res Treat, 2005. 89(1): p. 35-45], but it also known that it predicts of increased response to neoadjuvant chemotherapy [Vincent-Salomon, A., et al., Eur J Cancer, 2004. 40(10): p. 1502-8; Chang, J., et al., J Clin Oncol, 1999. 17(10): p. 3058-63]. Consequently, we can hypothesize that IDC with LGG and HGG would also show decreased and increased response rates to the chemotherapy respectively.
  • LGG tumors are expected to be less suitable for treatment as high- aggressiveness tumors. Therefore, they could be more suitably treated with agents that target other growth-related requirements of tumors, such as the mTOR pathway that mediates mRNA translation and increase genome instability of tumor cells and initiate their apoptosis.
  • agents that target other growth-related requirements of tumors such as the mTOR pathway that mediates mRNA translation and increase genome instability of tumor cells and initiate their apoptosis.
  • Further examples include agents that mediate the growth of blood vessels that provide blood supply to tumors (such as bevacizumab) or hormone-related growth signaling pathways (estrogen signaling pathways in ER+ tumors) such as tamoxifen.
  • NAT1 can be inhibited efficiently by Rhod-o-hp with minimal cell toxicity or by iRNA to decrease cell growth and invasiveness [Tiang, J.M., N.J. Butcher, and R.F. Minchin, Biochem Biophys Res Commun, 2010. 393(1): p. 95-100; Tiang, J.M., et al., PLoS One, 2011. 6(2): p. e17031].
  • MELK was successfully targeted by OTSSP167 compound and demonstrated a suppression of mammosphere formation in breast cancer cells and growth suppression of xenograft studied in multiple cancer types in mice [Chung, S., et a!., Oncotarget, 2012. 3(12): p. 1629-40; Chung, S. and Y. Nakamura, Cell Cycle, 2013. 12(11): p. 1655-6; Cho, Y.S., et al., Biochem Biophys Res Commun, 2014. 447(1 ): p. 7-11]. Therefore, it should be important to consider NAT1 and MELK genes and their products as the targets in the therapeutic plans for LGG and HGG tumor separately.
  • 22g-TAG (BUB1, KIF2C, UBE2C, and CENPN) in addition to CDC20 (form 22gTAG network) are among the 10 genes that determine the responsiveness of tumors to chemotherapy recently identified [Hallett, R.M., et al., Oncotarget, 2015. 6(9): p. 7040-52].
  • Neoadjuvant chemotherapy can cause tumor shrinkage, which enables a proportion of patients with large tumors to be eligible for breast conservation surgery (BCS). This increases the BCS rate in comparison to adjuvant chemotherapy only [Mathieu, M.C., et al., Ann Oncol, 2012. 23(8): p. 2046-52]. In such cases, our genetic grading classification could potentially be useful for prediction of patients' eligibility to NAC.
  • Buerger, H., et al., Ductal invasive G2 and G3 carcinomas of the breast are the end stages of at least two different lines of genetic evolution. J Pathol, 2001. 194(2): p. 165-70.
  • Prasanth S.G., K.V. Prasanth, and B. Stillman, Orc6 involved in DNA replication, chromosome segregation, and cytokinesis. Science, 2002. 297(5583): p. 1026-31.
  • Wennmalm, K., et al. Gene expression in 16q is associated with survival and differs between Sortie breast cancer subtypes. Genes, Chromosomes and Cancer, 2007. 46(1): p. 87-97. Zhou, R., et al., Periodic DNA patrolling underlies diverse functions ofPifl on R-loops and G-rich DNA. Elite, 2014. 3: p. e02190.

Abstract

The present invention relates to methods of classifying subjects with Invasive Ductal Carcinoma (IDC), a type of breast cancer, into low genetic grade (LGG) or high genetic grade (HGG), categorized by an optimal composition of genetic features. In particular, a group of 22 signature genes consisting of CENPA, CENPN, FAM72A/ FAM72B/ FAM72C/ FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1, KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1, PIF1, CDCA5, MCM10, MTFR2 and TICRR is used in the classification. The present invention also provides a prognostic evaluation based on the selected features providing a quantitative personalized predictor of the disease outcome, facilitating therapeutic regime based on the diagnostic results. Also encompassed are kit and assays comprising probes/primers of said genes.

Description

INVASIVE DUCTAL CARCINOMA AGGRESSIVENESS CLASSIFICATION
FIELD OF THE INVENTION The present invention relates to methods of classifying subjects with Invasive ductal carcinoma (IDC), a type of breast cancer, into low- and high- aggressive tumours, categorized by an optimized composition of genetic features. The present invention also provides a prognostic evaluation based on the selected features providing a quantitative personalized predictor of the disease outcome, facilitating therapeutic regimen based on the diagnostic results.
BACKGROUND TO THE INVENTION
Breast cancer (BC) is a heterogeneous disorder comprising of multiple molecular entities. The diverse oncogenic pathways leading to the formation of distinct BC tumours subtypes also contribute to the wide range of clinical tumour dynamics and disease outcomes.
Invasive ductal carcinoma (IDC) is a major histo-morphologic type of breast cancer. Histological grading (HG) of IDC is widely adopted by oncologists as a prognostic factor. However, HG evaluation is highly subjective with only 50%-85% inter-observer agreements. Specifically, the subjectivity in the assignment of the intermediate grade (histologic grade 2, HG2) breast cancers (comprising -50% of IDC cases) results in uncertain disease outcome prediction and sub-optima systemic therapy. Despite several attempts to identify the mechanisms underlying the HG classification, their molecular bases are poorly understood.
There are three grades of invasive ductal carcinoma: low or grade 1 ; moderate or grade 2; and high or grade 3. Grade 1 invasive ductal carcinoma cells, which are sometimes called "well differentiated," histologically look and act somewhat like healthy breast cells. Grade 3 cells, also called "poorly differentiated," are more abnormal in their behavior and appearance. However, differentiation is subjective leading to poor diagnosis and if a subject is identified as falling into the grade 2 category, it is not clear whether or not they are likely to develop into grade 3 in due course. Moreover, when conducting surgery to remove cancer cells/tissues, a surgeon may remove more tissue than may be necessary from subjects identified as having grade 2 cancer. Thus, having 3 grades of cancer may be seen as undesirable. Previously, Ivshina et al. (2006) studied gene expression of patient cohorts and showed that the sub-classification of HG2 patients, based on a large 232-gene genetic grading classifier (representing by 264 Affymetrix presets), dichotomized them with high accuracy (5-7% errors) into two genetically, and clinically distinct subclasses; histological grade 1 -like (HG1- like) and histological grade 3-like (HG3-like).These two tumor subclasses, called HG1-like and HG3-like, have similar gene expression profiles and clinical outcomes to HG1 and HG3 tumours, respectively. Small gene subsets have been also derived which dichotomizes of the HG2 patients onto HG1-like and HG3-like subclasses with the same 5-7 % accuracy (Ivshina et al, 2006). Importantly, the patients have not been pre-selected based on any clinical characteristics or tumour molecular status (e.g., tumour stages, tumour size, ER and LN status).The results of the dichotomization of the BC patients with HG2 tumours have been reproducible across different cohorts and treatment groups. Similar results were observed for the subpopulation of the IDC selected by ER+ status Sotiriou et al (2006). Several attempts have been made to identify the molecular mechanisms underlying the morphological characteristics of tumour grading to improve its objectivity [Buerger, H., et al., J Pathol, 2001. 194(2): p. 165-70; Buerger, H., et al., J Pathol, 1999. 187(4): p. 396-402; Cleton-Jansen, A.M., et al., Genes Chromosomes Cancer, 2004. 41(2): p. 109-16; Ivshina, A. V., et al., Cancer research, 2006. 66(21): p. 10292-10301 ; Roylance, R., et al., Cancer Res, 1999. 59(7): p. 1433-6; Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72; Kempowsky-Hamon, T., et al., BMC Med Genomics, 2015. 8: p. 3]. The continuous progressive model of tumour aggressiveness progression from low-grade to high-grade tumuors has been accepted for the last few decades. Alternatively, independent oncogenic pathways have been suggested based on observations of the differential loss of the 16q in HG1 versus HG3 tumours [Buerger, H., et al., J Pathol, 2001. 194(2): p. 165-70; Cleton-Jansen, A.M., et al., Genes Chromosomes Cancer, 2004. 41(2): p. 109-16; Roylance, R., et al., Cancer Res, 1999. 59(7): p. 1433-6]. Previous genetic studies demonstrated the loss of 16q in HG1 IDC and the possibility of micro-deletions in 16q in HG3 IDC [ Buerger, H., et al., J Pathol, 2001. 194(2): p. 165-70; Buerger, H., et al., J Pathol, 1999. 187(4): p. 396-402; Roylance, R., et al., Cancer Res, 1999. 59(7): p. 1433-6; Nordgard, S.H., et al., Genes Chromosomes Cancer, 2008. 47(8): p. 680-96]. For a detailed review of 16q loss frequency in different histological types of BC, the reader is referred to a review by Burger et al. However, there is still ambiguity regarding the categorisation of intermediate HG2 tumours. It was demonstrated that HG2 patients can be dichotomized based on gene expression profiles, with high accuracy (95 %) into two genetically, and clinically distinct subclasses; histological grade 1-like (HG1-like) and histological grade 3-like (HG3-like) [Ivshina, A.V., et al., Cancer research, 2006. 66(21): p. 10292-10301 ; Ivshina AV, et al., in Keystone Symposia: Stem Cells, Senescence and Cancer. 2005. p. P. 76; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83]. These subclasses, HG1-like and HG3-like, have similar gene expression profiles and clinical outcomes to HG1 and HG3 tumours, respectively. The 232 genes grading classifier were involved mostly in cell cycle, p53 pathway, inhibition of apoptosis, cell adhesion, cell motility, stress, hormone response and angiogenesis [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301]. Also, it has been argued that this genetic tumour aggressiveness grading classifier and its multiple representative 5-7 gene classification subsets can improve prognosis and therapeutic planning for BC patients diagnosed with tumour histologic type (HG2). Importantly, the patients have not been pre-selected based on any clinical characteristics (e.g., tumour stages, tumour size, ER and LN status). These re-classification results have been reproduced across different cohorts and treatment groups and strongly correlated with survival pattern of the re-classified tumour subgroups. Similar results were observed for the specific subpopulation of the BC selected by ER+ status [Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72; Loi, S., et al., J Clin Oncol, 2007. 25(10): p. 1239-46]. Collectively, genetic grade signatures can improve prognosis of BC patients, especially IDC patients with HG2 tumours, which are relatively poorly defined by different grading systems and currently used molecular prognostic and predictive signatures [Ivshina, A.V., et al., Cancer research, 2006. 66(21): p. 10292-10301 ;, Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83; Francis, G.D., S.R. Stein, and G.D. Francis, in The 2012 International Joint Conference on Neural Networks (IJCNN), 2012]. Importantly, HG2 sub-classification studies supported the view that the low- and high- grade, defined via transcriptomic analysis, reflect independent patho-biological entities (distinct cell phenotypes) rather than a continuum of cancer progression [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301 ; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83]. Several studies have investigated the association of HG systems with DNA copy number variations (CNV) and mutation events [Cava, C, et al., PLoS One, 2014. 9(5): p. e97681; Ping, Z., et al., J Pathol Inform, 2014. 5: p. 3], but to the inventors' knowledge, no studies have reported a systematic interconnection of the CNV and mutation patterns in the HG2 of IDC. It is amongst the objects of the present invention to provide a genetic tool which may be used to supplement the diagnosis and/or prognosis of breast cancer, especially IDC breast cancer. SUMMARY OF THE INVENTION
The present invention is based upon studies carried out by the inventors to identify genetic and/or phenotypic markers which may be used to classify patients into two grades (i.e. low grade or high grade) of breast cancer, such as IDC breast cancer. These combination of markers, have not previously been shown to have use in classifying subjects into these two separate groups and especially classifying intermediate grade HG2 subjects into a low or high grade.
The present inventors performed integrative bioinformatics and experimental analyses of The Cancer Genome Atlas (TCGA) cohort and several other validation cohorts (total 1246 patients). As part of the invention, the inventors identified a 22-gene tumour aggressiveness grading classifier (22g-TAG) which permits global bifurcation in the IDC transcriptomes and reclassifies patients with HG2 tumours into two genetically and clinically distinct subclasses: histological grade 1-like (HG1-like) and histological grade 3-like (HG3-like). The expression profiles and clinical outcomes of these subclasses were similar to the HG1 and HG3 tumours, respectively. The inventors were able to further reclassify IDC into low genetic grade (LGG=HG1 +HG1 -like) and high genetic grade (HGG=HG3-like+HG3) subclasses. For the HG 1-like and HG3-like I DCs the inventors found subclass specific DNA alterations, somatic mutations, oncogenic pathways, cell cycle/mitosis and stem cell-like expression signatures that discriminate between these tumours. The inventors also found similar molecular patterns in the LGG and HGG tumour classes. Without wishing to be bound by theory, the results suggest the existence of two genetically-predefined IDC classes, LGG and HGG, driven by distinct oncogenic pathways. The results permit novel prognostic and therapeutic outcomes and may open unique opportunities for personalized systemic therapies of IDC patients.
The present invention is defined with reference to the following numbered clauses appended hereto and the following description: 1. A method for classifying a subject with invasive ductal carcinoma (IDC) breast cancer, into one of two specific grades, low genetic grade (LGG) IDC or high genetic grade (HGG) IDC, the method comprising conducting a genetic and/or phenotypic analysis of a sample from the subject in order to identify one or more of the following characteristics:
1) Detecting expression levels of a group of signature genes, comprising CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRUM, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR., wherein the groups comprises 60 or less, such as 50, 40, 25 or less genes in total, in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC;
2) Detecting subsets of up- and down- regulated differentially expressed populations of genes within a subject's genome in order to facilitate tumor classification and disease outcome stratification between LGG or HGG IDC;
3) Detecting differentially altered genes within a subject's genome in order to facilitate tumour classification and disease outcome stratification between LGG or HGG IDC;
4) analysing a population of highly mutated genes that have significantly different mutation counts between LGG and HGG grades in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC;
5) analysing tumour subtypes in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC; and
6) detection of a population of genes routinely expressed in stem cells in order to facilitate tumour classification and disease outcome stratification between for disease outcome LGG and HGG cancers.
The method according to clause 1 for use in classifying or reclassifying IDC subjects with histologic type 2 (HG2) IDC into either LGG or HGG IDC.
The method of either of clauses 1 or 2 wherein the method comprises option 1 ) and optionally one or more of options 2) - 6).
The method according to and preceding clause wherein the group of signature genes consists of, or consists essentially of: CENPA, CENPN,
FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR. The method according to clause 4 wherein relatively high expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR stratifies a subject as having a high genetic grade (HGG) tumour and optionally a poor prognosis. The method according to clauses 4 or 5 wherein relatively high expression of CAPN8, NAT1 , NOSTRIN, and KIF13B stratifies a subject as having a low genetic grade (LGG) tumour and optionally a good or better prognosis than a subject who has a high grade tumour or tumours. The method according to any preceding clause, wherein the group of signature genes permits the stratification of subjects as having LGG and HGG tumours through analysis employing the statistically weighted syndrome algorithm. The method according to any preceding clause wherein analysis of tumour subtypes includes one or more of the following classifications: estrogen receptor status, progesterone receptor status, human epidermal growth factor receptor 2 status, age, stage, lymph node status and metastasis status. The method according to any of clauses 1 - 8 wherein analysis of tumour subtypes includes the subtypes normal-like, luminal-A, luminal-B, basal-like, and HER2- enriched subtypes and luminal-A/normal-like clarify subjects as having an LGG tumour; and luminal-B/HER2-enriched/basal-like tumours classify subjects as having an HGG tumour. The method according to any preceding clause wherein the differentially expressed genes and their relative expression level, up or down, which allow for tumour classification and patient stratification as a HGG tumour as compared to an LGG tumour are identified in Tables 14 or 15. The method according to any preceding clause wherein an up-regulation of gene expression of genes localized in chromosome 8, specifically in 8p23.3, 8p23.1 , 8p21.3-8p12, 8q13.1-8q24.3 are associated with the diagnosis and classification of subjects as having HGG tumour The method according to any preceding clause wherein detecting differentially altered gene expression comprises detecting frequency of chromosome gain or loss. The method of clause 14 comprising using gene-centric somatic DNA copy number variation analysis for the identification of differentially altered genes (DAG) between LGG and HGG tumours for specification of the diagnosis and stratifying subjects for HGG or LGG tumour classes. The method according to clauses 12 or 13 wherein loss of chromosome 8p23.3, 8p23.1 , 8p21.3-8p12 and the gain of 8q13.1-8q24.3 are associated with diagnosis and stratifying subjects as HGG tumour. The method according to clauses 12 or 13 wherein loss of 11q21-11q25 is associated with diagnosis and stratifying subjects as having an HGG tumour. The method according to clauses 12 or 13 wherein loss of 16q, optionally 16q12.1- 16q13, is associated with diagnosis and classification of subjects as having an LGG tumour. The method according to clauses 12 or 13 wherein gain of 20q13.13-20q 13.2 is associated with diagnosis and classification of subjects as having an HGG tumour. The method according to clauses 12 or 13 wherein gain of 1q21.1-1q21.3 is associated with diagnosis and classification of subjects as having an HGG tumour. The method according to clauses 12 or 13 wherein a low copy number of 22q-related genes is associated with diagnosis and classification of subjects as having an LGG tumour. The method according to any preceding clause wherein the population of highly mutated genes include TP53 and PIK3CA and wherein a high mutation count of PICK3XA is associated with classifying a subject as having an LGG tumour and wherein a high mutation count of TP53 is associated with diagnosis and classifying a subject as having an LGG tumour. The method according to clause 20 wherein in addition to PICK3CA, the following genes are more frequently mutated in a subject with an LGG tumour, but not mutated or rarely mutated in HGG: CBFB, CTCF, MAP3K1 , CHD8, DYSF, DNAH1 , MAP2K4, and GATA3. 22. The method according to clause 20 wherein in addition to TP53, the following genes are more frequently mutated in a subject with an HGG tumour, but not mutated or rarely mutated in LGG: MUC4, and TTN. 23. The method according to any preceding clause comprising detecting the expression of one or more genes (such as one or more of the genes identified in Table 18), wherein an increase in expression of said one or more genes allows for classification of a subject as having an HGG tumour. 24. The method according to any preceding clause for use in providing a diagnostic and prognosis for a subject.
25. The method according to any preceding clause for use in facilitating prognosis of an IDC subject.
26. The method according to any preceding clause for use in facilitating determination of how an IDC subject should be treated.
27. The method according to clause 26, further comprising treating the subject based on whether or not the subject is identified as having an LGG or HGG tumour.
28. A kit comprising a substrate to which is bound a plurality of probes, wherein at least one of said plurality of probes is capable of specifically binding to each of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 ,
PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
29. An assay /chip comprising, consisting essentially of, or consisting of probes for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C,
ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample.
30. A PCR-based assay comprising, consisting essentially of, or consisting of primers for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D,
CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample. In an aspect the present invention provides a method for classifying a subject with IDC breast cancer, into one of two specific grades, low genetic grade (LGG) or high genetic grade (HGG), said method comprising detecting a level of expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR and stratifying the subject into low genetic grade or high genetic grade based upon the expression levels of the identified genes.
In some embodiments the present invention provides a method for use in providing a prognosis of a subject with breast cancer, such as IDC breast cancer, the method comprising detecting a level of expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR and providing a prognosis based upon the expression level of the identified genes. Relatively high expression of one or more of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and/or TICRR is typically associated with high genetic grade tumours as defined herein and hence aggressive tumor class and a poor prognosis. Conversely relatively low expression is typically associated with low genetic grade tumours as defined herein and hence low- aggressive tumor class and a good prognosis or at least a prognosis which is better that for high genetic grade tumours. Relatively high expression of one of more of CAPN8, NAT1, NOSTRIN, and/or KIF13B is typically associated with low genetic grade tumours as defined herein and hence low-aggressive class and a good or better prognosis than a subject who has a high grade tumour or tumours. Conversely relatively low expression is typically associated with high genetic grade tumours as defined herein, high-aggressive tumor class and hence a poor prognosis or at least a prognosis which is worse than for low genetic grade tumours. The inventors analysed a number of markers for their ability to classify patients into low and high grade tumours and hence provide an indication of prognosis based on the grade of the tumour. Understanding the grade of the tumour can also allow a more informed decision to be made in terms of patient management and potential treatment.
It will be appreciated that "prognosis" is distinct from "diagnosis". "Prognosis" refers to a prediction about how a disease will develop, for example, the lifespan of the subject. In contrast, "diagnosis" refers to the identification of a disease. The prognosis of subjects may be categorised into "good" or "poor" prognosis. For the purposes of the present invention, a "good" prognosis may be considered to relate to a survival time of 2 years or more. In some embodiments, a "good" prognosis may be defined as a survival time of 6 years or more. In another embodiment, a "good" prognosis may be defined as a survival time of 10 years or more. In contrast, a "poor" prognosis may be defined as a survival time of less than 2 years, or less than 1 year or even shorter. For the purposes of this invention, "prognosis" has been calculated as the median survival time of a cohort i.e. 50% of the population in the cohort will survive for this time, based upon the high or low expression of a marker. Consequently, "prognosis" may be understood to mean a predicted survival time. In some cases, prognostic biomarkers may be used also for cancer classification, diagnostic and for prediction of the therapeutic treatment.
For the purposes of this invention increased expression (which, for example, may normalised and/or determined by comparison to a reference value(s)) may be referred to as "relatively high", while decreased expression may be referred to as "relatively low".
It will be understood, that, for the purposes of the present invention, increased or decreased expression of a gene, RNA and/or protein may detected in relation to a reference value. The reference value may comprise a mean and/or a median value for the level of expression for the gene, RNA and/or protein, whereby the mean and/or median is calculated from a known cohort of subjects without disease. Skilled addressees will be aware of publically available databases of cohorts, such as TCGA, METABRIC, GEO/NCBI which reference values can be obtained and used as a comparison. Alternatively, the reference value may be obtained from a cohort of patients generated by the practitioner. Hereinafter "gene, RNA and/or protein" with respect to expression will simply be referred to as "marker", but this is not to be construed as limiting.
How to calculate a mean or median value is known. In some embodiments, the reference value may comprise the mean and median of the selected marker. In alternative embodiments, the reference value may comprise the mean. In other embodiments, the reference value may comprise the median. In one aspect, the differential decreased expression of a marker in relation to the reference value may be about 0.5 times, 1 times, 1.5 times, 2.0 times, 3.0 times, 5 times, 10 times or alternatively about 50 times lower than the reference value of expression for the marker. Preferably, the differential decreased expression of the marker in relation to the reference value may be including and between 0.5 and 50 times lower. More preferably, the differential decreased expression of the marker in relation to the reference value may be including and between 0.5 and 5 times lower. In another aspect, the differential increased expression of the marker in relation to the reference value may be about 0.5 times, 1 times, 1.5 times, 2.0 times, 3.0 times, 5 times, 10 times or alternatively about 50 times higher than the reference value of expression for the marker. Preferably, the differential increased expression of the marker in relation to the reference value may be including and between 0.5 and 50 times higher. More preferably, the differential increased expression of the marker in relation to the reference value may be including and between 0.5 and 5 times higher.
The level of marker expression may be normalised. In some embodiments, marker expression may be normalised against the expression of another endogenous, regulated reference marker obtained from the sample. In an alternative embodiment, marker expression may be normalised against total cellular DNA from the sample. In some embodiments, marker expression may be normalised against total cellular RNA from the sample. In some embodiments, marker expression may be normalised against the length of the marker nucleotide transcript. For the purposes of this invention the term "transcript" relates to RNA, in particular mRNA, and DNA, in particular cDNA. The skilled addressee will be aware that the total number of reads for a given transcript is proportional to the expression level of the transcript multiplied by the length of the transcript. For example, a long transcript will have more reads mapping to it compared to a short gene of similar expression. Various normalisation methods are known in the art, and it is to be appreciated that the above normalisation methods are in no way limiting to the skilled reader. Thus, alternative normalisation techniques not described within this invention may also be used.
It will be appreciated that said markers may be detected by probes specific for the markers and which may be provided for use in a kit, or be a feature of a kit. For example, said probes may be provided bound to a substrate in a kit. The substrate may comprise probes capable of specifically binding said markers. Alternatively, the substrate may comprise primers capable of specifically binding said markers, or antibodies capable of specifically binding said markers. The substrate may comprise any combination of probes, antibodies and/or primers. Probes may be detectably labelled for example with a fluorescent or luminescent label. The kit may further comprise instructions for use, such as with an assay system. Kits for use in the detection of RNA or DNA markers may comprise at least two probes or primers per marker to be detected. Kits for use in the present methods may comprise reagents for the synthesis of cDNA. Typically and kits, assay chips or the like will comprise a finite number of probes, antibodies and/or markers, sufficient to detect the markers identified herein. Although the present invention in one area is directed to a specific 22-gene set, more than simply this set of genes may be considered and each kit may comprise probes, antibodies and/or markers which are sufficient to permit up to 60, 50. 40, 25 or only the specifically identified 22 genes to be detected and their expression levels detected.
The markers of the present invention comprise, consist essentially of or consist of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR. It is to be understood that the markers detected in the present invention relate to genes. Consequently, the markers may comprise DNA, RNA or the protein/polypeptide product of the gene. Variants of the gene will also be known in the art, and will also be included in the term marker. The term thus includes mutant nucleotide DNA, RNA or polypeptide sequences, allelic, splice and post translationally modified forms which are known in the art, or may be discovered in the future. In particular, the term marker includes mRNA and cDNA. Optionally, the markers to be assayed may comprise protein/polypeptide. Preferably, the markers may comprise RNA. Alternatively, the markers may comprise DNA. In one embodiment the markers may comprise cDNA. The cDNA may be synthesized from mRNA. In one embodiment the markers may comprise DNA and RNA. Although the remainder of this disclosure will be directed to RNA markers and resulting cDNA, this should not be construed as limiting in any way, as other expression products, as listed above, may alternatively be detected.
In accordance with the present invention, the sample is any appropriate tissue sample obtained from the subject. In one embodiment the sample is a breast tissue sample obtained from the subject. In another aspect the sample is any appropriate fluid sample obtained from the subject. In one embodiment the sample may comprise lymph fluid obtained from lymph nodes adjacent a tumour. Any sample may be obtained by biopsy, for example during surgery. By "biopsy" we include excisional and incisional biopsies. The term "biopsy" further includes partial or gross resection. Samples may alternatively be obtained by other methods known in the art. In one embodiment of the invention is provided a method of facilitating treatment for a subject with IDC breast cancer, said method comprising detecting a level of expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample from the subject and providing a prognosis based upon the expression level of said marker or markers and selecting and/or administering a treatment based on the prognosis. The expression level values obtained may be used by the clinician in assessing any of the following (a) probable or likely suitability of a subject to initially receive treatment; (b) probable or likely unsuitability of an individual to initially receive treatment; (c) dosage of treatment; (d) start date to begin treatment; (e) duration of treatment course; and (f) type of treatment to be administered. Example treatments may include, but are not limited to radiotherapy, chemotherapy, anti-angiogenic compounds and/or surgery. A preferred therapy comprises surgical removal of tissue. In other embodiments therapy comprises the administration of anti-angiogenic compounds.
For subjects identified with low grade tumours, less aggressive forms of treatment may be indicated, whereas for subjects with high grade tumours, more aggressive forms of treatment may be indicated. However, without wishing to be bound by theory, the present inventors have observed that low and high grade classifications may be as a result of alternative pathways to cancer progression. Thus, the targets for treatment may be different and this may lead to different therapies being proposed for low and high grade tumours. For example, for subjects with low grade tumours, minor surgical tumour resection and/or drug therapy may be indicated. Conversely for subjects with high grade tumours, full mastectomy and additional chemotherapy and/or radiotherapy may be indicated. Any particular therapy regime will be determined by a particular physician with his skill and information he has to hand. However, the present invention can be seen as an aid to assisting the physician to making a decision on how a subject should be treated, in consultation with the subject.
In accordance with the present invention, assay systems are provided. The assay systems may comprise a measurement device that measures marker expression levels. The system may further comprise a data transformation device that acquires marker expression level data and performs data transformation to calculate whether or not the level determined is increased, decreased or equal to a reference value for the marker in question from the sample.
In some embodiments, the assay system may also comprise an output interface device such as a user interface output device to output data to a user. Preferably, the assay system also includes a database of reference values, wherein the device identifies a low or high grade tumour, or good or bad prognosis upon analysis of the collective expression of the markers. In one embodiment the device provides treatment information in the database for the low or high grade tumour, or good or bad prognosis and outputs the treatment information to the user interface output device. In one embodiment the user interface output device may provide an output to the user, comprising notification such that the subject's gene expression is increased or decreased to the reference value, that this relates to a low or high grade tumour, or a good or a bad prognosis and if they should administer a suitable therapy, such as radiotherapy, chemotherapy, anti-angiogenic compounds and/or surgery. In an alternative embodiment, the user interface output device may provide an output to the user, providing information on a low or high grade tumour good, or a bad prognosis and, if treatment is suitable, a time deadline by which treatment should begin.
In one embodiment, the output interface device is remote from the user of the input device. For example, a subject's sample may be analysed in a local clinic or laboratory, but the results are transmitted remotely to a clinician or health care worker remote from the interface output device. Thus, results can immediately be transmitted, ensuring the timely release of information to ensure the relevant treatment is started as soon as possible, particularly when information is provided about a poor prognosis. By ensuring the most suitable treatment starts at the most relevant time, such an assay may provide subjects given a poor prognosis with better treatment options and in doing so a potentially longer life span and/or quality of life.
Assessment of DNA, RNA or polypeptide/protein expression levels is routine in the art. One example of a method of measuring protein levels, provided in the invention, is Western blotting or immunohistochemistry using antibodies to particular markers. Other protein assays may include radioimmunoassay (RIA), enzyme-linked immunosorbent assay (ELISA) or flow cytometry. Suitable RNA detection methods may include nucleic acid hybridisation (Northern blotting) or nucleic acid amplification. In some embodiments, the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array. In other preferred embodiments, the nucleic acid amplification method is reverse transcriptase PCR (RT-PCR). Two common methods for the detection of products in RT-PCR are: (1 ) non-specific fluorescent dyes that intercalate with any double-stranded DNA, and (2) sequence-specific DNA probes consisting of oligonucleotides that are labelled with a reporter, such as a fluorescent label which permits detection only after hybridization of the probe with its complementary sequence to quantify messenger RNA (mRNA). Further details with regards to RT-PCR will be known to those skilled in the art and can be found in common laboratory manuals (e.g. Sambrook and Russell, Molecular Cloning: A laboratory Manual, CSHL Press, 2001). In some embodiments, DNA detection methods may include nucleic acid hybridisation (Southern blotting) or nucleic acid amplification. Preferably, the nucleic acid amplification method is PCR. In some embodiments the nucleic acid detection method comprises a DNA microarray. In one embodiment the nucleic acid detection method is next- generation sequencing (NGS). It will be appreciated that nucleic acid transcripts detected by next-generation sequencing may be normalised by length of transcript. Further details with regards to DNA detection techniques will be known to skilled addressees and can be found in common laboratory manuals, for example Sambrook and Russell, Molecular Cloning: A laboratory Manual, CSHL Press, 2001. Various methods of next-generation sequencing are known to the skilled addressee, who will look to NGS system providers' websites for reference (including, but not limited to: http://res.illumina.com/documents/products/illumina sequencing introduction.pdf; https://www.qiagen.com/gb/products/next-gen-sequencing). In one embodiment there is provided a diagnostic chip for use in the present methods. The chip comprises, consists essentially of or consists of a probe or probes for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample. In some embodiments there is provided a diagnostic chip for use in the present methods, wherein said chip comprises a plurality of probes and/or primers which are collectively capable of specifically binding all of the above markers. Thus, there is provided a diagnostic chip for use in the present methods, said method comprising detecting a level of expression of all the above markers in a sample from the subject and providing a prognosis based upon the expression level of said markers. Preferably, there is provided a diagnostic chip for use in the present methods, wherein said chip comprises a plurality of probes and/or primers which are collectively capable of specifically binding CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR. More preferably, there is provided a diagnostic chip for use in the present methods, wherein said chip comprises a plurality of probes and/or primers which are collectively capable of specifically binding CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR. The diagnostic chip may comprise a traditional, solid phase array. Alternatively, the diagnostic chip may comprise an alternative bead array. Diagnostic chips may also be referred to as DNA microarrays. Various diagnostic chips, are known to those skilled in the art, and may include, but not be limited to Affymetrix chips, Agilent products, and/or lllumina products. In accordance with the invention, the probes and/or primers for the diagnostic chip may be bound to a surface. In one embodiment, the preferred surface is silica or glass. In one embodiment, the preferred surface is plastic. In some embodiments, the probes and/or primers for the diagnostic chip may be bound to polystyrene beads.
Preferably, oligonucleotides may be used as probes or primers. Oligonucleotides for use within a kit may be labelled in order to be detected. Fluorescent labels may be used to enable direct detection. Alternatively, labels may be detected indirectly. Indirect detection methods are known in the art and may comprise, but not be limited to, biotin-avidin interactions and antibody binding. Fluorescently labelled oligonucleotides may also contain a quenching molecule.
As well as the above description which is generally directed to assays and materials focussing in on a 22-gene signature and its use, it will be seen from the claims and the following detailed description that the present invention can more broadly be viewed in terms of a method or multiple methods for use in classifying IDC breast cancer subjects into LGG and HGG cohorts. The present invention provides a number of methods which can be used alone or in combination in order to provide suitable classification. Nevertheless in a particularly preferred embodiment, the present invention is directed to at least the use of a group of signature genes which comprise the 22-gene signature as described herein in order to classify subjects into LGG and HGG cohorts.
The evidence as presented herein suggests that, for example, if in an IDC breast cancer patient biopsy material most (e.g. greater than 10, 11 , 12, 13, 14 ,15 16, or all 17) of the 17 specific genes (CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR) of the 22-g TAG genes are expressed higher than their computationally-defined cut-off values and also most (e.g. 3, 4, or all 5) of the 5 specific genes (NAT1, NOSTRIN, COPN8, KIF13B) of 22-g TAG genes are expressed lower than their computationally-defined cut-off values, then the tumor is classified high- aggressive (HGG). Moreover, the risk of the disease recurrence is expected to be high in comparison to the patient (s) which tumor's diagnostic test predicts the belonging to LGG class. The converse is true with respect to classifying a tumour as being of LGG grade. The evidence presented also suggests or proposes that it is possible to quantitatively make such a prediction even if the tumors were clinically classified HGG2 (see Figure 1 and Figure 2 and many other results). Additionally, further evidence, such as the mutation and DNA copy number data provides further support; this data can provide further specification of the disease outcome for a given subject. The additional tests and their specific patterns allow a clinician to further specify/predict the most critical genetic alterations, which may be used for improvement of diagnostic outcome and/or gene/innumo/chemo/hormone therapy(s) alone or to propose an optimal combination(s).
Each of the further diagnostic and prognostic features (e.g stem cell genotype and phenotype) can further personalize the tumour classification or in parallel the disease sub- types, tumor's dynamics and may dictate the treatment strategy. Again, this invention proposes a novel multi-level system for classification of tumor class (LGG or HGG) and essential tumor features and provide prognosis and optionally a novel genetically supported opportunity for rational and personalized and precise therapy. The strong power of the present work is based on genetically-defined determinants and statistically validated structural and functional bio-markers. The present system was derived and integrated quantitatively. It focuses on a synergy of the diagnostic and prognostic bio-markers. The various features may work in combination to provide for an individual patient's diagnostic outcome and related to precision medicine, disease outcome prediction and next generation bio-marker applications.
DETAILED DESCRIPTION OF THE INVENTION
The present invention will now be further described by way of example and with reference to the tables and figures which show:
Table 1 : overview of the clinical information of TCGA cohort.
A) : A summary of clinical parameters for each histological grade of the 430 IDC tumors of TCGA cohort.
B) : A summary of clinical parameters for each genetic subclass of HG2 tumors derived from 22g-TAG classifier.
Table 2: summary of differentially altered genes (DAG) of copy number variaiton:
Summary of number of genes and their genomic location that showed differentially altered copy number profile across different genetic grades:
A): the DAG between HG1-like and HG3-like tumors.
B): the DAG between HG1-like and HG1 tumors. C) : the DAG between HG3-like and HG3 tumors.
D) : the DAG between LGG and HGG tumors.
Table 3: the association of histological grades sub-classification and intrinsic subtypes:
breast cancer intrinsic subtype stratification according to histological and genetic grade classification of TCGA (A), Uppsala, and Stockholm (B) breast cancer cohorts:
Low-genetic grade LGG (HG1+HG1-like) and high-genetic grade HGG (HG3-like+HG3) are considered two major genetically distinct classes of BC. Table 4: Results of PAM pattern recognition analysis applied on HG1 and HG3 tumors.
Confusion matrices show training accuracies of seven training sets using PAM classifiers. The seven training sets are results of under-sampling procedure applied on HG3 tumors to overcome imbalance training dataset.
PAM classification parameters (Class Error rate; Shrinking threshold; Overall error rate) and the number of top discriminative probesets resulted from each training are shown.
Table 5: tumor aggressiveness grading signature.
A): The annotation of the 39 probesets corresponding to the 22g-TAG signature for TCGA gene expression level 2, Agilent G4502A_07_3 platform.
B): The periodicity of the 22g-TAG genes based on experimental data from the Cyclebase database and other published databases. The p-values quantify the periodicity and regulation through cell cycle of given gene. The rank represents the rank of gene's periodicity among all periodic genes, where the smaller rank the higher significant periodicity.
Table 6: the frequency of 22g-TAG signature genes' occurrence in 72 breast cancer signatures
The occurrence of 22g-TAG breast cancer signature in 72 breast cancer related signatures. As the comparative examples, we included two molecular grading signatures (212 breast cancer genetic grading gene subset (represented 264 Affymetrix probesets, Ivshina et al, 2006) and 5-gene genetic grading signature (represented 6 Affymetrix probesets, Ivshina et al, 2006). The number of occurrences represents by the number of reference signatures that contain a given gene of the 22g-TAG breast cancer signature. Table 7: Survival prediction of patients grouping into low- and high- risk subclasses derived using 22g-TAG genes. Survival prediction analysis of 22g-TAG was performed using 1D-DDg method based on gene expression cut-off value estimated using Cox proportional hazards model in four independent cohorts: A:Uppsala, B: Stockholm, C:Singapore and D: Marseille. 1 D DDg- defined cut-off: gene expression cut-off corresponding to the most significant separation of the patients into low- and high- risk groups. 1D DDg p-value: log-rank statistic p-value corresponding the optimized 1 D DDg-defined cut-off value. C.I.: Confidence Interval. Model design: 1: tumor suppressor-like gene (low expression is related to relatively poor prognosis) and 2: oncogene-like gene (high expression is related to relatively poor prognosis. Table 8: Univariate and multivariate survival analyses of SWS-derived 22g-TAG signature providing the low- and high- grade IDC classification
Univariate and multivariate survival analyses of 22g-TAG signature and common clinical parameters using Cox proportional hazards model in four independent cohorts. Table 9: Gene ontology enrichment analysis of 22g-TAG molecular signature genes.
Functionally enriched biological process, cellular components and pathways associated with the 22g-TAG genes using DAVID gene ontology tool.
Table 10: Characteristics of genes and transcribed loci (represented by probesets) that are differentially expressed between HG1-like and HG3-like tumors, defined based on the 22g- TAG classifier.
The fold changes represent the ratio of the median expression level in HG3-like with respect to the median of expression level in HG1-like tumors. A two-tailed Wilcoxon test was used to assess the significance of the difference of the gene expression profile between HG1-like and HG3-like tumors. Multiple probesets and transcribed isoforms can be associated with a gene.
Table 11 : Gene ontology and functional enrichment analysis for differentially expressed genes between HG1-like and HG3-like tumors.
Gene ontology and functional enrichment analysis for up- and down-regulated genes in HG3-like tumors with respect to HG1-like tumors using DAVID Gene Ontology tool.
Table 12: A list of differentially altered genes between HG1-like and HG3-like tumors.
A list of 1,214 differentially altered genes between HG1-like and HG3-like tumors. The copy number intensity values were log2 transformed (diploid=1 ). Table 13: Contingency tables of the agreement between 22g-TAG derived genetic grades and classes resulting from unsupervised hierarchical clustering.
Contingency tables show frequencies produced by cross-classifying genetic grades and hierarchical clustering. The hierarchical clustering performed on 4,933 genes that were differentially expressed between HG1-like and HG3-like tumors using the Euclidian distance and average linkage agglomerative method.
Table 14: lists of genes that were differentially expressed between LGG and HGG tumors. List of 3,073 and 2,618 genes that were up- and down-regulated in HGG tumors relative to LGG tumors. Each gene could be represented by more than one probeset or one probeset could represent multiple genes.
Table 15: Gene ontology and functional enrichment analysis.
Gene ontology and functional enrichment analysis for up- and down-regulated genes in HGG tumors with respect to LGG tumors using DAVID Gene Ontology tool.
Table 16: A list of differentially altered genes between LGG and HGG tumors.
Summary of copy number variation (CNV) profiles of 1,858 genes and their CNV event percentages in LGG and HGG tumors. Two-tailed Wilcoxon test p-values assess the difference between CNV profiles of LGG and HGG tumors.
Table 17: Test of the agreement between DNA and RNA based sub-classification of HG2 tumors
A contingency table shows frequencies produced by cross-classifying the HG2 samples based on copy number variations and gene expression classifications. The statistical significance of the agreement between both classifications was assessed using Cohen's Kappa correlation coefficient.
Table 8: enrichment analysis of stem cells related genes among molecular grading related genes.
A: The significant enrichment of genes associated with 21 embryonic stem cells, as obtained by SAGE, in the genes that were differentially expressed between HG1-like and HG3-like tumors, specifically within the genes that are up regulated in HG3-like tumors with respect to HG1-like tumors. The analysis was performed using the DAVID gene ontology tool.
B: list of 106 genes that are commonly expressed in all 21 embryonic stem cell lines obtained by SAGE, (only 84 genes are represented in Agilent G4502A_07_3 platform.) Table 19: Test of the agreement between genetic grades and clusters associated with stem cell related genes.
A contingency table shows frequencies produced by cross-classifying of patients based on genetic grades and classes resulting from unsupervised hierarchical clustering of the 106 genes. These genes are commonly expressed in the 21 embryonic stem cell lines studied in the CGAP SAGE database. Hierarchical clustering was performed using Euclidian distance and average linkage agglomerative method. The statistical significance of the agreement between both classifications was assessed using Cohen's Kappa correlation coefficient. Table 20: Summary of clinical parameters of 22g-TAG breast cancer patients' cohort and PCR primers used in qPCR validation.
A) : Summary of clinical parameters of patients' cohort used in the qPCR based grading validation of 22g-TAG signature.
B) : PCR primers sequences associated with 22g-TAG genes.
Figure 1 : Schematic overview of the gene expression-based sub-classification of histological grade 2 (HG2) samples into HG1-like and HG3-like.
A): The basic concept of HG2 dichotomization based on the pattern recognition analysis supervised by the gene expression of HG1 and HG3 tumors.
B): the workflow of our methodology of the sub-classification of HG2 and integrative data analyses of different genetic grades obtained by 22g-TAG classifier of TCGA cohort.
DEGs: Differentially Expressed Genes; DAGs: Differentially Altered Genes; HG: Histological grades; SWS: Statistically Weighted Syndrome algorithm; PAM: Prediction Analysis of Microarray algorithm; LGG: Low Genetic Grades; HGG: High Genetic Grades.
Figure 2: Functional and network analyses of 22g-TAG signature:
A) : The differences in gene expression profiles between HG1-like and HG3-like samples for 5 genes from 22g-TAG signature.
B) : The peak expression of the 5 genes in A at the G2/M phase of the cell cycle. P is the p- value, which assesses the periodicity of a gene during the cell cycle according to the
Cyclebase database.
C) : Network analysis of 22g-TAG signature genes using MetaCore network analysis tool.
D) : Kaplan-Meier curves of LGG and HGG patients' disease-free survival classified based on qPCR data of 22g-TAG genes.
E): Examples of the difference in qPCR-based expression for 2 genes of 22g-TAG for all histological and genetic grades of IDC patients. F): Heatmap of Kendal tau correlation coefficients between 22-gTAG genes using their qPCR-based relative expression profiles. Figure 3: Major genomic and transcriptomic variations between subclasses of IDC determined by 22g-TAG classifier.A): Box plots of the number of reference deviated genes (RDG) per sample for histological and genetic grades of IDC associated with 22g-TAG classifier.
Figure 3: Major genomic and transcriptomic variations between subclasses of IDC determined by 22g-TAG classifier.
A) : Box plots of the number of reference deviated genes (RDG) per sample for histological and genetic grades of IDC associated with 22g-TAG classifier.
B) : Box plots of the numbers of altered genes (AG) per sample for histological and genetic grades of IDC associated with 22g-TAG classifier.
C) : Box plots of mutations count per sample for histological and genetic grades of IDC associated with 22g-TAG classifier.
The differences in the numbers of RDG, altered genes or mutations counts between different combinations of genetic grades were assessed statistically using two-tailed Wilcoxon test.
D) : Bar plots of mutations counts per sample for different genetic grades for TP53 and PIK3CA.
E) : Bar plots of mutations counts per sample in LGG and HGG tumors for 12 genes that are correlated significantly with LGG and HGG classification. P is p-value of Fisher exact test.
Figure 4: Copy number variation visualization of few chromosomes in which the differentially altered genes between LGG and HGG are enriched.
A) : Copy number variation for chromosomes 8, 16, and 17. For each chromosome, three bars are shown:
- The upper bar is a plot of the negative log p-value of the Wilcoxon test per gene against its transcription start site. The Wilcoxon test assesses the difference in CNV profile between LGG and HGG tumors for each gene.
-The middle bar is the median values of the CNV signal intensities of LGG (green) and HGG (red) tumors per gene against its transcription start site.
- The lower bar is the ideogram of the corresponding chromosome (centromere in red).
B) : Sistributions of Kendall's tau correlation coefficients between CNV and corresponding gene expression of differentially altered genes between LGG and HGG tumors (red), non- differentially altered genes (remaining genes in the genome, blue), and a random match between the CNV profile and gene expression profile as a control distribution (n represents number of different combinations of matching the CNV profiles of genes with their expression profiles of multiple probesets). C): copy number variation visualization of chromosome 22. Figure 5: progression model for LGG and HGG tumors
IDC tumors progression model shows the major genetic events that dichotomize and characterize each oncogenic pathway of LGG and HGG tumors. DEG: differentially expressed gene. CSC: Cancer Stem Cell. +: DNA copy number gain. -: DNA copy number loss, mut: DNA point mutation.
Figure 6: under-sampling representation of HG3 tumor samples during pattern recognition analysis.
Schematic view of our methodology to overcome the class imbalance in training set. HG3 tumors were shuffled and split into 7 non-overlapping subsets. Each unique subset of HG3 tumors was compared with HG1 tumors to obtain balanced training set during pattern recognition analysis.
Figure 7: SWS derived assigning probabilities of HG1 , HG3 and subclasses of HG2 tumors to corresponding genetic subclasses.
A): the assigning probabilities of HG1 and HG3 tumors to low and high genetic grades during 7 training iterations using 22g-TAG signature based on SWS algorithm.
B): the assigning probabilities of HG2 tumors to low and high genetic grades during 7 prediction iterations using 22g-TAG signature based on SWS algorithm.
Figure 8: scatter plot of the data driven cutoff and mean values of gene expression in low and high risk groups of prognosis prediction analysis of 22g-TAG genes.
Scatter plots show the correlations of data driven cutoffs ( D cutoff) of 22g-TAG genes between different cohorts as a test of their reproducibility and robustness. Similarly, for mean values of gene expression in low and high risk groups' tumors. Kendall tau correlation was used to for calculating correlation coefficients and p-values. Figure 9: Box plot of the relative expression based on qPCR data of 22-TAG genes.
Box plots of the relative expression based on qPCR validation data of 22g-TAG genes expression in HG1 (n=8), HG1-like (n=10), HG3-like (n=14) and HG3 (n=48) tumors of Origene cohort. Two-tailed Wilcoxon test was used to assess the differences in the expression profile between different combinations of histological and genetic grades grades.
Figure 10: box plots of AG and mutation counts per samples for different histological and genetic grades. A) : box plots for of the number of amplified or deleted genes per sample separately for different histological and genetic grades.
B) : box plots of the count of missense, nonsense and silent mutations per sample separately for different histological and genetic grades.
The difference in the number of altered genes or mutations counts between different combinations of genetic grades was assessed statistically using two-tailed Wilcoxon test.
Figure 11 : Unsupervised hierarchical results.
Heatmap of gene expression of differentially expressed genes between HG1-like and HG3- like tumors clustered by unsupervised hierarchical clustering. Euclidean based distance measurement and average linkage agglomerative methods were used for hierarchical clustering.
Figure 12: cumulative distribution of copy number variation of 22q genes.
Cumulative distributions of median values of copy number signal intensities of 556 genes of 22q in LGG and HGG tumors.
Figure 13: hierarchical clustering results performed on 106 genes associated with 21 embryonic stem cells.
Heatmap of gene expression profiles resulted from hierarchical clustering of 106 genes expressed in 21 different embryonic stem cells according to SAGE database. Euclidean distance as distance measurement and average linkage agglomerative methods were used for hierarchical clustering. Materials and methods
Data source and preprocessing
Clinical information and gene expression data for the Uppsala, Stockholm, Singapore and Marseille BC cohorts were obtained from the NCBI/GEO database series GSE4922, GSE1456, GSE4922 and GSE21653, respectively.
The Cancer Genome Atlas (TCGA) data is available at multiple levels of preprocessing steps for each data type. We used gene expression, DNA mutation, and DNA copy number variation (CNV) data for IDC. Each data type was downloaded at preprocessing level appropriate for our subsequent analysis [Cancer Genome Atlas, N., Nature, 2012. 490(7418): p. 61-70].
Level 2 TCGA Gene expression data, profiled using Agilent Technologies G4502A, was downloaded. Data was already normalized against Stratagene Universal References RNA, and then Lowess normalization was applied for each probeset (n=90,797). We restricted our analysis to Invasive Ductal Carcinoma (IDC) of no special type (NST), which constitutes 82% of the cohort (481 of 590 samples). Among the 481 samples, there are 48 normal samples from tumor adjacent tissues, and 3 unknown histological grades. The distribution of the histological grades of the remaining 430 samples is uneven (HG1= 32 (7.4%), HG2=183 (42.6%), HG3=215 (50%) samples). The information about histologic grades has been manually extracted from the available unanimous histologic reports of TCGA database. Level 1 CNV data corresponding to our 430 IDC samples was downloaded from TCGA (upon General Research Use access approval). This subset of samples consists of 860 samples (430 tumor/normal pairs). CEL files were imported into Partek® Genomics Suite software for the extraction of aberrant genomic regions in any tumor sample with respect to its corresponding matched normal DNA sample extracted from blood (paired analysis). A circular binary segmentation algorithm was chosen to infer the regions with genomic aberrations using the default parameters (10 minimum markers in the detected region and t- test p-value <0.001 between the altered region and its neighbor region). Genes included in each reported genomic region were extracted using Refseq data. The data was then converted into a two-dimensional matrix in which the rows represent the genes, the columns represent the samples, and the data values represent the mean value of CNV marker intensities of the reported aberrant region that harbors a given gene in a given sample.
Level 2 DNA somatic mutation data were downloaded from TCGA identified using exome sequencing. The mutation annotation file (MAF) contains information about the mutated genes, mutation genomic coordinates, type of mutation, and genotype calls of the tumor and reference normal samples for each patient. Only 418 samples are common with the chosen 430 IDC samples. Data were converted into a two-dimensional matrix in which the rows and columns represent the genes and samples, respectively, and the data points represent the number of distinct mutated sites of a given gene in a given sample.
Prediction Analysis of Microarray (PAM):
PAM is a modified nearest-centroid method used for features selection and class prediction analyses [Tibshirani, R., et al., Proceedings of the National Academy of Sciences, 2002. 99(10): p. 6567-6572]. In this work, we used it for dimensionality reduction to obtain most informative and representative features from the entire set of microarray probesets that discriminate between HG1 and HG3 tumors. PAM was implemented via the "pamr" R package.
Statistically Weighted Syndrome (SWS):
SWS is a statistics-based voting class prediction and feature selection method. It selects the most informative variables (prediction features), categorizes them and tests the stability of the classification border of a feature domain of the training set based on sampling and a leave-one-out procedures [Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83; Kuznetsov, V.A., et al., Mathematical and Computer Modelling, 1996. 23(6): p. 95-119].. SWS was also implemented in Recognition software (http://www.solutions-center.ru/index.php?sct=prod) We used the features resulted from PAM analysis to sub-classify HG2 samples into HG1-like and HG3-like tumors based on SWS algorithm
Normalization of probeset expression and identification of reference-deviated genes per sample (RDG)
For each TCGA IDC tumor sample, we normalize the expression of each probeset with respect to the reference normal expression for that same probeset. This reference is represented by its median expression in the 48 normal samples. The normalized probeset expression relative to the reference normal dataset can be referred to as the fold-change. The data was already normalized by Lowess normalization for all chips. Variation of coefficients for 25%, 50% and 75% quartiles for all chips are 0.059, 0.014, and 0.054 respectively.
For each TCGA IDC tumor sample, to identify RDG, fold change criteria > 1.25 or < 0.75 were used. The number of RDG for each TCGA IDC tumor sample can be calculated independently and compared across the genetic grade subgroups.
Copy number variation visualization
Median values of the CNV intensities for LGG and HGG tumors and Wilcoxon test p-value for each gene were plotted against the genomic coordinates of its transcription start site. A chromosomal region is considered altered if the median values of its genes pass one of the global thresholds of loss or gain (i.e., greater than 50% of the patients undergo the CNV event).
Identification of differentially expressed genes
A two-tailed Wilcoxon test was used to assess the significance of the differential expression, and Benjamini-Hochberg (FDR) correction was used for multiple hypothesis testing. Differentially expressed probesets were selected based on fold-changes (FC > 1.5 or FC < 0.75) and statistical significance (FDR <0.01 ).
Association analysis of gene expression and copy number variation
Kendall tau correlation was utilized to study the association of CNV and corresponding mRNA expression for each gene. CNV and mRNA expression data matching was performed using Agilent 244K Custom Gene Expression G4502A-07-3 annotation data provided by TCGA data portal. Functional enrichment and gene ontology analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID) [Da Wei Huang, B.T.S. and R.A. Lempicki, Nature protocols, 2008. 4(1): p. 44-57] tool was used to identify the top enriched biological processes among the differentially expressed genes through the Gene Ontology (GO) annotation database. Input of unique Entrez genes IDs was compared with a background gene list constitute all the genes in the genome using Hypergeometric test. Functional annotation chart constitutes of molecular functions, biological processes, cellular components, KEGG pathways, tissue expression, and chromosome number was reported.
Network analysis of 22g-TAG genes
The MetaCore tool (Thomson Reuters, St. Joseph, Ml, USA) was used to build the genes network associated with 22g-TAG genes (https://portal.genego.com/).
Hierarchical clustering (HC) and heatmap visualization
Multi-experiment viewer version 4.9.0 was utilized to conduct HC and heat map visualization of numerical matrices. Euclidian distance and average linkage agglomerative method were used to achieve HC.
Data-driven prognosis analysis based on gene expression profile
Survival analyses were conducted using a data-driven grouping (DDg) algorithm which relies on a Cox-proportional hazard regression model to fit the patients' survival times to gene expression data (see supplementary material in [Tang, Z., et al., Int J Cancer, 2014. 134(2): p. 306-18]). It searches for the best cutoff of the expression of a given gene that maximizes the separation of the survival curves of the patients into high- and low-risk groups for each gene. DDG has been successfully used in prognosis of breast, glioblastoma and ovarian cancer patients [Tang, Z., et al., Int J Cancer, 2014. 134(2): p. 306-18; Chan, X.H., et al., Cell Rep, 2012. 2(3): p. 591 -602].
Univariate and Multivariate analyses were conducted using the "survival" R package version 2.37-7.
qPCR based validation of 22g-TAG genes
Total RNA samples of 84 IDC patients were obtained from OriGene (patients' clinical parameters are summarized in Table 20A). The concentration of the RNA was provided by OriGene, reconfirmed using a Nanodrop® spectrophotometer, and normalized. cDNA synthesis from 250 ng total RNA was conducted using a QuantiTect® Reverse Transcription Kit based on random hexamer and Oligo (dT) primers. qPCR experiments were conducted in 96-well plates using the QuantStudio™ 6 Flex Real-Time PCR System. The KAPA SYBR® FAST qPCR Kit was used for qPCR experiments, and low Rox was used as a passive reference dye. Primers were designed using primer3 (v. 0.4.0) [Untergasser, A., et al., Nucleic Acids Res, 2012. 40(15): p. e115], and the specificities of obtained primer pairs were tested computationally using BLAT [Kent, W.J., Genome Res, 2002. 12(4): p. 656-64] and in-silico PCR on the UCSC genome browser [Kent, W.J., et al., Genome Res, 2002. 12(6): p. 996-1006]. The primer pair sequences for 22g-TAG are listed in Table 20B. We used β-actin as an endogenous control. The obtained Ct values of all genes were analyzed using the 2" ΔΔα method [Livak, K.J. and T.D. Schmittgen, Methods, 2001. 25(4): p. 402-8].
Feature selection methods and identification of the 22-gene tumor aggressiveness grading classifier
We studied the gene expression data of 430 TCGA IDC samples profiled using Agilent G4502A. The tumors consisted of the following histological grades: 32 HG1 , 183 HG2 and 215 HG3 tumor samples (Table 1 A).
In this study, we proposed that HG2 tumors are genetically heterogeneous and include tumors which oncogenic pathways could be separated into two distinct subclasses similar to either HG1 or HG3 tumors. To test this hypothesis for TCGA dataset, we applied a trained pattern recognition classifier to the intermediate HG2 tumors and evaluate the ability of the classifier to stratify HG2 tumors into HG1-like or HG3-like tumors (Figure 1A).
The workflow of our analysis is presented in Figure 1 B. Due to the high dimensionality of the feature space (n=90,797 probesets), we used a two-step analyses consisting of 1) feature selection procedure to reduce the biomarker space and 2) pattern recognition analysis for training a classifier to distinguish between two tumor classes. The number of patients with HG1 tumors was much smaller (32 patients) than in HG3 tumors (215 patients), demonstrating the imbalanced training set. It is known, that balanced dataset is very important for creating a robust and accurate training set [Rahman, M.M. and D. Davis, International Journal of Machine Learning and Computing, 2013. 3(2): p. 224-228]. To overcome the imbalance in the classes size of the training data, under-sampling of the majority classes were performed to avoid the bias in training accuracy toward the majority class [Rahman, M.M. and D. Davis, International Journal of Machine Learning and Computing, 2013. 3(2): p. 224-228].
Addressing imbalance problem [Rahman, M.M. and D. Davis, International Journal of Machine Learning and Computing, 2013. 3(2): p. 224-228; Sun, Y., A.K.C. WONG, and M.S. KAMEL, International Journal of Pattern Recognition and Artificial Intelligence, 2009. 23(04): p. 687-719], our method shuffled the 215 HG3 tumor expression profiles and separated them into seven non-overlapping (independent) subgroups (Figure 6). First, our method used the prediction analysis of microarray (PAM) [Tibshirani, R., et al., Proceedings of the National Academy of Sciences, 2002. 99(10): p. 6567-6572; T. Hastie, R.T., Balasubramanian Narasimhan and Gil Chu, pamr: Pam: prediction analysis for microarrays. 2013]. The algorithm selects the most differentially expressed genes (DEG) (represented by the microarray probesets) that discriminated HG1 and HG3 tumors in our seven training sets. These training sets resulted in the seven statistically reproducible classification signatures (The training accuracies and numbers of features are shown in Table 4). We selected 39 common probesets (corresponding to 22 genes) from the seven PAM- derived signatures. The 22 genes comprise BUB1, CAPN8, CDC45, CDCA5, CDCA8, CENPA, CENPN, FAM72B/FAM72A, KIF13B, KIF14, KIF2C, MCM10, MELK, MTFR2, MYBL2, NAT1, NOSTRIN, ORC6, PIF1, SHCBP1, TICRR, and UBE2C.
After reduction of the biomarker space to these 39 most representative probesets, we used the statistically weighted Syndrome (SWS) pattern recognition algorithm that outperforms PAM when a small number of features are used for training sets [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301; Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83]. It controls the stabilization of the prediction based on re-sampling and performs robustly in classification of small sample size of datasets [Kuznetsov, V.A., et al., International Journal of Computer Science and Network Security, 2006. 6: p. 73-83; Kuznetsov, V.A., et al., Mathematical and Computer Modelling, 1996. 23(6): p. 95-119]. Similar to PAM analysis, SWS was performed for seven training/prediction sets to address the size imbalance of training classes. The average accuracy of SWS was 90.5±3.4% (with average sensitivity of 90.2±3.7%, average specificity of 91.5±5.3%).
Next, during the prediction step, each HG2 tumor was assigned to either HG1-like or HG3- like sub-class. The overall prediction for each sample was based on the consensus agreement across the seven trained SWS classifiers. Consensus agreement is determined by the number of times a sample assigned to a given subclass with an assigning probability threshold (p> 0.7). The tumor samples that showed predicted probability in an uncertainty zone (0.5±0.2) was classified as the "HG2-like" class. According to these criteria, 55.2% (101/183) and 42.6% (78/183) of HG2 tumors were assigned to HGI-like and HG3-like tumor type, respectively. The remaining 2.2% (4/183) of HG2 tumors could not be classified (interpreted as 'true HG2' and/or erroneous class). The distributions of the assigning probabilities of all training-prediction iterations are presented in Figure 7. Summary of clinical information for HG2 tumors subclasses is shown in Table 1 B.
We provide a threshold of each probeset signal intensity value that signifies "low" or "high" expression level. The threshold expression values are important characteristics of medical classification system and they were listed in Table 5A. We refer to this table as 22g-TAG classifier. All 22g-TAG genes are differentially expressed between HG1 and HG3 tumors (Table 5A, examples of 5 genes are shown in Figure 2A)
Comparison of the 22g-TAG classifier genes with 72 known signatures, including alternative molecular tumor grading signatures To test the novelty of genes in 22g-TAG, we compared our 22g-TAG with reference lists of 72 BC gene signatures previously published in other studies and collated by our group [Ow, G.S., et al., in 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2013. Singapore: IEEE Publishing; Ow, G.S. and V.A. Kuznetsov, BMC Genomics, 2015. 16 Suppl 7: p. S2] (including 2 grading signatures from previous studies [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301 ; Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72] ). Only one gene (CAPN8) can be considered as novel IDC-associated gene. Because most of 22g-TAG genes have been annotated as cell cycle genes, often considered the main hallmark of cancer, we assumed that a large proportion of 22g-TAG genes would be found in other gene signatures (Table 6). Indeed, we found that the genes ORC6 (origin recognition complex, subunit 6 like (yeast)) and PIF1 (5'-to-3' DNA helicase homolog (S. cerevisiae) were observed in one of the 72 IDC signature gene lists. Consequently, they could also be considered "novel" BC-related genes and potential therapeutic target. CAPN8 is a protease that plays a role in membrane trafficking of gastric cells and protection of gastric mucosa [Hata, S., et al., J Biol Chem, 2007. 282(38): p. 27847-56; Hata, S., et al., PLoS Genet, 2010. 6(7): p. e1001040]. PIF1 plays critical roles in DNA replication, cell growth, G-quadruplex, and R-loops resolving [Zhou, R., et al., Elite, 2014. 3: p. e02190; Gagou, M.E., et al., Oncotarget, 2014. 5(22): p. 11381-98; Sanders, CM., Biochem J, 2010. 430(1 ): p. 119-2834]. ORC6 is an important cell cycle-related gene involved in DNA replication initiation and chromosome segregation [Prasanth, S.G., K.V. Prasanth, and B. Stillman, Science, 2002. 297(5583): p. 1026-31].
Interestingly, five genes (CDC45, KIF13B, ORC6, SHCBP1, and CAPN8) were not present in previously reported molecular tumor grading signatures [Ivshina, A.V., et al., Cancer research, 2006. 66(21 ): p. 10292-10301 ; Sotiriou, C, et al., J Natl Cancer Inst, 2006. 98(4): p. 262-72]. MELK, MYBL2, and CDCA8 were the most common and were observed in 20, 18, and 16 BC signatures, respectively.
22g-TAG signature genes are potential prognostic markers
According to our data driven grouping (DDg) prognosis analysis (see Methods), all 22g-TAG genes were significant for patient survival (log-rank test FDR<0.05) and showed consistent pattern (oncogene-like/tumor suppressor-like) in at least three of four independent validation cohorts (obtained from GEO dataset IDs: GSE1456 (Stockholm), GSE4922 (Singapore and Uppsala), and GSE21653 (Marseille)). Therefore, they could be considered as perspective prognostic markers (Table 7). Moreover, the data-driven expression threshold values of survival prediction analysis of the genes and their mean expression in the low- and high-risk tumor development groups are significantly correlated (Kendal's tau correlation p<0.05) among at least three cohorts (Figure 8). Generally, the 22g-TAG signature outperformed other clinical parameters in the stratification of patients into prognostically meaningful groups, according to univariate and multivariate survival analyses based on a Cox- regression model in at least three of the four validation cohorts (Table 8). Collectively, the 22g-TAG signature genes are potentially reliable prognostic markers.
22g-TAG signature genes are involved in cell cycle/mitosis and oncogenic pathways To study the biological relevance of the 22g-TAG genes, we performed gene ontology (GO) enrichment analysis and found that these genes are strongly enriched in cell cycle/mitosis gene ontology categories (p < 0.01 , Table 9).
Furthermore, using published datasets reporting the lists of periodically expressed cell cycle genes and CycleBase database [Santos, A., R. Wernersson, and L.J. Jensen, Nucleic Acids
Res, 2015. 43(Database issue): p. D1 40-4; Gauthier, N.P., et al., Nucleic Acids Res, 2010.
38(Database issue): p. D699-702; Gauthier, N.P., et al., Nucleic Acids Res, 2008.
36(Database issue): p. D854-9] containing experimentally defined cell cycle genes, we found that 80% (18/22) of 22g-TAG genes are periodically over-expressed in the cell cycle and show successive expression peaks within the cell cycle (mostly in the G2/M phase, Figure
2B, Table 5B).
To further explore the relationships and interconnectivity among the signature genes and other cancer related genes, we conducted network analysis using the MetaCore software (Thomson Reuters, St. Joseph, Ml). MetaCore includes manually curated knowledge database about annotated genes, their products and functional interactions. The 22g-TAG gene symbols were used as the seed nodes for "extension" of the gene network via finding the shortest path between any two genes of seed node set with maximum two intermediate nodes (genes or their products). Results showed strong association of 22g-TAG genes with key cancer-related genes such as TP53, AURKA, TOP2A, E2F1, and MYC, and that this network was generally associated with the mitotic cell cycle biological process (p = 9.1x10" 3 ). KIF2C and MYBL2 represent the convergence and divergence hubs, respectively for this network highlighting their role in IDC aggressiveness (Figure 2C). Two genes of 22g-TAG (KIF2C and NAT1) could be potentially druggable genes according to the drug-gene interaction database (DGIdb) [Griffith, M., et al., Nat Methods, 2013. 10(12): p. 1209-10], whereas 10 genes of 22g-TAG associated network are druggable (AR, AURKA, AURKB, CDK1, CDK2, MYC, PLK1, SMAD2, TOP2A, and 7P53). Collectively, these analyses suggest that the most 22g-TAG genes are molecularly interconnected and could act in concert with other genes during mitosis, specifically, during G2/M phases.
Quantitative PCR-based validation of the 22g-TAG genes as a grading signature in an independent cohort
For further confirmation of the validity of the 22g-TAG signature as a tumor grading and prognostic signature, qPCR was conducted on 84 RNA samples of BC patients obtained from OriGene (see Methods). CT values for each gene were obtained and normalized against endogenous control and normal tissue samples (n=4) using the 2 Ct method [Livak, K.J. and T.D. Schmittgen, Methods, 2001. 25(4): p. 402-8]. Obtained fold change values were used for the re-classification of HG2 samples using the SWS algorithm. HG1 (n=8) and HG3 (n=48) tumors were used for training, and HG2 tumors (n=24) were used as a class discovery set. Again, we used under-sampling to address the training classes' size imbalance. For that, HG3 samples were shuffled and split into 3 non-overlapping sets of 16 samples each. Three training-prediction sets were performed using the SWS algorithm. HG2 tumors were finally sub-classified based on the consensus sub-classification of the three prediction iterations. The average training accuracy is 83.3% (sensitivity: 66.6 ± 7.2%, specificity: 91.7± 3.6%).
HG2 tumors (n=24) were re-classified into HG1-like (n=10) and HG3-like (n=14) tumors. Because of the small number of HG2 samples, the prognostic survival levels of HG1-like and HG3-like patients were not significantly different. However, one of the HG1-like patients (10%) versus four of the HG3-like patients (28.6%) experienced a tumor relapse during the follow-up period. Furthermore, the survival difference between patients dichotomized onto low grade (HG1+HG1-like) and high grade (HG3-like+HG3) tumors is significant (log-rank test p=1.9x10"2, Figure 2D).
Remarkably, all 22g-TAG genes show a consistent expression trend across molecular grades according to both qPCR and microarray gene expression datasets (Figure 9). Boxplots of the relative expression across different genetic grades for two genes are shown in Figure 2E. Expressions of all genes significantly correlate with each other based on qPCR data. Interestingly, oncogene-like genes correlate positively with each other but negatively with tumor suppressor-like genes, and vice versa (Figure 2F). Collectively, the sub- classification of HG2 into biologically and clinically meaningful classes by 22g-TAG signature genes is reproducible across different patients' cohorts and gene expression platforms.
Now, after we have assessed the validity of 22g-TAG as grading and prognostic signature, we will study the IDC/HG2 subclasses resulted from this signature.
HG1-like and HG3-like tumors have distinct transcriptome profiles
We characterized HG1-like and HG3-like tumors, resulted from 22g-TAG, using integrative genomics and transcriptomics data analysis (Figure 1 B). Starting with global gene expression profiles, we identified and studied differentially expressed genes (DEG) between HG1-like (n=101) and HG3-like (n=78) tumors. We selected 4,933 differentially expressed probeset based on the fold-changes (FC > 1.25 or FC < 0.75) and the statistical significance of a two-tailed Wilcoxon test (Benjamini-Hochberg (FDR) <0.01 ). These probeset signals correspond to RNA transcribed by 2147 genes: 887 genes (777 protein-coding, 26 pseudogenes, 33 ncRNA, 1 snoRNA, and 50 unknown transcripts) and 1 ,260 genes (1099 protein-coding, 83 pseudogenes, 18 ncRNA, and 60 unknown transcripts) were down- regulated and up-regulated, respectively, in HG3-like tumors with respect to HG1-like tumors (Table 10).
GO enrichment analysis for the down-regulated genes revealed significant association with cell adhesion (Benjamini p-value=5.5x10"5), extracellular matrix cellular component (Benjamini p-value=4x10"22), focal adhesion pathway (Benjamini p-value=8.5x10"7), cytoskeleton organization (Benjamini p-value=7.7x10~5), and response to hormone stimulus (Benjamini p-value=7.3x10"4). Up-regulated genes are strongly associated with the cell cycle (Benjamini p-value=2.5x10"66) , M phase (Benjamini p-value = 1.1x10"56), chromosome segregation (Benjamini p-value = 5.5x10"23) DNA repair biological processes (Benjamini p- values=5.6x10~16), DNA replication pathway (Benjamini p-values<1.2x10"17) and are related to the chromosome, kinetochore, and microtubule cellular components (Benjamini p- values<1.1x10"6). Interestingly, the gene locations of up-regulated genes are enriched in specific chromosomes, such as chr8, chr17, chr20, and chr22 (Benjamini p-values< 3x10"5, Table 11).
We provide a quantitative measurement of the number of expressed genes per sample which expressions are deviated from reference (genes are represented by their assigned probesets). This reference is defined by the median values of the same genes in normal tissue to obtain a fold change profile with respect to the reference. Using fold change thresholds (FC≥ 1.25 or FC < 0.75) we count the number of genes that satisfied these criteria per sample. We call these genes as reference-deviated genes (RDG). We found that HG3-like tumors have significantly larger number of RDG than HG1-like tumors (p = 3x10"8, Figure 3A). Therefore, on the genome scale, HG1-like and HG3-like tumors have distinct gene expression profiles that were associated with distinct molecular functions when compared with each other and also with normal breast tissue.
HG1-like and HG3-like tumors are distinct in their genomic constitution
We studied HG1-like and HG3-like tumors at the DNA level to characterize the DNA copy number variation (CNV) and point mutation events of each tumor subclass (Figure 1 B). We studied Affymetrix human genome-wide SNP 6.0 array data for HG1-like (n=101) and HG3- like (n=77) tumors that were also profiled by gene expression. The CNV data of each individual sample were analyzed by Partek® Genomics Suite software (see Methods). We transformed the CNV signal intensities into log2 values with respect to diploid status (i.e., transformed CNV signal intensity of diploid locus = 1 ). To determine whether a gene is amplified or deleted in a given tumor, thresholds of 1.25 for gene gain and 0.75 for gene loss were applied to CNV signal intensities of HG1-like and HG3-like tumors. As a primary analysis, to detect the differences between HG1-like and HG3-like tumors, the number of altered genes (gain or loss) in each sample was determined based on the previously mentioned thresholds. HG1-like tumors exhibited fewer altered genes (AG) per sample than did HG3-like tumors. This difference in the overall number of AG was assessed statistically using a two-tailed Wilcoxon test (p =1.2x10"7, Figure 3B). This significant difference was also observed in both the loss and gain of genomic regions (p = 9.5x10"7 and 7.8x10"6 for gene loss and gain, respectively; Figure 10A).
Next, we studied 25,172 unique gene symbols (annotated genes resulting from segmentation analysis, see Methods) and identified individual genes that exhibit differential copy number status between HG1 -like and HG3-like tumors. Differentially altered genes (DAG) were selected based on the following criteria: a) the median value of CNV intensity of either HG1-like or HG3-like tumors passes the thresholds for gain or loss (1.25 and 0.75, respectively), and b) the CNV profiles of HG1-like and HG3-like tumors are significantly different (p<0.05). Our results reveal 1 ,214 DAG (925 protein-coding, 242 ncRNA, 32 pseudo, 14 snoRNA, and 1 snRNA, Table 12). These genes include well-known altered genes important for BC initiation, development, and progression. For instance, the TP53 gene, located on chromosome 17p, is deleted in 37% (37 of 101 samples) of HG1-like tumors and 64% (49 of 77 samples) of HG3-like tumors.
Further analysis of the genes' loci showed that many of these altered genes are enriched in a few chromosomes. Specifically in HG1-like tumors, there is deletion of part of 16q. In contrast, HG3-like tumors showed gains in 8q, 17q, and 20q and losses in 8p, 11q, and 17p (Table 2A). It is notable that the chromosomes that harbor the DAG are the same chromosomes in which the DEG are enriched (chromosomes 8, 17, and 20).
Therefore, genes within these chromosomes could be considered as the major players in initiation and maintenance of differential level of malignancy that distinguish between HG1- like and HG3-like tumors at both the DNA and mRNA levels.
HG1-like and HG3-like tumors are distinct in their DNA point mutational profiles
Also, we conducted DNA point mutation analysis to study the mutations counts in HG1-like (n=98) and HG3-like (n=78) tumors (Figure 1 B). We calculated the numbers of mutated sites (mutations counts) for all the genes in each tumor. Subsequently, we assessed the difference in the mutations counts in HG1-like and HG3-like tumors using a two-tailed Wilcoxon test. We found that HG1-like tumors generally exhibited lower mutations counts than HG3-like tumors (p = 2.2x10"6, Figure 3C). This difference was consistent for the three most frequent types of mutations in our data- missense, nonsense, and silent mutations (p = 3.2x1 ο*6, 6.4x10"4, and 1x10"5, respectively, Figure 10B). The two most frequently mutated genes are TP53 and PIK3CA. These two genes are the only genes that show a significant association in their mutational status with the sub-classification of HG2 tumors, as assessed using Fisher's exact test of independence. As we expected, TP53 showed significantly lower mutations counts in HG1-like tumors (12 of 98 samples; 12.5%) than in HG3-like tumors (28 of 78 samples; 35.9%), (p = 1.5x10*4). However, PIK3CA exhibited higher mutations counts in HG1-like tumors (44 of 98 samples; 44.9%) than in HG3-like tumors (21 of 78 samples; 26.9%), (p = 1.4x10"2) (Figure 3D).
These results suggest essential differences in mutations frequency in TP53 and PIK3CA provide mutagenesis background, strongly discriminating HG1-like from HG3-like tumors. Reclassified HG1-like and HG3-like tumors from the HG2 tumors are genetically comparable to HG1 and HG3 tumors, respectively
Figures 3A, B, C, and Figure 10 show that there is no statistically significant differences between HG1 and HG1-like tumors as well as between HG3-like and HG3 with respect to RDG, AG, and mutation counts in all cases (p-values>0.05). Moreover, no DEG between HG1 and HG1-like were detected whereas 1 ,837 DEG were detected between HG3-like and HG3 but these genes did not show significant enrichment in any biological process, cellular component, or pathway for the up- or down-regulated genes (Benjamini p-values > 0.01 ). Only a few molecular functions showed significant enrichment and are associated with ATP binding. Next, we performed unsupervised hierarchical clustering (HC) of expression profiles of all IDC presented in TCGA database. Only 4,933 probesets that were identified as differentially expressed expression signal in the comparison of HG1-like and HG3-like tumors were used for HC analysis. Using Euclidean distance and the average linkage agglomerative method, HC revealed two major clusters: 78% (104 of 133 samples) of HG1 and HG1-like tumors were enriched in one cluster, and 89% (261 of 293 samples) of HG3 and HG3-like tumors were enriched in the other cluster (Figure 11 ). Large and positive value of Cohen's Kappa correlation coefficient suggests a high level similarity between classification results of SWS and HC methods (κ= 0.67, p = 3.1x1ο*43; Table 13).Thus, the results confirmed the 2-clusters pattern of all IDC derived due to 22g-TAG classifier.
We studied the DAG between HG1-like and HG1 tumors. Using the same criteria used for HG1-like and HG3-like tumors, we found only 12 significant DAG between HG1-like and HG1 tumors; 1 1 genes are on chromosome 16, and one is on chromosome 1 (Table 2B). Similarly, for HG3-like and HG3 tumors, we found 680 significant DAG enriched primarily in chromosomes 11, 16, and 17 (Table 2C). These results suggest more diversity between HG3-like and HG3 tumors than between HG1-like and HG1 tumors.
Generally, 16q loss occurred more frequently in HG1 and HG1-like tumors compared with HG3-like and HG3 tumors. For example, an important centromeric protein-encoding gene located on 16q, CENPT, exhibited loss in 88%, 70%, 54.5%, and 46.7% of HG1 , HGMike, HG3-like, and HG3 tumors, respectively (Fisher-exact test p=1x10"26) and thus, the CNV of this gene locus could be used as structural biomarker of the aggressiveness of IDC.
Finally, we observed no correlation between HG1 and HG1-like according to the mutation status of all genes in the dataset. Similarly, for HG3-like and HG3 tumors, no genes showed any significant correlation between their mutation status and grade classification with the exception of TP53 (p-value=0.016). Thus, our results from the comparisons of HG1-like with HG1 tumors as well as HG3-like with HG3 tumors from the perspectives of transcript expressions, CNV or mutations revealed the relative homogeneity of HG1/HG1-like tumors and that of HG3/HG3-like tumors. Overall, these findings suggest multi-layered molecular dichotomization of IDC into LGG and HGG classes, predetermined by 22g-TAG classifier, specific patterns of DNA alterations and point mutations.
Grading reclassification of IDC tumors correlates with intrinsic molecular subtypes and 16q loss
The intrinsic subtypes information of the tumors was obtained from TCGA network [Cancer Genome Atlas, N., Nature, 2012. 490(7418): p. 61-70], of which PAM50 model [Parker, J.S., et al., J Clin Oncol, 2009. 27(8): p. 1160-7] was used to achieve the classification for each sample. A contingency table of the frequency of 5 different subtypes (normal-like, luminal-A, luminal-B, basal-like, and HER2-enriched subtypes) versus the 4 classes of grading classification (HG1 , HG1-like, HG3-like, and HG3) was generated. Luminal-A tumors are enriched and distributed in HG1 and HG1-like tumors (low genetic grade/LGG), whereas luminal-B, HER2-enriched, and basal-like tumors are enriched in HG3 and HG3-like tumors (high genetic grade/HGG). The association of LGG with luminal-A/normal-like and that of HGG with luminal-B/HER2-enriched/basal-like tumors is significant (Chi-square p-value =4.3x10'39, Table 3A). In parallel, because of the small number of normal-like samples in TCGA cohort, we performed a similar analysis for the grading classification of the Uppsala and Stockholm cohorts studied previously by Ivshina et al., in which HG2 tumors were sub-classified based on their 5-genes grading signature [Ivshina, A.V., et al., Cancer research, 2006. 66(21): p. 10292-10301]. Therefore, we compared the reclassified LGG and HGG tumors of the Uppsala and Stockholm cohorts with their intrinsic molecular subtypes (Table 3B). The results showed that 73.7% of LGG tumors (177 of 240 samples) were strongly associated with normal-like and luminal-A tumor subtypes. In contrast, 80% of HGG (124 of 155 samples) tumors are strongly associated with luminal-B, ERBB2+, and basal molecular tumor subtypes (Chi-square p-value =2.8x10"45). The associations obtained from independent analyses of both TCGA and the Uppsala/Stockholm data are consistent (Table 3A and 3B).
To analyze the homogeneity of HG1 and HG1-like tumors with respect to intrinsic subtypes, we performed a chi-square test of homogeneity between HG1 and HG1-like subgroups and the enriched intrinsic molecular subtypes within them (p > 0.05, Table 3B). The lack of statistical significance suggests that HG1 or HG1-like subclasses could be similar. Similar results were observed for HG3 and HG3-like tumors (p > 0.05, Table 3A and B). Together, the homogeneity between HG1-like and HG1 tumors and the homogeneity between HG3- like and HG3 tumors seems to suggest -the lack of a distinct intermediate grade between LGG and HGG. These results also provide plausible evidence to support the unlikelihood of inter-grade progression from the LGG to HGG classes.
Interestingly, this current subtype grouping was further corroborated by studies performed previously using a different classification method based on the expression of genes located on 16q [Wennmalm, K., et al., Chromosomes and Cancer, 2007. 46(1): p. 87-97], further suggesting that our sub-classification may also be associated with 16q copy number variation status.
Gene expression, copy number variation, and mutation data provide a molecular basis for the genome wide re-classification of IDC into clinically distinct LGG and HGG tumor classes
We considered LGG and HGG tumor classes as the two major classes of IDC, which are supposed to have distinct genomic background and perhaps distinct oncogenic pathways, cellular functions, and therapeutic specificity. Therefore, we performed similar analyses of gene expression, CNV, and point mutations for these major classes (LGG n=133 and HGG n=293).
For DEG, we selected 14,357 (16% of 90,797 total probesets) differentially expressed probesets corresponding to 5,691 genes. Of these, 2,618 genes (2,285 protein-coding, 99 pseudogenes, 101 ncRNA, 2 snoRNA, and 131 unknown) and 3,073 genes (2,594 protein- coding, 187 pseudogenes, 84 ncRNA, 2 snoRNA, and 206 unknown) are down- and up- regulated, respectively, in HGG tumors with respect to LGG tumors (Table 14).
GO functional enrichment analysis for down-regulated genes showed significant associations with cell adhesion (Benjamini p-value= 5.34x10"11), response to steroid hormone stimulus (Benjamini p-value= 3.2x1 fJ5) biological processes, extracellular matrix and basement membrane cellular components (Benjamini p-value < 1.4x10"3), and with PDGF signaling pathway (Benjamini p-value = 3.9x10"3, Table 15). Up-regulated genes are associated with cell cycle, chromosome segregation, and DNA replication biological processes (Benjamini p-value < 1.3x10"14) and involved in kinetochore and spindle microtubule cellular components (Benjamini p-value < 1.9x10*3), and the genes are strongly enriched among the genes expressed in epithelial tissues (Benjamini p-value 2.7x10"12, Table 15). It is noteworthy that the up-regulated genes in HGG tumors are significantly enriched on chromosomes 2, 8, 16, 20, and 22 (Benjamini p-value < 1.4x10"4). Furthermore, HGG tumors have higher number of RDG than in LGG tumors (p = 2.8x10"13, Figure 3A). The high number of differentially expressed genes and the functional and chromosomal enrichment of these genes indicate essential distinct genomic and transcriptomic profiles of LGG and HGG tumors.
For CNV data, we compared the number of AG in the LGG and HGG tumor subclasses. The results revealed that the difference in AG between LGG and HGG is significant (p = 3.7x10" 16, Figure 3B). Furthermore, these patterns were also observed when deleted or amplified genes were analyzed separately (Figure 10A). In all cases, there were more AG in the HGG than in the LGG tumors (median HGG: 3,565 genes per sample; LGG: 1 ,875 genes per sample). Next, DAG analysis between LGG and HGG reveal 1 ,858 DAG (1 ,432 protein-coding, 347 ncRNA, 61 pseudo, 17 snoRNA, and 1 snRNA, Table 16) enriched in a few chromosomes (Table 2D). Specifically, 52% of the DAG (971 of 1 ,845 genes) are located on chromosome 16. Visualization of the copy number variation status across the chromosome arms showed that in LGG, there is a gain of 16p and deletion of 16q whereas HGG tumors showed gain of 8q and loss of 8p and 17p (Figure 4A). Our results provide plausible evidence to support the hypothesis that that LGG and HGG tumors are distinct at the genotype level. In particular, the deletion of 16q in the LGG tumors and the lack of deletion of 16q in the HGG tumors support the model of independent tumor progression into low or high grades. It is noteworthy that both DAG and DEG between LGG and HGG tumors share the enrichment in chromosome 8 and 16 (Table 2D, Table 15).
Based on our analysis of somatic mutation profiles, we found that LGG tumors have significantly fewer mutations than HGG tumors. Figure 3C shows a significant difference between LGG and HGG tumors (Wilcoxon p-value = 3.8x10"13). The three types of mutations (missense, nonsense, and silent) show the same trend in the difference in mutations per sample across genetic grades (Figure 10B). The mutation status of TP53 and PIK3CA show a significant correlation with the new classification into two major genetic classes (Fisher's exact test p-value for 7P53=8.3x10"15, and for P/K3G4=6.1x10"7). For p53, the frequency of mutations in this gene consists of the 10%(14/130) in the LGG tumors and the 48%(137/284) in the HGG tumors. This correlation is positive regarding HGG. Inversely, for PIK3CA the frequency of mutations in the gene consists of the 48%(62/130) in the LGG tumors and 23%(67/284) in the HGG tumors, suggesting the negative correlation relatively HGG. Other 10 genes also show significant correlation with the genetic grading into two major classes (Fisher's exact test p<0.05) where 2 genes (MUC4, TTN) are highly mutated in HGG tumors whereas 8 genes (CBFB, CTCF, MAP3K1, CHD8, DYSF, DNAH1, MAP2K4, GATA3) are highly mutated in LGG tumors (Figure 3E). Specifically, the high mutation rate of the genes in specific regions of LGG tumor cells with respect to HGG tumor cells supports the independence of the oncogenic pathways hypothesis for LGG and HGG tumors.
DNA copy numbers of the differentially altered genes are strongly associated with their corresponding gene expression profiles
To assess the mechanistic role of DAG in cancer progression, we analyzed the effect of the CNV of each gene on its gene expression profile. A correlation analysis was conducted between the mRNA profile and the corresponding CNV profile for each gene (see Methods). Interestingly, RNA expression and corresponding CNV were significantly correlated for approximately 52% of the DAG (FDR<0.01 , 976 of 1 ,845 genes). Moreover, the DAG (1 ,845 genes) have stronger correlation with their gene expression profile compared with non- differentially altered genes (non-DAG). Non-DAG, (n=23,327) and randomly matched copy number/expression (background/control) (Figure 4B). Wilcoxon test shows a significant difference between the correlation coefficients of the DAG (median = 0.34) and non-DAG (median = 0.24) (p=4.3x10"105); the DAG tend to have stronger positive correlations which reflect the importance of the CNV of these genes in driving their gene expressions that lead to functional distinction between LGG and HGG tumors.
Chromosome 22 copy number variation is a novel indicator of LGG and HGG independence.
Generally, a loss of genetic material in low grade tumors but not in high grade represents the striking evidence for the independence of the low- and high- grade oncogenic pathways (e.g. 16q loss). In addition to the loss of 16q in low-grade tumors, 22q shows low CNV signal intensities for LGG tumors compared with HGG tumors. Although the median values of CNV signal intensities for LGG tumors do not pass the threshold of copy number loss, the difference in copy number between LGG and HGG is significant. This difference is notable for genes located downstream of the centromeric region and at the sub-telomeric region (Figure 4C). Generally, LGG tumors have a lower 22q copy number than do HGG tumors (Wilcoxori test p-value= 4x10"179), as shown in a cumulative distribution of all of the 22q genes CNV intensities in Figure 12. Collectively, observed patterns of 22q CNV alterations provide plausible evidence to support the hypothesis that oncogenic pathways related to the LGG and HGG gene expression phenotypes are independent.
DNA copy number variation reflects sub-classification of HG2 tumors
We studied the discriminative potential of the CNV for classifying histological grades. DAGs between HG1 and HG3 tumors were determined using the same criteria used previously for the selection of DAG. We obtained 1 ,486 genes localized on 16p, 16q, 17p, 8p, and 8q. Next, we selected the top gene from each chromosome arm that has the minimum Wilcoxon test p-value as a representative marker for its chromosome arm CNV event. Therefore, 5 genes (LOC286114 for 8p, MYC for 8q, POLR3E tor 16p, HERPUD1 for 16q, and ZNFW for 17p) were selected for subsequent class discovery analysis. Using SWS algorithm, the classifier was trained using HG1 and HG3 tumors. Similar to the gene expression data, HG3 tumors were shuffled and divided into 7 non-overlapping groups, and 7 training-prediction subsets were performed. The average classification accuracy was 77±4.3%. HG2 tumors were sub-classified into HG1-like and HG3-like tumors in each training-prediction subset. Each HG2 sample was assigned to a new subclass according to the consensus classification in all 7 classifiers. According to these criteria, 93 samples were classified as HG1-like (67 of 93 samples matched with gene expression-based HG1-like samples), and 73 samples were classified as HG3-like (42 of 73 samples match with gene expression- based classified HG3-like samples). Sixteen samples showed intermediate assigning probabilities and were considered HG2. We have found significant positive agreement between the classifications of HG2 tumors based on gene expression and copy number variation data (Cohen's kappa coefficient = 0.32, p-value = 7.4x10"5, Table 17). These results indicate that our classification of IDC tumors into LGG and HGG tumors can be achieved at genomic and transcriptomic level. However, the agreement between the mRNA- based and DNA-based classification is moderate, perhaps due to the differences in mechanisms of regulation at these two levels of molecular organization of gene expression.
The LGG and HGG grading classification is associated with the differential expression of stem cell genes
To relate the grading classification with tumor sternness, we investigated whether the genes associated with stem cells were enriched among DEG between HG1-like and HG3-like samples. We used Cancer Genome Anatomy Project data, for which serial analysis of gene expression (SAGE) was used to study genes expressed in 21 embryonic stem cell lines. Interestingly, all gene lists related to the 21 stem cell were over-represented in the up- regulated genes in HG3-like tumors (Benjamini p-value < 8.3x10"24, Table 18A). Moreover, we checked the discriminative capability of the sternness-associated genes in the sub- classification of HG2 samples. We extracted the common genes expressed in all 21 stem cell lines independent from the grading associated genes. We obtained 106 genes that are expressed in all the studied 21 stem cell lines (Table 18B). Subsequently, we used unsupervised hierarchical clustering on the TCGA gene expression profile of these genes using Euclidean distance for similarity measurement and average linkage as agglomerative method. The results showed a formation of two major clusters. We found strong correlation between these two clusters and the grading classifications of LGG and HGG (Cohen's Kappa correlation = 0.57, p = 3.3x10"31, Table 19, Figure 13). The concept of distinct precursors of LGG and HGG tumors provides a plausible explanation for these results.
Discussion
Our integrative analysis and intrinsic subtype distributions within HG2 tumors demonstrate the strong molecular distinction between HG1-like and HG3-like tumors and their comparable genetic profiles with HG1 and HG3 tumors, respectively. Based on these similarities, we considered HG1-like and HG1 tumors to be LGG tumors, and similarly, we considered HG3-like and HG3 tumors to be HGG tumors. We tested the hypothesis that LGG and HGG tumors are the two major genetically predetermined classes of breast IDC and that they have independent oncogenic pathways. Similarly, the distinction between LGG and HGG tumors was supported based on integrative data analysis. We found 4,879 protein- coding genes that were differentially expressed between LGG and HGG tumors, which represent 23.2% of the total protein-coding genes annotated in the genome. This systemic shift in the transcriptomic program implies that there are independent oncogenic pathways that dichotomize IDC tumors into these two subtypes. These two oncogenic pathways are distinguished primarily in cell proliferation and cell adherence phenotypes, which provide the basic hallmarks of malignancy, cancer progression and metastasis.
Because mRNA expression is temporally regulated during the cell cycle and differentiation, justifying molecular grading at the DNA level is an essential step to understanding tumor heterogeneity and the independence of LGG and HGG tumor progression. While an association between DNA copy number variation (CNV) and histological grades is expected because of the inclusion of the mitotic index and nuclear polymorphisms in histological grading systems [Rakha, E.A., et al., Breast Cancer Res, 2010. 12(4): p. 207], this association has not been explored using large cohorts and high-resolution techniques. DNA copy number variations and point mutations are the major genetic changes that drive tumor development. Generally, we show that the number of altered genes is much higher in HGG tumors than in LGG tumors. The DAGs discriminating between LGG and HGG tumors are enriched in specific chromosomes where chr16 is the major contributor. Five major events were observed, 16q loss, and 16p gain in LGG tumors and 8p, 17p loss and 8q gain in HGG tumors. Our gene-centric based copy number variation analysis helps to highlight candidate genes of which copy number alterations give a survival advantage to tumor cells during tumor evolution.
The frequent loss of 16q in LGG tumors is another line of evidence that supports the improbable progression between LGG and HGG tumors. Regaining lost genetic material is unlikely, and thus the inter-grades progression is improbable. However, it was reported that the loss of 16q in HG3 tumors is followed by mitotic recombination [Cleton-Jansen, A.M., et al., Genes Chromosomes Cancer, 2004. 41(2): p. 109-16]. This recombination makes 16q loss ostensibly less frequent in high grades, especially when allelic imbalance is not taken into account during copy number variation analysis. However, a high allelic imbalance of 16q in low grade tumors was observed previously based on three microsatellite markers [Roylance, R., et al., J Pathol, 2002. 196(1): p. 32-6]. Moreover, we observed that 22q- related genes show an overall low copy number status in LGG tumors with respect to HGG tumors. This observation is similar to that observed in 16q and may act as a supporting evidence of the independent oncogenic pathways too. The strong correlation of the DAG with their gene expression, in contrast to the non-DAG, reflects the importance of CNVs in driving the distinction between LGG and HGG tumors. Therefore, the DAGs are a shortlist of candidate genes and genome loci associated with the independent oncogenic pathways in LGG and HGG tumors. Collectively, observed CNV alterations provide a genomic basis for future development of diagnostics and prognostic assays.
For point mutations, general comparison of the number of point mutations in LGG and HGG tumors shows a significantly different mutation profile. Overall, LGG tumors have fewer mutations than HGG tumors. Specifically, the most mutated genes, TP53 and PIK3CA, have mutations counts positively and negatively correlated with genetic grades respectively. Our analysis demonstrated that a relatively higher count of PIK3CA mutations is associated with HG1-like tumors. As PIK3CA mutations frequently occurs in IDC and are known to activate the PI3K/AKT/mTOR pathway, these mutations could be considered as the potential predictive biomarkers of HG1-like tumors. High mutation rate of PIK3CA in LGG with respect to HGG indicates that PIK3CA hotspot mutations could have the potential to predict intrinsic tamoxifen resistance in the adjuvant treatment of LGG ER+ BC patients. The testing of this hypothesis should be the interest of future studies. In addition, LGG and HGG showed differences in mutation counts of MAP3K1 and MAPK2K4 that are functionally linked with PIK3CA. Interestingly, 9 of 12 top frequently mutated genes, (PIK3CA, GATA3, MAP3K1, MAPK2K4, CBFB, DNAH1, CTCF, CHD8, and DYSF; Figure 3E), also demonstrated significantly higher mutation counts in LGG with respect to HGG IDC cells. These findings support the hypothesis of independence of the oncogenic pathways of LGG and HGG tumors.
We observed moderate but significant differences in CNV levels between HG3-like and HG3 tumors. These observations may be artificial because the multiple grading systems used to evaluate the histological grades of the TCGA cohort could introduce some bias into the quantitative determination of HG3 tumor classification in addition to the subjectivity of all these grading systems. Interestingly, these DNA variations do not result in any functional transcriptomic discrimination between HG3 and HG3-like sub-classes of IDC. However, observed differences between HG3 and HG3-like IDC tumors could reflect actual (patho)biological differences which should be a topic of future studies.
The ongoing open question regarding the functional heterogeneity of IDC due either to the cell of origin or accumulation of mutational events is still unanswered [Nakshatri, H., E.F. Srour, and S. Badve, Curr Stem Cell Res Ther, 2009. 4(1): p. 50-60; Visvader, J.E. and J. Stingl, Genes Dev, 2014. 28(11 ): p. 1143-58]. The measure of cell differentiation in grading systems makes the association of stem cells with histo-pathological grades self-evident. However, this association has been studied in only a limited number of studies [Ben-Porath, I., et al., Nat Genet, 2008. 40(5): p. 499-507; Pece, S., et al., Cell, 2010. 140(1 ): p. 62-73]. An enrichment of cancer stem cells (CSC) in high histological grades has been shown with respect to low grade [Pece, S., et al., Cell, 2010. 140(1 ): p. 62-73]. Several stem-cell-based models of cancer initiation and progression have been suggested for different intrinsic subtypes of IDC. However, data are controversial and further studies are needed for the specification and validation of these models [Nakshatri, H., E.F. Srour, and S. Badve, Curr Stem Cell Res Ther, 2009. 4(1): p. 50-60; Visvader, J.E. and J. Stingl, Genes Dev, 2014. 28(11): p. 1143-58]. It was argued that good-prognosis ER+ tumors could initiate via clonal selection and have limited number or no CD44+/CD24" cells. However, poor-prognosis ER+ tumors could initiate from ER+ stem or progenitors cells and expand to have a mixture of ER" /CD447CD24- and ER+/CD447CD24+ cells [Nakshatri, H., E.F. Srour, and S. Badve, Curr Stem Cell Res Ther, 2009. 4(1 ): p. 50-60]. Collectively, based on our observation of strong expression differences between LGG and HGG tumors for genes associated with embryonic stem cells, the different frequency of ER loss between LGG and HGG tumors (Table 1 ), and the distribution of intrinsic subtypes within them, it can be assumed that LGG tumors originate and progress depending on the clonal evolution of normal epithelial cells, whereas HGG tumors originate from stem/progenitor cells and progress via clonal evolution to multiple subtypes to include ER+ and ER" tumors. Thus, we provided for the first time an integrative characterization of LGG and HGG classes of IDC tumors by gene expression, CNV and mutation data analyses. We presented several lines of evidences that support concept of independent origin and independent oncogenic pathways in LGG and HGG classes, as well as the improbability of inter-grade progression. The distinct molecular events leading to either LGG or HGG tumors are outlined in the tumor progression model shown in Figure 5.
Our 22g-TAG signature with known cell cycle function, clinical measures of cancer proliferative capacity such as Ki67 staining and pathological mitotic index, could be used in the parallel or single assays.
High tumor grade is associated with decreased overall survival [Trudeau, M.E., et al., Breast Cancer Res Treat, 2005. 89(1): p. 35-45], but it also known that it predicts of increased response to neoadjuvant chemotherapy [Vincent-Salomon, A., et al., Eur J Cancer, 2004. 40(10): p. 1502-8; Chang, J., et al., J Clin Oncol, 1999. 17(10): p. 3058-63]. Consequently, we can hypothesize that IDC with LGG and HGG would also show decreased and increased response rates to the chemotherapy respectively.
According to our classification, in HGG IDC many hundreds genes of cell cycles, mitosis and DNA replication are overexpressed which are typically associated with higher sensitivity to neoadjuvant anthracycline- and taxane-based chemotherapy in both ER-positive and ER- negative IDC subsets [Liedtke, C, et al., J Clin Oncol, 2009. 27(19): p. 3185-91]. Consequently, we expect that HG3-like IDC in HG2 (and HGG tumors) would have high response to the conventional chemotherapy targeting the pathways related to rapidly proliferative epithelial cells.
In contrast LGG tumors are expected to be less suitable for treatment as high- aggressiveness tumors. Therefore, they could be more suitably treated with agents that target other growth-related requirements of tumors, such as the mTOR pathway that mediates mRNA translation and increase genome instability of tumor cells and initiate their apoptosis. Further examples include agents that mediate the growth of blood vessels that provide blood supply to tumors (such as bevacizumab) or hormone-related growth signaling pathways (estrogen signaling pathways in ER+ tumors) such as tamoxifen.
Among 22g-TAG genes, it is important to highlight NAT1 and MELK as the 'druggable' targets which are highly expressed in LGG and HGG tumors, respectively, with respect to normal tissue. It was shown that NAT1 can be inhibited efficiently by Rhod-o-hp with minimal cell toxicity or by iRNA to decrease cell growth and invasiveness [Tiang, J.M., N.J. Butcher, and R.F. Minchin, Biochem Biophys Res Commun, 2010. 393(1): p. 95-100; Tiang, J.M., et al., PLoS One, 2011. 6(2): p. e17031]. In addition, MELK was successfully targeted by OTSSP167 compound and demonstrated a suppression of mammosphere formation in breast cancer cells and growth suppression of xenograft studied in multiple cancer types in mice [Chung, S., et a!., Oncotarget, 2012. 3(12): p. 1629-40; Chung, S. and Y. Nakamura, Cell Cycle, 2013. 12(11): p. 1655-6; Cho, Y.S., et al., Biochem Biophys Res Commun, 2014. 447(1 ): p. 7-11]. Therefore, it should be important to consider NAT1 and MELK genes and their products as the targets in the therapeutic plans for LGG and HGG tumor separately.
Moreover, four genes of 22g-TAG (BUB1, KIF2C, UBE2C, and CENPN) in addition to CDC20 (form 22gTAG network) are among the 10 genes that determine the responsiveness of tumors to chemotherapy recently identified [Hallett, R.M., et al., Oncotarget, 2015. 6(9): p. 7040-52].
Neoadjuvant chemotherapy (NAC) can cause tumor shrinkage, which enables a proportion of patients with large tumors to be eligible for breast conservation surgery (BCS). This increases the BCS rate in comparison to adjuvant chemotherapy only [Mathieu, M.C., et al., Ann Oncol, 2012. 23(8): p. 2046-52]. In such cases, our genetic grading classification could potentially be useful for prediction of patients' eligibility to NAC.
Conclusion
Our methodological approach of the integrative data analyses of histologic grads rejects the old hypothesis of the inter-grade progression from HG1 toward HG3 tumors of IDC. Alternatively, the IDC patient population dichotomization based on the multiple key cancer- associated molecular factors and mechanisms, were characterized by the 5691 DEGs and by the 1858 DAGs reported in this study. Collectively, this study strongly supports our hypothesis of the genetically-defined low- and high- grade tumors corresponding to two oncogenic pathways independently governing the progression of LGG and HGG IDCs. Our grading delineation could help to narrow the IDC biomarker space, specify essential characteristics of the two main IDC classes. Eventually, our concept and findings have the potential to impact on patient care, diagnostic and treatment decisions to develop rational strategies for future personalized molecular targeting of IDC. References:
Ben-Porath, I., et al., An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet, 2008. 40(5): p. 499-507.
Buerger, H., et al., Ductal invasive G2 and G3 carcinomas of the breast are the end stages of at least two different lines of genetic evolution. J Pathol, 2001. 194(2): p. 165-70.
Buerger, H., et al., Comparative genomic hybridization of ductal carcinoma in situ of the breast-evidence of multiple genetic pathways. J Pathol, 1999. 187(4): p. 396-402.
Cancer Genome Atlas, N., Comprehensive molecular portraits of human breast tumours. Nature, 2012. 490(7418): p. 61-70.
Cava, C, et al., Integration ofmRNA expression profile, copy number alterations, and microRNA expression levels in breast cancer to improve grade definition. PLoS One, 2014. 9(5): p. e97681.
Chan, X.H., et al., Targeting glioma stem cells by functional inhibition of a prosurvival oncomiR-138 in malignant gliomas. Cell Rep, 2012. 2(3): p. 591-602.
Chang, J., et al., Biologic markers as predictors of clinical outcome from systemic therapy for primary operable breast cancer. J Clin Oncol, 1999. 17(10): p. 3058-63.
Chung, S., et al., Development of an orally-administrative MELK-targeting inhibitor that suppresses the growth of various types of human cancer. Oncotarget, 2012. 3(12): p. 1629- 40.
Chung, S. and Y. Nakamura, MELK inhibitor, novel molecular targeted therapeutics for human cancer stem cells. Cell Cycle, 2013. 12(11): p. 1655-6.
Cho, Y.S., et al., The crystal structure ofMPK38 in complex with OTSSP167, an orally administrative MELK selective inhibitor. Biochem Biophys Res Commun, 2014. 447(1 ): p. 7- 11.
Cleton-Jansen, A.M., et al., Different mechanisms of chromosome 16 loss of heterozygosity in well- versus poorly differentiated ductal breast cancer. Genes Chromosomes Cancer, 2004. 41(2): p. 109-16.
Da Wei Huang, B.T.S. and R.A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols, 2008. 4(1 ): p. 44-57.
Francis, G.D., S.R. Stein, and G.D. Francis, Prediction of histologic grade in breast cancer using an artificial neural network, in The 2012 International Joint Conference on Neural Networks (IJCNN), . 2012.
Gagou, M.E., et al., Human PIF1 helicase supports DNA replication and cell growth under oncogenic-stress. Oncotarget, 2014. 5(22): p. 11381-98. Gauthier, N.P., et al., Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results. Nucleic Acids Res, 2010. 38(Database issue): p. D699-702.
Gauthier, N.P., et al., Cyclebase.org-a comprehensive multi-organism online database of cell-cycle experiments. Nucleic Acids Res, 2008. 36(Database issue): p. D854-9.
Grant, G.D., et al., Identification of cell cycle-regulated genes periodically expressed in U20S cells and their regulation by F0XM1 and E2F transcription factors. Mol Biol Cell, 2013. 24(23): p. 3634-50.
Griffith, M., et al., DGIdb: mining the druggable genome. Nat Methods, 2013. 10(12): p. 1209-10.
Hallett, R.M., et al., Treatment-induced cell cycle kinetics dictate tumor response to chemotherapy. Oncotarget, 2015. 6(9): p. 7040-52.
Hata, S., et al., Stomach-specific calpain, nCL-2/calpain 8, is active without calpain regulatory subunit and oligomerizes through C2-like domains. J Biol Chem, 2007. 282(38): p. 27847-56.
Hata, S., et al., Calpain 8/nCL-2 and calpain 9/nCL-4 constitute an active protease complex, G-calpain, involved in gastric mucosal defense. PLoS Genet, 2010. 6(7): p. e1001040.
Ivshina, A.V., et al., Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer research, 2006. 66(21 ): p. 10292-10301.
Ivshina AV, G.J., Senko O, Mow B, Putti T, Smeds J, Nordgen H, Berg J, Liu ET, Kuznetsov VA, Miller LD., Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. , in Keystone Symposia: Stem Cells, Senescence and Cancer. 2005. p. P. 76.
Kent, W.J., BLAT-the BLAST-like alignment tool. Genome Res, 2002. 12(4): p. 656-64.
Kent, W.J., et al., The human genome browser at UCSC. Genome Res, 2002. 12(6): p. 996- 1006.
Kempowsky-Hamon, T., et al., Fuzzy logic selection as a new reliable tool to identify molecular grade signatures in breast cancer-the INNODIAG study. BMC Med Genomics, 2015. 8: p. 3.
Kuznetsov, V.A., et al., Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes. International Journal of Computer Science and Network Security, 2006. 6: p. 73-83.
Kuznetsov, V.A., et al., Syndrome approach for computer recognition of fuzzy systems and its application to immunological diagnostics and prognosis of human cancer. Mathematical and Computer Modelling, 1996. 23(6): p. 95-119.
Liedtke, C, et al., Genomic grade index is associated with response to chemotherapy in patients with breast cancer. J Clin Oncol, 2009. 27(19): p. 3185-91. Livak, K.J. and T.D. Schmittgen, Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods, 2001. 25(4): p. 402-8.
Loi, S., et al., Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J Clin Oncol, 2007. 25(10): p. 1239-46.
Mathieu, M.C. , et al. , Breast Cancer Index predicts pathological complete response and eligibility for breast conserving surgery in breast cancer patients treated with neoadjuvant chemotherapy. Ann Oncol, 2012. 23(8): p. 2046-52.
Nakshatri, H., E.F. Srour, and S. Badve, Breast cancer stem cells and intrinsic subtypes: controversies rage on. Curr Stem Cell Res Ther, 2009. 4(1): p. 50-60.
Nordgard, S.H., et al., Genome-wide analysis identifies 16q deletion associated with survival, molecular subtypes, mRNA expression, and germline haplotypes in breast cancer patients. Genes Chromosomes Cancer, 2008. 47(8): p. 680-96.
Ow, G.S., et al. How to discriminate between potentially novel and considered biomarkers within molecular signature? in 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2013. Singapore: IEEE Publishing.
Ow, G.S. and V.A. Kuznetsov, Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers. BMC Genomics, 2015. 16 Suppl 7: p. S2.
Parker, J.S., et al., Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol, 2009. 27(8): p. 1160-7.
Pece, S., et al., Biological and molecular heterogeneity of breast cancers correlates with their cancer stem cell content. Cell, 2010. 140(1 ): p. 62-73.
Ping, Z., et al., Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology. J Pathol Inform, 2014. 5: p. 3.
Prasanth, S.G., K.V. Prasanth, and B. Stillman, Orc6 involved in DNA replication, chromosome segregation, and cytokinesis. Science, 2002. 297(5583): p. 1026-31.
Rahman, M.M. and D. Davis, Addressing the class imbalance problem in medical datasets. International Journal of Machine Learning and Computing, 2013. 3(2): p. 224-228.
Rakha, E.A., et al., Breast cancer prognostic classification in the molecular era: the role of histological grade. Breast Cancer Res, 2010. 12(4): p. 207.
Roylance, R., et al., Comparative genomic hybridization of breast tumors stratified by histological grade reveals new insights into the biological progression of breast cancer. Cancer Res, 1999. 59(7): p. 1433-6.
Roylance, R., et al., Allelic imbalance analysis of chromosome 16q shows that grade I and grade III invasive ductal breast cancers follow different genetic pathways. J Pathol, 2002. 196(1 ): p. 32-6. Sanders, CM., Human Pif1 helicase is a G-quadruplex DNA-binding protein with G- quadruplex DNA-unwinding activity. Biochem J, 2010. 430(1 ): p. 119-28.
Santos, A., R. Wernersson, and L.J. Jensen, Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes. Nucleic Acids Res, 2015. 43(Database issue): p. D1140-4.
Sotiriou, C, et al., Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst, 2006. 98(4): p. 262-72.
Sun, Y., A.K.C. WONG, and M.S. KAMEL, CLASSIFICATION OF IMBALANCED DATA: A REVIEW. International Journal of Pattern Recognition and Artificial Intelligence, 2009.
23(04): p. 687-719.
Tang, Z., et al., Meta-analysis of transcriptome reveals let-7b as an unfavorable prognostic biomarker and predicts molecular and clinical subclasses in high-grade serous ovarian carcinoma. Int J Cancer, 2014. 134(2): p. 306-18.
T. Hastie, R.T., Balasubramanian Narasimhan and Gil Chu, pamr: Pam: prediction analysis for microarrays. 2013.
Thomae, A.W., et al., Different roles of the human Orc6 protein in the replication initiation process. Cell Mol Life Sci, 2011. 68(22): p. 3741-56.
Tiang, J.M., N.J. Butcher, and R.F. Minchin, Small molecule inhibition ofarylamine N- acetyltransferase Type I inhibits proliferation and invasiveness of MDA-MB-231 breast cancer cells. Biochem Biophys Res Commun, 2010. 393(1 ): p. 95-100.
Tiang, J.M., et al., RNAi-mediated knock-down ofarylamine N-acetyltransferase-1 expression induces E-cadherin up-regulation and cell-cell contact growth inhibition. PLoS One, 2011. 6(2): p. e17031.
Tibshirani, R., et al., Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 2002. 99(10): p. 6567-6572.
Trudeau, M.E., et al., Prognostic factors affecting the natural history of node-negative breast cancer. Breast Cancer Res Treat, 2005. 89(1): p. 35-45.
Untergasser, A., et al., Primer3-new capabilities and interfaces. Nucleic Acids Res, 2012. 40(15): p. e115.
Vincent-Salomon, A., et al., Proliferation markers predictive of the pathological response and disease outcome of patients with breast carcinomas treated by anthracycline-based preoperative chemotherapy. Eur J Cancer, 2004. 40(10): p. 1502-8.
Visvader, J.E. and J. Stingl, Mammary stem cells and the differentiation hierarchy: current status and perspectives. Genes Dev, 2014. 28(11): p. 1143-58.
Wennmalm, K., et al., Gene expression in 16q is associated with survival and differs between Sortie breast cancer subtypes. Genes, Chromosomes and Cancer, 2007. 46(1): p. 87-97. Zhou, R., et al., Periodic DNA patrolling underlies diverse functions ofPifl on R-loops and G-rich DNA. Elite, 2014. 3: p. e02190.

Claims

CLAIMS:
1. A method for classifying a subject with invasive ductal carcinoma (IDC) breast cancer, into one of two specific grades, low genetic grade (LGG) IDC or high genetic grade (HGG) IDC, the method comprising conducting a genetic and/or phenotypic analysis of a sample from the subject in order to identify one or more of the following characteristics:
a) Detecting expression levels of a group of signature genes, comprising CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR., wherein the groups comprises 60 or less, such as 50, 40, 25 or less genes in total, in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC;
b) Detecting subsets of up- and down- regulated differentially expressed populations of genes within a subject's genome in order to facilitate tumor classification and disease outcome stratification between LGG or HGG IDC;
c) Detecting differentially altered genes within a subject's genome in order to facilitate tumour classification and disease outcome stratification between LGG or HGG IDC;
d) analysing a population of highly mutated genes that have significantly different mutation counts between LGG and HGG grades in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC; e) analysing tumour subtypes in order to facilitate tumour classification and disease outcome stratification between LGG and HGG IDC; and
f) detection of a population of genes routinely expressed in stem cells in order to facilitate tumour classification and disease outcome stratification between for disease outcome LGG and HGG cancers.
2. The method according to claim 1 for use in classifying or reclassifying IDC subjects with histologic type 2 (HG2) IDC into either LGG or HGG IDC.
3. The method of either of claims 1 or 2 wherein the method comprises option 1 ) and optionally one or more of options 2) - 6).
4. The method according to and preceding claim wherein the group of signature genes consists of, or consists essentially of: CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
5. The method according to claim 4 wherein relatively high expression of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1, CDCA5, MCM10, MTFR2, and TICRR stratifies a subject as having a high genetic grade (HGG) tumour and optionally a poor prognosis.
6. The method according to claims 4 or 5 wherein relatively high expression of CAPN8, NAT1 , NOSTRIN, and KIF13B stratifies a subject as having a low genetic grade (LGG) tumour and optionally a good or better prognosis than a subject who has a high grade tumour or tumours.
7. The method according to any preceding claim, wherein the group of signature genes permits the stratification of subjects as having LGG and HGG tumours through analysis employing the statistically weighted syndrome algorithm.
8. The method according to any preceding claim wherein analysis of tumour subtypes includes one or more of the following classifications: estrogen receptor status, progesterone receptor status, human epidermal growth factor receptor 2 status, age, stage, lymph node status and metastasis status.
9. The method according to any of claims 1 - 8 wherein analysis of tumour subtypes includes the subtypes normal-like, luminal-A, luminal-B, basal-like, and HER2- enriched subtypes and luminal-A/normal-like clarify subjects as having an LGG tumour; and luminal-B/HER2-enriched/basal-like tumours classify subjects as having an HGG tumour.
10. The method according to any preceding claim wherein the differentially expressed genes and their relative expression level, up or down, which allow for tumour classification and patient stratification as a HGG tumour as compared to an LGG tumour are identified in Tables 14 or 15.
11. The method according to any preceding claim wherein an up-regulation of gene expression of genes localized in chromosome 8, specifically in 8p23.3, 8p23.1 , 8p21.3-8p12, 8q13.1-8q24.3 are associated with the diagnosis and classification of subjects as having HGG tumour
12. The method according to any preceding claim wherein detecting differentially altered gene expression comprises detecting frequency of chromosome gain or loss.
13. The method of claim 14 comprising using gene-centric somatic DNA copy number variation analysis for the identification of differentially altered genes (DAG) between LGG and HGG tumours for specification of the diagnosis and stratifying subjects for HGG or LGG tumour classes.
14. The method according to claims 12 or 13 wherein loss of chromosome 8p23.3, 8p23.1 , 8p21.3-8p12 and the gain of 8q13.1-8q24.3 are associated with diagnosis and stratifying subjects as HGG tumour.
15. The method according to claims 12 or 13 wherein loss of 11q21-11q25 is associated with diagnosis and stratifying subjects as having an HGG tumour.
16. The method according to claims 12 or 13 wherein loss of 16q, optionally 16q12.1- 16q13, is associated with diagnosis and classification of subjects as having an LGG tumour.
17. The method according to claims 12 or 13 wherein gain of 20q13.13-20q13.2 is associated with diagnosis and classification of subjects as having an HGG tumour.
18. The method according to claims 12 or 13 wherein gain of 1q21.1-1q21.3 is associated with diagnosis and classification of subjects as having an HGG tumour.
19. The method according to claims 12 or 13 wherein a low copy number of 22q-related genes is associated with diagnosis and classification of subjects as having an LGG tumour.
20. The method according to any preceding claim wherein the population of highly mutated genes include TP53 and PIK3CA and wherein a high mutation count of PICK3XA is associated with classifying a subject as having an LGG tumour and wherein a high mutation count of TP53 is associated with diagnosis and classifying a subject as having an LGG tumour.
21. The method according to claim 20 wherein in addition to PICK3CA, the following genes are more frequently mutated in a subject with an LGG tumour, but not mutated or rarely mutated in HGG: CBFB, CTCF, MAP3K1 , CHD8, DYSF, DNAH1 , MAP2K4, and GATA3.
22. The method according to claim 20 wherein in addition to TP53, the following genes are more frequently mutated in a subject with an HGG tumour, but not mutated or rarely mutated in LGG: MUC4, and TTN.
23. The method according to any preceding claim comprising detecting the expression of one or more genes (such as one or more of the genes identified in Table 18), wherein an increase in expression of said one or more genes allows for classification of a subject as having an HGG tumour.
24. The method according to any preceding claim for use in providing a diagnostic and prognosis for a subject.
25. The method according to any preceding claim for use in facilitating prognosis of an IDC subject.
26. The method according to any preceding claim for use in facilitating determination of how an IDC subject should be treated.
27. The method according to claim 26, further comprising treating the subject based on whether or not the subject is identified as having an LGG or HGG tumour.
28. A kit comprising a substrate to which is bound a plurality of probes, wherein at least one of said plurality of probes is capable of specifically binding to each of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1, NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 ,
PIF1 , CDCA5, MCM10, MTFR2, and TICRR.
29. An assay /chip comprising, consisting essentially of, or consisting of probes for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C,
ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample.
30. A PCR-base assay comprising, consisting essentially of, or consisting of primers for the detection of a level of CENPA, CENPN, FAM72A/FAM72B/FAM72C/FAM72D, CAPN8, NAT1 , NOSTRIN, MELK, CDCA8, MYBL2, CDC45, BUB1 , KIF2C, UBE2C, ORC6, KIF14, KIF13B, SHCBP1 , PIF1 , CDCA5, MCM10, MTFR2, and TICRR in a sample.
PCT/SG2016/050490 2015-10-05 2016-10-05 Invasive ductal carcinoma aggressiveness classification WO2017061953A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10201508269R 2015-10-05
SG10201508269R 2015-10-05

Publications (1)

Publication Number Publication Date
WO2017061953A1 true WO2017061953A1 (en) 2017-04-13

Family

ID=58488143

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2016/050490 WO2017061953A1 (en) 2015-10-05 2016-10-05 Invasive ductal carcinoma aggressiveness classification

Country Status (1)

Country Link
WO (1) WO2017061953A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022226038A1 (en) * 2021-04-22 2022-10-27 The George Washington University Compositions and methods for treatment of invasive cancers

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030019872A1 (en) * 2001-07-30 2003-01-30 Lyublinski Efim Ya Systems and methods for preventing and/or reducing corrosion in various types of tanks, containers and closed systems
WO2008048193A2 (en) * 2006-10-20 2008-04-24 Agency For Science, Technology And Research Breast tumour grading
WO2009089548A2 (en) * 2008-01-11 2009-07-16 H. Lee Moffitt Cancer & Research Institute, Inc. Malignancy-risk signature from histologically normal breast tissue
WO2013025952A2 (en) * 2011-08-16 2013-02-21 Oncocyte Corporation Methods and compositions for the treatment and diagnosis of breast cancer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030019872A1 (en) * 2001-07-30 2003-01-30 Lyublinski Efim Ya Systems and methods for preventing and/or reducing corrosion in various types of tanks, containers and closed systems
WO2008048193A2 (en) * 2006-10-20 2008-04-24 Agency For Science, Technology And Research Breast tumour grading
WO2009089548A2 (en) * 2008-01-11 2009-07-16 H. Lee Moffitt Cancer & Research Institute, Inc. Malignancy-risk signature from histologically normal breast tissue
WO2013025952A2 (en) * 2011-08-16 2013-02-21 Oncocyte Corporation Methods and compositions for the treatment and diagnosis of breast cancer

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ANDRES, S.A. ET AL.: "Interrogating differences in expression of targeted gene sets to predict breast cancer outcome.", BMC CANCER, vol. 13, 2 July 2013 (2013-07-02), pages 326.1 - 326.18, XP021155172, [retrieved on 20161118] *
ASWAD, L. ET AL.: "Genome and transcriptome delineation of two major oncogenic pathways governing invasive ductal breast cancer development.", ONCOTARGET, vol. 6, no. 34, 10 October 2015 (2015-10-10), pages 36652 - 36674, XP055376414, [retrieved on 20161118] *
COLAK, D. ET AL.: "Age-specific gene expression signatures for breast tumors and cross-species conserved potential cancer progression markers in young women.", PLOS ONE, vol. 8, no. 5, 21 May 2013 (2013-05-21), pages e63204.1 - e63204.15, XP055310627, [retrieved on 20161118] *
FIDALGO, F. ET AL.: "Lymphovascular invasion and histologic grade are associated with specific genomic profiles in invasive carcinomas of the breast.", TUMOUR BIOL, vol. 36, no. 3, 13 November 2014 (2014-11-13), pages 1835 - 1848, XP036218128, [retrieved on 20161118] *
HAWTHORN, L. ET AL.: "Integration of transcript expression, copy number and LOH analysis of infiltrating ductal carcinoma of the breast.", BMC CANCER, vol. 10, 27 August 2010 (2010-08-27), pages 460.1 - 460.16, XP021075282, [retrieved on 20161118] *
HERNANDEZ, L. ET AL.: "Genomic and mutational profiling of ductal carcinomas in situ and matched adjacent invasive breast cancers reveals intra- tumour genetic heterogeneity and clonal selection.", J PATHOL, vol. 227, no. 1, 21 March 2012 (2012-03-21), pages 42 - 52, XP055376400, Retrieved from the Internet <URL:www.ncbi.nlm.nih.gov/pmc/articles/PMC4975517> [retrieved on 20161118] *
JO, B.-H. ET AL.: "Heterogeneity of invasive ductal carcinoma: proposal for a hypothetical classification.", J KOREAN MED SCI, vol. 21, no. 3, 21 June 2006 (2006-06-21), pages 460 - 468, XP055376408, [retrieved on 20161118] *
KIM, S.Y.: "Genomic differences between pure ductal carcinoma in situ and synchronous ductal carcinoma in situ with invasive breast cancer.", ONCOTARGET, vol. 6, no. 10, 26 March 2015 (2015-03-26), pages 7597 - 7607, XP055376405, [retrieved on 20161118] *
LAWRENCE, M.S. ET AL.: "Mutational heterogeneity in cancer and the search for new cancer-associated genes.", NATURE, vol. 499, no. 7457, 16 June 2013 (2013-06-16), pages 214 - 218, XP055251629, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919509> [retrieved on 20161118] *
NATRAJAN, R. ET AL.: "Tiling path genomic profiling of grade 3 invasive ductal breast cancers.", CLIN CANCER RES, vol. 15, no. 8, 24 March 2009 (2009-03-24), pages 2711 - 2722, XP007912729, [retrieved on 20161118] *
VOLINIA, S. ET AL.: "Prognostic microRNA/mRNA signature from the integrated analysis of patients with invasive breast cancer.", PROC NATL ACAD SCI USA, vol. 110, no. 18, 15 April 2013 (2013-04-15), pages 7413 - 7417, XP055200882, [retrieved on 20161118] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022226038A1 (en) * 2021-04-22 2022-10-27 The George Washington University Compositions and methods for treatment of invasive cancers

Similar Documents

Publication Publication Date Title
Zheng et al. Comprehensive pan-genomic characterization of adrenocortical carcinoma
Liu et al. A novel strategy of integrated microarray analysis identifies CENPA, CDK1 and CDC20 as a cluster of diagnostic biomarkers in lung adenocarcinoma
US9434995B2 (en) Breast cancer biomarker signatures for invasiveness and prognosis
JP4938672B2 (en) Methods, systems, and arrays for classifying cancer, predicting prognosis, and diagnosing based on association between p53 status and gene expression profile
US20170107577A1 (en) Determining Cancer Aggressiveness, Prognosis and Responsiveness to Treatment
Liu et al. Integrated multi-omics profiling yields a clinically relevant molecular classification for esophageal squamous cell carcinoma
JP6704861B2 (en) Methods for selecting personalized triple therapies for cancer treatment
WO2017062505A1 (en) Method of classifying and diagnosing cancer
US20080275652A1 (en) Gene-based algorithmic cancer prognosis
JP2015070839A (en) Pathways underlying pancreatic tumorigenesis and hereditary pancreatic cancer gene
US20120141603A1 (en) Methods and compositions for lung cancer prognosis
Huang et al. Molecular portrait of breast cancer in C hina reveals comprehensive transcriptomic likeness to C aucasian breast cancer and low prevalence of luminal A subtype
JP2017506506A (en) Molecular diagnostic tests for response to anti-angiogenic drugs and prediction of cancer prognosis
Aswad et al. Genome and transcriptome delineation of two major oncogenic pathways governing invasive ductal breast cancer development
JP7043404B2 (en) Gene signature of residual risk after endocrine treatment in early-stage breast cancer
WO2017046714A1 (en) Methylation signature in squamous cell carcinoma of head and neck (hnscc) and applications thereof
Yang et al. An integrated model of clinical information and gene expression for prediction of survival in ovarian cancer patients
US20150126392A1 (en) Method, system, and kit for characterizing a cancer
WO2017061953A1 (en) Invasive ductal carcinoma aggressiveness classification
JP2014501496A (en) Signature of clinical outcome in gastrointestinal stromal tumor and method of treatment of gastrointestinal stromal tumor
Liu et al. Identification of novel potential homologous repair deficiency-associated genes in pancreatic adenocarcinoma via WGCNA coexpression network analysis and machine learning
US20230348990A1 (en) Prognostic and treatment response predictive method
EP4314832A1 (en) Molecular subtyping of colorectal liver metastases to personalize treatment approaches
Pfeffer et al. Breast cancer genomics: from portraits to landscapes
WO2019158705A1 (en) Patient classification and prognostic method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16854001

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16854001

Country of ref document: EP

Kind code of ref document: A1