WO2008077165A9 - Set of tumor markers - Google Patents

Set of tumor markers

Info

Publication number
WO2008077165A9
WO2008077165A9 PCT/AT2007/000566 AT2007000566W WO2008077165A9 WO 2008077165 A9 WO2008077165 A9 WO 2008077165A9 AT 2007000566 W AT2007000566 W AT 2007000566W WO 2008077165 A9 WO2008077165 A9 WO 2008077165A9
Authority
WO
WIPO (PCT)
Prior art keywords
tumor markers
yes
moieties
tumor
markers
Prior art date
Application number
PCT/AT2007/000566
Other languages
French (fr)
Other versions
WO2008077165A1 (en
Inventor
Martin Lauss
Klemens Vierlinger
Albert Kriegner
Christa Noehammer
Original Assignee
Arc Austrian Res Centers Gmbh
Martin Lauss
Klemens Vierlinger
Albert Kriegner
Christa Noehammer
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arc Austrian Res Centers Gmbh, Martin Lauss, Klemens Vierlinger, Albert Kriegner, Christa Noehammer filed Critical Arc Austrian Res Centers Gmbh
Publication of WO2008077165A1 publication Critical patent/WO2008077165A1/en
Publication of WO2008077165A9 publication Critical patent/WO2008077165A9/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to the field of tumor or cancer diagnostics.
  • Breast cancer is the most common type of cancer in women, affecting about every 9 th woman in ' industrial countries.
  • Clinical and histopathological parameters such as ER (estrogen receptor) -status, tumor size, lymph node status, age or tumor grade are of limited prognostic value.
  • therapeutical decision making relies on those weak prediction factors . This uncertainty about disease progression results in treatment with chemotherapeutica or tamoxifen of now virtually all patients, according to the widely adopted consensus criteria of St. Gallen 1 and NIH 2 .
  • histopathological parameters used for therapeutical decision making do not add significant prognostic information about recurrence and therefore outcome of disease. .
  • WO 02/103320 teaches the importance of the breast cancer genes BRCAl, one of the first identified genetic tumor markers, and BRCA2 and provides a set of breast cancer markers for distinguishing ER(+) and ER(-) breast cancers. Women who carry the BRCAl mutation have a life-time risk of 92% of developing breast cancer.
  • WO 2004/079014 A2 and WO 2006/135886 A2 describe expression profiles of cancer and identified more than 6000 possible genetic expression alterations in breast cancer.
  • WO 2005/083429 describes gene expression analysis of breast cancers and microarrays for the diagnosis.
  • WO 2005/039382 relates to a set of genetic markers for the prognosis of breast and ovarian cancer.
  • WO 2005/028681 describes tumor markers for tailoring a tamoxifen breast cancer treatment.
  • WO 2003/041562 describes a method for classifying tumors by comparative examination.
  • WO 2005/071419 relates to a method of analyzing differential gene expression associated with breast disease by analysis of a set of protein markers.
  • US 7,118,853 provides a microarray to identify genes with expression profiles of breast cancer.
  • US 2006/0183141 provides a serum marker set for the classification of tumors.
  • US 2004/0053317 provides a method for the classification of samples characterizing cellular differentiation pathways.
  • the present invention provides a set of moieties specific for at least 20 tumor markers selected from the tumor markers of table 6, i.e. MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2 , MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4, CDKN3, Cl ⁇ orf ⁇ l, DLG7, H2AFZ, PFKP, KPNA2, GATA3, CENPF, KRT18, KRT5, CCNE2, MELK, CX3CR1, TRIP13, MCM6, CCNDl, PDIA4, CENPA, UBE2S, NCFl, CDC25B, PGR, TGFB3, PSMD2, HMMR, XBPl, TROAP, KNTC2, PRAME, BTG2, KRT8 , FOXMl ,KYNU, NMEl, MCM3, NUSAPl
  • the tumor markers can be modified DNA, RNA or proteins, including mutated genes or genetic loci, aberrant gene expression, aberrantly methylated genes or modified proteins, including modification in the amino acid sequence, glycosylation or three-dimensional structure of the identified genes indicated by the symbols in table 6 and given as GeneBank Database references below. These markers are known in the art per se and their aberrant modification can lead to the identity as tumor marker.
  • the specific moieties are, for example, nucleic acids, such as PCR primers or hybridizing nucleic acids such as RNA, DNA or PNA, being specific for tumor marker nucleic acids or for moieties being specific for the tumor marker protein.
  • the moieties are capable of identifying the tumor- relevant modification of the tumor marker.
  • the final design of the moieties is generally known in the art, especially as exemplified by the documents cited herein.
  • the present invention provides a compilation of the most relevant tumor markers that have been validated with clinical data of up to 1067 patients. Importantly, the set is also significantly associated with recurrence in the subcohort of untreated patients, indicating a direct role for breast cancer progression, allowing cancer prognosis. Cell cycle genes are impressively enriched in the signature, which represents the relevance of breast cancer progression.
  • the amount of specific moieties may be up to the amount of tumor markers or more. In most cases one moiety is specific for one tumor marker. In rare cases one moiety may be specific for two or more tumor markers.
  • the set comprises moieties specific for at least 30, preferably at least 40, at least 50, at least 60, at least 70, at least 80 or at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180 or at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370 or 374 tumor markers selected from the tumor markers of table 6.
  • the more markers of the inventive collection given in table 6 below the better the diagnostic value and predictive power of the set.
  • sets with at least 200 tumor markers are used the greater the qualitative value, however also with increased costs.
  • a set with at least or about 200 tumor markers is a good compromise for applications such as micro titer plate assays or microarrays.
  • the set comprises moieties specific for at least 20, preferably at least 40, at least 50, at least 60, at least 70, at least 80, at least 90 or at least 100, tumor markers selected from the tumor markers of table 6 with a score of at least 15, preferably at least 17, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30 or at least 32, and/or an overlap of at least 3, preferably at least 4, at least 5, at least 6, at least 7 or at least 8.
  • tumor markers represent often identified and diagnosed markers with a certain occurrence in clinical results .
  • the markers are selected from those with an overlap of at least 4, which are MYBL2, RNASE4, GATA3, BIRC5, ESRl, BCL2, PRAME, ERBB2, AURKA, PSMD2, MAD2L1, KRT18, MKI 67, BTG2, GTSEl, NUDTl, BUBl, TGFB3, CCNBl, FOXMl, KPNA2, CX3CR1, PLKl, NMEl, RRM2, CENPN, CKS2, CDKN3, SLPI,.
  • BRRNl Cl ⁇ orf ⁇ l, HMMR, PFKP, CTSC, CDC20, UBE2C, KIF23, DNAJC12, ASPM, MCM2, MCM ⁇ , H2AFZ, MLFlIP, CCNDl, MCM4, GGH, DLG7, PDIA4, CCNE2, CENPF, UBE2S, KRT5, VIL2, CP, SFRSlO, TCEALl, TFDPl, SLC25A5, PSMB7, SLC7A5, EIF4A1, FENl, DDOST, HMGAl, TRA@, IGFBP5, MUCl, C0L1A2, RFC4, CENPA, VEGF, MELK, PTTGl, RARRES3, HRB, CENPE, TFRC, C14orf45, TRIP13, ERBB3, KRT8, TROAP, KNTC2, CSElL, PIR, MCM3, NUSAPl, KYNU, PGR, PPP2R5C,
  • the present invention provides a set of moieties specific for at least 50 tumor markers with a score of at least 24, which are MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2, MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2 , CKS2, MCM4 , CDKN3, Cl ⁇ orf ⁇ l, DLG7, H2AFZ, PFKP, KPNA2 , GATA3, CENPF, KRT18, KRT5, MELK, CCNE2, CX3- CRl, CCNDl, MCM ⁇ , TRIPl3, PDIA4, CENPA, UBE2S, HMMR, NCFl, PSM- D2, PGR, CDC25B, TGFB3, TROAP, KNTC2 , XBPl, PRAME, KRT8, BTG2, FOXMl, KYNU, NMEl
  • the present invention provides a set of moieties specific for at least 19 tumor markers with a score of at least 34, which are MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2 , MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4, CDKN3, Cl ⁇ orf ⁇ l, DLG7 , H2AFZ, PFKP and/or KPNA2, preferably as a subset of the above defined set.
  • moieties specific for at least 19 tumor markers with a score of at least 34, which are MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2 , MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4, CDKN3, Cl
  • markers are: PTDSSl, MYBL2 , PAXIPl, KIT, FABP7, TIMPl, AURKB, VEGF, TFRC, MMP7 ,ARPC4, PPP1R12A, FDT8, GSTM3, SAT, BUBl, NDRGl, NPYlR, TXNRDl, GNAZ, DCK, VASHl, RAB6A, RAD21, PFKP, HIFlA, TUBAl, PTGS2, TFDPl, CDC6, LDHA, NP, CDKN3, KRT8, CTSL, PCNA, BRCAl, IGFBP2, NATl, NFIB, SC4MOL, SDCl, TIMP2, PCTKl, ITPR3, CCT7, CDHl, CDC2, IER2, TFF3, PLOD2, FOXMl, SLPI, CHI3L1, KPNA2, YYl, TOP2A, MAPRE2, ABCD3, MET, SCD, SFRP4,
  • the set comprises moieties specific for at least 5, preferably at least 10, at least 15 or at least 19, markers selected from the tumor markers of table 6 with a score of at least 34.
  • the set comprises at least 10, preferably at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180 or at least 188, moieties specific for the tumor markers of table 6 with a greater centroid good value compared to the centroid poor value.
  • markers are in particular BCL2, RNASE4, GATA3, CX3- CRl, PDIA4, TGFB3, XBPl, BTG2, GSTM3, IGFBP2, ACADSB, GATM, CNKSRl, KRT17, TFF3, ABCD3, LRRC17 , PEX12, NDP, TCEALl, NPYlR, DUSP4, KIT, PTPRT, UBR2 , INSR, PRLR, SCAPl, PPP1R12A, IER2, OMD, VASHl, ZMYM4, FCGRT, GABBRl, CFB, FUT8, FRY, FGFRl, COL6A1, SLC39A6 r SPARC, TIMPl, MSX2 , ClOorfll ⁇ , NATl, SATBl, TSPAN4, CES2, SFRP4, BCL6, PAXIPl, CELSR2, APOD, APP and/or JUNB.
  • the centroid values calculated from the clinical data analyzed in the present invention
  • the set comprises at least 10, preferably at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80 r at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190 or at least 200, moieties specific for the tumor markers of table 6 with a greater centroid poor value compared to the centroid good value.
  • markers are in particular MYBL2, MKI67, MAD2L1, AURKA, BUBl, BIRC5, ERBB2, CCNBl, NUDTl, GGH, CKS2, CDKN3, KPNA2, PFKP, CENPF, KRT5, MCM6, TRIP13, UBE2S, CD- C25B, NCFl, HMMR, PSMD2, TROAP, PRAME, FOXMl, KRT8, MCM3, NMEl, KYNU, PCTKl, CDC2, CSElL, UBE2C, CCT4, PPP2R5C, SLPI, TOP2A, PIR, NP, VEGF, IL32, CTSC, DCK, FABP5, GMPS, MET, CCNEl, IFI30, SLC25A5, CENPE, SLC7A5, EIF4A1, CCNA2, TIMP2, YWHAZ, PSMB7, EXTl, YYl, PCNA, FENl, AURKB, TF
  • the set comprises a moiety specific for the tumor marker MYBL2. In an further preferred embodiment the set comprises a moiety specific for the tumor marker BCL2 •
  • the invention provides a set of moieties specific for tumor markers selected from MYBL2, MKI 67, MAD2L1, AURKA and BCL2. These markers are the most prevalent tumor markers identified by the present invention and can also be incorporated into the set as described above.
  • the moieties are nucleic acids, especially primers specific for tumor marker nucleic acids .
  • the moieties are antibodies (monoclonal or polyclonal) or antibody fragments, preferably selected from Fab, Fab" Fab 2 , F(ab') 2 or scFv (single-chain variable fragments), specific for tumor marker proteins.
  • the moieties of the set are immobilized on a solid support, preferably in the form of a microar- ray or nanoarray.
  • a solid support preferably in the form of a microar- ray or nanoarray.
  • nanoarray likewise “nanoarray”, is used to describe a array of an microscopic arrangement (nanoarray for an array in nanometer scale) or refers to a carrier comprising such an array. Both definitions do not contradict each other and are applicable in the sense of the present invention.
  • the present invention provides the use of the tumor markers as defined above as groups from the tumor markers of table 6 below, for the creation of a set for detecting the tumor markers, wherein the set has at least 20, preferably at least 30, preferably at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370 or 374, members.
  • a set is described above, e.g..
  • the present invention provides the method for detecting breast cancer or breast cancer cells, using the set as defined above and detecting or measuring for the occurrence of tumor markers in one or more sample (s) obtained from a patient.
  • the present invention provides for a method for detecting or measuring a set of tumor markers selected form the groups, as defined above, for the specificity of the moieties.
  • the patient may be a person with breast cancer or a person at risk for developing breast cancer.
  • the patient is a human being.
  • diagnosis or prognosis of a (developing) breast cancer in a patient is also contemplated.
  • the method may comprise a detection or measurement by RNA- expression analysis, preferably by naicroarray or quantitative PCR, or protein analysis, preferably by tissue microarrays, pro- teinmicroarrays, ELISA, multiplex assays, iramunohistocheraistry, or DNA analysis, preferably CpG island methylation analysis, comparative genomic hybridization (CGH) -arrays or single nucle- otide polymorphism (SNP) -analysis .
  • CGH comparative genomic hybridization
  • SNP single nucle- otide polymorphism
  • the method comprises providing a pool of good prognosis markers from the tumor markers and generating a pool of poor prognosis markers and determining the specificity of the tumor markers of the pools, and classifying the sample according to the greater specificity to one pool.
  • the pools can be generated by statistical analysis of the clinical data available for the tumor markers of the present invention.
  • the pool of good prognosis markers is selected from the tumor markers of table 6 with a greater centroid good value compared to the centroid poor value.
  • the pool of poor prognosis markers is selected from the tumor markers of table 6 with a greater centroid poor value compared to the centroid good value. This special classification allows a qualitative prognosis of a certain sample.
  • Figure IA Overlap-Diagram. The number of overlaps is the total number of gene lists that a UniGeneCluster appears in. There is only one UniGene ID that appears in 11 gene lists; it is MYBL2.
  • Figure IB CENPN and Cl ⁇ orf ⁇ l are neighbours on the genome and share a common CpG Island. Image from UCSC. RefSeq track is in blue and CpG Islands track in green.
  • FIG. 1C KEGG pathway chart "cell cycle”. Genes contained in Signature are red
  • Figure 2 Kaplan Meier plots of relapse-free survival using 1067 patients that were grouped by A) 374 Gene expression signature, (504 patients in group ⁇ good' and 563 patients in group ⁇ oor f ) , B) Lymph node status (eleven observations deleted due to missing lymph node values) and C) Estrogen receptor status (six observations deleted due to missing estrogen receptor values). Numbers at risk are given at last time of event before the fixed timepoints (0,50,100,150,200)
  • Figure 3 Performance of the 374 Gene Signature in histopathological subcohorts. ' A) LN- patients, B) LN + patients, C) ER- patients D) ER+ patients and in E) untreated (neither chemotherapy nor hormones) patients. Numbers at risk are given at last time of event before the fixed timepoints (0,50,100, 150,200)
  • Figure 4 DAG diagram of the GO terms enriched in the " 374 Signature" (at the markers of table 6 below) when compared to the human genome .
  • 374 genes were extracted, which, besides other quality criteria, were recorded at least twice.
  • This gene set termed "374 - Signature”
  • 374 - Signature is highly enriched in genes involved in the cell cycle.
  • From 8 published microarray datasets was created a multi- center validation set of 1067 breast cancer patients, using transformation to poe (probability of expression) scale.
  • Probe identifiers of the datasets were updated to UniGene Clusters (UniGene Build 194). The datasets were reduced to those probe identifiers that were annotated to the 2292 UniGene Clusters common to all 8 datasets.
  • For the Sotiriou study from 2006 the values after RMA preprocession were taken, two patients were deleted due to missing observations in recurrence.
  • For the Foekens study the signals averaged to intensity 600 were used.
  • the van de Vijver study only 229 patients unambiguously non- redundant to the van' t Veer study were included. The van de Vijver data had to be transformed from loglO to the Iog2 ratios and the few missing values were replaced by 0.
  • For the Ma data the Iog2 ratios were used.
  • 188 UniGene IDs of the S 374 - Signature s are contained in the 2292 UniGene IDs of the united poe matrix.
  • the corresponding poe-matrix of 188 UniGene IDs x 1067 patients was prepared.
  • Centroids for the good and poor prognosis group were obtained by calculating the mean for each of the 188 UniGene ID 's poe-values from all patients of the respective prognosis group.
  • k-means clustering was performed for the entire united poe-matrix (2292 x 1067) and for 10 poe-matrices of randomly chosen 188 UniGene IDs (188 x 1067) .
  • the ⁇ good' prognosis group consists of 504 patients with a recurrence rate of 28.6%
  • the ⁇ poor' prognosis group consists of 563 patients with a recurrence rate of 48.9%, respectively.
  • the estimated Hazard Ratio (HR) for recurrence in the group of 'poor' prognosis gene expression compared to the group of "good" prognosis gene expression is 2.03 (95% confidence interval, 1.66 to 2.48).
  • the "374 - Signature' is significantly associated with recurrence free survival in the clinically important subcohorts of lymph node negative (Fig. 3A) , lymph node positive (Fig. 3B) and estrogen receptor positive (Fig. 3D) patients but not in estrogen receptor negative patients (Fig. 3C) .
  • the 374 - Signature is also significantly associated with recurrence, indicating that the signature is not a mere predictor of therapeutic response but inherent to tumor progression (Fig. 3E).
  • lymph node status and estrogen receptor status are significantly associated with recurrence (Fig. 2B and C and Table 4), however hazard ratios are less than for the '374 - Signature".
  • Patient information about tumor size and grade was only available for 689 and 780 patients, respectively. Recurrence free survival was significantly associated to both tumor size and grade in the remaining patients (Table 4).
  • Lymph node status 1056 1.36 [1.11 ;1.65] 0.00268
  • the present invention reports the most comprehensive matching of genes from high-throughput publications and the largest microarray validation dataset in the field of breast cancer prognosis.
  • Referred gene lists were mainly developed on several different microarray platforms but also by real time - PCR and tissue microarrays .
  • One gene list was even discovered by work on metastatic primary mammary tumors of the rat.
  • centroids vectors of 188 poe-values
  • Table 6 the "good” and “poor” prognosis group.
  • New samples can be transformed to poe-scale and readily categorized to the prognosis group for which the uncen- tered correlation of the 188 poe-values is highest.
  • the 188 testable genes of the "374 - Signature" still remain highly relevant for breast cancer recurrence.
  • the present invention has learned from criticism that has risen against publications presenting molecular predictors of breast cancer prognosis. Most importantly, with 1067 patients, validation is adequately powered and is dedicated to the idea of a multi-center validation, which not only covers different researchers from different laboratories, but even various micorar- ray platforms. Next, randomly generated signatures are proven to not reach performance of the "374 - Signature". This is of importance, because a large proportion of genes is slightly correlated to breast cancer survival. Last, the patients should represent a typical breast cancer cohort. When taken together, the patients of the 42 original publications certainly are a good representation of the overall breast cancer population. However, the validation set with 704 lymph node negative patients compared to 352 lymph not positive patients, shows a slight over- representation of lymph node negative patients.
  • the "374 - Signature" is also significantly associated with recurrence in the subcohort of untreated patients, indicating a direct role for breast cancer progression.
  • Cell cycle genes are impressively enriched in the signature, which suggests that proliferation is the decisive feature of breast cancer progression.
  • Overrepresentation of nucleotide and protein binding functions demonstrate the enrichment of key regulators in the signature.
  • the signature is also a collection of potential therapeutical targets (Table 6) . For those targets, new drugs could be designed that may add to the rather limited benefit of standard chemotherapeutica and tamoxifen. Top ranked MY- BL2 could be found in 11 gene lists, however, its function in breast cancer is relatively unknown.
  • MYBL2 is phosphorylated in the S-Phase by CCNA2/CDK2 complex and activates CDC2, CCNDl and IGFBP5. All five genes appear in the "374 - Signature".
  • OMIM database additionally reports, that three MYBL2 binding sites were found in the 5 th -ranking gene BCL2.
  • MYBL2 and 4 th ranking AURKA are located on 20ql3, which is frequently amplified in breast cancer with prognostic implications. Genes important for breast cancer classification, like ESRl (estrogen receptor) and ERBB2 (Her2), rank at daunting positions 8 and 10, respectively.
  • Equally 8 th ranked CENPN and 21 st ranked C16orf61 are neighbors on the genome and share a common CpG island. Methylation analysis, which is gaining importance for breast cancer prognosis , and CGH (comparative genomic hybridization) could be done for all of the genes of the signature.
  • the "374 - Signature” remains the best single predictor of breast cancer recurrence in an extensive multi-center validation set. In the presence of the "374 - Signature" several histopathological parameters used for therapeutical decision making, do not add significant prognostic information about recurrence and therefore outcome of disease.
  • the genes contained in the signature, extracted throughout 42 publications, are an up-to-date collection of potential therapeutical targets.
  • Example 6 Table 6: Genes of the -"374 - Signature” and centroids for the "good” and “poor” prognosis-groups. Found in 1076 Centroid Centroid
  • NUDT1 Nudix (nucleoside diNMJ 98949 Hs.534331 4521 phosphate linked moiety NMJ 98952 X)-type motif 1 NMJ 98954 NMJ 98948 NMJ 98950 NMJ 98953 NM_002452 6 39 Yes -0,02 0,07
  • PFKP Phosphofructokinase NM_002627 Hs.26010 5214 ⁇ > platelet 5 34 Yes -0,03 0 ,06
  • KRT5 Keratin 5 (epidermolysis NM 175053 Hs.433845 3852 5 32 Yes -0,00 0 ,06 bullosa simplex, Dowling- NM 000424
  • PRAME Preferentially expressed NM " " 206953 Hs.30743 23532 antigen in melanoma NM 206955
  • IGFBP5 Insulin-like growth factor NM_ .000599 Hs.369982 3488 binding protein 5 4 24 No NA NA
  • PCTK1 PCTAIRE protein kinase NM " 033018 Hs.496068 5127
  • MUC1 Mucin 1 cell surface asNM_002456 Hs.89603 4582 sociated NM_001018016 NM_001018017 NM_001018021 23 No NA NA
  • Cystatin C (amyloid anNM_000099 Hs.304682 1471 giopathy and cerebral hemorrhage) 20 No NA NA
  • IGFBP2 Insulin-like growth factor NM_000597 Hs.438102 3485 binding protein 2, 36kDa 4 20 Yes 0,00 -0,08
  • PSMB7 Proteasome prosome, NM 002799 Hs.213470 5695 macropain subunit, beta type, 7 3 19 Yes -0,01 0,04 ACADSB Acyl-Coenzyme A dehydNM 001609 Hs.81934 36 rogenase, short/branched chain 3 19 Yes 0,02 -0,02 m GATM Glycine amidinotrans- NM 001482 Hs.75335 2628
  • SMC4 structural mainNM_005496 Hs.58992 10051 tenance of chromosomes NM_001002800 4-like 1 (yeast) N M_001002799 2 19 No NA NA
  • ABCD3 ATP-binding cassette NM 002858 Hs.76781 5825 sub-family D (ALD), member 3 3 18 Yes 0,01 -0,03 m LMNB2 Lamin B2 NM_032737 Hs.538286 84823 3
  • FEN1 Flap structure-specific NMJ304111 Hs.409065 2237 endonuclease 1 2 18 Yes -0,03 0,09
  • PTDSS1 Phosphatidylserine syntNM 014754 Hs.292579 9791 hase 1 2 18 Yes -0,00 0,12
  • CD24 CD24 molecule NM. _013230 Hs.375108 934 4 17 No NA NA
  • RAB6A RAB6A, member RAS N MJD 16577 Hs.12152 5870 2 15 Yes 0,01 0,08 oncogene family
  • TJ CD44 CD44 molecule Indian NM 000610 Hs.502328 960 blood group
  • ADAMTS 1 ADAM metallopeptidase NMJD06988 Hs.534115 9510 with thrombospondin type 1 motif, 1 2 15 No NA NA
  • GABBR1 Gamma-aminobutyric NMJ
  • GABA GABA
  • PBXIP1 Pre-B-cell leukemia tran NM 020524 Hs.505806 57326 scription factor interacting protein 1 3 14 No NA NA
  • EIF2C2 Eukaryotic translation iniNM_012154 Hs.449415 27161 tiation factor 2C, 2 2 14 No NA NA
  • EIF4EBP1 Eukaryotic translation iniNM_004095 Hs.411641 1978 tiation factor 4E binding protein 1 2 14 No NA NA
  • DNAJC12 DnaJ (Hsp40) homolog NM_021800 Hs.260720 56521 subfamily C, member 12 NM_201262 3 13 No NA NA H3F3B H3 histone, family 3B NM 005324 Hs.180877 3021 (H3.3B) 3 13 No NA NA NA
  • PA kinase alpha DMPK-like
  • PA kinase alpha NM 014826 2 13 No NA NA
  • ARPC4 Actin related protein 2/3 NM 015644 Hs.323342 10093 complex, subunit 4, 20k- NM 001025930
  • ABLIM1 Actin binding LIM protein NM ⁇ 001003408 Hs.438236 3983
  • AP2A2 Adaptor-related protein NM 012305 Hs.19121 161 complex 2, alpha 2 subu- nit 13 No NA NA
  • CHI3L1 Chitinase 3-like 1 (cartilaNM_001276 Hs.382202 1116 ge glycoprotein-39) 12 Yes -0,02 0,08 CAD
  • HDGFRP3 Hepatoma-derived NM_016073 Hs.513954 50810 growth factor, related protein 3 2 11 No NA NA
  • RNA- Ribonuclease H2 large NM_006397 Hs.532851 10535
  • TNFAIP2 Tumor necrosis factor
  • NM_006291 Hs.525607 7127 alpha-induced protein 2 2 11 No NA NA
  • TAS2R5 Taste receptor type 2
  • NM 018980 Hs.490394 54429 member 5 NM 016943
  • EEF1A2 Eukaryotic translation NM. _001958 Hs.433839 1917 elongation factor 1 alpha
  • CDKN1A Cyclin-dependent kinase NM 078467 Hs.370771 1026 inhibitor 1A (p21 , Cip1) NM 000389 2 11 No NA NA
  • a cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res. 65, 4059-4066 (2005).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a set of cross-validated tumor markers for the diagnosis and prognosis of breast cancer.

Description

Set of tumor markers
The present invention relates to the field of tumor or cancer diagnostics.
Breast cancer is the most common type of cancer in women, affecting about every 9th woman in' industrial countries. Clinical and histopathological parameters, such as ER (estrogen receptor) -status, tumor size, lymph node status, age or tumor grade are of limited prognostic value. However, therapeutical decision making relies on those weak prediction factors . This uncertainty about disease progression results in treatment with chemotherapeutica or tamoxifen of now virtually all patients, according to the widely adopted consensus criteria of St. Gallen 1 and NIH 2. Nevertheless, histopathological parameters used for therapeutical decision making do not add significant prognostic information about recurrence and therefore outcome of disease.. To find a better prognostic factor as well as new targets for breast cancer therapy, high-throughput methods were employed in cancer research. For breast cancer, not only gene expression profiling on many different platforms (e.g.: Affymetrix Gene Chips, Agilent, custom arrays, ...) was primarily done, but also other techniques were used. Recent years have brought a remarkably large number of gene lists, often referred to as prognostic signatures, containing genes correlating somehow with survival or recurrence of patients.
WO 02/103320 teaches the importance of the breast cancer genes BRCAl, one of the first identified genetic tumor markers, and BRCA2 and provides a set of breast cancer markers for distinguishing ER(+) and ER(-) breast cancers. Women who carry the BRCAl mutation have a life-time risk of 92% of developing breast cancer.
WO 2004/079014 A2 and WO 2006/135886 A2 describe expression profiles of cancer and identified more than 6000 possible genetic expression alterations in breast cancer.
WO 2005/083429 describes gene expression analysis of breast cancers and microarrays for the diagnosis.
WO 2005/039382 relates to a set of genetic markers for the prognosis of breast and ovarian cancer.
WO 2005/028681 describes tumor markers for tailoring a tamoxifen breast cancer treatment. WO 2003/041562 describes a method for classifying tumors by comparative examination.
WO 2005/071419 relates to a method of analyzing differential gene expression associated with breast disease by analysis of a set of protein markers.
US 7,118,853 provides a microarray to identify genes with expression profiles of breast cancer.
US 2006/0183141 provides a serum marker set for the classification of tumors.
US 2004/0053317 provides a method for the classification of samples characterizing cellular differentiation pathways.
Nearly all studies claim that their gene list results in a better single predictor than the standard clinical and histo- pathological parameters. Some show in multivariate analysis that their signature adds significant independent prognostic value to clincal and histopathological factors. Yet, euphoria stopped early as it became clear that the largest studies showed very little overlap in genes. To date only a few signatures are aiming for entry into clinical routine, none of them has passed approval by FDA so far. Criticism focuses usually on the insufficient power of the studies, resulting from too few patient samples involved in a study's training and validation sets , or on sample selection , lack of random validation and lack of multi-center validation .
Therefore, there is a need for reliable and efficient breast cancer diagnostic and prognostic methods and means.
The present invention provides a set of moieties specific for at least 20 tumor markers selected from the tumor markers of table 6, i.e. MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2 , MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4, CDKN3, Clβorfβl, DLG7, H2AFZ, PFKP, KPNA2, GATA3, CENPF, KRT18, KRT5, CCNE2, MELK, CX3CR1, TRIP13, MCM6, CCNDl, PDIA4, CENPA, UBE2S, NCFl, CDC25B, PGR, TGFB3, PSMD2, HMMR, XBPl, TROAP, KNTC2, PRAME, BTG2, KRT8 , FOXMl ,KYNU, NMEl, MCM3, NUSAPl, PCTKl, IGFBP5, CDC2, ERBB3, CSElL, PTTGl, PRCl, BRRNl, UBE2C, MUCl, KIF23, CDK2, PPP2R5C, RARRES3, PIR, CCT4, KIFl4, SLPI ,TOP2A, BBC3, RHOC, EZH2, HMGB3, GMPS, YIFlA, NP, DKFZp762El312, MET, FABP5, DCK, CTSC, CCNB2, FLJ21062, VEGF, IL32, CDC20, TACC2, IGFBP2, IFI30, ID3, GPSM2, TIMP2, CCNEl, EI- F4A1, RFC4, CST3, CCNA2 , CENPE, SLC25A5, GSTM3, SLC7A5, LETMDl, RPS4X, TFF3, ATAD2, ACADSB, KRT17, YWHAZ, PSMB7, CNKSRl, EXTl, SMC4, MCM2, GATM, DDOST, PEX12, YYl, TFDPl, LMNB2, HPN, POLQ, PCNA, GTSEl, MAPREl, PLAUR, PTDSSl, LRRC17, FENl, NDP, ABCD3, SCUBE2, TP53, AURKB, KIFCl, C0L3A1, NPYlR, PTPLB, SFRSlO, SDCl, CDC6, CD24, TCEALl Clorfl98, FAM64A, CDCA3, MSN, MYOlO, KIF2C, ASPM, TUBAl, VIL2, CYBRDl, CTSLf SFRS7, SESNl, LRP8, CP, KIT, CNAPl, TFRC, PLOD2, CKSlB, DUSP4, NDRGl, SLC35A1, CIS, CCT5, IFITMl, ITPR3, SAT, FABP7 , OMD, ADAMTSl, PPP1R12A, PRLR, FKBPlA, SNRPAl, CCNC, SCAPl, SPRR2C, FADS2, CTSL2, TLE3, PDAPl, IER2, ESPLl, CDHl, UBR2, RAB6A, CD44, FBXO5, F3, PTPRT, RACGAPl, CCT7, SLC25A1, C4orfl8, TXNRDl, SLC3A2, C16orf35, INSR, S0D2, GABBRl, SNRPB, EIF2C2, IDIl, CEP55, RLNl, PTMA, KIFIl, SHMT2, FAM89B, TPX2, CFB, EXOl, EIF4EBP1, DHFR, HIPK2, SYNCRIP, BRCAl, ZNF43, LMNBl, PBXIPl, FlO, FCGRT, FUT8, RAD21, FRY, LDHA, VASHl, GRB7 , ZMYM4, ACTB, CCL18, MTDH, MS4A7, C17orf27, LOC286052, TACC3, MTlX, TKl, CDH3, CDC42BPA, FUT3, GNAZ, YBXl, GPR126, ARPC4, AP2B1, COL6A1, CXCL9, C14orf45, DIAPH3, DNAJC12, LAPTM4B, TUBA3, DTL, ALDH4A1, ORC6L, ABLIMl, SHCBPl, FGFRl, ERRFIl, CIRBP, C20orf46, SLCl6Al, SPARC, CYP2J2, AP2A2, SLC39A6, F2, SCD, ECT2, QSCN6L1, H3F3B, COL2A1, TBX19, EDNl, OXCTl, RP13-297E16.1, PALM2-AKAP2, HRB, TUBB, CTPS, CAD, CHI3Ll, GREMl, ENOl, PLODl, SORBSl, TSPANl, STMNl, HIFlA, MMP7, STK3, G0LPH2, MT2A, FOXCl, SRM, COL1A2, GEMIN4, MAPRE2,PGK1, TIMPl, ZBTB4, CRABPl, MAP3K8, TGFBl, ClOorfllβ, C14orfl32, TP53INP1, BLM, CDC25A, MSX2, MMP23B, ADM, CTSF, TRA@, SFRP4, HMGAl, MRPS6, APBA2BP, STRA13, CDCA8, SQLE, ACSS2, FBPl, PSMA7 , HTATIP2, PSMD14, HSPB2, APPr TAS2R5, NFIB, TNFAIP2, NATl, SC4MOL, HNRPAB, TUBGl, PAXIPl, SEC14L1, SATBl, CELSR2, RNASEH2A, TMEM45A, CDKNlA, PTGS2, ARFl, HDAC2, BCL6, CKAP4, JUNB, N0LA2, APOD, MMPl, EGFR, CCT6A, HDG- FRP3, CES2, SMS, DEPDClB, TSPAN4, BDH2, EEF1A2, S100A8, WISPl, PGAMl, DYNLTl and/or ADCY3. The tumor markers can be modified DNA, RNA or proteins, including mutated genes or genetic loci, aberrant gene expression, aberrantly methylated genes or modified proteins, including modification in the amino acid sequence, glycosylation or three-dimensional structure of the identified genes indicated by the symbols in table 6 and given as GeneBank Database references below. These markers are known in the art per se and their aberrant modification can lead to the identity as tumor marker. The specific moieties are, for example, nucleic acids, such as PCR primers or hybridizing nucleic acids such as RNA, DNA or PNA, being specific for tumor marker nucleic acids or for moieties being specific for the tumor marker protein. The moieties are capable of identifying the tumor- relevant modification of the tumor marker. The final design of the moieties is generally known in the art, especially as exemplified by the documents cited herein. The present invention provides a compilation of the most relevant tumor markers that have been validated with clinical data of up to 1067 patients. Importantly, the set is also significantly associated with recurrence in the subcohort of untreated patients, indicating a direct role for breast cancer progression, allowing cancer prognosis. Cell cycle genes are impressively enriched in the signature, which represents the relevance of breast cancer progression. In the set the amount of specific moieties may be up to the amount of tumor markers or more. In most cases one moiety is specific for one tumor marker. In rare cases one moiety may be specific for two or more tumor markers.
In a preferred embodiment the set comprises moieties specific for at least 30, preferably at least 40, at least 50, at least 60, at least 70, at least 80 or at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180 or at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370 or 374 tumor markers selected from the tumor markers of table 6. Obviously, the more markers of the inventive collection given in table 6 below the better the diagnostic value and predictive power of the set. Especially preferred are sets with at least 200 tumor markers. The more markers are used the greater the qualitative value, however also with increased costs. A set with at least or about 200 tumor markers is a good compromise for applications such as micro titer plate assays or microarrays.
Preferably, the set comprises moieties specific for at least 20, preferably at least 40, at least 50, at least 60, at least 70, at least 80, at least 90 or at least 100, tumor markers selected from the tumor markers of table 6 with a score of at least 15, preferably at least 17, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30 or at least 32, and/or an overlap of at least 3, preferably at least 4, at least 5, at least 6, at least 7 or at least 8. These tumor markers represent often identified and diagnosed markers with a certain occurrence in clinical results . In particular preferred the markers are selected from those with an overlap of at least 4, which are MYBL2, RNASE4, GATA3, BIRC5, ESRl, BCL2, PRAME, ERBB2, AURKA, PSMD2, MAD2L1, KRT18, MKI 67, BTG2, GTSEl, NUDTl, BUBl, TGFB3, CCNBl, FOXMl, KPNA2, CX3CR1, PLKl, NMEl, RRM2, CENPN, CKS2, CDKN3, SLPI,. BRRNl, Clβorfβl, HMMR, PFKP, CTSC, CDC20, UBE2C, KIF23, DNAJC12, ASPM, MCM2, MCMβ, H2AFZ, MLFlIP, CCNDl, MCM4, GGH, DLG7, PDIA4, CCNE2, CENPF, UBE2S, KRT5, VIL2, CP, SFRSlO, TCEALl, TFDPl, SLC25A5, PSMB7, SLC7A5, EIF4A1, FENl, DDOST, HMGAl, TRA@, IGFBP5, MUCl, C0L1A2, RFC4, CENPA, VEGF, MELK, PTTGl, RARRES3, HRB, CENPE, TFRC, C14orf45, TRIP13, ERBB3, KRT8, TROAP, KNTC2, CSElL, PIR, MCM3, NUSAPl, KYNU, PGR, PPP2R5C, PRCl, IL32, NCFl, CDC25B and/or CDK2.
Particularly preferred aspect the present invention provides a set of moieties specific for at least 50 tumor markers with a score of at least 24, which are MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2, MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2 , CKS2, MCM4 , CDKN3, Clβorfβl, DLG7, H2AFZ, PFKP, KPNA2 , GATA3, CENPF, KRT18, KRT5, MELK, CCNE2, CX3- CRl, CCNDl, MCMβ, TRIPl3, PDIA4, CENPA, UBE2S, HMMR, NCFl, PSM- D2, PGR, CDC25B, TGFB3, TROAP, KNTC2 , XBPl, PRAME, KRT8, BTG2, FOXMl, KYNU, NMEl, MCM3, CDC2, NUSAPl, ERBB3, PCTKl and/or IGFBP5, in particular as a subset of the above defined set.
In a further aspect the present invention provides a set of moieties specific for at least 19 tumor markers with a score of at least 34, which are MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2 , MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4, CDKN3, Clβorfβl, DLG7 , H2AFZ, PFKP and/or KPNA2, preferably as a subset of the above defined set.
Especially preferred are the moieties of the set specific for at least 10, preferably at least 20, at leas't 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, or 188, tumor markers selected from the tumor markers of table 6 labelled "found in 1076 patients", which are . These markers have been especially validated with microarray data of 1067 patients. These markers are: PTDSSl, MYBL2 , PAXIPl, KIT, FABP7, TIMPl, AURKB, VEGF, TFRC, MMP7 ,ARPC4, PPP1R12A, FDT8, GSTM3, SAT, BUBl, NDRGl, NPYlR, TXNRDl, GNAZ, DCK, VASHl, RAB6A, RAD21, PFKP, HIFlA, TUBAl, PTGS2, TFDPl, CDC6, LDHA, NP, CDKN3, KRT8, CTSL, PCNA, BRCAl, IGFBP2, NATl, NFIB, SC4MOL, SDCl, TIMP2, PCTKl, ITPR3, CCT7, CDHl, CDC2, IER2, TFF3, PLOD2, FOXMl, SLPI, CHI3L1, KPNA2, YYl, TOP2A, MAPRE2, ABCD3, MET, SCD, SFRP4, JUNB, CSElL, SATBl, MKI67, MAP3K8, GEMIN4, YBXl, IDIl, XBPl, TGFBl, NDP, PLAUR, KRT17, ACADSB, FRY, GMPS, CCNA2, CDH3, YWHAZ, AURKA, DUSP4, C16orf35, MSN, CNKSRl, PTPRT, FKBPlA, EIF4A1, LRP8, ARFl, PIR, LRRC17, SLC3A2, SPARC, EGFR, TSPAN4, PTMA, PLO- Dl, CTSC, SHMT2, CFB, PSMD14, APOD, RNASE4, CKS2, HMMR, TP53, ClOorfllβ, MMPl, PDIA4 , GGH, KRT5, UBE2S, CENPF, ZMYM4, NUDTl, CRABPl, F2, GABBRl, CCNC, CYP2J2, PSMD2, CES2, ERBB2, CAD, FC- GRT, MAD2L1, BIRC5, GATA3, EXTl, RAME, BCL2, GRB7, CX3CR1, MSX2 , UBR2, UBE2C, NMEl, PEX12, MCM6, CCT4, BTG2, APP, TGFB3, OMD, SN- RPB, TRIP13, PRLR, HTATIP2, TROAP, SFRS7, FGFRl, SQLE, BCL6, NO- LA2, CENPE, HRB, IL32, IFI30, PPP2R5C, COL6A1, CDC25B, SCAPl, MCM3, FABP5, HDAC2, INSR, KYNU, RP13-297E16.1, NCFl, TAS2R5, DDOST, STK3, GATM, FENl, SLC25A5, PSMB7, CDC25A, SLC7A5 , CELSR2, SLC39A6, OXCTl, CCNEl, SLC25A1, CCNBl, TCEALl and/or TUBGl.
In a preferred embodiment the set comprises moieties specific for at least 5, preferably at least 10, at least 15 or at least 19, markers selected from the tumor markers of table 6 with a score of at least 34.
In an especially preferred embodiment the set comprises at least 10, preferably at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180 or at least 188, moieties specific for the tumor markers of table 6 with a greater centroid good value compared to the centroid poor value. These markers are in particular BCL2, RNASE4, GATA3, CX3- CRl, PDIA4, TGFB3, XBPl, BTG2, GSTM3, IGFBP2, ACADSB, GATM, CNKSRl, KRT17, TFF3, ABCD3, LRRC17 , PEX12, NDP, TCEALl, NPYlR, DUSP4, KIT, PTPRT, UBR2 , INSR, PRLR, SCAPl, PPP1R12A, IER2, OMD, VASHl, ZMYM4, FCGRT, GABBRl, CFB, FUT8, FRY, FGFRl, COL6A1, SLC39A6r SPARC, TIMPl, MSX2 , ClOorfllβ, NATl, SATBl, TSPAN4, CES2, SFRP4, BCL6, PAXIPl, CELSR2, APOD, APP and/or JUNB. The centroid values calculated from the clinical data analyzed in the present invention allows the diagnosis of the further development of breast cancer and tumor recurrence. A set with such "good" and "poor" prognosis markers is especially preferred.
In a further preferred embodiment the set comprises at least 10, preferably at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80r at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190 or at least 200, moieties specific for the tumor markers of table 6 with a greater centroid poor value compared to the centroid good value. These markers are in particular MYBL2, MKI67, MAD2L1, AURKA, BUBl, BIRC5, ERBB2, CCNBl, NUDTl, GGH, CKS2, CDKN3, KPNA2, PFKP, CENPF, KRT5, MCM6, TRIP13, UBE2S, CD- C25B, NCFl, HMMR, PSMD2, TROAP, PRAME, FOXMl, KRT8, MCM3, NMEl, KYNU, PCTKl, CDC2, CSElL, UBE2C, CCT4, PPP2R5C, SLPI, TOP2A, PIR, NP, VEGF, IL32, CTSC, DCK, FABP5, GMPS, MET, CCNEl, IFI30, SLC25A5, CENPE, SLC7A5, EIF4A1, CCNA2, TIMP2, YWHAZ, PSMB7, EXTl, YYl, PCNA, FENl, AURKB, TFDPl, DDOST, PTDSSl, PLAUR, TP53, SFRS7, CDC6, CTSL, MSN, TUBAl, SDCl, TFRC, PLOD2, ITPR3, LRP8, SAT, FABP7, NDRGl, RAB6A, SLC3A2, CCT7, C16orf35, TXNRDl, SLC25A1, CDHl, CCNC, FKBPlA, IDIl, GRB7 , RAD21, LDHA, SNRPB, PTMA, BRCAl, SHMT2, OXCTl, CDH3, ARPC4, RP13-297E16.1, F2, SCD, CYP2J2, YBXl, GNΑZ, CDC25A, TGFBl, GEMIN4, CHI3L1, MMP7, STK3, HRB, PLODl, CAD, MAPRE2 , CRABPl, MAP3K8 , HIFlA, EGFR, ARFl, PTGS2, NFIB, TAS2R5, N0LA2, SQLE, TUBGl, MMPl, PSMD14, SC4MOL, HTATIP2 and/or HDAC2. Preferably the set comprises both good and poor prognosis markers.
In an especially preferred embodiment the set comprises a moiety specific for the tumor marker MYBL2. In an further preferred embodiment the set comprises a moiety specific for the tumor marker BCL2 •
In a further aspect the invention provides a set of moieties specific for tumor markers selected from MYBL2, MKI 67, MAD2L1, AURKA and BCL2. These markers are the most prevalent tumor markers identified by the present invention and can also be incorporated into the set as described above.
Preferably, the moieties are nucleic acids, especially primers specific for tumor marker nucleic acids . In another embodiment the moieties are antibodies (monoclonal or polyclonal) or antibody fragments, preferably selected from Fab, Fab" Fab2 , F(ab')2 or scFv (single-chain variable fragments), specific for tumor marker proteins.
In a preferred embodiment the moieties of the set are immobilized on a solid support, preferably in the form of a microar- ray or nanoarray. The term ^microarray", likewise "nanoarray", is used to describe a array of an microscopic arrangement (nanoarray for an array in nanometer scale) or refers to a carrier comprising such an array. Both definitions do not contradict each other and are applicable in the sense of the present invention.
In a further aspect the present invention provides the use of the tumor markers as defined above as groups from the tumor markers of table 6 below, for the creation of a set for detecting the tumor markers, wherein the set has at least 20, preferably at least 30, preferably at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200, at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370 or 374, members. Such a set is described above, e.g..
In another aspect the present invention provides the method for detecting breast cancer or breast cancer cells, using the set as defined above and detecting or measuring for the occurrence of tumor markers in one or more sample (s) obtained from a patient. Thus, the present invention provides for a method for detecting or measuring a set of tumor markers selected form the groups, as defined above, for the specificity of the moieties. The patient may be a person with breast cancer or a person at risk for developing breast cancer. Preferably, the patient is a human being. Also contemplated is the diagnosis or prognosis of a (developing) breast cancer in a patient.
The method may comprise a detection or measurement by RNA- expression analysis, preferably by naicroarray or quantitative PCR, or protein analysis, preferably by tissue microarrays, pro- teinmicroarrays, ELISA, multiplex assays, iramunohistocheraistry, or DNA analysis, preferably CpG island methylation analysis, comparative genomic hybridization (CGH) -arrays or single nucle- otide polymorphism (SNP) -analysis . These methods are known in the art and can be readily used for the method of the present invention, as examples of the vast field of genetic marker analysis.
In a further embodiment the method comprises providing a pool of good prognosis markers from the tumor markers and generating a pool of poor prognosis markers and determining the specificity of the tumor markers of the pools, and classifying the sample according to the greater specificity to one pool. The pools can be generated by statistical analysis of the clinical data available for the tumor markers of the present invention.
Preferably, the pool of good prognosis markers is selected from the tumor markers of table 6 with a greater centroid good value compared to the centroid poor value. Also preferred, the pool of poor prognosis markers is selected from the tumor markers of table 6 with a greater centroid poor value compared to the centroid good value. This special classification allows a qualitative prognosis of a certain sample.
The present invention will be further illustrated by the following figures and examples without being limited thereto.
Figures :
Figure IA: Overlap-Diagram. The number of overlaps is the total number of gene lists that a UniGeneCluster appears in. There is only one UniGene ID that appears in 11 gene lists; it is MYBL2.
Figure IB: CENPN and Clβorfβl are neighbours on the genome and share a common CpG Island. Image from UCSC. RefSeq track is in blue and CpG Islands track in green.
Figure 1C: KEGG pathway chart "cell cycle". Genes contained in Signature are red
Figure 2: Kaplan Meier plots of relapse-free survival using 1067 patients that were grouped by A) 374 Gene expression signature, (504 patients in group λgood' and 563 patients in group λρoorf ) , B) Lymph node status (eleven observations deleted due to missing lymph node values) and C) Estrogen receptor status (six observations deleted due to missing estrogen receptor values). Numbers at risk are given at last time of event before the fixed timepoints (0,50,100,150,200)
Figure 3: Performance of the 374 Gene Signature in histopathological subcohorts.' A) LN- patients, B) LN + patients, C) ER- patients D) ER+ patients and in E) untreated (neither chemotherapy nor hormones) patients. Numbers at risk are given at last time of event before the fixed timepoints (0,50,100, 150,200)
Figure 4: DAG diagram of the GO terms enriched in the "374 Signature" (at the markers of table 6 below) when compared to the human genome .
Examples :
From 44 published gene lists relevant for breast cancer prognosis, 374 genes were extracted, which, besides other quality criteria, were recorded at least twice. This gene set, termed "374 - Signature", is highly enriched in genes involved in the cell cycle. From 8 published microarray datasets was created a multi- center validation set of 1067 breast cancer patients, using transformation to poe (probability of expression) scale. The "374 - Signature" is significantly associated to breast cancer recurrence (p = 2 x 10~12, log-rank test) with an estimated Hazard Ratio of recurrence for the "poor" prognosis group compared to the "good" prognosis group of 2.03 (95% confidence interval, 1.66 to 2.48). In multivariate analysis, including the standard histopathological parameters, only tumor size and the "374 - Signature" remain independent predictors of recurrence. Notably, the "374 - Signature" is significantly associated to recurrence in untreated patients, and therefore is regarded as a ranked ' list of potential therapeutic targets. Example 1: Methods
Comprehensive literature search
Pubmed was queried for new articles using high-through put methods directly linked to breast cancer prognosis. Articles describing association with breast cancer prognosis of less than four genes were not considered for further analysis. 32 gene lists that are of direct relevance for breast cancer prognosis and a selection of 12 gene lists indirectly linked to breast cancer prognosis were used for this study. In summary, 44 gene lists from 42 published works were included in the analysis.
Annotation to UniGeneClusters and Selection of the " 374 - Signature*
All together, 4475 gene identifiers, consisting of all available identifying terms obtained from the original publication, were annotated to UniGene Clusters (UniGene Build 194) . Annotation was performed with the CGAP Batch Gene Finder implemented in the ARCS (Austrian Research Centers Seibersdorf) GeneFilter (version 2006-09-27) . For 4192 gene identifiers a UniGene ID was found. Leaving aside UniGene IDs appearing only in one gene list and a single gene list's redundant UniGene IDs, 1676 gene identifiers from 626 UniGene IDs were found in more than one gene list. Due to differing quality of the studies (e.g., whether validation of the gene list was done, the number of samples investigated, ... ) , a score between 1 and 10 was assigned to each gene list and therein contained UniGene IDs. The 'better' the gene list the higher the score. 374 UniGene IDs of the 626 UniGene IDs surpassed the best single score of 10 and were selected for further characterization and validation.
Characterization of the "374 - Signature"
For Gene Ontology analysis, gene symbols were used as input for GOTM (Gene Ontology Tree Machine) Webgestalt. WEBGESTALT_HUMAN was chosen as reference. GO Terms significantly enriched in the signature compared to the human genome were obtained. GOTM Webgestalt also tested for enriched KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways in the 374 - Signature also taking the human genome as reference. For both GOTM and KEGG analysis, significantly enriched genes were calculated by a hypergeomet- rical test, only considering pathways containing at least 2 genes from the signature. To find neighbouring genes on the genome, a custom track for the V374 - Signature v was generated and analyzed with the UCSC Genome Browser.
Transformation of validation datasets to poe-scale and synthesis to a single dataset
To generate a mulit-center validation set, eight datasets with corresponding patients data were downloaded from the published article, GEO (Gene Expression Omnibus), or the author's homepage (see Table 1) .
Table 1
Dataset Synonym ref rows patients oligo/cDNA chanels data source
Vijver 24496 229 60-mer 2 http //www πi com/publιcatιons/2002/nejm html
Shen - Sortie 2555 58 cDNA 2 Additional File 1 of Shen et al
Shen - van't Veer 2555 78 60-mer 2 Additional File 2 of Shen et al
Shen - Sotiriou 2555 98 cDNA 2 Additional File 3 of Shen et al
Shen - Huang 2555 71 25-mer 1 Additional File 4 of Shen et al
Sotιπou06 22283 187 25-mer 1 GSE2990
Foekens 22283 286 25-mer 1 GSE2034
Ma 22575 60 60-mer 2 GSE1379 Supplementary Data 1 2292 1067 poe - transformed
Probe identifiers of the datasets were updated to UniGene Clusters (UniGene Build 194). The datasets were reduced to those probe identifiers that were annotated to the 2292 UniGene Clusters common to all 8 datasets. For the Sotiriou study from 2006 the values after RMA preprocession were taken, two patients were deleted due to missing observations in recurrence. For the Foekens study the signals averaged to intensity 600 were used. For the van de Vijver study only 229 patients unambiguously non- redundant to the van' t Veer study were included. The van de Vijver data had to be transformed from loglO to the Iog2 ratios and the few missing values were replaced by 0. For the Ma data the Iog2 ratios were used. The four datasets of the Shen study are already published in poe (probability of expression) - scale by Shen et al. Without further preprocessing, all other datasets were transformed to poe-scale, using the package poe (version 0.2-9) in the biostatistical software R 2.3.1, with the default setting of M = 2000 changed to M = 250. poe-values range between -1 and 1. Subsequently, poe values for probes of the same Uni- Gene ID were replaced by median poe values. All 8 datasets were unified to a single poe-matrix describing the probability of "expression of 2292 genes in 1067 patients.
Cluster 3.0
188 UniGene IDs of the S374 - Signature s are contained in the 2292 UniGene IDs of the united poe matrix. The corresponding poe-matrix of 188 UniGene IDs x 1067 patients was prepared. To cluster the patients into two groups according to their gene expression profile, k-means clustering with the help of CLUSTER 3.0 software was employed. No data filtering nor data adjustment was applied to the poe-matrix. Uncentered correlation was applied as distance measure, k-means clustering of patients into 2 groups was performed 100 times, and the solution with the least within-cluster sum of distances was picked automatically by the software. Centroids for the good and poor prognosis group were obtained by calculating the mean for each of the 188 UniGene ID 's poe-values from all patients of the respective prognosis group. In the same way, k-means clustering was performed for the entire united poe-matrix (2292 x 1067) and for 10 poe-matrices of randomly chosen 188 UniGene IDs (188 x 1067) .
.Recurrence analysis
Statistical analysis was performed with package survival (version 2.26) in R 2.3.1. Kaplan-Meier analysis was executed with the function survfit and Cox proportional hazards regression models were fitted with the coxph function. After fitting the appropriate Cox proportional hazards regression model, uni- and multivariate model parameters (log-rank test, WaId test, Hazard ratio with its 95% confidence' interval) are displayed by the summary. coxph function. With given follow-up characteristics of n = 1067 patients, the minimal detectable hazard ratio is 1.22 at significance level 0.05 with a power of 0.9, as calculated at the website hedwig.mgh. harvard. edu/sample_size/quan_measur/para_time.html . Example 2 :
Literature search and gene selection
A comprehensive literature search was performed, which cover most important publications dealing with breast cancer prognosis until July 2006. Additionally some selected studies were added that deal only indirectly with breast cancer prognosis but seemed important, resulting in a total of 44 gene lists from 42 studies (table 2a) .
Table 2a
Publication PMIO Score Source
Abba et al, BMC Genomics 2005 15762987 1 Table2
Ahr et al. The Lancet, 2002 11809257 4 Acknowledgements
Amatschek et al, Cane Res 2004 14871811 2 Table3
Amatschek et al. Cane Res 2004 14871811 5 Fig6
Beer et al, Nature Medicine 2002 12118244 3 SupplTablei
Berchuck et al. ClinCancRes 2005 15897565 3 Table2
Bertucci et al HumMolBiol. 2002 11971868 4 Figure2
Bieche et z\. WIoI Cane 2004 15606925 4 Table3+4
Chang et al, PNAS 2005 15701700 8 gvant_veer_early_cazip
Dai et al. Cancer Research, 2005 15899795 4 Table S3
Glinksky et al, Journal of Clinical Investigation 2005 15931389 10 Table3
Glinsky et al, Journal of Clinical Investigation 2004 15067324 3 Tablei
Glinsky et al. Clin Cane Res 2004 15073102 4 Table1+2
Hu et al. BWIC Genomics, 2006 16643655 5 WebSupplement
Huang et al. 2003, The Lancet 12747878 3 Table 2
Huang et al. 2003, The Lancet 12747878 3 Table2
Iwao et al, HumMolGen 2002 11809729 6 Tablei
Jacquemier et al. Cane Res 2005 15705873 10 from text
Jones et al, CancRes 2004 15126339 3 Tablei , italic
Korkutø et al, CancerResearch 2003 14612510 1 Table2
Ma et al. PNAS 2003 12714683 3 SupTab5(6)
Makretsov et al. Clin Cane Res 2004 15448001 8 Table2
Miller et ai., Modern Pathology, 2004 15073601 4 Figure2
Nutt et al, CancerResearch 2003 12670911 3 Table3
Onda et al. J Cane Clin Oncol 2004 15235906 5 Table4+5
Paik et al. New England Journal of Medicine 20Q4 15591335 6 Figurei
Pomeroy et al, Nature 2002 11807556 3 Figure4
Pusztai et al. Clin Cane Res 2003 12855612 1 Table3 Ramaswamy et al NatGen 2003 12469122 3 SuppllnfoSheetE
Rhodes et al, PNAS 2004 15184677 5 SupplFigureδ
Rosenwald et al, NEJM 2002 12075054 3 Table2
Shen et al. 2004, BMC Genomics 15598354 9 Additional File7
Sortie et al. PNAS 2001 11553815 7 SupplFigδ
Sotiriou et al. J of the NCI, 2006 16478745 8 SuppTable i
Sotiriou et al. PNAS 2003 12917485 6 SupplTable9 Van Laere et al. Breast Cancer Res and Treatment, 2005 16172796 3 Table3 van't Veer et al, Nature 2002 11823860 9 TabS2
Wang et al. Cancer Research 2002 12414658 3 SupplTable2
Wang et al. Lancet 2005 15721472 10 Table3
West et al. Plos Biology, 2005 15869330 2 SuppTable 1
West et al. PNAS 2001 11562467 1 SuppTable3
Woelfle et al. Cane Res 2003 14522883 3 Tabie1:-e-f
Yu et al, Cane Res 2004 15126326 6 Suppllnf2
Zhu et al. Oncogene 2003 12802281 6 Table1+2
Subsequent selection of breast cancer progression relevant genes was based on the following idea: Due to bias and multiple testing it is easy for any gene to reach significance and enter any one of the 44 gene lists. If a gene appears in more than one gene list, especially gene lists with well-done and validated evidence, the gene is likely to be truly associated with breast cancer prognosis. Using the ARCS (Austrian Research Centers Seibersdorf) GeneFilter, 4192 of all 4475 gene identifiers could be annotated to UniGene Clusters (UniGene Build 194). 1676 gene identifiers are attributable to the 626 UniGene IDs appearing in more than one gene list (Fig. IA) . A subjective score between 1 and 10 was applied to the gene lists and therefore to the therein contained genes. UniGene IDs with score of 11 or greater have to appear in more than one gene list as the highest single score is 10. Additionally the gene lists containing such an UniGene also have to be of high relevance for breast cancer prognosis to surpass a score of 10. 374 UniGene IDs hold a score of 11 or greater and are furthermore referred to as the "374 - Signature". The full ranked list with extensive annotation can be found in table 6. The 25 top scoring UniGene IDs starting with MYBL2, scoring 65, are listed in Table 2b. Intriguingly, the genes ranked 8th and 21st are neighbours on the genome and share a common CpG Island (Fig. IB) . Table 2b
Rank Symbol UniGene ID Geneld Overlaps Score
1 MYBL2 Hs.179718 4605 11 65
2 MK)Bl Hs.80976 4288 7 50
3 MAD2L1 Hs.591697 4085 7 47
4 AURKA Hs.250822 6790 7 46
5 BCL2 Hs.150749 596 8 45
6 BUB1 Hs.469649 699 6 44
7 BIRC5 Hs.514527 332 8 42
8 ESR1 Hs.598504 2099 8 41
8 CENPN Hs.55028 55839 6 41
10 ERBB2 Hs.446352 2064 7 40
10 CCNB1 Hs.23960 891 6 40
12 NUDT1 Hs.534331 4521 6 39
12 MLFHP Hs.575032 79682 5 39
14 RNASE4 Hs.283749 6038 8 38
14 PLK1 Hs.592049 5347 6 38
14 GGH Hs.78619 8836 5 38
17 CKS2 Hs.83758 1164 6 37
17 RRM2 Hs.226390 6241 6 37
19 CDKN3 Hs.84113 1033 6 36
19 MCM4 Hs.460184 4173 5 36
21 C16orf61 Hs.388255 56942 5 35
21 DLG7 Hs.77695 9787 5 35
23 H2AFZ Hs.119192 3015 6 34
23 KPNA2 Hs.594238 3838 5 34
23 PFKP Hs.26010 5214 5 34
Example 3
Characterization of the "374 Signature"
GO (Gene ontology) analysis showed that genes involved in cell cycle are highly enriched in the "374 - Signature" when compared to the whole human genome. The most significantly enriched GO Terms are all somehow related to the cell cycle (Table 3A) . The dominant molecular function of the 374 - Signature clearly is "binding", especially protein binding and nucleotide binding (Table 3B) . The most favourable place of action in the cell is on the spindle or near the chromosome (Table 3C) . The DAG (Dir- ected Acyclic Graph) diagram visualizes the close interaction of the enriched GO terms (Fig. 4) . According to the KEGG pathways, most genes of the signature are involved in the cell cycle pathway, which is also the most significantly overrepresented pathway relative to the human genome (Table 3D) . The firm interplay of the "374 - Signature' genes in the cell cycle is mapped in the KEGG pathway chart (Fig. 1C) .
Table 3
A) In biologicial process Observed Expected Ratio p-value cell cycle 77 16.84 4.57 1.00E-30 mitotic cell cycle 42 4.93 8.52 1.69E-27 mitosis 34 3.52 9.66 2.59E-24
M phase of mitotic cell cycle 34 3.57 9.52 4.30E-24 cell division 35 4.05 8.64 2.93E-23
M phase 36 4.57 7.88 2.08E-22 regulation of progression through cell cycle 51 11.09 4.6 2.40E-20 regulation of cell cycle 51 11.11 4.59 2.66E-20
B) In molecular function protein binding 166 96.17 1.73 9.10E-17 adenyl nucleotide binding 57 28.97 1.97 4.63E-07 purine nucleotide binding 66 36.39 1.81 9.23E-07
ATP binding 54 27.93 1.93 1.69E-06 binding 253 216.79 1.17 2.04E-06 nucleotide binding 71 41.94 1.69 4.06E-06
C) In cellular component spindle 11 0.85 12.94 3.67E-10 chromosomeV, pericentric region 10 0.85 11.76 6.73E-09 microtubule cytoskeleton 24 6.04 3.97 8.21 E-09 chromosome 25 6.91 3.62 2.69E-08
D) KEGG pathways
Cell cycle 30 1.43 21.01 2.45E-31
Cell Communication 11 1.53 7.18 4.21 E-07
Adherens junction 8 0.99 8.11 6.53E-06
Focal adhesion 12 2.65 4.53 1.60E-05
Glycolysis / Gluconeogenesis 7 0.83 8.43 1.93E-05
Pyrimidine metabolism 8 1.17 6.85 2.29E-05
ECM-receptαr interaction 7 1.03 6.83 7.62E-05
Example 4 : Validation of the '374 - Signature "
The prognostic relevance of the top-scoring genes were confirmed in an adequately powered computational meta-analysis using mi- croarray data from published studies. All together 8 datasets with corresponding recurrence data of breast cancer patients could be found (Table 1) . The datasets were transformed to poe scale and united to a single multi-center validation set including 1067 patients and 2292 UniGene IDs. 188 UniGene IDs of the 374 Signature were found in the 2292 UniGene IDs of the validation set and therefore used for subsequent prognosis analysis. Using Cluster 3.0, the 1067 patients were split into 2 patient groups by the k-means clustering algorithm. The Λgood' prognosis group consists of 504 patients with a recurrence rate of 28.6%, the Λpoor' prognosis group consists of 563 patients with a recurrence rate of 48.9%, respectively. To see whether the resulting patient groups differ in recurrence free survival the Cox proportional hazards regression model was applied. (Fig. 2A; p= 2.5 x 10~12, log-rank test). The estimated Hazard Ratio (HR) for recurrence in the group of 'poor' prognosis gene expression compared to the group of "good" prognosis gene expression is 2.03 (95% confidence interval, 1.66 to 2.48). The "374 - Signature' is significantly associated with recurrence free survival in the clinically important subcohorts of lymph node negative (Fig. 3A) , lymph node positive (Fig. 3B) and estrogen receptor positive (Fig. 3D) patients but not in estrogen receptor negative patients (Fig. 3C) . Notably, in untreated patients (neither chemo- therapeutica nor hormones) , the 374 - Signature is also significantly associated with recurrence, indicating that the signature is not a mere predictor of therapeutic response but inherent to tumor progression (Fig. 3E).
In univariate analysis, lymph node status and estrogen receptor status are significantly associated with recurrence (Fig. 2B and C and Table 4), however hazard ratios are less than for the '374 - Signature". Patient information about tumor size and grade was only available for 689 and 780 patients, respectively. Recurrence free survival was significantly associated to both tumor size and grade in the remaining patients (Table 4). Table 4
Variable n HR [95%CI] P
Lymph node status 1056 1.36 [1.11 ;1.65] 0.00268
ER status 1061 0.684 [0.551 ;0.848] 0.000502
G 1 vs 2+3 689 2.61 [1.81 ;3.76] 8.64E-08
G 1+2 vs 3 689 2.08 ['1.64;2.65] 9.18E-10
Size <=2cm vs >2cm 780 2.17 [1.71 ;2.74] 4.46E-11
374 Signature 1067 2.03 [1.66;2.48] 2.49E-12
Signature in sub-cohort
LN- 704 2.15 [1.67;2.75] 8.47E-10
LN+ 352 1.74 [1.22;2.5] 0.00221
ER- 242 1.33 [0.716;2.49] 0.362
ER+ 819 2.09 [1.66;2.62] 8.13E-11 untreated 526 1.7 [1.28;2.27] 0.000233 other gene expression signatures
10 random signatures 1067 mean HR = 1.135 best: 0.017
2292 x 1067 matrix 1067 1.18 [0.968;1.44] 0.102
To show that the "374 - Signature" perforins superior to a signature generated by chance, 10 signatures consisting of 188 randomly chosen UniGene IDs were tested. Using k-means clustering, only one of the 10 signatures resulted in patient groups significantly different in relapse free survival (p=0.017, log-rank test) . The mean Hazard Ratio of recurrence of the 10 random signatures is less than the lower limit of the 95 % CI of the Hazard Ratio of the 374 - Signature group (p«0.0001, one sample t- test) . Clustering of patients by the full 2292 genes, doesn't result in patient groups with significantly different breast cancer recurrence either. All univariate analysis is summarized in Table 4. In a multivariate including lymph node and estrogen receptor status and the "374 - Signature" (1050 patients with observations), only the "374 - Signature" remains significantly associated to recurrence free survival (Table 5B) . In the presence of the "374 - Signature", lymph node status and estrogen receptor status carry no further information about disease recurrence. Including all routinely used histopathological factors and the "374 - Signature" (673 patients with observations), only tumor size and the "374 - Signature" are independent predictors of breast cancer recurrence, with the "374 - Signature" showing the higher hazard ratio (Table 5A) . Table 5
A) n=673 (394 observations deleted due to missing) HR [95%CI] P
Lymph Node Status 1.08 [0.839;1.39] 0.55
ER Status 0.82 [0.615;1.09] 0.18
Tumor Grade: 1 + 2 vs. 3 1.28 [0.975;1.68] 0.076
Tumor Size: <= 2cm vs. >2cm 1.9 [1.45;2.48] 2.9E-06
374 Signature: good vs. poor 2.22 [1.607;3.07] 1.3E-06
B) n=1050 (17 observations deleted due to missing) Lymph Node Status 1.21 [0.982;1.48] 0.074
ER Status 0.89 [0.703;1.13] 0.33
374 Signature: good vs. poor 1.93 [1.544;2.42] 7.9E-09
Example 5 : Discussion
So far, to the best knowledge, the present invention reports the most comprehensive matching of genes from high-throughput publications and the largest microarray validation dataset in the field of breast cancer prognosis. Referred gene lists were mainly developed on several different microarray platforms but also by real time - PCR and tissue microarrays . One gene list was even discovered by work on metastatic primary mammary tumors of the rat.
The true potential of the "374 - Signature's genes for breast cancer progression should not be underestimated. First, for reason of unification, all of the original gene list identifiers and validation dataset identifiers had to be replaced by UniGene IDs, therefore losing all isoform information. By annotation to UniGene Cluster, 93.7% of all gene identifiers from the gene list could be used. Isoform-specific alignment could reveal more strength of the signature, however, appropriate identifiers are required. Second, synthesis of microarray datasets is only at its beginning. Prior to dataset synthesis, precisely defined standard procedures for preprocessing of raw data are required for one-colour Affymetrix, two-colour Agilent and custom microarrays, respectively. Several methods besides poe calculation, were suggested to transform different microarray datasets to a common scale, the best performing method remains yet to be determined. Using the entire poe-transformed matrix (2292 genes x 1067 patients) , patients from one study rather cluster together than estrogen receptor positive and negative patients according to the molecular subtypes of Perou et al. Although, the remaining bias of the individual datasets is not strong enough to cover the effect of the "374 - Signature", it could not be fully avoided by transformation to poe-scale. Third, the term "recurrence" had to be recruited from different terms of the eight validation studies, which is likely to introduce additional bias to the performance of the signature. Clinicians also use different procedures to estimate estrogen receptor status, node status, tumor size and grade. In contrast, centroids (vectors of 188 poe-values) were defined for the "good" and "poor" prognosis group (Table 6) . New samples can be transformed to poe-scale and readily categorized to the prognosis group for which the uncen- tered correlation of the 188 poe-values is highest. Despite those deficiencies, the 188 testable genes of the "374 - Signature" still remain highly relevant for breast cancer recurrence.
The present invention has learned from criticism that has risen against publications presenting molecular predictors of breast cancer prognosis. Most importantly, with 1067 patients, validation is adequately powered and is dedicated to the idea of a multi-center validation, which not only covers different researchers from different laboratories, but even various micorar- ray platforms. Next, randomly generated signatures are proven to not reach performance of the "374 - Signature". This is of importance, because a large proportion of genes is slightly correlated to breast cancer survival. Last, the patients should represent a typical breast cancer cohort. When taken together, the patients of the 42 original publications certainly are a good representation of the overall breast cancer population. However, the validation set with 704 lymph node negative patients compared to 352 lymph not positive patients, shows a slight over- representation of lymph node negative patients.
Importantly, the "374 - Signature" is also significantly associated with recurrence in the subcohort of untreated patients, indicating a direct role for breast cancer progression. Cell cycle genes are impressively enriched in the signature, which suggests that proliferation is the decisive feature of breast cancer progression. Overrepresentation of nucleotide and protein binding functions demonstrate the enrichment of key regulators in the signature. Because the markers were gathered throughout the abundant literature, the signature is also a collection of potential therapeutical targets (Table 6) . For those targets, new drugs could be designed that may add to the rather limited benefit of standard chemotherapeutica and tamoxifen. Top ranked MY- BL2 could be found in 11 gene lists, however, its function in breast cancer is relatively unknown. According to the "NCBI Gene" database, MYBL2 is phosphorylated in the S-Phase by CCNA2/CDK2 complex and activates CDC2, CCNDl and IGFBP5. All five genes appear in the "374 - Signature". OMIM database additionally reports, that three MYBL2 binding sites were found in the 5th-ranking gene BCL2. MYBL2 and 4th ranking AURKA are located on 20ql3, which is frequently amplified in breast cancer with prognostic implications. Genes important for breast cancer classification, like ESRl (estrogen receptor) and ERBB2 (Her2), rank at formidable positions 8 and 10, respectively. Equally 8th ranked CENPN and 21st ranked C16orf61 are neighbors on the genome and share a common CpG island. Methylation analysis, which is gaining importance for breast cancer prognosis , and CGH (comparative genomic hybridization) could be done for all of the genes of the signature.
The "374 - Signature" remains the best single predictor of breast cancer recurrence in an extensive multi-center validation set. In the presence of the "374 - Signature" several histopathological parameters used for therapeutical decision making, do not add significant prognostic information about recurrence and therefore outcome of disease. The genes contained in the signature, extracted throughout 42 publications, are an up-to-date collection of potential therapeutical targets.
Example 6: Table 6: Genes of the -"374 - Signature" and centroids for the "good" and "poor" prognosis-groups. Found in 1076 Centroid Centroid
Symbol Name Accessions UniGeneld Geneld Overlaps Score patients Good Poor
MYBL2 V-myb myeloblastosis NM 002466 Hs.179718 4605 viral oncogene homolog (avian)-like 2 11 65 Yes -0,19 0,05
MKI67 Antigen identified by NM 002417 Hs.80976 4288 monoclonal antibody Ki-67 7 50 Yes -0,02 0,05
MAD2L1 MAD2 mitotic arrest defiNM_002358 Hs.591697 4085 cient-like 1 (yeast) 7 47 Yes -0,00 0,13 AURKA Aurora kinase A NMJ 98433 Hs.250822 6790 NMJD03600 NMJ 98434 NMJ 98436 NMJ 98435 NMJ 98437 7 46 Yes -0,04
CO 0,06
C BC L2 B-cell CLL/lymphoma 2 NM_000633 Hs.150749 596 co NM_000657 8 45 Yes
CO 0,01 -0,14
H BUB1 BUB1 budding uninhibNM 004336 Hs.469649 699
H ited by benzimidazoles 1 homolog (yeast) 44 Yes -0,05 0,02 m BIRC5 Baculoviral IAP repeat- NM_001012271 Hs.514527 332 K*
CO containing 5 (survivin) NMJ)01168
I m NM_001012270 8 42 Yes -0,03 0,05 m
^^ ESR1 Estrogen receptor 1 NM_000125 Hs.598504 2099 8 41 No NA NA Tl CENPN Centromere protein N NM_018455 Hs.55028 55839 6 41 No NA NA
C ERBB2 V-erb-b2 erythroblastic N M_001005862 Hs.446352 2064 m leukemia viral oncogene NM 004448
IO homolog 2, neuro/glio- σ> blastoma derived oncogene homolog (avian) 7 40 Yes 0,05 0,18
CCNB1 Cyclin B1 NM_031966 Hs.23960 891 6 40 Yes -0,03 0,04
NUDT1 Nudix (nucleoside diNMJ 98949 Hs.534331 4521 phosphate linked moiety NMJ 98952 X)-type motif 1 NMJ 98954 NMJ 98948 NMJ 98950 NMJ 98953 NM_002452 6 39 Yes -0,02 0,07
MLF1 IP MLF1 interacting protein NM_024629 Hs.575032 79682 5 39 No NA NA RNASE4 Ribonuclease, RNase A NMJ 94430 Hs.283749 6038 8 38 Yes 0,03 family, 4 NM 002937 -0,03
NM 194431
NM 001145
PLK1 Polo-like kinase 1 (Dro- NM_005030 Hs.592049 5347 sophila) 6 38 No NA NA
GGH Gamma-glutamyl hydroNM_003878 Hs.78619 8836 lase (conjugase, folylpo- lygammaglutamyl hydrolase) 5 38 Yes -0,03 0,06
CKS2 CDC28 protein kinase NM_001827 Hs.83758 1164 regulatory subunit 2 6 37 Yes -0,04 0,03
RRM2 Ribonucleotide reductase NM_001034 Hs.226390 6241
M2 polypeptide 6 37 No NA NA
CDKN3 Cyclin-dependent kinase NMJD05192 Hs.84113 1033 inhibitor 3 (CDK2-associ- ated dual specificity phosphatase) 6 36 Yes -0,04 0,11
CO
(™ MCM4 MCM4 minichromosome NM 005914 Hs.460184 4173 co maintenance deficient 4 NMJ82746
CO
— I (S. cerevisiae) 5 36 No NA NA
H C16orf61 Chromosome 16 open NMJD20188 Hs.388255 56942 reading frame 61 5 35 No NA NA m DLG7 Discs, large homolog 7 NM_014750 Hs.77695 9787
CO (Drosophila) 5 35 No
X NA NA m H2AFZ H2A histone family, NM_002106 Hs.119192 3015 m member Z 6 34 No NA NA
KPNA2 Karyopherin alpha 2 NM_002266 Hs.594238 3838
73 (RAG cohort 1 , importin ι- alpha 1) 5 34 Yes -0,01 m 0 ,11
PFKP Phosphofructokinase, NM_002627 Hs.26010 5214 σ> platelet 5 34 Yes -0,03 0 ,06
GATA3 GATA binding protein 3 NM 001002295 Hs.524134 2625
NM 002051 8 33 Yes -0,01 -0 ,35
CENPF Centromere protein F, NM_016343 Hs.497741 1063
350/400ka (mitosin) 5 33 Yes -0,08 0 ,04
KRT18 Keratin 18 NM 000224 Hs.406013 3875
NM 199187 7 32 No NA NA
KRT5 Keratin 5 (epidermolysis NM 175053 Hs.433845 3852 5 32 Yes -0,00 0 ,06 bullosa simplex, Dowling- NM 000424
Meara/Kobner/Weber- NM 005554
Cockayne types) NM 033448
NM 058242
NM 005555
NM 173086
NM 080747
MELK Maternal embryonic leuNM. _014791 Hs.184339 9833 cine zipper kinase 31 No NA NA
CCNE2 Cyclin E2 NM 057749 Hs.567387 9134
NM 057735 4 31 No NA NA
CCND1 Cyclin D1 NM] .053056 Hs.523852 595 6 30 No NA NA
MCM6 MCM6 minichromosome NM. .005915 Hs.444118 4175 maintenance deficient 6
(MIS5 homolog, S. pom- be) (S. cerevisiae) 5 30 Yes -0,03 0,09
TRIP13 Thyroid hormone recepNM. _004237 Hs.436187 9319 tor interactor 13 5 30 Yes -0,02 0,10
CX3CR1 Chemokine (C-X3-C moNM. .001337 Hs.78913 1524
CO tif) receptor 1 4 30 Yes 0,05 -0,03
C r CENPA Centromere protein A NM. .001809 Hs.1594 1058
DnO 5 29 No NA NA CO UBE2S Ubiquitin-conjugating enNM[ .014501 Hs.396393 27338
H zyme E2S 29 Yes -0,02 0,11
— I PDIA4 Protein disulfide iso- NM. .004911 Hs.93659 9601
H m merase family A, mem¬
CO ber 4 4 29 Yes 0,00 0,00
I NCF1 Neutrophil cytosolic NM 023005 Hs.520943 4687 7 28 Yes -0,00 0,00 m m factor 1 , (chronic granuNM "032408 lomatous disease, autoNM" 003388
73 somal 1) NM" 032421
C NM "032999 m NM "033000
IO σ> NM" 033001
NM "001518
NM 001003795
NM "173537
NM 016328
NM" 002314
NM 032951
NM 032953
NM "032952
NM" 032954
NM '005685
NM '012453
NM~ 001039145
NM "022170
NM" '032317
NM_031992
NM_030798
NM_003508
NM_000501
NM_022040
NM_032463
NM_004603
NMJ78125
NM_014146
NM_001305
NM_018044
NMJ81471
NM_001707
NMJ 48956
NM_031295
CO
C NM_002914 CO NM_003602 CO NMJ48912
NMJ 48915
NMJ48913 m NM_000265
CO
I N M_001040003 K* m N M_001025202 m NM_001306
NM_017528
TJ NMJ 38707 ι- m NMJ 52559
IO NMJ 82504
NMJ 74930
NMJ48916
NMJ48914
TGFB3 Transforming growth facNM_003239 Hs.592317 7043 tor, beta 3 28 Yes O1OO -0,09 CDC25B Cell division cycle 25B NM_021873 Hs.153752 994 NM_004358 NM_021872 5 28 Yes -0,00 0,17
HMMR Hyaluronan-mediated NM_012484 Hs.72550 3161 4 28 Yes -0,04 0,06 motility receptor NM_012485
(RHAMM) PSMD2 Proteasome (prosome, NM 002808 Hs.518464 5708 macropain) 26S subunit, non-ATPase, 2 28 Yes -0,02 0,05
PGR Progesterone receptor NM. _000926 Hs.368072 5241 4 28 No NA NA
XBP1 X-box binding protein 1 NM^ _005080 Hs.437638 7494 8 27 Yes -0,03 -0,30
PRAME Preferentially expressed NM" "206953 Hs.30743 23532 antigen in melanoma NM 206955
NM "206956
NM 006115
NM 206954 27 Yes 0,04 0,33
TROAP Trophinin associated proNM. .005480 Hs.524399 10024 -0,01661404 0,08117660 tein (tastin) 4 27 Yes 8 2
KNTC2 Kinetochore associated 2 NM. _006101 Hs.414407 10403 3 27 No NA NA
KRT8 Keratin 8 NM 002272 Hs.533782 3856
NM 015848
NM 057088
NM" 175834
CO NM 002273
C m NM 173352 6 26 Yes 0,00 0,01
CO BTG2 BTG family, member 2 NM. .006763 Hs.519162 7832 6 26 Yes O1OO -0,08
— I FOXM 1 Forkhead box M1 NM 202002 Hs.239 2305
NM 021953
— I m NM" "202003 26 Yes -0,04 0,08
CO NME1 Non-metastatic cells 1 , NM" 001018136 Hs.463456 4830
I κ» m protein (NM23A) exNM 198175 m pressed in NM 002512
NM 000269
73 NM" 001018137 ι- NM" "001018138 m NM" "001018139 6 25 Yes 0,01 0,10 σ> KYNU Kynureninase (L-kynure- NM" "003937 Hs.470126 8942 nine hydrolase) NM 001032998 4 25 Yes 0,03 0,17
MCM3 MCM3 minichromosome NM~ .002388 Hs.179565 4172 maintenance deficient 3
(S. cerevisiae) 4 25 Yes -0,01 0,08
IGFBP5 Insulin-like growth factor NM_ .000599 Hs.369982 3488 binding protein 5 4 24 No NA NA
NUSAP1 Nucleolar and spindle asNM 016359 Hs.615092 51203 sociated protein 1 NM" 018454 4 24 No NA NA
CDC2 Cell division cycle 2, G1 NM" 001786 Hs.334562 983 4 24 Yes -0 ,04 0 ,04 to S and G2 to M NM 033379
PCTK1 PCTAIRE protein kinase NM" "033018 Hs.496068 5127
1 NM 006201 3 24 Yes -0 ,02 0 ,05
ERBB3 V-erb-b2 erythroblastic NM~ 001982 Hs.118681 2065 3 24 No NA NA leukemia viral oncogene NM 001005915
homolog 3 (avian)
UBE2C Ubiquitin-conjugating enNMJ 81802 Hs.93002 11065 zyme E2C NMJ 81801 NM_007019 NMJ 81799 NMJ 81800 NMJ 81803 23 Yes -0,05 0,06
MUC1 Mucin 1 , cell surface asNM_002456 Hs.89603 4582 sociated NM_001018016 NM_001018017 NM_001018021 23 No NA NA
PRC1 Protein regulator of cytoNM_003981 Hs.567385 9055 kinesis 1 NMJ 99413 NMJ 99414 4 23 No NA NA
CO CSE1 L CSE1 chromosome seNM_001316 Hs.90073 1434
C gregation 1 -like (yeast) 4 23 Yes -0,00 co 0,05
CO PTTG 1 Pituitary tumor-transforNM_004219 Hs.350966 9232 ming 1 4 23 No NA NA
BRRN1 Barren homolog 1 (Dro- NM_015341 Hs.308045 23397 sophila) 4 23 No NA m NA
T0P2A Topoisomerase (DNA) Il NM_001067 Hs.156346 7153
CO
I alpha 17OkDa 5 22 Yes -0,01 0,07 m m RARRES3 Retinoic acid receptor re- NM 004585 Hs.17466 5920 sponder (tazarotene induced) 3 5 22 No NA NA c BBC3 BCL2 binding component NM_014417 Hs.467020 27113 m O
O 4 22 No NA NA
IO KIF23 Kinesin family member NMJ 38555 Hs.270845 9493
23 NM_004856 4 22 No NA NA
SLPI Secretory leukocyte pepNM_003064 Hs.517070 6590 tidase inhibitor 4 22 Yes 0,03 0,14
PPP2R5C Protein phosphatase 2, NM_002719 Hs.368264 5527 regulatory subunit B NMJ 78586
(B56), gamma isoform NMJ 78587 NMJ 78588 4 22 Yes -0,02 -0,00
PIR Pirin (iron-binding nucleNM_003662 Hs.495728 8544 3 22 Yes -0,02 0,12 ar protein) NMJD01018109
CCT4 Chaperonin containing NM_006430 Hs.421509 10575
TCP1 , subunit 4 (delta) 22 Yes -0,03 0,02
KIF14 Kinesin family member NM_014875 Hs.3104 9928
t l */lf 3 22 No NA NA
CDK2 Cyclin-dependent kinase NM 001798 Hs.19192 1017 3 22 No NA NA
NM_052827
CTSC Cathepsiπ C N MJ 48170 Hs.128065 1075 NM_001814 5 21 Yes -0,01 0,08 RHOC Ras homolog gene famNM_005167 Hs.502659 389 ily, member C NMJ 75744 4 21 No NA NA NP Nucleoside phosphoryla- NM_000270 Hs.75514 4860 se 4 21 Yes -0,03 0,05
DCK Deoxycytidine kinase NM_000788 Hs.709 1633 3 21 Yes -0,00 0,03 GMPS Guanine monphosphate NM_003875 Hs.591314 8833 synthetase 3 21 Yes -0,01 0,09
FABP5 Fatty acid binding protein NM_001444 Hs.408061 2171 5 (psoriasis-associated) 3 21 Yes -0,00 0,06 IL32 lnterleukin 32 NM_001012631 Hs.943 9235 NM_001012718
CO NM_004221
C co NM_001012632
CO NM_001012633 NM_001012634 NM_O01012635 m NM_001012636 21 Yes -0,03 0,05
CO VEGF Vascular endothelial N M_001025366 Hs.73793 7422 K*
I growth factor NM_003376 m m NM_001025367 NM_001025368 NM_001025369 c NM_001033756 m NM_001025370 21 Yes -0,00 0,08
IO DKFZ- Hypothetical protein NM 018410 Hs.532968 55355 P762E131 DKFZp762E1312 2 3 21 No NA NA FLJ21062 Hypothetical protein FL- NM_001039706 Hs.521012 79846 J21062 3 21 No NA NA
YIF1A Yip1 interacting factor NM 020470 Hs.446445 10897 homolog A (S. cerevisi- ae) 21 No NA NA
EZH2 Enhancer of zeste homo- NM_004456 Hs.444082 2146 log 2 (Drosophila) NMJ 52998 3 21 No NA NA
CCNB2 Cyclin B2 NM_004701 Hs.194698 9133 3 21 No NA NA HMGB3 High-mobility group box 3 NM_005342 Hs.19114 3149 3 21 No NA NA MET Met proto-oncogene NM 000245 Hs.132966 4233 (hepatocyte growth factor receptor) 21 Yes -0,01 -0,00
(ERULE SHETIl ΓUTE SUBST
CST3 Cystatin C (amyloid anNM_000099 Hs.304682 1471 giopathy and cerebral hemorrhage) 20 No NA NA
ID3 Inhibitor of DNA binding NM_002167 Hs.76884 3399
3, dominant negative helix-loop-helix protein 20 No NA NA
CDC20 CDC20 cell division cycle NM_001255 Hs.524947 991 20 homolog (S. cerevi- siae) 4 20 No NA NA
IGFBP2 Insulin-like growth factor NM_000597 Hs.438102 3485 binding protein 2, 36kDa 4 20 Yes 0,00 -0,08
IFI30 Interferon, gamma-indu- NM_006332 Hs.14623 10437 cible protein 30 4 20 Yes -0,01 0,04
TIMP2 TIMP metallopeptidase NMJD03255 Hs.104839 7077 inhibitor 2 4 20 Yes -0,05 0,00
RFC4 Replication factor C (acNM_002916 Hs.591322 5984 tivator 1 ) 4, 37kDa NMJ 81573 3 20 No NA NA GSTM3 Glutathione S-trans- NMJD00849 Hs.2006 2947 ferase M3 (brain) 3 20 Yes 0,02 -0,01
SLC25A5 Solute carrier family 25 NM 001152 Hs.632282 292 (mitochondrial carrier; adenine nucleotide trans-
O locator), member 5 20 Yes -0,04 0,03 EIF4A1 Eukaryotic translation iniNMJD01251 Hs.129673 1973 tiation factor 4A, isoform NM_001040059 1 NM_001416 20 Yes -0,01 0,00
CENPE Centromere protein E, NMJD01813 Hs.75573 1062
IO 312kDa 3 20 Yes -0,21 σ> 0,01
CCNA2 Cyclin A2 NM_001237 Hs.58974 890 3 20 Yes -0,05 0,06 GPSM2 G-protein signalling modNM 013296 Hs.584901 29899 ulator 2 (AGS3-like, C. elegans) 20 No NA NA
SLC7A5 Solute carrier family 7 NM 003486 Hs.513797 8140 (cationic amino acid transporter, y+ system), member 5 3 20 Yes -0,00 0,15
CCNE1 Cyclin E1 NMJD01238 Hs.244723 898 NMJD57182 3 20 Yes -0,04 0,10
TACC2 Transforming, acidic NMJ206862 Hs.501252 10579 coiled-coil containing proNMJ206861 tein 2 NMJD06997 NM 206860 20 No NA NA
(RTHETUl E SEBSTI ITU SU
KRT17 Keratin 17 NM_000226 Hs.2785 3872 NMJ 53490 NM_002275 NM_002274 NM_005557 NM_000526 NM_002276 NM_000422 5 19 Yes -0,00 -0,00
TFF3 Trefoil factor 3 (intestinal) NM_003226 Hs.82961 7033 4 19 Yes -0,02 -0,16 EXT1 Exostoses (multiple) 1 NM_000127 Hs.492618 2131 3 19 Yes 0,00 0,05 YWHAZ Tyrosine 3-monooxygen- NMJ 45690 Hs.492407 7534 ase/tryptophan 5-mo- NM 003406 nooxygenase activation protein, zeta polypeptide 19 Yes -0,01 0,03
LETMD1 LETM1 domain contaiNM_014033 Hs.288771 25875 ning 1 NM_015416 NM_001024668 N M_001024669 N M_001024670 NM_001024671 3 19 No NA NA
PSMB7 Proteasome (prosome, NM 002799 Hs.213470 5695 macropain) subunit, beta type, 7 3 19 Yes -0,01 0,04 ACADSB Acyl-Coenzyme A dehydNM 001609 Hs.81934 36 rogenase, short/branched chain 3 19 Yes 0,02 -0,02 m GATM Glycine amidinotrans- NM 001482 Hs.75335 2628
IO σ> ferase (L-arginine:glycine amidinotransferase) 19 Yes 0,03 -0,00
MCM2 MCM2 minichromosome NM 004526 Hs.477481 4171 maintenance deficient 2, mitotin (S. cerevisiae) 3 19 No NA NA
CNKSR1 Connector enhancer of NM 006314 Hs.16232 10256 kinase suppressor of Ras 1 3 19 Yes 0,01 -0,03
SMC4 SMC4 structural mainNM_005496 Hs.58992 10051 tenance of chromosomes NM_001002800 4-like 1 (yeast) N M_001002799 2 19 No NA NA
RPS4X Ribosomal protein S4, X- N M_001007 Hs.446628 6191 linked 2 19 No NA NA
ATAD2 ATPase family, AAA doNM 014109 Hs.370834 29028 main containing 2 2 19 No NA NA
C0L3A1 Collagen, type III, alpha 1 NM 000090 Hs.443625 1281
(Ehlers-Danlos syndrome type IV, autosomal dominant) 6 18 No NA NA
SCUBE2 Signal peptide, CUB doNM_020974 Hs.523468 57758 main, EGF-like 2 4 18 No NA NA
LRRC17 Leucine rich repeat conNM_005824 Hs.567412 10234 taining 17 NM_001031692 4 18 Yes 0,03 -0,02
HPN Hepsin (transmembrane NMJ82983 Hs.182385 3249 protease, serine 1)" NM_002151 4 18 No NA NA
PLAUR Plasminogen activator, NM_002659 Hs.466871 5329 urokinase receptor N M_001005376
NM_001005377 3 18 Yes 0,00 0,01
AURKB Aurora kinase B NM_004217 Hs.442658 9212 3 18 Yes -0,04 0,05
PCNA Proliferating cell nuclear NM_002592 Hs.147433 5111
CO
C antigen NMJ82649 3 18 Yes -0,03 0,03 CO TFDP1 Transcription factor Dp-1 NM_007111 Hs.79353 7027 CO 3 18 Yes -0,01 0,05
ABCD3 ATP-binding cassette, NM 002858 Hs.76781 5825 sub-family D (ALD), member 3 3 18 Yes 0,01 -0,03 m LMNB2 Lamin B2 NM_032737 Hs.538286 84823 3
CO 18 No NA NA
I DDOST Dolichyl-diphosphooligo- NM_005216 Hs.523145 1650 m m saccharide-protein glyco- syltransferase 3 18 Yes -0,01 0,00
TJ NDP Norrie disease (pseudo- NM_000266 Hs.522615 4693 glioma) 3 18 Yes ι- 0,23 0,06 m MAPRE1 Microtubule-associated NM_012325 Hs.472437 22919
IO protein, RP/EB family, member 1 18 No NA NA
KIFC1 Kinesin family member NM_002263 Hs.436912 3833
C1 3 18 No NA NA
YY1 YY1 transcription factor NM_003403 Hs.388927 7528 3 18 Yes -0,00 -0,00
PEX12 Peroxisomal biogenesis NM_000286 Hs.591190 5193 factor 12 2 18 Yes 0,02 -0,02
FEN1 Flap structure-specific NMJ304111 Hs.409065 2237 endonuclease 1 2 18 Yes -0,03 0,09
TP53 Tumor protein p53 (Li- NM_000546 Hs.408312 7157
Fraumeni syndrome) 2 18 Yes 0,01 0,01
GTSE1 G-2 and S-phase exNM_016426 Hs.386189 51512 pressed 1 2 18 No NA NA
PTDSS1 Phosphatidylserine syntNM 014754 Hs.292579 9791 hase 1 2 18 Yes -0,00 0,12
POLQ Polymerase (DNA direcNM. J 99420 Hs.241517 10721 ted), theta 2 18 No NA NA
TCEAL1 Transcription elongation NM 004780 Hs.95243 9338 factor A (Sll)-like 1 NM 001006640
NM 001006639 5 17 Yes 0,08 -0,02
MSN Moesin NM. _002444 Hs.87752 4478 4 17 Yes -0,00 0,04
CDC6 CDC6 cell division cycle NM. "θO1254 Hs.405958 990
6 homolog (S. cerevisiae) 4 17 Yes -0,01 0,09
CD24 CD24 molecule NM. _013230 Hs.375108 934 4 17 No NA NA
TUBA1 Tubulin, alpha 1 (testis NM^ _006000 Hs.75318 7277 specific) 3 17 Yes -0,01 0,01
Vl L2 Villin 2 (ezrin) NM. .003379 Hs.642735 7430 3 17 No NA NA
SFRS10 Splicing factor, NM. .004593 Hs.533122 6434 arginine/serine-rich 10
CO (transformer 2 homolog,
C Drosophila) 3 17 No NA NA
OD CO SFRS7 Splicing factor, NM. .001031684 Hs.309090 6432 arginine/serine-rich 7,
H 35kDa 3 17 Yes -0,01 -0,01
H SDC1 Syndecan 1 NM 001006946 Hs.224607 6382 m NM" 002997
CO 3 17 Yes -0,06 -0,101 ^ wJ
I CYBRD1 Cytochrome b reductase NM~ .024843 Hs.221941 79901 m rπ 1 m 3 17 No NA NA
H KIF2C Kinesin family member NM. .006845 Hs.69360 11004
2C 3 17 No NA NA
CDCA3 Cell division cycle assoNM. .031299 Hs.524216 83461 m ciated 3 3 17 No NA NA
IO NPY1 R Neuropeptide Y receptor NM. .000909 Hs.519057 4886
O
Y1 3 17 Yes 0,34 0,08
M Y010 Myosin X NM. .012334 Hs.481720 4651 3 17 No NA NA
CTSL Cathepsiπ L NM 001912 Hs.418123 1514
NM" 145918
NM~ 001023564 3 17 Yes -0,03 0,04
PTPLB Protein tyrosine phos- NM" .198402 Hs.477367 201562 phatase-like (proline instead of catalytic argin- ine), member b 2 17 No NA NA
ASPM Asp (abnormal spindle)- NM_ .018136 Hs.121028 259266 like, microcephaly associated (Drosophila) 2 17 No NA NA
FAM64A Family with sequence NM_ 019013 Hs.592116 54478 similarity 64, member A 2 17 No NA NA
C1orf198 Chromosome 1 open NM_032800 Hs.568242 84886 reading frame 198 2 17 No NA NA
CNAP1 Chromosome condensaNM_014865 Hs.5719 9918 tion-related SMC-associ- ated protein 1 4 16 No NA NA
PLOD2 Procollagen-lysine, 2- NM 182943 Hs.477866 5352 oxoglutarate 5-dioxygen- NM_000935 ase 2 4 16 Yes -0,02 0,02
C1S Complement component NM 201442 Hs.458355 716
1 , s subcomponent NM 001734 3 16 No NA NA
DUSP4 Dual specificity phosphaNM 057158 Hs.417962 1846 tase 4 NM 001394 3 16 Yes -0,04 -0,14
LRP8 Low density lipoprotein NM 004631 Hs.576154 7804 receptor-related protein NM 001018054
8, apolipoprotein e reNM 033300
CO
C ceptor NM 017522 16 Yes -0,06 0,06 CD IFITM1 Interferon induced transNM_003641 CO Hs.458414 8519 membrane protein 1
(9-27) 3 16 No NA NA
CKS 1 B CDC28 protein kinase NM_001826 Hs.374378 1163 m regulatory subunit 1 B 3 16 No NA NA
CO
I FAB P7 Fatty acid binding protein NM_001446 Hs.26770 2173 m 7, brain 3 16 Yes m 0,02 0,19
CCT5 Chaperonin containing NM_012073 Hs.1600 22948
TCP1 , subunit 5 (epsilon) 2 16 No NA NA
ITPR3 Inositol 1 ,4,5-triphospha- NM_002224 Hs.65758 3710 ι- m te receptor, type 3 2 16 Yes -0,02 0,03
N) SESN1 Sestrin 1 NM_014454 Hs.591336 27244 2 16 No NA NA
CP Ceruloplasmin (ferroxida- NM_000096 Hs.558314 1356 se) 2 16 No NA NA
TFRC Transferrin receptor NM_003234 Hs.529618 7037
(p90, CD71 ) 2 16 Yes -0 ,04 0,02
KIT V-kit Hardy-Zuckerman 4 NM_000222 Hs.479754 3815 feline sarcoma viral oncogene homolog 16 Yes 0 ,04 0,02
SLC35A1 Solute carrier family 35 NM_006416 Hs.423163 10559
(CMP-sialic acid transporter), member A1 2 16 No MA NA
NDRG1 N-myc downstream reguNM_006096 Hs.372914 10397 2 16 Yes 0 ,02 0,15 lated gene 1
SAT Spermidine/spermine NM 002970 Hs.28491 6303 16 Yes 0 .00 0,03
N 1 -acety transferase
TLE3 Transducin-like enhancer NM_005078 Hs.287362 7090 of split 3 (E(sp1 ) homo- log, Drosophila) 3 15 No NA NA
PDAP1 PDGFA associated proN MJD 14891 Hs.632296 11333 3 15 No NA NA tein 1
F3 Coagulation factor III NM_001993 Hs.62192 2152
(thromboplastin, tissue factor) 3 15 No NA NA
SNRPA1 Small nuclear ribonucleo- NM_003090 Hs.528763 6627 3 15 No NA NA protein polypeptide A'
RACGAP 1 Rac GTPase activating N M JD 13277 Hs.505469 29127 3 15 No NA NA protein 1
CDH1 Cadherin 1 , type 1 , E- NM_004360 Hs.461086 999 3 15 Yes -0,05 -0,00 cadherin (epithelial)
PRLR Prolactin receptor NM 000949 Hs.368587 5618
CO 3 15 Yes 0,01 -0,02
C ESPL1 Extra spindle poles like 1 NM_012291 Hs.153479 9700 3 15 No NA NA CO CO (S. cerevisiae)
RAB6A RAB6A, member RAS N MJD 16577 Hs.12152 5870 2 15 Yes 0,01 0,08 oncogene family
TXNRD1 Thioredoxin reductase 1 NM 003330 Hs.567352 7296 m NM 182742
CO
I NM 182729 U( Ul m NM 182743 m NM 001008394 2 15 Yes -0,00 0,07
TJ CD44 CD44 molecule (Indian NM 000610 Hs.502328 960 blood group) NM 001001389 ι- m NM 001001390
IO NM 001001391
NM 001001392 2 15 No NA NA
S0D2 Superoxide dismutase 2, NM 000636 Hs.487046 6648 mitochondrial NM 001024465
NM 001024466 2 15 No NA NA
OMD Osteomodulin NMJD05014 Hs.94070 4958 2 15 Yes 0,10 -0,02
SPRR2C Small proline-rich protein M21539 Hs.592363 6702 2 15 No NA NA
2C
INSR Insulin receptor NMJD00208 Hs.591381 3643 2 15 Yes 0,01 0,00
C4orf18 Chromosome 4 open NM 016613 Hs.567498 51313 2 15 No NA NA reading frame 18 NM 001031700
ADAMTS 1 ADAM metallopeptidase NMJD06988 Hs.534115 9510 with thrombospondin type 1 motif, 1 2 15 No NA NA
UBR2 Ubiquitin protein ligase N MJD 15255 Hs.529925 23304 2 15 Yes 0,00 -0,02
E3 component n-reco- gnin 2
PTPRT Protein tyrosine phosNMJ33170 Hs.526879 11122 phatase, receptor type, T NM_007050 2 15 Yes 0,01 -0,06 FBX05 F-box protein 5 NM_012177 Hs.520506 26271 2 15 No NA NA SLC3A2 Solute carrier family 3 NM_001012661 Hs.502769 6520
(activators of dibasic and NM_001012662 neutral amino acid transNM_002394 port), member 2 NM_001012663
NM_O01012664
NM_001013251 2 15 Yes -0,00 0,04
FADS2 Fatty acid desaturase 2 NM_004265 Hs.502745 9415 2 15 No NA NA IER2 Immediate early responNM_004907 Hs.501629 9592 se 2 15 Yes 0,07 0,02
PPP1 R12 Protein phosphatase 1 , NM 002480 Hs.49582 4659
CO
C A regulatory (inhibitor) sub- CO unit 12A CO 15 Yes 0,00 -0,00
FKBP1A FK506 binding protein NM_000801 Hs.471933 2280 1A, 12kDa NM_080489
NM_015685 m NM_054014 2 15 Yes -0,02 0,01
CO
I CTS L2 Cathepsin L2 NM_001333 Hs.434529 1515 2 15 No NA NA m m CCNC Cyclin C NM_005190 Hs.430646 892
NM_001013399 2 15 Yes -0,01 0,04
TJ CCT7 Chaperonin containing NM_006429 Hs.368149 10574
TCP1 , subunit 7 (eta) N M_001009570 2 15 Yes -0,02 ι- 0,04 m SCAP1 Src family associated NM_003726 Hs.316931 8631
IO phosphoprotein 1 2 15 Yes 0,01 -0,01
C16orf35 Chromosome 16 open N M_001039476 Hs.19699 8131 reading frame 35 2 15 Yes 0,02 0,02
SLC25A1 Solute carrier family 25 NM 005984 Hs.111024 6576
(mitochondrial carrier; ci trate transporter), member 1 15 Yes -0,00 0,02
GABBR1 Gamma-aminobutyric NMJ)01470 Hs.167017 2550 acid (GABA) B receptor, NM_021905 1 NM_021904 NM_021903 3 14 Yes -0,04 -0,04
ZNF43 Zinc finger protein 43 NM_003423 Hs.534365 7594 3 14 No NA NA
PBXIP1 Pre-B-cell leukemia tran NM 020524 Hs.505806 57326 scription factor interacting protein 1 3 14 No NA NA
LDHA Lactate dehydrogenase NM 144972 Hs.2795 3939
A NM 005566
NM 002301
NM 017448 3 14 Yes -0 ,03 0, 01
GRB7 Growth factor receptor- NM 005310 Hs.86859 2886 3 14 Yes 0 ,03 0, 24 bound protein 7 NM 001030002
CFB Complement factor B NM_001710 Hs.69771 629 3 14 Yes -0 ,02 -o, 09
F10 Coagulation factor X NM_000504 Hs.361463 2159 3 14 No NA NA
BRCA1 Breast cancer 1 , early NM 007295 Hs.194143 672 onset NM 007294
NM 007296
NM 007302
NM 007297
NM 007299
NM 007300
CO
C NM 007304
CD
CO NM 007303
H NM 007305
H NM 007298 3 14 Yes
C -0,00 0 ,03 H SHMT2 Serine hydroxy methyl- NMJ305412 Hs.75069 6472 m transferase 2 (mitochon¬
CO drial) 2 14 Yes -0,03 0 ,04 w m HIPK2 Homeodomain interacNM_022740 Hs.397465 28996 2 14 No m NA NA
H ting protein kinase 2 rπ DHFR Dihydrofolate reductase NM_000791 Hs.83765 1719 2 14 No NA NA
C LMNB1 Lamin B1 NM_005573 Hs.89497 4001 2 14 No NA NA m KIF11 Kinesin family member NM_004523 Hs.8878 3832 2 14 No NA NA
KJ 11
SNRPB Small nuclear ribonucleo- NM 003091 Hs.83753 6628 protein polypeptides B NMJ 98216 and B1 2 14 Yes -0,00 0, ,06
RAD21 RAD21 homolog (S. NM_006265 Hs.81848 5885 2 14 Yes -0,01 0, ,03 pombe)
FRY Furry homolog (Droso- NM_023037 Hs.591225 10129 2 14 Yes 0,01 -o, ,01 phila)
SYNCRIP Synaptotagmin binding, NM_006372 Hs.571177 10492 cytoplasmic RNA interacting protein 2 14 No NA NA
VASH 1 Vasohibin 1 NM_014909 Hs.525479 22846 2 14 Yes 0,01 -o, 02
EX01 Exonuclease 1 NM 130398 Hs.498248 9156
NM 006027
NM 003686 2 14 No NA NA
PTMA Prothymosin, alpha (geNM_002823 Hs.459927 5757 ne sequence 28) 2 14 Yes -0,03 0,02
EIF2C2 Eukaryotic translation iniNM_012154 Hs.449415 27161 tiation factor 2C, 2 2 14 No NA NA
EIF4EBP1 Eukaryotic translation iniNM_004095 Hs.411641 1978 tiation factor 4E binding protein 1 2 14 No NA NA
RLN1 Relaxin 1 NM_006911 Hs.368996 6013 2 14 No NA NA IDM Isopentenyl-diphosphate NM_004508 Hs.283652 3422 delta isomerase 1 2 14 Yes -0 ,01 0, 03
ZMYM4 Zinc finger, MYM-type 4 NM_005095 Hs.269211 9202 2 14 Yes 0 ,00 FAM89B Family with sequence NMJ 52832 Hs.25723 23625 -o, 01 similarity 89, member B NMJD06396 14 No NA NA
TPX2 TPX2, microtubule-asso- NM_012112 Hs.244580 22974
CO ciated, homolog (Xeno-
C pus laevis) co 2 14 No NA NA
CO CEP55 Centrosomal protein 55k ;- NM_018131 Hs.14559 55165 Da 2 14 No NA NA FUT8 Fucosyltransferase 8 (alN MJ 78155 Hs.118722 2530 m pha (1 ,6) fucosyltransNMJ 78156
CO ferase) NMJ 78154
I NM_004480 m OO m NMJ 78157 2 14 Yes 0,01 -0,08
FCGRT Fc fragment of IgG, reNM 004107 Hs.111903 2217 ceptor, transporter, alpha 2 14 Yes 0,03 0,00 c SPARC Secreted protein, acidic, NM_003118 Hs.111779 6678 m cysteine-rich (os¬
IO teonectin) 5 13 Yes -0,01 -0,05
C0L6A1 Collagen, type Vl, alpha NM_001848 Hs.474053 1291 1 4 13 Yes 0,02 0,00 SLC39A6 Solute carrier family 39 NM_012319 Hs.79136 25800 (zinc transporter), member 6 13 Yes 0,04 -0,01
CXC L9 Chemokine (C-X-C motif) NM_002416 Hs.77367 4283 ligand 9 3 13 No NA NA
ACTB Actin, beta NM_001101 Hs.520640 60 3 13 No NA NA CIRBP Cold inducible RNA binNM_001280 Hs.501309 1153 ding protein 3 13 No NA NA
DNAJC12 DnaJ (Hsp40) homolog, NM_021800 Hs.260720 56521 subfamily C, member 12 NM_201262 3 13 No NA NA H3F3B H3 histone, family 3B NM 005324 Hs.180877 3021 (H3.3B) 3 13 No NA NA
DIAPH3 Diaphanous homolog 3 NM_030932 Hs.283127 81624
(Drosophila) 2 13 No NA NA
ECT2 Epithelial cell transformNM_018098 Hs.518299 1894 ing sequence 2 oncogene 13 No NA NA
ALDH4A1 Aldehyde dehydrogenase NM 003748 Hs.77448 8659
4 family, member A1 NM 170726 2 13 No NA NA
DTL Denticleless homolog NM_016448 Hs.632496 51514
(Drosophila) 2 13 No NA NA
PALM2- PALM2-AKAP2 protein NM 053016 Hs.591908 445815
AKAP2 NM 001037293
NM 007203
NM 001004065
NM 147150 13 No NA NA
GNAZ Guanine nucleotide bin¬
CO NM_002073 Hs.584760 2781
C ding protein (G protein),
CD tt\ alpha z polypeptide 13 Yes -0,02 0,00
H MS4A7 Membrane-spanning 4- NM 021201 Hs.530735 58475
H domains, subfamily A, NM 206939
C H member 7 NM 206938 m NM 206940
CO NM 032597 m NM 139249 2 13 No m NA NA
H C20orf46 Chromosome 20 open NM_018354 Hs.516834 55321
^^
Tl reading frame 46 2 13 No NA NA
C AP2B1 Adaptor-related protein NM 001030006 Hs.514819 163 ι- m complex 2, beta 1 subu- N M_001282
KJ nit 13 No NA NA
ORC6L Origin recognition comNM_014321 Hs.49760 23594 plex, subunit δ like
(yeast) 2 13 No NA NA
MTDH Metadherin NMJ78812 Hs.377155 92140 13 No NA NA
CDC42B- CDC42 binding protein NM 003607 Hs.35433 8476
PA kinase alpha (DMPK-like) NM 014826 2 13 No NA NA
GPR126 G protein-coupled recepNM 020455 Hs.318894 57211 tor 126 NM 198569
NM 001032394
NM 001032395 2MM C CMM C CM C 13 No NA NA
0XCT1 3-oxoacid CoA transferaNM_000436 Hs.278277 5019 se 1 2 13 Yes -0,00 0,01
QSCN6L1 Quiescin Q6-like 1 NMJ81701 Hs.144073 169714 2 13 No NA NA
LOC2860 Hypothetical protein AK095104 Hs.100691 286052 2 13 No NA NA
52 LOC286052
C17orf27 Chromosome 17 open NM. _020914 Hs.195642 57674 reading frame 27 2 13 No NA NA
SCD Stearoyl-CoA desaturase NM. _005063 Hs.558396 6319
(delta-9-desaturase) 2 13 Yes -0,01 0,04
TU B A3 Tubulin, alpha 3 NM 006082 Hs.524390 7846
NM 006009 2 13 No NA NA
LAPTM4B Lysosomal associated NM. .018407 Hs.492314 55353 protein transmembrane 4 beta 13 No NA NA
ARPC4 Actin related protein 2/3 NM 015644 Hs.323342 10093 complex, subunit 4, 20k- NM 001025930
Da NM 005718
NM' "001024959
UJ NM 001024960 2 13 Yes -0,01 0,01
C ERRFI1 ERBB receptor feedback NM. j) 18948 Hs.11169 54206
CO
U) inhibitor 1 2 13 No NA NA
SLC16A1 Solute carrier family 16 NM. .003051 Hs.75231 6566
H (monocarboxylic acid
L-
H transporters), member 1 2 13 No NA NA m CDH3 Cadherin 3, type 1 , P- NM. .001793 Hs.554598 1001
CO
I cadherin (placental) 2 13 Yes 4-
-0,01 0,07 O m RP13-297 DNA segment on chroNM 005088 Hs.522572 8227 m
— I E16.1 mosome X and Y NM. .004043
(unique) 155 expressed sequence, isoform 1 13 Yes -0,00 0,00 m TK1 Thymidine kinase 1 , soluNM. .003258 Hs.515122 7083
IO ble 2 13 No NA NA
EDN1 Endothelin 1 NM. .001955 Hs.511899 1906 2 13 No NA NA
TBX19 T-box 19 NM 005149 Hs.507978 9095
NM" "199344 2 13 No NA NA
YBX 1 Y box binding protein 1 NM. [004559 Hs.473583 4904 2 13 Yes -0,03 0,06
ABLIM1 Actin binding LIM protein NM~ 001003408 Hs.438236 3983
1 NM 001003407
NM "002313
NM" "006720 13 No NA NA
F2 Coagulation factor Il NM 014741 Hs.410092 2147
(thrombin) NM 024741
NM" "000506 2 13 Yes -0,00 -0,00
C0L2A1 Collagen, type II, alpha 1 NM 001844 Hs.408182 1280 2 13 No NA NA forimarv osteoarthritis. NM" '033150 spondyloepiphyseal dys-
plasia, congenital) MT1X Metallothionein 1X NM_005952 Hs.374950 4501 13 No NA NA FGFR1 Fibroblast growth factor NM_023110 Hs.264887 2260 receptor 1 (fms-related NM_015850 tyrosine kinase 2, Pfeiffer NM_023111 syndrome) NM_023105 NM_023106 NM_023107 NM_023108 2 13 Yes 0,03 0,03
C14orf45 Chromosome 14 open NM_025057 Hs.260555 80127 reading frame 45 2 13 No NA NA
AP2A2 Adaptor-related protein NM 012305 Hs.19121 161 complex 2, alpha 2 subu- nit 13 No NA NA
CO FUT3 Fucosyltransferase 3 NM 000149 Hs.169238 2525
C (galactoside 3(4)-L- co
CO fucosyltransferase, Lewis blood group) 2 13 No NA NA
CYP2J2 Cytochrome P450, family NM_000775 Hs.152096 1573
2, subfamily J, polym peptide 2 2 13 Yes
CO -0,03 0,00
I CCL18 Chemokine (C-C motif) NM_002988 Hs.143961 6362 m m ligand 18 (pulmonary and activation-reg u lated) 2 13 No NA NA
SHCBP1 SHC SH2-domain bindNMJ324745 Hs.123253 79801 c ing protein 1 2 13 No NA NA m TACC3 Transforming, acidic NM_006342 Hs.104019 10460
IO coiled-coil containing protein 3 2 13 No NA NA
C0L1A2 Collagen, type I, alpha 2 NM_000089 Hs.489142 1278 4 12 No NA NA
GREM1 Gremlin 1 , cysteine knot NM_013372 Hs.40098 26585 4 12 No NA NA superfamily, homolog
(Xenopus laevis)
EN01 Enolase 1 , (alpha) NM_001428 Hs.517145 2023 3 12 No NA NA
SRM Spermidine synthase NM_003132 Hs.76244 6723 3 12 No NA NA
C14orf132 Chromosome 14 open NM_020215 Hs.6434 56967 reading frame 132 12 No NA NA
MT2A Metallothionein 2A NMJ 75617 Hs.534330 4502 NMJ305953 3 12 No NA NA
TSPAN1 Tetraspanin 1 NM_005727 Hs.38972 10103 3 12 No NA NA
MSX2 Msh homeobox homolog NM 002449 Hs.89404 4488
2 (Drosophila) 12 Yes 0,00 -0,01
8
2 9
3 3
0 2 1 6
4 1
Figure imgf000043_0001
NM_001034957
NM_006434
CHI3L1 Chitinase 3-like 1 (cartilaNM_001276 Hs.382202 1116 ge glycoprotein-39) 12 Yes -0,02 0,08 CAD Carbamoyl-phosphate NM 004341 Hs.377010 790 synthetase 2, aspartate transcarbamylase, and dihydroorotase 12 Yes -0,03 0,03
ZBTB4 Zinc finger and BTB doNM_020899 Hs.35096 57659 main containing 4 2 12 No NA NA
FOXC 1 Forkhead box C1 NMJ)01453 Hs.348883 2296 2 12 No NA NA MMP7 Matrix metallopeptidase NM_002423 Hs.2256 4316 7 (matrilysin, uterine) 12 Yes -0,00 0,11
STMN 1 Stathmin 1/oncoprotein NM_203401 Hs.209983 3925 18 NM_005563
CO
C NM_203399 2 12 No NA NA CD BLM Bloom syndrome NM_000057 CO Hs.169348 641 2 12 No NA NA TGFB1 Transforming growth NM 000660 Hs.155218 7040 factor, beta 1 (Camurati- Engelmann disease) 2 12 Yes 0,00 m 0,02
NFIB Nuclear factor I/B NM_005596 Hs.370359 4781
CO 4 11 Yes -0,02 0,02
I NAT1 N-acetyltransferase 1 NM_000662 Hs.591847 9 m m (arylamine N-acetyltransferase) 4 11 Yes -0,05 -0,14
TRA@ T cell receptor alpha loX73617 Hs.74647 6955 ι- cus 3 11 No NA NA m CELSR2 Cadherin, EGF LAG sevNM 001408 Hs.57652 1952
N) en-pass G-type receptor 2 (flamingo homolog, Drosophila) 11 Yes 0,08 -0,00
HMGA1 High mobility group AT- NM_145904 Hs.518805 3159 hook 1 NMJ 45899 NMJD02131 NMJ45901 NMJ45905 NMJ45902 NM_145903 3 11 No NA NA
FBP1 Fructose-1 ,6-bisphos- NMJ3OO5O7 Hs.494496 2203 phatase 1 3 11 No NA NA WISP1 WNT1 inducible signaling NM_003882 Hs.492974 8840 pathway protein 1 NM_080838 2 11 No NA NA PGAM 1 Phosphoglycerate muta- NM 002629 Hs.447492 5223 2 11 No NA NA
se 1 (brain)
N0LA2 Nucleolar protein family NM 017838 Hs.27222 55651 A, member 2 (H/ACA NM_001034833 small nucleolar RNPs) 11 Yes 0,001 0,02
HDGFRP3 Hepatoma-derived NM_016073 Hs.513954 50810 growth factor, related protein 3 2 11 No NA NA
BDH2 3-hydroxybutyrate dehyNM_020139 Hs.124696 56898 drogenase, type 2 2 11 No NA NA
HTATIP2 HIV-1 Tat interactive proNM_006410 Hs.90753 10553 tein 2, 3OkDa 2 11 Yes -0 ,01 0,06
MMP1 Matrix metallopeptidase NM_002421 Hs.83169 4312 1 (interstitial collagenase) 2 11 Yes -0 ,03 0,21
CCT6A Chaperonin containing NM 001762 Hs.82916 908
CO TCP1 , subunit 6A (zeta NM_001009186
C co 1 ) 2 11 No NA NA
CO HSPB2 Heat shock 27kDa proteNMJD01541 Hs.78846 3316 in 2 2 11 No NA NA
CKAP4 Cytoskeleton-associated NM_006825 Hs.74368 10970 m protein 4 2 11 No NA NA
CO SQLE Squalene epoxidase NM_003129 Hs.71465 6713 2 11 Yes -0,01 0,06
I APP Amyloid beta (A4) preNM 000484 m Hs.642685 351 m cursor protein (peptidase NM 201413 nexin-ll, Alzheimer diseaNM_201414 se) 2 11 Yes -0,00 -0,01 c ADCY3 Adenylate cyclase 3 NM_024322 Hs.642633 109 2 11 No NA NA m HNRPAB Heterogeneous nuclear NM 031266 Hs.591731 3182
IO ribonucleoprotein A/B NM 004499 11 No NA NA
PSMD14 Proteasome (prosome, NM_005805 Hs.567410 10213 macropain) 26S subunit, non-ATPase, 14 2 11 Yes -0,02 0,04
RNA- Ribonuclease H2, large NM_006397 Hs.532851 10535
SEH2A subunit 2 11 No NA NA
TNFAIP2 Tumor necrosis factor, NM_006291 Hs.525607 7127 alpha-induced protein 2 2 11 No NA NA
CDCA8 Cell division cycle assoNM_018101 Hs.524571 55143 ciated 8 2 11 No NA NA
APOD Apolipoprotein D NMJD01647 Hs.522555 347 2 11 Yes 0 ,00 -0 ,00
SATB1 Special AT-rich seNM 002971 Hs.517717 6304 2 11 Yes 0 ,03 0 ,03 quence binding protein 1 (binds to nuclear matrix/scaffold-associat-
ing DNA's)
ACSS2 Acyl-CoA synthetase NM 018677 Hs.517034 55902 short-chain family memNM' J 39274 ber 2 11 No NA NA
APBA2BP Amyloid beta (A4) preNM 005225 Hs.516986 63941 cursor protein-binding, NM' "031232 family A, member 2 bindNM" "031231 ing protein NM 007238
NM 183397 11 No NA NA
TAS2R5 Taste receptor, type 2, NM "018980 Hs.490394 54429 member 5 NM 016943
NM 016944
NM" 003143 11 Yes -0,01 0,02
EGFR Epidermal growth factor NM" "005228 Hs.488293 1956 receptor (erythroblastic NM" "201284
C leukemia viral (v-erb-b) NM 201282 ro
CO oncogene homolog, aviNM] "201283
H an) 2 11 Yes -0,01 0,01
H DEPDC1 B DEP domain containing NM. .018369 Hs.482233 55789
Λ I R D 2 11 No NA NA m BCL6 B-cell CLL/lymphoma 6 NM 138931 Hs.478588 604
CO
I (zinc finger protein 51) NM 001706 2 11 Yes 0,01 0,00 4- m SEC14L1 SEC14-like 1 (S. cerevi- NM 001039573 Hs.464184 6397 m
H siae) NM 003003 2 11 No NA NA
DYNLT1 Dynein, light chain, Tc- NM. .006519 Hs.445999 6993 tex-type 1 2 11 No NA NA
|— m PAXI P 1 PAX interacting (with NM. .007349 Hs.443881 22976
N) transcription-activation σ> domain) protein 1 2 11 Yes 0,02 0,00
TSPAN4 Tetraspanin 4 NM 020376 Hs.437594 7106 2 11 Yes 0,01 -0,00
NM~ "173584
NM 004357
NM 139029
NM~ "139030
NM~ "001039490
NM" "001025237
NM" "001025236
NM" "001025234
NM 001025235
NM~ '003271
NM" "001025239
NM" '001025238
NM. _001004
EEF1A2 Eukaryotic translation NM. _001958 Hs.433839 1917 elongation factor 1 alpha
2 11 No NA NA
S100A8 S100 calcium binding NM. _002964 Hs.416073 6279 protein A8 (calgranulin A) 2 11 No NA NA
SFRP4 Secreted frizzled-related NM. .003014 Hs.416007 6424 protein 4 2 11 Yes 0,05 -0,02
STRA13 Stimulated by retinoic NM. J 44998 Hs.37616 201254 acid 13 homolog (mouse) 2 11 No NA NA
CDKN1A Cyclin-dependent kinase NM 078467 Hs.370771 1026 inhibitor 1A (p21 , Cip1) NM 000389 2 11 No NA NA
HDAC2 Histone deacetylase 2 NM; _001527 Hs.3352 3066 2 11 Yes -0,02 0,07
MRPS6 Mitochondrial ribosomal NM 006933 Hs.302742 64968
CO protein S6 NM 032476 2 11 No NA NA
C SMS Spermine synthase NM. _004595 Hs.288487 6611 2 11 No NA
OD NA CO ARF1 ADP-ribosylation factor 1 NM 001024227 Hs.286221 375
H NM" 001024228
H NM 001024226
H m NM" "001658 2 11 Yes -0,01 0,00
CES2 Carboxylesterase 2 (inNM 003869 Hs.282975 8824
I testine, liver) NM 198061 2 11 Yes 0,01 -0,00 £ m m TUBG1 Tubulin, gamma 1 NM" 016437 Hs.279669 7283
H NM" 001070 2 11 Yes 0,01 0,02
73 JUNB Jun B proto-oncogene NM. .002229 Hs.25292 3726 2 11 Yes 0.06 0,00
C PSMA7 Proteasome (prosome, NM] .002792 Hs.233952 5688 m macropain) subunit, al¬
N) σ> pha type, 7 11 No NA NA
PTGS2 Prostaglandin-endoper- NM. .000963 Hs.196384 5743 oxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase) 11 Yes -0,05 -0,02
TMEM45A Transmembrane protein NM. .018004 Hs.126598 55076
45A 2 11 No NA NA
CTSF Cathepsin F NM_ .003793 Hs.11590 8722 2 11 No NA NA
SC4MOL Sterol-C4-methyl oxi- NM 006745 Hs.105269 6307 dase-like NM 001017369 11 Yes 0,05 0,10
References :
1. van't Veer, L. J. et al . Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-536 (2002) .
- 32 -
2. Sorlie,T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci . U. S. A 98, 10869-10874 (2001).
3. Wang, Y. et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671-679 (2005).
4. Paik,S. et al . A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817-2826 (2004).
5. Sotiriou,C. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer Inst. 98, 262-272 (2006) .
6. Shen,R., Ghosh, D. & Chinnaiyan,A.M. Prognostic meta- signature of breast cancer developed by two-stage mixture modeling of microarray data. BMC. Genomics 5, 94 (2004) .
7. Abba, M. C. et al. Gene expression signature of estrogen receptor alpha status in breast cancer. BMC. Genomics 6, 37 (2005) .
8. Amatschek,S. et al. Tissue-wide expression profiling using cDNA subtraction and microarrays to identify tumor- specific genes. Cancer Res. 64, 844-856 (2004) .
9. Beer, D. G. et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8, 816-824
(2002) .
10. Berchuck,A. et al. Patterns of gene expression that characterize long-term survival in advanced stage serous ovarian cancers. Clin. Cancer Res. 11, 3686-3696 (2005).
11. Bertucci,F. et al. Gene expression profiles of poor- prognosis primary breast cancer correlate with survival. Hum. MoI. Genet. 11, 863-872 (2002).
12. Bieche,I., Tozlu,S., Girault,I. & Lidereau,R. Identification of a three-gene expression signature of poor-prognosis breast carcinoma. MoI. Cancer 3, 37 (2004) .
13. Chang, H. Y. et al . Robustness, scalability, and integra- tion of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl. Acad. Sci. U. S. A 102, 3738-3743 (2005) .
14. Glinsky, G. V. , Berezovska,0, & Glinskii,A. B. Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. J. Clin. Invest 115, 1503-1521 (2005).
15. Glinsky, G.V. , Higashiyama, T . & Glinskii,A.B. Classification of human breast cancer using gene expression profiling as a component of the survival predictor algorithm. Clin. Cancer Res. 10, 2272-2283 (2004).
16. Glinsky, G.V. , Glinskii, A. B. , Stephenson,A. J. , Hoffman, R.M. & Gerald, W. L. Gene expression profiling predicts clinical outcome of prostate cancer. J. Clin. Invest 113, 913-923 (2004) .
17. Huang, E. et al. Gene expression predictors of breast cancer outcomes. Lancet 361, 1590-1596 (2003).
- 34 -
18. Iwao,K. et al. Molecular classification of primary breast tumors possessing distinct prognostic properties. Hum. MoI. Genet. 11, 199-206 (2002).
19. Jacquemier, J. et al. Protein expression profiling identifies subclasses of breast cancer and predicts prognosis. Cancer Res. 65, 767-779 (2005).
20. Jones, C. et al. Expression profiling of purified normal human luminal and myoepithelial breast cells: identification of novel prognostic markers for breast cancer. Cancer Res. 64, 3037-3045 (2004).
21. Korkola, J.E. et al. Differentiation of lobular versus ductal breast carcinomas by expression microarray analysis. Cancer Res. 63, 7167-7175 (2003).
22. Ma,X. J. et al. Gene expression profiles of human breast cancer progression. Proc. Natl. Acad. Sci. U. S. A 100, 5974- 5979 (2003) .
23. Makretsov,N.A. et al. Hierarchical clustering analysis of tissue microarray immunostaining data identifies prognosti- cally significant groups of breast carcinoma. Clin. Cancer Res. 10, 6143-6151 (2004) .
24. Nutt,C.L. et al. Gene expression-based classification of malignant gliomas correlates better with survival than histo- logical classification. Cancer Res. 63, 1602-1607 (2003) .
25. Onda,M. et al. Gene expression patterns as marker for 5-year postoperative prognosis of primary breast cancers. J. Cancer Res. Clin. Oncol. 130, 537-545 (2004).
26. Pomeroy, S .L. et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436-442 (2002) .
27. Pusztai,L. et al . Gene expression profiles obtained from fine-needle aspirations of breast cancer reliably identify routine prognostic markers and reveal large-scale molecular differences between estrogen-negative and estrogen-positive tumors. Clin. Cancer Res. 9, 2406-2415 (2003).
28. Ramaswamy, S. , Ross, K.N. , Lander, E. S. & Golub,T.R. A molecular signature of metastasis in primary solid tumors. Nat. Genet. 33, 49-54 (2003).
29. Rhodes, D. R. et al. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl. Acad.
- 35 - Sci. U. S. A 101, 9309-9314 (2004).
30. Rosenwald,A. et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937-1947 (2002).
31. Sotiriou,C. et al . Breast cancer classification and prognosis based on gene expression profiles from a population- based study. Proc. Natl. Acad. Sci. U. S. A 100, 10393-10398
(2003) .
32. West, M. et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. ϋ. S. A 98, 11462-11467 (2001) .
33. Woelfle,U. et al. Molecular signature associated with bone marrow micrometastasis in human breast cancer. Cancer Res. 63, 5679-5684 (2003) .
34. Yu, K. et al. A molecular signature of the' Nottingham prognostic index in breast cancer. Cancer Res. 64, 2962-2968
(2004) .
35. Zhu,G. et al. Combination of microdissection and microarray analysis to identify gene expression changes between differentially located tumour cells in breast cancer. Oncogene 22, 3742-3748 (2003) . 36. Ahr,A. et al. Identification of high risk breast-cancer patients by gene expression profiling. Lancet 359, 131-132
(2002) .
37. West, R. B. et al . Determination of stromal signatures in breast carcinoma. PLoS. Biol. 3, el87 (2005) .
38. Miller, D.'V. et al . Utilizing Nottingham Prognostic Index in microarray gene expression profiling of breast carcinomas. Mod. Pathol. 17, 756-764 (2004).
39 i Van Laere,S. et al . Distinct molecular signature of inflammatory breast cancer by cDNA microarray analysis. Breast Cancer Res. Treat. 93, 237-246 (2005).
40. Hu, Z. et al . The molecular portraits of breast tumors are conserved across microarray platforms. BMC. Genomics 7, 96 (2006) .
41. Dai, H. et al . A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res. 65, 4059-4066 (2005).
42. Wang, W. et al. Single cell behavior in metastatic primary mammary tumors correlated with gene expression patterns re-
- 36 - vealed by molecular profiling. Cancer Res. 62, 6278-6288 (2002).

Claims

Claims :
1. A set of moieties specific for at least 200 tumor markers selected from the tumor markers MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2, MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2 , CKS2, MCM4, CDKN3, Cl6orf61, DLG7, H2AFZ, PFKP, KPNA2, GATA3, CENPF, KRT18, KRT5, CCNE2, MELK, CX3CR1, TRIP13, MCM6, CCNDl, PDIA4 , CENPA, UBE2S, NCFl, CDC25B, PGR, TGFB3, PSMD2, HMMR, XBPl, TROAP, KNTC2, PRAME, BTG2 , KRT8, FOXMl ,KYNU, NMEl, MCM3 , NUSAPl, PCTKl, IGFBP5, CDC2, ERBB3, CSElL, PTTGl, PRCl, BRRNl, UBE2C, MUCl, KIF23, CDK2, PPP2R5C, RARRES3, PIR, CCT4, KIF14, SLPI ,TOP2A, BBC3, RHOC, EZH2, HMGB3, GMPS, YIFlA, NP, DKFZp762E1312, MET, FABP5, DCK, CTSC, CCNB2, FLJ21062, VEGF, IL32, CDC20, TACC2, IGFBP2, IFI30, ID3, GPSM2, TIMP2, CCNEl, EIF4A1, RFC4, CST3, CCNA2 , CENPE, SLC25A5, GSTM3, SLC7A5, LETMDl, RPS4X, TFF3, ATAD2 , ACADSB, KRT17, YWHAZ, PSMB7, CNKSRl, EXTl, SMC4, MCM2, GATM, DDOST, PEX12, YYl, TFDPl, LMNB2, HPN, POLQ, PCNA, GTSEl, MAPREl, PLAUR, PTDSSl, LRRC17, FENl, NDP, ABCD3, SCUBE2, TP53, AURKB, KIFCl, COL3A1, NPYlR, PTPLB, SFRSlO, SDCl, CDCβ, CD24, TCEALl Clorfl98, FAM64A, CDCA3, MSN, MYOlO, KIF2C, ASPM, TUBAl, VIL2, CYBRDl, CTSL, SFRS7, SESNl, LRP8, CP, KIT, CNAPl, TFRC, PL0D2, CKSlB, DUSP4, NDRGl, SLC35A1, CIS, CCT5, IFITMl, ITPR3, SAT, FABP7 , OMD, ADAMTSl, PPP1R12A, PRLR, FKBPlA, SNRPAl, CCNC, SCAPl, SPRR2C, FADS2, CTSL2, TLE3, PDAPl, IER2, ESPLl, CDHl, UBR2, RAB6A, CD44, FBXO5, F3, PTPRT, RACGAPl, CCT7, SLC25A1, C4orfl8, TXNRDl, SLC3A2, Clβorf35, INSR, S0D2, GABBRl, SNRPB, EIF2C2, IDIl, CEP55, RLNl, PTMA, KIFIl, SHMT2, FAM89B, TPX2, CFB, EXOl, EIF4EBP1, DHFR, HIPK2, SYNCRIP, BRCAl, ZNF43, LMNBl, PBXIPl, FlO, FCGRT, FUT8, RAD21, FRY, LDHA, VASHl, GRB7, ZMYM4, ACTB, CCL18, MTDH, MS4A7, C17orf27, LOC286052, TACC3, MTlX, TKl, CDH3, CDC42BPA, FUT3, GNAZ, YBXl, GPR126, ARPC4, AP2B1, COL6Al, CXCL9, C14orf45, DIAPH3, DNAJC12, LAPTM4B, TUBA3, DTL, ALDH4A1, 0RC6L, ABLIMl, SHCBPl, FGFRl, ER- RFIl, CIRBP, C20orf4β, SLC16A1, SPARC, CYP2J2, AP2A2, SLC39A6, F2, SCD, ECT2, QSCN6L1, H3F3B, COL2A1, TBX19, EDNl, OXCTl, RP13-297E16.1, PALM2-AKAP2, HRB, TUBB, CTPS, CAD, CHI3L1, GREMl, ENOl, PLODl, SORBSl, TSPANl, STMNl, HIFlA, MMP7, STK3, G0LPH2, MT2A, FOXCl, SRM, C0L1A2, GEMIN4, MAPRE2,PGK1, TIMPl, ZBTB4, CRABPl, MAP3K8, TGFBl, ClOorfllβ, C14orfl32, TP53INP1, BLM, CD- C25A, MSX2, MMP23B, ADM, CTSF, TRA@, SFRP4, HMGAl, MRPS6, AP- BA2BP, STRA13, CDCA8 , SQLE, ACSS2, FBPl, PSMA7, HTATIP2, PSMD14, HSPB2, APP, TAS2R5, NFIB, TNFAIP2, NATl, SC4MOL, HNRPAB, TUBGl, PAXIPl, SEC14L1, SATBl, CELSR2, RNASEH2A, TMEM45A, CDKNlA, PTG- S2, ARFl, HDAC2, BCL6, CKAP4, JUNB, NOLA2, APOD, MMPl, EGFR, CCT6A, HDGFRP3, CES2, SMS, DEPDClB, TSPAN4, BDH2 , EEF1A2, S100A8, WISPl, PGAMl, DYNLTl and/or ADCY3.
2. A set of moieties comprising moieties specific for at least 50 tumor markers selected from the tumor markers MYBL2 , MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2, MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4, CDKN3, ClβorfGl, DLG7, H2AFZ, PFKP, KPNA2, GATA3, CENPF, KRT18, KRT5, MELK, CCNE2, CX3CR1, CCNDl, MCM6, TRIP13, PDIA4, CENPA, UBE2S, HMMR, NCFl, PSMD2, PGR, CDC25B, TGFB3, TROAP, KNTC2, XBPl, PRAME, KRT8, BTG2, FOXMl, KYNU, NMEl, MCM3, CDC2, NUSAPl, ERBB3, PCTKl and/or IGFBP5, preferably in a set of moieties specific for at least 60, preferably at least 80, at least 100, at least 120, at least 140, at least 160, at least 180 or at least 200, of the tumor markers as defined in claim 1.
3. A set of moieties comprising moieties specific for at least 19 tumor markers selected from the tumor markers MYBL2, MKI67, MAD2L1, AURKA, BCL2, BUBl, BIRC5, ESRl, CENPN, CCNBl, ERBB2, MLFlIP, NUDTl, PLKl, RNASE4, GGH, RRM2, CKS2, MCM4 , CDKN3, Clβorfβl, DLG7, H2AFZ, PFKP and/or KPNA2, preferably in a set of moieties specific for at least 20, preferably at least 40, at least 80, at least 100, at least 140, at least 160, at least 180 or at least 200, of the tumor markers as defined in claim 1.
4. A set of moieties according to any one of claims 1 to 3, comprising moieties specific for at least 210, preferably at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370 or 374 tumor markers selected from the tumor markers as defined in claim 1.
5. A set of moieties according to any one of claims 1 to 4, comprising moieties specific for at least 60, preferably at least 70, at least 80 or at least 90, tumor markers selected from the tumor markers MYBL2, RNASE4, GATA3, BIRC5, ESRl, BCL2, PRAME, ERBB2, AURKA, PSMD2 , MAD2L1, KRT18, MKI67, BTG2 , GTSEl, NUDTl, BUBl, TGFB3, CCNBl, FOXMl, KPNA2, CX3CR1, PLKl, NMEl, RRM2, CENPN, CKS2, CDKN3, SLPI, BRRNl, Clβorfβl, HMMR, PFKP, CTSC, CDC20, UBE2C, KIF23, DNAJC12, ASPM, MCM2, MCM6, H2AFZ, MLFlIP, CCNDl, MCM4, GGH, DLG7, PDIA4, CCNE2, CENPF, UBE2S, KRT5, VIL2, CP, SFRSlO, TCEALl, TFDPl, SLC25A5, PSMB7, SLC7A5, EIF4A1, FENl, DDOST, HMGAl, TRA@, IGFBP5, MUCl, C0L1A2, RFC4, CENPA, VEGF, MELK, PTTGl, RARRES3, HRB, CENPE, TFRC, C14orf45, TRIP13, ERBB3, KRT8, TROAP, KNTC2, CSElL, PIR, MCM3, NUSAPl, KYNU, PGR, PPP2R5C, PRCl, IL32, NCFl, CDC25B and/or CDK2.
6. A set of moieties according to any one of claims 1 to 4, comprising moieties specific for at least 10, preferably at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, or 188, tumor markers selected from the tumor markers PTDSSl, MYBL2, PAXIPl, KIT, FABP7, TIMPl, AURKB, VEGF, TFRC, MMP7 ,ARPC4, PPP1R12A, FUT8, GSTM3, SAT, BUBl, NDRGl, NPYlR, TXNRDl, GNAZ, DCK, VASHl, RABβA, RAD21, PFKP, HIFlA, TUBAl, PTGS2, TFDPl, CDC6, LDHA, NP, CDKN3, KRT8, CTSL, PCNA, BRCAl, IGFBP2, NATl, NFIB, SC4MOL, SDCl, TIMP2, PCT- Kl, ITPR3, CCT7, CDHl, CDC2, IER2, TFF3, PLOD2, FOXMl, SLPI, CHI3L1, KPNA2, YYl, TOP2A, MAPRE2, ABCD3, MET, SCD, SFRP4, JUNB, CSElL, SATBl, MKI67, MAP3K8, GEMI1SJ4, YBXl, IDIl, XBPl, TGFBl, NDP, PLAUR, KRTl7, ACADSB, FRY, GMPS, CCNA2, CDH3, YWHAZ, AURKA, DUSP4, C16orf35, MSN, CNKSRl, PTPRT, FKBPlA, EIF4A1, LRP8, ARFl, PIR, LRRC17, SLC3A2, SPARC, EGFR, TSPAN4, PTMA, PLODl, CTSC, SHMT2, CFB, PSMD14, APOD, RNASE4, CKS2, HMMR, TP53, C10orfll6, MMPl, PDIA4, GGH, KRT5, UBE2S, CENPF, ZMYM4, NUDTl, CRABPl, F2 , GABBRl, CCNC, CYP2J2, PSMD2, CES2, ERBB2, CAD, FCGRT, MAD2Ll, BIRC5, GATA3, EXTl, RAME, BCL2, GRB7, CX3CR1, MSX2, UBR2 , UBE2C, NMEl, PEX12, MCM6, CCT4, BTG2, APP, TGFB3> OMD, SNRPB, TRIP13, PRLR, HTATIP2, TROAP, SFRS7, FGFRl, SQLE, BCLβ, NOLA2, CENPE, HRB, IL32, 1FI30, PPP2R5C, COL6A1, CDC25B, SCAPl, MCM3, FABP5, HDAC2, INSR, KYNU, RP13-297E16.1, NCFl, TAS2R5, DDOST, STK3, GATM, FENl, SLC25A5, PSMB7 , CDC25A, SLC7A5, CELSR2, SLC39A6, OXCTl, CCNEl, SLC25A1, CCNBl, TCEALl and/or TUBGl.
7. A set of moieties according to any one of claims 1 to 7, comprising at least 10, preferably at least 20 , at least 30, at least 40, or at least 50, moieties specific for the tumor markers BCL2, RNASE4, GATA3, CX3CR1, PDIA4, TGFB3, XBPl, BTG2, GST- MS, IGFBP2, ACADSB, GATM, CNKSRl, KRT17, TFF3, ABCD3, LRRC17, PEX12, NDP, TCEALl, NPYlR, 'DUSP4, KIT, PTPRT, UBR2, INSR, PRLR, SCAPl, PPP1R12A, IER2, OMD, VASHl, ZMYM4, FCGRT, GABBRl, CFB, FUT8, FRY, FGFRl, C0L6A1, SLC39A6, SPARC, TIMPl, MSX2, ClOor- fll6, NATl, SATBl, TSPAN4, CES2, SFRP4, BCL6, PAXIPl, CELSR2, APOD and/or APP.
8. A set of moieties according to any one of claims 1 to 7, comprising at least 10, preferably at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, or at least 130, moieties specific for the tumor markers MYBL2, MKI67, MA- D2L1, AURKA, BUBl, B1RC5, ERBB2, CCNBl, NUDTl, GGH, CKS2, CDKN3, KPNA2, PFKP, CENPF, KRT5, MCM6, TRIP13, UBE2S, CDC25B, NCFl, HMMR, PSMD2, TROAP, FRAME, FOXMl, KRT8, MCM3, NMEl, KYNU, PCTKl, CDC2, CSElL, UBE2C, CCT4, PPP2R5C, SLPI, TOP2A, PIR, NP, VEGF, IL32, CTSC, DCK, FABP5, GMPS, MET, CCNEl, IFI30, SLC25A5, CENPE, SLC7A5, EIF4A1, CCNA2, TIMP2, YWHAZ, PSMB7, EXTl, YYl, PCNA, FENl, AURKB, TFDPl, DDOST, PTDSSl, PLAUR, TP53, SFRS7, CDC6, CTSL, MSN, TUBAl, SDCl, TFRC, PLOD2, ITPR3, LRP8 , SAT, FABP7, NDRGl, RAB6A, SLC3A2, CCT7, C16orf35, TXNRDl, SLC25A1, CDHl, CCNC, FKBPlA, IDIl, GRB7, RAD21, LDHA, SNRPB, PTMA, BRCAl, SHM- T2, OXCTl, CDH3, ARPC4, RP13-297E16.1, F2, SCD, CYP2J2, YBXl, GNAZ, CDC25A, TGFBl, GEMIN4, CHI3L1, MMP7, STK3, HRB, PLODl, CAD, MAPRE2, CRABPl, MAP3K8, HIFlA, EGFR, ARFl, PTGS2, NFIB, TAS2R5, NOLA2, SQLE, TUBGl, MMPl, PSMD14, SC4MOL, HTATIP2 and/or HDAC2.
9. A set of moieties according to any one of claims 1 to 9, comprising a moiety specific for the tumor marker MYBL2.
10. A set of moieties specific for tumor markers selected from MYBL2, MKI 67, MAD2L1, AURKA and BCL2.
11. A set according to any one of claims 1 to 10, characterized in that the moieties are nucleic acids, preferably primers, spe- cific for tumor marker nucleic acids.
12. A set according to any one of claims 1 to 11, characterized in that the moieties are antibodies or antibody fragments, preferably selected from Fab, Fab' Fab2 , F(ab')2 or scFv
(single-chain variable fragments) , specific for tumor marker proteins .
13. A set according to any one of claims 1 to 12, characterized in that the moieties are immobilized on a solid support, preferably in the form of a microarray.
14. Use of the tumor markers as selected in any one of claims 1 to 12 for the creation of a set for detecting the tumor markers, wherein the set has at least 200, preferably at least 210, at least 220, at least 230, at least 240, at least 250, at least 260, at least 270, at least 280, at least 290, at least 300, at least 310, at least 320, at least 330, at least 340, at least 350, at least 360, at least 370 or 374, members.
15. A method for the detection of breast cancer, using the set of any one of claims 1 to 13 and detecting or measuring for the occurrence of tumor markers in one or more sample (s) obtained from a patient.
16. The method according to claim 15, characterized in that the patient is a human being.
17. The method according to any one of claims 14 or 15, characterized in that the detection or measurement is done by RNA-ex- pression analysis, preferably by microarray or quantitative PCR, or protein analysis, preferably by tissue microarrays, protein- microarrays, ELISA, multiplex assays, immunohistochemistry, or DNA analysis, preferably CpG island methylation analysis, comparative genomic hybridization (CGH) -arrays or single nucleotide polymorphism (SNP) -analysis .
18. Method according to any one of claims 15 to 17, characterized in that the method comprises providing a pool of good prognosis markers from the tumor markers and generating a pool of poor prognosis markers and determining the specificity for the tumor markers of the pools, and classifying the sample according to the greater specificity to one pool.
19. The method according to claim 18, characterized in that the pool of good prognosis markers is selected from the tumor markers as defined in claim 7.
20. The method according to claim 18 or 19, characterized in that the pool of poor prognosis markers is selected from the tumor markers as defined in claim 8.
PCT/AT2007/000566 2006-12-22 2007-12-12 Set of tumor markers WO2008077165A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AT0213406A AT504702A1 (en) 2006-12-22 2006-12-22 SET OF TUMOR MARKERS
ATA2134/2006 2006-12-22

Publications (2)

Publication Number Publication Date
WO2008077165A1 WO2008077165A1 (en) 2008-07-03
WO2008077165A9 true WO2008077165A9 (en) 2008-09-04

Family

ID=39263053

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AT2007/000566 WO2008077165A1 (en) 2006-12-22 2007-12-12 Set of tumor markers

Country Status (2)

Country Link
AT (1) AT504702A1 (en)
WO (1) WO2008077165A1 (en)

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8697384B2 (en) 2008-01-23 2014-04-15 Herlev Hospital YKL-40 as a general marker for non-specific disease
WO2010023854A1 (en) * 2008-08-27 2010-03-04 Oncotherapy Science, Inc. Cancer related gene, lgn/gpsm2
US8367415B2 (en) * 2008-09-05 2013-02-05 University Of South Carolina Specific gene polymorphisms in breast cancer diagnosis, prevention and treatment
AU2009291312A1 (en) 2008-09-15 2010-03-18 Herlev Hospital YKL-40 as a marker for gastrointestinal cancers
KR101413480B1 (en) * 2008-12-05 2014-07-10 아브락시스 바이오사이언스, 엘엘씨 Sparc binding peptides and uses thereof
US20120041274A1 (en) 2010-01-07 2012-02-16 Myriad Genetics, Incorporated Cancer biomarkers
CA2753971C (en) * 2009-01-28 2018-10-02 Steven Buechler Accelerated progression relapse test
WO2010093872A2 (en) 2009-02-13 2010-08-19 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Molecular-based method of cancer diagnosis and prognosis
JP5681183B2 (en) 2009-07-16 2015-03-04 エフ.ホフマン−ラ ロシュ アーゲーF. Hoffmann−La Roche Aktiengesellschaft Flap endonuclease-1 as a cancer marker
US20120309640A1 (en) * 2009-10-08 2012-12-06 Torti Frank M Diagnostic and Prognostic Markers for Cancer
US20130012409A1 (en) 2009-10-08 2013-01-10 M Frank Diagnostic and Prognostic Markers for Cancer
EP2322658A1 (en) 2009-11-13 2011-05-18 Centre National de la Recherche Scientifique (CNRS) Signature for the diagnosis of breast cancer aggressiveness and genetic instability
GB0922085D0 (en) 2009-12-17 2010-02-03 Cambridge Entpr Ltd Cancer diagnosis and treatment
EP3812469A1 (en) 2010-07-07 2021-04-28 Myriad Genetics, Inc. Gene signatures for cancer prognosis
EP3147373B1 (en) 2010-07-27 2019-05-15 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer
US9605319B2 (en) 2010-08-30 2017-03-28 Myriad Genetics, Inc. Gene signatures for cancer diagnosis and prognosis
WO2012078648A2 (en) * 2010-12-06 2012-06-14 University Of Medicine And Dentistry Of New Jersey Novel method of cancer diagnosis and prognosis and prediction of response to therapy
EP2665835B1 (en) * 2011-01-18 2018-03-14 Everist Genomics, Inc. Prognostic signature for colorectal cancer recurrence
AU2012272662A1 (en) * 2011-06-22 2014-01-16 Oncocyte Corporation Methods and compositions for the treatment and diagnosis of cancer
JP2014525584A (en) * 2011-08-31 2014-09-29 オンコサイト コーポレーション Methods and compositions for the treatment and diagnosis of cancer
WO2013109690A1 (en) * 2012-01-17 2013-07-25 Myriad Genetics, Inc. Breast cancer prognosis signatures
CN102585004B (en) * 2012-01-19 2013-07-24 中国人民解放军第四军医大学 AEG-1 (Astrocyte Elevated Gene-1)/1E3 monoclonal antibody with high affinity
WO2014009055A1 (en) * 2012-07-12 2014-01-16 Universite De Namur Method and kit for predicting or monitoring the response of a cancer patient to chemotherapy, based on measuring the expression level of tmem45a gene.
US20150203589A1 (en) 2012-07-24 2015-07-23 The Trustees Of Columbia University In The City Of New York Fusion proteins and methods thereof
EP4190918A1 (en) 2012-11-16 2023-06-07 Myriad Genetics, Inc. Gene signatures for cancer prognosis
WO2014115889A1 (en) * 2013-01-28 2014-07-31 国立大学法人東京大学 Therapeutic or prophylactic agent for disease caused by activation of vascular endothelial cells
GB201317609D0 (en) 2013-10-04 2013-11-20 Cancer Rec Tech Ltd Inhibitor compounds
WO2015065097A1 (en) * 2013-10-31 2015-05-07 에스케이텔레콤 주식회사 Composition for diagnosing pancreatic cancer and method for diagnosing pancreatic cancer using same
KR101594981B1 (en) * 2013-10-31 2016-02-17 에스케이텔레콤 주식회사 Composition for diagnosing pancreatic cancer and method for diagnosing pancreatic cancer using the same
WO2015138834A1 (en) * 2014-03-13 2015-09-17 H. Lee Moffitt Cancer Center And Research Institute, Inc. Paxip1 as a biomarker for wee1 inhibitor therapy
WO2015175692A1 (en) 2014-05-13 2015-11-19 Myriad Genetics, Inc. Gene signatures for cancer prognosis
WO2015188273A1 (en) * 2014-06-09 2015-12-17 Biomark Technologies Inc. Method of detecting cancer based on spermine/spermidine n'-acetyltransferase gene expression
WO2016105517A1 (en) * 2014-12-23 2016-06-30 The Trustees Of Columbia University In The City Of New York Fusion proteins and methods thereof
US10961589B2 (en) 2015-03-05 2021-03-30 Case Western Reserve University HER2-regulated RNA as a diagnostic and therapeutic targets in HER2+ breast cancer
CN104678110B (en) * 2015-03-17 2020-04-07 北京博清科创生物技术有限公司 Serum CENPF antibody quantitative determination kit
GB201505658D0 (en) 2015-04-01 2015-05-13 Cancer Rec Tech Ltd Inhibitor compounds
US10443103B2 (en) * 2015-09-16 2019-10-15 Innomedicine, LLC Chemotherapy regimen selection
EP3202913B1 (en) * 2016-02-08 2019-01-30 King Faisal Specialist Hospital And Research Centre A set of genes for use in a method of predicting the likelihood of a breast cancer patient's survival
GB201617103D0 (en) 2016-10-07 2016-11-23 Cancer Research Technology Limited Compound
WO2018095933A1 (en) * 2016-11-22 2018-05-31 Université D'aix-Marseille (Amu) Method of prognosticating, or for determining the efficiency of a compound for treating cancer
GB201704536D0 (en) * 2017-03-22 2017-05-03 Univ Malta Method
CN107574249B (en) * 2017-10-23 2020-12-25 江苏省肿瘤防治研究所(江苏省肿瘤医院) Gene expression profile detection kit for predicting breast cancer recurrence of Chinese population
CN109142729B (en) * 2018-06-14 2021-04-23 郑州大学第一附属医院 Lung cancer marker anti-HMGB 3 autoantibody and application thereof
CN111057689A (en) * 2018-10-17 2020-04-24 复旦大学 Marker for grading and prognosis of malignant tumor
CN111679076A (en) * 2020-06-15 2020-09-18 吉林医药学院 Detection kit for detecting cyclinD1 and BCL-2 antibodies
GB2613386A (en) * 2021-12-02 2023-06-07 Apis Assay Tech Limited Diagnostic test

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008039071A2 (en) * 2006-09-29 2008-04-03 Agendia B.V. High-throughput diagnostic testing using arrays

Also Published As

Publication number Publication date
AT504702A1 (en) 2008-07-15
WO2008077165A1 (en) 2008-07-03

Similar Documents

Publication Publication Date Title
WO2008077165A9 (en) Set of tumor markers
US10378066B2 (en) Molecular diagnostic test for cancer
EP2553118B1 (en) Method for breast cancer recurrence prediction under endocrine treatment
US8065093B2 (en) Methods, systems, and compositions for classification, prognosis, and diagnosis of cancers
US7943306B2 (en) Gene expression signature for prediction of human cancer progression
US20120071346A1 (en) Gene-based algorithmic cancer prognosis
US20160222459A1 (en) Molecular diagnostic test for lung cancer
EP2692871A1 (en) Classification of cancers
EP3739060A1 (en) Methods to predict clinical outcome of cancer
US20070134688A1 (en) Calculated index of genomic expression of estrogen receptor (er) and er-related genes
US20170322217A1 (en) A method for prognosis of ovarian cancer, patient&#39;s stratification
WO2008030845A2 (en) Methods of predicting distant metastasis of lymph node-negative primary breast cancer using biological pathway gene expression analysis
Liu et al. Discovery of microarray-identified genes associated with ovarian cancer progression
CN109072481B (en) Genetic characterization of residual risk after endocrine treatment of early breast cancer
WO2014075067A1 (en) Methods to predict breast cancer outcome
Reinholz et al. Expression profiling of formalin-fixed paraffin-embedded primary breast tumors using cancer-specific and whole genome gene panels on the DASL® platform
US9195796B2 (en) Malignancy-risk signature from histologically normal breast tissue
WO2014066796A2 (en) Breast cancer prognosis signatures
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
EP3134548A2 (en) Cancer prognosis signatures
WO2012045019A2 (en) Brca deficiency and methods of use
WO2012170710A1 (en) Disease classification modules
US20140024028A1 (en) Brca deficiency and methods of use
US20110059074A1 (en) Knowledge-Based Proliferation Signatures and Methods of Use

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07845295

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 07845295

Country of ref document: EP

Kind code of ref document: A1