WO2005076005A2 - A method for classifying a tumor cell sample based upon differential expression of at least two genes - Google Patents

A method for classifying a tumor cell sample based upon differential expression of at least two genes Download PDF

Info

Publication number
WO2005076005A2
WO2005076005A2 PCT/EP2005/000858 EP2005000858W WO2005076005A2 WO 2005076005 A2 WO2005076005 A2 WO 2005076005A2 EP 2005000858 W EP2005000858 W EP 2005000858W WO 2005076005 A2 WO2005076005 A2 WO 2005076005A2
Authority
WO
WIPO (PCT)
Prior art keywords
genes
expression
tumor
cell
markers
Prior art date
Application number
PCT/EP2005/000858
Other languages
French (fr)
Other versions
WO2005076005A3 (en
Inventor
Christian Stratowa
Ulrich KÖNIG
Peter Steinlein
Stefan Amatschek
Herbert Auer
Wolfgang Sommergruber
Martin Schreiber
Agnes GRÜNFELDER
Margit Pacher-Zavisin
Original Assignee
Medizinische Universität Wien
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medizinische Universität Wien filed Critical Medizinische Universität Wien
Publication of WO2005076005A2 publication Critical patent/WO2005076005A2/en
Publication of WO2005076005A3 publication Critical patent/WO2005076005A3/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57484Immunoassay; Biospecific binding assay; Materials therefor for cancer involving compounds serving as markers for tumor, cancer, neoplasia, e.g. cellular determinants, receptors, heat shock/stress proteins, A-protein, oligosaccharides, metabolites
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present invention relates to the field of tumor diagnosis and prognosis.
  • Lung, breast, and colorectal cancer are the most common cancers in the industrial world. In 2000, more than 3 million people were diagnosed to suffer from one of these tumors. Although improvements in diagnostics and therapy have reduced cancer mortality, still 65% of all cancer patients die. Prognosis for people suffering from lung cancer is particularly poor, with a mortality rate close to 90%.
  • Two main reasons are responsible for the fact that cancer is expected to become the leading cause of death within a few years: first, cancer is a disease of multiple accumulating mutations that are becoming manifest in human populations with an increasingly prolonged life span, and second, neoplastic diseases still have many unmet needs, including lack of understanding of the mechanisms underlying cancer- related deaths and the difficulty in identification of the corresponding risk factors and development of specific targeted molecular therapies. The reason thus lies within the enormous complexity of tumor formation and tumor progression on the molecular level. Many efforts to detect differential gene expression between tumor and normal tissues have discovered numerous differentially expressed genes. However, the mechanistic contribution to tumorigenesis of many of these genes is still unknown.
  • cDNA microarrays have been shown to be a powerful tool to detect gene expression differences in cancers (e.g. Bangur et al., Oncogene 21, 3814- 3825 (2002) ) .
  • the cDNA array technique has been also successfully combined with subtractive hybridisation to detect tumor- specific transcripts in, for example, lung squamous cell cancer (LSCC; Datchenko et al . , PNAS 93, 6025-6030 (1996)).
  • LSCC lung squamous cell cancer
  • PNAS 93, 6025-6030 1996
  • Germ-line mutations within these two loci are associated with a 50 to 85% lifetime risk of breast and/or ovarian cancer. Only about 5% to 10% of breast cancers are associated with breast cancer susceptibility genes, BRCA1 and BRCA2. The cumulative lifetime risk of ⁇ breast cancer for women who carry the mutant BRCA1 is predicted to be approximately 92%, while the cumulative lifetime risk .for the non-carrier majority is estimated to be approximately 10%.
  • BRCAl is a tumor suppressor gene that is involved in DNA repair anc cell cycle control, which are both important for the maintenance of genomic stability. More than 90% of all mutations reported so far result in a premature truncation of the protein product with abnormal or abolished function.
  • BRCAl mutation carriers differs from that in sporadic cases, but mutation analysis is the only way to find the carrier.
  • BRCA2 is involved in the development of breast cancer, and like BRCAl plays a role in DNA repair. However, unlike BRCAl, it is not involved in ovarian cancer.
  • c-erb-2 HER2
  • p53 c-erb-2
  • Overexpression of c-erb-2 (HER2) and p53 have been correlated with poor prognosis, as has been aberrant expression products of mdm.2 and cyclin 1 and p27 (W098/33450 A) .
  • W098/33450 A cyclin 1 and p27
  • a marker-based approach to tumor identification and characterisation promises improved diagnostic and prognostic reliability.
  • diagnosis of breast cancer requires histopatholo- gical proof of the presence of the tumor.
  • histopathological examinations also provide information about prognosis and selection of treatment regimens.
  • Prognosis may also be established based upon clinical parameters such as tumor size, tumor grade, the age of the patient, and lymph node metastasis .
  • Diagnosis and/or prognosis may be determined to varying degrees of effectiveness by direct examination of the outside of the breast, or through mammography or other X-ray imaging methods. The latter approach is not without considerable cost, however.
  • the invention provides a method for classifying a cell sample as being a tumor cell comprising detecting a difference in the expression by said cell sample of at least one gene, preferably at least two genes, of Table 4 relative to at least one control cell and classifying the cell sample as a tumor cell, if the at least one gene, preferably the at least two genes, of Table 4 show at least 1.5-fold higher expression than the control cell. Selecting at least one, preferably at least two, of the genes of Table 4 (for a given tumor type) allows a fast and reliable classification of a given cell sample.
  • marker genes according to Table 4 which show - in table 3 - a "k/n" value of 10/20 or more (for BC) , of 3/11 or more for LAC, of 6/11 or more for LSCC and of 4/8 or more of RCC.
  • a difference in the expression of at least 3, preferably at least 5, especially at least 10 genes of Table 4 are detected.
  • the cell sample is classified as a tumor cell, if the cell sample is classified as a tumor cell, if the at least 2 genes of Table 4 show at least 2- fold, preferably 3-fold, especially 5-fold higher expression than the control cell .
  • the tumors to be classified according to the present invention are preferably selected from the group consisting of breast cancer (BC) , lung squamous cell cancer (LSCC) , lung adenocarcinoma (LAC) and renal cell cancer (RCC) .
  • tumor expression markers may be additionally applied and tested in combination with the markers according to the present invention, e.g. those described in WO 02/103320 A2.
  • At least 10, 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80 or 90 marker genes of Table 4 or from other sources, such as from table 3 or WO 02/103320 A2 may be tested in the present method.
  • the present invention provides novel putative intervention sites for anticancer therapy using a combination of subtractive hybridisation and cDNA microarray technology.
  • a set of 22 samples from various normal tissues were included to allow discrimination between tumor-specific genes and those that are also expressed in vital normal tissues.
  • This approach allowed to identify genes expressed exclusively in tumors but not in a comprehensive panel of vital normal tissues, which is a prerequisite in the design of a more specific anticancer immuno- or chemotherapy with less side effects.
  • a focus was laid on the transcriptional profiling of lung (squamous and adeno) , breast, and renal cell cancer (RCC) .
  • cDNA libraries were generated by subtracting cDNA fragments derived from normal tissues or primary cell lines from corresponding tumor tissues or tumor cell lines. Subsequently, the derived tumor-enriched clone collection (about 9250 clones in total) , together with about 1750 additional tumor-relevant genes, was used for the production of cDNA arrays.
  • 50 different tumor samples were analysed and compared with the expression profile of 22 different normal tissues. This approach allowed the selection of genes that show a significantly higher expression level in tumor tissues than in any of the analysed normal tissues.
  • the present invention provides for the first time a tissue-wide expression profile used to increase the significance of expression data of tumor samples. Furthermore, a subset of differentially regulated genes was identified that can be correlated with poor prognosis in breast cancer patients.
  • the present invention provides a method for classifying an individual as having a good prognosis (survival more than 9 years after initial diagnosis) or a poor prognosis (survival less than 3 years after initial diagnosis) , comprising detecting a difference in the expression of at least one gene of Table 4 in a cell sample taken from the individual relative to a control.
  • Table 4 provides clear markers for short or long survival expectation which are either up- or down-regulated in the two different survival groups.
  • the genes with Ace. o. X77303, U07707, NM018948, NM003379, M17254, X14149, NM078467, X13293, NM000610 and NM031966 are specifically up- regulated in the survival more than 9 years group whereas these genes and X77303, NM000701, U07707 and NM001993 are down-regulated in patients with low survival.
  • the other genes in Table 4 are (perhaps with the exception of Y00479, BC028152, AF065386, NM001946, X14149 and U77916) significantly up-regulated in the less than 3 years survival group.
  • NM021238, NM000297, D38594, AL034384, NM002235, AC009433, X13293, NM000610, NM031966, NM020698 and AA627385 are specifically up-regulated in this group of patients .
  • At least two of the genes of Table 4 are examined, especially genes from the above mentioned specifically up-regulated genes for each patient group. Even more preferred, at least 5, 7, 10, 15, 20, 25, 30, 35 or 40 of these genes are examined in the method according to the present invention.
  • the invention further provides microarrays comprising the disclosed marker sets.
  • the invention provides a microarray comprising at least 2, especially at least 5 markers derived from any one of Table 4, especially wherein at least 50% of the probes on the microarray are present in any one of Table 4.
  • at least 60%, 70%, 80%, 90%, 95% or 98% of the probes on said microarray are present in any one of Table 4.
  • the invention provides a microarray for distinguishing cell samples from patients having a good prognosis and cell samples from patients having a poor prognosis comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridisable to a plurality of genes, said plurality consisting of at least 2, preferably at least 5 of the genes corresponding to the markers listed in Table 4, wherein at least 50% of the probes on the microarray are present in Table 4.
  • the invention further provides for mi- croarrays comprising at least 5, 10, 15, 20, 30, 40, 50, 70 or 100 marker genes listed in Table 4, Table 3 WO 01/74405 A, US 2002/142981 Al or WO 02/103320 A2 in addition to the at least one, preferably at least two, especially at least 5 genes from Table 4, especially at least 5, 10, 15, 20 or 30 of the prognostic marker genes listed in Table 4, in any combination, wherein at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of the probes on said microarrays are present in Table 4.
  • Good prognosis means that a patient is expected to have a survival of more than 9 years, counted from initial diagnosis of cancer.
  • “Poor prognosis” means that a patient is expected to have a survival of less than 3 years, counted from initial diagnosis of cancer.
  • Marker means an entire gene, or an EST derived from that gene, the expression or level of which changes between certain conditions. Where the expression of the gene correlates with a certain condition, the gene is a marker for that condition.
  • Marker-derived polynucleotides means the RNA transcribed from a marker gene, any cDNA or cRNA produced therefrom, and any nucleic acid derived therefrom, such as synthetic nucleic acid having a sequence derived from the gene corresponding to the marker gene.
  • the present invention provides a set of genetic markers whose expression is correlated with the existence of BC, LSCCC, LAC or RCC.
  • the invention also provides a method of using these markers to distinguish tumor types in diagnosis or prognosis.
  • the invention provides a set of 42 genetic markers that can distinguish between patients with a good tumor prognosis and patients with a poor tumor prognosis. These markers are listed in Table 4.
  • the invention also provides subsets of at least 5, 10, 20, 30 or 40 markers, drawn from the set of 42, which also distinguish between patients with good and poor prognosis.
  • markers according to the present invention may be combined with markers that distinguish ER status, BRCAl markers or sporadic markers, or with the prognostic markers, or both. Any of the marker sets provided above may also be used in combination with other markers for the preferred tumors according to the present invention, or for any other clinical or physiological condition.
  • the present invention provides a screening method for finding sets of markers for the identification of conditions or indications associated with BC, LSCC, LAC and RCC cancer types.
  • the method for identifying marker sets is as follows: After extraction and labeling of target polynucleotides, the expression of all markers (genes) in a sample X is compared to the expression of all markers in a standard or control.
  • the standard or control comprises target polynucleotide molecules derived from a sample from a normal individual (i. e., an individual not afflicted with breast cancer) .
  • the standard or control is a pool of target polynucleotide molecules. The pool may be derived from collected samples from a number of normal individuals. In a preferred embodiment, the pool comprises samples taken from a number of individuals having sporadic-type tumors.
  • the pool comprises an artificially-generated population of nucleic acids designed to approximate the level of nucleic acid derived from each marker found in a pool of marker-derived nucleic acids derived from tumor samples.
  • the pool is derived from normal or breast cancer cell lines or cell line samples.
  • the comparison may be accomplished by any means known in the art. For example, expression levels of various markers may be assessed by separation of target polynucleotide molecules (e. g., RNA or cDNA) derived from the markers in agarose or polyac- rylamide gels, followed by hybridisation with marker-specific oligonucleotide probes. Alternatively, the comparison may be accomplished by the labeling of target polynucleotide molecules followed by separation on a sequencing gel. Polynucleotide samples are placed on the gel such that patient and control or standard polynucleotides are in adjacent lanes. Comparison of expression levels is accomplished visually or by means of dens- itometer.
  • target polynucleotide molecules e. g., RNA or cDNA
  • the expression of all markers is assessed simultaneously by hybridisation to a microarray.
  • markers meeting certain criteria are identified as associated with cancer.
  • a marker is selected based upon significant difference of expression in a sample as compared to a standard or control condition. Selection may be made based upon either significant up- or down regulation of the marker in the patient sample. Selection may also be made by calculation of the statistical significance (i. e., the p-value) of the correlation between the expression of the marker and the condition or indication. Preferably, both selection criteria are used.
  • markers associated with cancer are selected where the markers show both more than two-fold change (increase or decrease) in expression as compared to a standard, and the p-value for the correlation between the existence of cancer and the change in marker expression is no more than 0.01 (i. e., is statistically significant).
  • the expression of the identified cancer-related markers is then used to identify markers that can differentiate tumors into clinical types.
  • markers are identified by calculation of correlation coeffi- cients between the clinical category or clinical parameter (s) and the linear, logarithmic or any transform of the expression ratio across all samples for each individual gene.
  • a specifically preferred screening method uses PCR-based cDNA subtraction and cDNA microarrays (see example-section) .
  • target polynucleotide molecules are extracted from a sample taken from an individual afflicted with breast cancer.
  • the sample may be collected in any clinically ac- cepTable manner, but must be collected such that marker-derived polynucleotides (i. e., RNA) are preserved.
  • marker-derived polynucleotides i. e., RNA
  • mRNA or nucleic acids derived therefrom i. e., cDNA or amplified DNA
  • mRNA or nucleic acids derived therefrom may be labeled with the same label as the standard or control polynucleotide molecules, wherein the intensity of hybridisation of each at a particular probe is compared.
  • a sample may comprise any clinically relevant tissue sample, such as a tumor biopsy or fine needle aspirate, or a sample of bodily fluid, such as blood, plasma, serum, lymph, ascitic fluid, cystic fluid, urine or nipple exudate.
  • the sample may be taken from a human, or, in a veterinary context, from non-human animals such as ruminants, horses, swine or sheep, or from domestic companion animals such as felines and canines.
  • RNA may be isolated from eukaryotic cells by procedures that involve lysis of the cells and denaturation of the proteins contained therein.
  • Cells of interest include wild-type cells (i. e., non-cancerous), drug-exposed wild-type cells, tumor-or tumor-derived cells, modified cells, normal or tumor cell line cells, and drug-exposed modified cells.
  • RNA is extracted from cells of the various types of interest using guanidinium thiocyanate lysis followed by CsCI centrifugation to separate the RNA from DNA (Chirgwin et al . , Biochemistry 18: 5294-5299 (1979)).
  • Poly (A) + RNA is selected by selection with oligo-dT cellulose (see Sam- brook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) .
  • separation of RNA from DNA can be accomplished by organic extraction, for example, with hot phenol or phenol/chloroform/isoamyl alcohol.
  • RNase inhibitors may be added to the lysis buffer.
  • mRNAs such as transfer RNA (tRNA) and ribosomal RNA (rRNA) .
  • Most mRNAs contain a poly (A) tail at their 3 'end. This allows them to be enriched by affinity chromatography, for example, using oligo (dT) or poly (U) coupled to a solid support, such as cellulose or Sephadex (see Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current Protocols Publishing, New York (1994).
  • poly (A) + mRNA is eluted from the affinity column using 2 mM EDTA/0.1% SDS.
  • the sample of RNA can comprise a plurality of different mRNA molecules, each different mRNA molecule having a different nucleotide sequence.
  • the mRNA molecules in the RNA sample comprise at least 100 different nucleotide sequences. More preferably, the mRNA molecules of the RNA sample comprise mRNA molecules corresponding to each of the marker genes.
  • the RNA sample is a mammalian RNA sample. In a specific embodiment, total RNA or mRNA from cells are used in the methods of the invention.
  • the source of the RNA can be cells of a plant or animal, human, mammal, primate, non-human animal, dog, cat, mouse, rat, bird, yeast, eukaryote, prokaryote, etc.
  • the method of the invention is used with a sample containing total mRNA or total RNA from 1 x 10 6 cells or less.
  • proteins can be isolated from the foregoing sources, by methods known in the art, for use in expression analysis at the protein level.
  • Probes to the homologs of the marker sequences disclosed herein can be employed preferably wherein non-human nucleic acid is being assayed.
  • the present invention provides sets of markers useful for distinguishing samples from those patients with a good prognosis from samples from patients with a poor prognosis.
  • the invention further provides a method for using these markers to determine whether an individual afflicted with cancer, especially BC, will have a good or poor clinical prognosis.
  • the invention provides for method of determining whether an individual afflicted with cancer, especially breast cancer, will likely experience non-survival within 3 years of initial diagnosis (i.
  • a set of experiments of individuals with known outcome should be hybridised against the pool to define the expression templates for the good prognosis and poor prognosis group.
  • Each individual with unknown outcome is hybridised against the same pool and the resulting expression profile is compared to the templates to predict its outcome. Poor prognosis of breast cancer may indicate that a tumor is relatively aggressive, while good prognosis may indicate that a tumor is relatively nonaggressive.
  • the invention provides for a method of determining a course of treatment of a breast cancer patient, comprising determining whether the level of expression of the 42 markers of Table 4, or a subset thereof, correlates with the level of these markers in a sample representing a good prognosis expression pattern or a poor prognosis pattern; and determining a course of treatment, wherein, if the expression correlates with the poor prognosis pattern, the tumor is treated as an aggressive tumor.
  • Classification of a sample as "good prognosis” or “poor prognosis” is accomplished substantially as for the diagnostic markers described above, wherein a template is generated to which the marker expression levels in the sample are compared.
  • the use of marker sets is not restricted to the prognosis of breast cancer- related conditions, and may be applied in a variety of pheno- types or conditions, clinical or experimental, in which gene expression plays a role. Where a set of markers has been identified that corresponds to two or more phenotypes, the marker sets can be used to distinguish these phenotypes.
  • the phenotypes may be the diagnosis and/or prognosis of clinical states or phenotypes associated with other cancers, other disease conditions, or other physiological conditions, wherein the expression level data is derived from a set of genes correlated with the particular physiological or disease condition.
  • the expression level values are preferably transformed in a number of ways.
  • the expression level of each of the markers can be normalised by the average expression level of all markers the expression level of which is determ- ined, or by the average expression level of a set of control genes.
  • the markers are represented by probes on a microarray, and the expression level of each of the markers is normalised by the mean or median expression level across all of the genes represented on the microarray, including any non-marker genes.
  • the normalisation is carried out by dividing the median or mean level of expression of all of the genes on the microarray.
  • the expression levels of the markers are normalised by the mean or median level of expression of a set of control markers.
  • the control markers comprise a set of housekeeping genes.
  • the normalisation is accomplished by dividinc / by the median or mean expression level of the control genes.
  • the sensitivity of a marker-based assay will also be increased if the expression levels of individual markers are compared to the expression of the same markers in a pool of samples.
  • the comparison is to the mean or median expression level of each the marker genes in the pool of samples.
  • Such a comparison may be accomplished, for example, by dividing by the mean or median expression level of the pool for each of the markers from the expression level each of the markers in the sample. This has the effect of accentuating the relative differences in expression between markers in the sample and markers in the pool as a whole, making comparisons more sensitive and more likely to produce meaningful results that the use of absolute expression levels alone.
  • the expression level data may be transformed in any convenient way; preferably, the expression level data for all is log transformed before means or medians are taken.
  • the expression levels of the markers in the sample may be compared to the expression level of those markers in the pool, where nucleic acid derived from the sample and nucleic acid derived from the pool are hybridised during the course of a single experiment.
  • nucleic acid derived from the sample and nucleic acid derived from the pool are hybridised during the course of a single experiment.
  • Such an approach requires that new pool nucleic acid be generated for each comparison or limited numbers of comparisons, and is therefore limited by the amount of nucleic acid available.
  • the expression levels in a pool are stored on a computer, or on computer- readable media, to be used in comparisons to the individual expression level data from the sample (i. e., single-channel data) .
  • the expression levels of the marker genes in a sample may be determined by any means known in the art.
  • the expression level may be determined by isolating and determining the level (i. e., amount) of nucleic acid transcribed from each marker gene.
  • the level of specific proteins translated from mRNA transcribed from a marker gene may be determined.
  • the level of expression of specific marker genes can be accomplished by determining the amount of mRNA, or polynucleotides derived therefrom, present in a sample. Any method for determining RNA levels can be used. For example, RNA is isolated from a sample and separated on an agarose gel. The separated RNA is then transferred to a solid support, such as a filter.
  • Nucleic acid probes representing one or more markers are then hybridised to the filter by northern hybridisation, and the amount of marker-derived RNA is determined. Such determination can be visual, or machine-aided, for example, by use of a densitometer . Another method of determining RNA levels is by use of a dot-blot or a slot-blot. In this method, RNA, or nucleic acid derived therefrom, from a sample is labeled. The RNA or nucleic acid derived therefrom is then hybridised to a filter containing oligo- nucleotides derived from one or more marker genes, wherein the oligonucleotides are placed upon the filter at discrete, easily- identifiable locations.
  • Hybridisation, or lack thereof, of the labeled RNA to the filter-bound oligonucleotides is determined visually or by densitometer.
  • Polynucleotides can be labeled using a radiolabel or a fluorescent (i. e., visible) label. These examples are not intended to be limiting; other methods of determining RNA abundance are known in the art.
  • the level of expression of particular marker genes may also be assessed by determining the level of the specific protein expressed from the marker genes. This can be accomplished, for example, by separation of proteins from a sample on a polyacrylamide gel, followed by identification of specific marker-derived proteins using antibodies in a western blot. Alternatively, proteins can be separated by two-dimensional gel electrophoresis systems.
  • Two-dimensional gel electro- phoresis is well-known in the art and typically involves iso- electric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension.
  • the resulting elec- tropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies.
  • marker-derived protein levels can be determined by constructing an antibody microarray in which binding sites comprise immobilised, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • binding sites comprise immobilised, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • antibodies are present for a substantial fraction of the markerderived proteins of interest.
  • Methods for making monoclonal antibodies are well known (see, e. g. , Harlow and Lane, 1988, ANTIBODIES : A LABORATORY MANUAL, Cold Spring Harbor, New York, which is incorporated in its entirety for all purposes).
  • monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array, and their binding is assayed with assays known in the art.
  • the expression, and the level of expression, of proteins of diagnostic or prognostic interest can be detected through immunohistochemical staining of tissue slices or sections.
  • tissue array Kononen et al., Nat. Med 4 (7): 844-7 (1998).
  • tissue array multiple tissue samples are assessed on the same microarray.
  • the arrays allow in situ detection of RNA and protein levels; consecutive sections allow the analysis of multiple samples simultaneously.
  • polynucleotide microarrays are used to measure expression so that the expression status of each of the markers above is assessed simultaneously.
  • the invention provides for oligonucleotide or cDNA arrays comprising probes hybridisable to the genes corresponding to each of the marker sets described above (i.
  • microarrays e., markers to determine the tumor, especially the molecular type or subtype of a tumor; markers to distinguish patients with good versus patients with poor prognosis.
  • the microarrays provided by the present invention may comprise probes hybridisable to the genes corresponding to markers able to distinguish the status of one, two, or all three of the clinical conditions noted above.
  • the invention provides polynucleotide arrays comprising probes to a subset or subsets of at least 2, 5, 10, 15, 20, 30 or 40 genetic markers from Table 4.
  • microarrays that are used in the methods disclosed herein optionally comprise markers additional to at least some of the markers listed in Table 4.
  • the microarray is a screening or scanning array as described in WO 02/103320, WO 02/18646 and WO 02/16650.
  • the scanning and screening arrays comprise regularlyspaced, position- ally-addressable probes derived from genomic nucleic acid sequence, both expressed and unexpressed.
  • Such arrays may comprise probes corresponding to a subset of, or all of, the markers listed in Table 4, or a subset thereof as described above, and can be used to monitor marker expression in the same way as a microarray containing only markers listed in Table .
  • the microarray is a commercially available cDNA microarray that comprises at least five of the markers listed in Table 4.
  • a commercially-available cDNA microarray comprises all of the markers listed in Table 4.
  • a microarray may comprise 5, 10, 15, 25, 40 or more of the markers in any of Table 4, up to the maximum number of markers in the Table or Figure.
  • the markers that are all or a portion of Table 4 make up at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of the probes on the microarray.
  • Microarrays are prepared by selecting probes which comprise a polynucleotide sequence, and then immobilising such probes to a solid support or surface.
  • the probes may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA.
  • the polynucleotide sequences of the probes may also comprise DNA and/or RNA analogues, or combinations thereof.
  • the polynucleotide sequences of the probes may be full or partial fragments of genomic DNA.
  • the polynucleotide sequences of the probes may also be synthesised nucleotide sequences, such as synthetic oligonucleotide sequences .
  • the probe sequences can be synthesised either enzymatically in vivo, en- zymatically in vitro (eg., by PCR) , or non-enzymatically in vitro.
  • the probe or probes used in the methods of the invention are preferably immobilised to a solid support which may be either porous or non-porous.
  • the probes of the invention may be polynucleotide sequences which are attached to a nitrocellulose or nylon membrane or filter covalently at either the 3 ' or the 5 ' end of the polynucleotide.
  • hybridisation probes are well known in the art (see, e. g., Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) .
  • the solid support or surface may be a glass or plastic surface.
  • hybridisation levels are measured to microarrays of probes consisting of a solid phase on the surface of which are immobilised a population of polynucleotides, such as a population of DNA or DNA mimics, or, alternatively, a population of RNA or RNA mimics.
  • the solid phase may be a nonporous or, optionally, a porous material such as a gel .
  • a microarray comprises a support or surface with an ordered array of binding (e. g. , hybridisation) sites or "probes" each representing one of the markers described herein.
  • the microarrays are addressable arrays, and more preferably positionally addressable arrays.
  • each probe of the array is preferably located at a known, predetermined position on the solid support such that the iden- tity (i. e., the sequence) of each probe can be determined from its position in the array (i. e., on the support or surface).
  • each probe is covalently attached to the solid support at a single site.
  • Microarrays can be made in a number of ways, of which several are described below. However produced, microarrays share certain characteristics. The arrays are reproducible, allowing multiple copies of a given array to be produced and easily compared with each other. Preferably, microarrays are made from materials that are sTable under binding (e. g., nucleic acid hybridisation) conditions. The microarrays are preferably small, e. g., between 0,1 cm 2 and 25 cm 2 , between 2 cm 2 and 13 cm 2 , or 3 cm 2 . However, larger and smaller arrays are also contemplated and may be preferable, e. g., for use in screening arrays.
  • a given binding site or unique set of binding sites in the microarray will specifically bind (e. g., hybridise) to the product of a single gene in a cell (e. g., to a specific mRNA, or to a specific cDNA derived therefrom) .
  • a single gene in a cell e. g., to a specific mRNA, or to a specific cDNA derived therefrom
  • other related or similar sequences will cross hybridise to a given binding site.
  • the microarrays of the present invention include one or more test probes, each of which has a polynucleotide sequence that is complementary to a subsequence of RNA or DNA to be detected.
  • the position of each probe on the solid surface is known.
  • the microarrays are preferably positionally addressable arrays.
  • each probe of the array is preferably located at a known, predetermined position on the solid support such that the identity (i. e., the sequence) of each probe can be determined from its position on the array (i. e., on the support or surface).
  • the microarray is an array (i. e. , a matrix) in which each position represents one of the markers described herein.
  • each position can contain a DNA or DNA analogue based on genomic DNA to which a particular RNA or cDNA transcribed from that genetic marker can specifically hybridise.
  • the DNA or DNA analogue can be, e. g., a synthetic oli- gomer or a gene fragment.
  • probes representing each of the markers is present on the array.
  • the "probe" to which a particular polynucleotide molecule specifically hybridises according to the invention contains a complementary genomic polynucleotide sequence.
  • the probes of the microarray preferably consist of nucleotide sequences of no more than 1,000 nucleotides. In some embodiments, the probes of the array consist of nucleotide sequences of 10 to 1,000 nucleotides. In a preferred embodiment, the nucleotide sequences of the probes are in the range of 10-200 nucleotides in length and are genomic sequences of a species of organism, such that a plurality of different probes is present, with sequences complementary and thus capable of hybridising to the genome of such a species of organism, sequentially tiled across all or a portion of such genome.
  • the probes are in the range of 10-30 nucleotides in length, in the range of 10-40 nucleotides in length, in the range of 20-50 nucleotides in length, in the range of 40-80 nucleotides in length, in the range of 50-150 nucleotides in length, in the range of 80-120 nucleotides in length, and most preferably are 60 nucleotides in length.
  • the probes may comprise DNA or DNA "mimics" (e. g., derivatives and analogues) corresponding to a portion of an organism's genome.
  • the probes of the microarray are complementary RNA or RNA mimics.
  • DNA mimics are polymers composed of subunits capable of specific, Watson-Crick-like hybridisation with DNA, or of specific hybridisation with RNA.
  • the nucleic acids can be modified at the base moiety, at the sugar moiety, or at the phosphate backbone.
  • Exemplary DNA mimics include, e. g., phosphorothioates .
  • DNA can be obtained, e. g., by polymerase chain reaction (PCR) amplification of genomic DNA or cloned sequences.
  • PCR primers are preferably chosen based on a known sequence of the genome that will result in amplification of specific fragments of genomic DNA.
  • Computer programs that are well known in the art are useful in the design of primers with the required specificity and optimal amplification properties, such as Oligo version 5.0 (National Biosciences) .
  • each probe on the microarray will be between 10 bases and 50,000 bases, usually between 300 bases and 1,000 bases in length.
  • PCR methods are well known in the art, and are described, for example, in Innis et al., eds .
  • PCR PROTOCOLS A GUIDE TO METHODS AND APPLICATIONS, Academic Press Inc., San Diego, CA (1990). It will be apparent to one skilled in the art that controlled robotic systems are useful for isolating and amplifying nucleic acids.
  • An alternative, preferred means for generating the polynucleotide probes of the microarray is by synthesis of synthetic polynucleotides or oligonucleotides, e. g., using Nphosphonate or phosphoramidite chemistries (Froehler et al., Nucleic Acid Res. 14: 53995407 (1986); McBride et al., Tetrahedron Lett. 24: 246-248 (1983)). Synthetic sequences are typically between about 10 and about 500 bases in length, more typically between about 20 and about 100 bases, and most preferably between about 40 and about 70 bases in length.
  • synthetic nucleic acids include non-natural bases, such as, but by no means limited to, inosine.
  • nucleic acid analogues may be used as binding sites for hybridisation.
  • An example of a suiT- able nucleic acid analogue is peptide nucleic acid (see, e. g., Eghohn et al., Nature 363: 566-568 (1993); U. S. Patent No. 5,539,083) .
  • Probes are preferably selected using an algorithm that takes into account binding energies, base composition, sequence complexity, cross-hybridisation binding energies, and secondary structure (see Friend et al., International Patent Publication WO 01/05935, published January 25,2001; Hughes et al . , Nat. Biotech. 19: 342-7 (2001) ) .
  • positive control probes e. g. , probes known to be complementary and hybridisable to sequences in the target polynucleotide molecules
  • negative control probes e. g., probes known to not be complementary and hybridisable to sequences in the target polynucleotide molecules, should be included on the array.
  • positive controls are synthesised along the perimeter of the array.
  • positive controls are synthesised in diagonal stripes across the array.
  • the reverse complement for each probe is synthesised next to the position of the probe to serve as a negative control.
  • sequences from other species of organism are used as negative controls or as"spike-in"controls .
  • the probes are attached to a solid support or surface, which may be made, e. g., from glass, plastic (e. g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, or other porous or nonporous material.
  • a preferred method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al, Science 270: 467-470 (1995). This method is especially useful for preparing microarrays of cDNA (See also, DeRisi et al, Nature Genetics 14: 457-460 (1996); Shalon et al., Genome Res. 6 : 639-645 (1996); and Schena et al . , Proc. Natl. Acad. Sci. U. S. A. 93: 10539-11286 (1995)).
  • a second preferred method for making microarrays is by making high-density oligonucleotide arrays.
  • Techniques are known for producing arrays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ (see, Fodor et al . , 1991, Science 251: 767-773; Pease et al., 1994, Proc. Natl. Acad. Sci. U. S. A. 91: 5022-5026; Lockhart et al., 1996, Nature Biotechnology 14: 1675; U. S. Patent Nos.
  • oligonucleotides e. g., 60-mers
  • oligonucleotides of known sequence are synthesised directly on a surface such as a derivat- ised glass slide.
  • the array produced is redundant, with several oligonucleotide molecules per RNA.
  • Other methods for making microarrays e. g., by masking (Maskos and Southern, 1992, Nuc. Acids. Res. 20: 1679-1684), may also be used.
  • any type of array for example, dot blots on a nylon hybridisation membrane (see Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989)) could be used.
  • very small arrays will frequently be preferred because hybridisation volumes will be smaller.
  • the arrays of the present invention are pre- pared by synthesising polynucleotide probes on a support.
  • polynucleotide probes are attached to the support covalently at either the 3 ' or the 5 ' end of the polynucleotide.
  • microarrays of the invention are manufactured by means of an ink jet printing device for oligonucleotide synthesis, e. g., using the methods and systems described by Blanchard in U. S. Pat. No.
  • the oligonucleotide probes in such microarrays are preferably synthesised in arrays, e. g. , on a glass slide, by serially depositing individual nucleotide bases in "microdroplets" of a high surface tension solvent such as propylene carbonate.
  • the microdroplets have small volumes (e.
  • Microarrays manufactured by this ink-jet method are typically of high density, preferably having a density of at least about 2,500 different probes per 1 cm 2 .
  • the polynucleotide probes are attached to the support covalently at either the 3 ' or the 5 ' end of the polynucleotide.
  • the polynucleotide molecules which may be analyzed by the present invention may be from any clinically relevant source, but are expressed RNA or a nucleic acid derived therefrom (e. g., cDNA or amplified RNA derived from cDNA that incorporates an RNA polymerase promoter) , including naturally occurring nucleic acid molecules, a.s well as synthetic nucleic acid molecules.
  • the target polynucleotide molecules comprise RNA, including, but by no means limited to, total cellular RNA, poly (A) + messenger RNA (mRNA) or fraction thereof, cytoplasmic mRNA, or RNA transcribed from cDNA (i.
  • RNA e., cRNA ; see, e. g., Linsley & Schelter, U. S. Patent Application No. 09/411,074, filed October 4,1999, or U. S. Patent Nos. 5,545,522, 5,891,636, or 5,716,785).
  • Methods for preparing total and poly (A) RNA are well known in the art, and are described generally, e. g., in Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) .
  • RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry 18: 5294-5299).
  • total RNA is extracted using a silica gel-based column, commercially available examples of which include RNeasy (Qiagen, Valencia, California) and StrataPrep (Stratagene, La Jolla, California) .
  • RNA is extracted from cells using phenol and chloroform, as described in Ausubel et al., eds . , 1989, CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Vol III, Green Publishing Associates, Inc., John Wiley & Sons, Inc., New York, at pp. 13.12.1-13.12.5).
  • Poly (A) + RNA can be selected, e. g., by selection with oligo- dT cellulose or, alternatively, by oligo-dT primed reverse transcription of total cellular RNA.
  • RNA can be fragmented by methods known in the art, e. g, by incubation with ZnCt, to generate fragments of RNA.
  • the polynucleotide molecules analyzed by the invention comprise cDNA, or PCR products of amplified RNA or cDNA.
  • total RNA, mRNA, or nucleic acids derived therefrom is isolated from a sample taken from a person afflicted with BC, LSCC, LAC and RCC.
  • Target polynucleotide molecules that are poorly expressed in particular cells may be enriched using normalisation techniques (Bonaldo et al . , 1996, Genome Res. 6: 791- 806) .
  • the target polynucleotides are detect- ably labeled at one or more nucleotides. Any method known in the art may be used to detectably label the target polynucleotides.
  • this labeling incorporates the label uniformly along the length of the RNA, and more preferably, the labeling is carried out at a high degree of efficiency.
  • oligo-dT primed reverse transcription uses oligo-dT primed reverse transcription to incorporate the label; however, conventional methods of this method are biased toward generating 3' end fragments.
  • random primers e. g., 9-mers
  • random primers may be used in reverse transcription to uniformly incorporate labeled nucleotides over the full length of the target poly- nucleotides.
  • random primers may be used in conjunction with PCR methods or T7 promoter-based in vitro transcription methods in order to amplify the target polynucleotides.
  • the detectable label is a ' luminescent label.
  • the label is a fluorescent label, such as a fluorescein, a phosphor, a rhodamine, or a polymethine dye derivative.
  • fluorescent labels include, for example, fluorescent phosphoramidites such as FluorePrime (Amersham Pharmacia, Piscataway, N. J. ) , Fluoredite (Millipore, Bedford, Mass.), FAM (ABI, Foster City, Calif.), and Cy3 or Cy5 (Amersham Pharmacia, Piscataway, N. J. ) .
  • the detecTable label is a radiolabeled nucleotide.
  • target polynucleotide molecules from a patient sample are labeled differentially from target polynucleotide molecules of a standard.
  • the standard can comprise target polynucleotide molecules from normal individuals (i. e., those not afflicted with breast cancer).
  • the standard comprises target polynucleotide molecules pooled from samples from normal individuals or tumor samples from individuals having sporadic-type breast tumors.
  • the target polynucleotide molecules are derived from the same individual, but are taken at different time points, and thus indicate the efficacy of a treatment by a change in expression of the markers, or lack thereof, during and after the course of treatment (i.
  • a change in the expression of the markers from a poor prognosis pattern to a good prognosis pattern indicates that the treatment is efficacious.
  • different timepoints are differentially labeled.
  • Nucleic acid hybridisation and wash conditions are chosen so that the target polynucleotide molecules specifically bind or specifically hybridise to the complementary polynucleotide sequences of the array, preferably to a specific array site, wherein its complementary DNA is located.
  • Arrays containing double-stranded probe DNA situated thereon are preferably subjected to denaturing conditions to render the DNA single-stranded prior to contacting with the target polynucleotide molecules.
  • Arrays containing single-stranded probe DNA e. g., synthetic oligodeoxyribonucleic acids
  • Optimal hybridisation conditions will depend on the length (e. g. , oligomer versus polynucleotide greater than 200 bases) and type (e. g., RNA, or DNA) of probe and target nucleic acids.
  • length e. g. , oligomer versus polynucleotide greater than 200 bases
  • type e. g., RNA, or DNA
  • oligonucleotides become shorter, it may become necessary to adjust their length to achieve a relatively uniform melting temperature for satisfactory hybridisation results.
  • General parameters for specific (i. e., stringent) hybridisation conditions for nucleic acids are described in Sambrook et al . , MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols.
  • Typical hybridisation conditions for the cDNA microarrays of Schena et al. are hybridisation in 5 X SSC plus 0.2% SDS at 65 C for four hours, followed by washes at 25 C in low stringency wash buffer (1 X SSC plus 0.2% SDS), followed by 10 minutes at 25 C in higher stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Schena et al., Proc. Natl. Acad. Sci. U. S. A.
  • hybridisation conditions are also provided in, e. g., Tijessen, 1993, HYBRIDIZATION WITH NUCLEIC ACID PROBES, Elsevier Science Publishers B. V.; and Kricka, 1992, NONISOTOPIC DNA PROBE TECHNIQUES, Academic Press, San Diego, CA.
  • Particularly preferred hybridation conditions include hybridation at a temperature at or near the mean melting temperature of the probes (e. g., within 5 C, more preferably within 2 C) in 1 M NaCl, 50 mM MES buffer (pH 6.5), 0.5% sodium sarcosine and 30% formamide.
  • the fluorescence emissions at each site of a microarray may be, preferably, detected by scanning confocal laser microscopy.
  • a separate scan, using the appropriate excitation line, is carried out for each of the two fluorophores used.
  • a laser may be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al., 1996, "A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridation, "Genome Research 6: 639-645, which is incorporated by reference in its entirety for all purposes) .
  • the arrays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomulti- plier tubes. Fluorescence laser scanning devices are described in Schena et al., Genome Res. 6: 639-645 (1996), and in other references cited herein. Alternatively, the fiber-optic bundle described by Ferguson et al., Nature Biotech. 14: 1681-1684 (1996) , may be used to monitor mRNA abundance levels at a large number of sites simultaneously.
  • Signals are recorded and, in a preferred embodiment, analyzed by computer, e. g., using a 12 or 16 bit analog to digital board.
  • the scanned image is despeckled using a graphics program (e. g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridation at each wavelength at each site. If necessary, an experimentally determined correction for"cross talk" (or overlap) between the channels for the two fluors may be made.
  • a ratio of the emission of the two fluorophores can be calculated. The ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated in association with the different cancer-related condition.
  • Fig.l shows PCR products of subtractive lung squamous cell carcinoma cDNA libraries generated either by a 4-base or a pool of 6-base recogning restriction enzymes.
  • Lanes 1 and 4 DNA size marker (Phi-x/Haelll + lambda/Hindlll) ;
  • Lane 2 subtractive library generated using a pool of 6-cutters;
  • Lane 3 subtractive library generated using the 4-cutter Rsal;
  • arrows indicate the respective position for keratin 6A cDNA fragment.
  • Fig.2 shows a histogram of the distribution of coefficients of variation (CV) .
  • Data are from a hybridisation that was repeated four times under the same conditions. Only spots with valid signals in each of the four hybridisations (71%) were included for calculation. Ninety-five percent of all spots have CVs smaller than 37%, more than 99% of all spots displayed CVs of less than 57%.
  • Fig.3 shows the distribution of clones among subtractive libraries.
  • SCC lung-squamous cell carcinoma
  • AC lung-adenocarcinoma
  • RRCC renal cell cancer
  • Fig.4 shows the comparison of expression levels of clone 709G4 [EGL nine homologue 3 (EGLN3) ] in various normal and tumor tissues analyzed with either cDNA microarrays or quantitative realtime PCR.
  • Expressions levels of microarray experiments are presented as the ratio of intensities of Cy3 (green fluorescence; individual probe) versus Cy5 (red fluorescence; pool of critical normal tissues) ; data from real-time PCR are shown as relative copy numbers (reference: beta-actin) .
  • the different tissue types are distinguished with the indicated colors.
  • Figure 5 Kaplan Meyer analysis of the overall survival (in years after diagnosis) of 39 breast cancer patients with a strong expression (high cyclin Bl or high cyclin B2) or a weak expression (low cyclin Bl or low cyclin B2) of the marker genes cyclin Bl and cyclin B2.
  • cyclin Bl was discovered as a marker gene discriminating between a good or a poor prognosis by us (cyclin Bl is one of the 42 marker genes shown in Table 4), whereas cyclin B2 was discovered as one of 240 best marker genes discriminating between a good or a poor prognosis by van't Veer et al. (Nature 415, 530-536, 2002).
  • Figure 6 Gene expression correlates with long or short overall survival of breast cancer patients; left panel: Hierarchical clustering of the 42 genes selected to be most highly associated with the two survival groups (more than 9 years versus less than 3 years) including a dendrogram of the clustered patient samples. Each column represents one tumor sample, and each row represents one gene, presented in the same order as in Table 4. Student's t-test was used to select genes most differentially expressed between the survival groups (significance P less than 0.02), P-chance analysis was used to eliminate false positives. Only genes with P-value less than P-chance were selected.
  • Genes downregulated in tumors of patients with an overall survival less than 3 years, and genes upregulated in tumors of patients with an overall survival less than 3 years are indicated (separated by dashed line) .
  • the presence (black box) or absence (white box) of prognostic breast cancer markers estrogen receptor (ER) , progesterone receptor (PR), HER2, tumor stage T3, invasive ductal morphology, and tumor grade 3 are indicated for each tumor sample; right panel: Principal component analysis (PCA) of patient samples using the selected genes. Samples of patients surviving at least nine years are shown in the left part, and samples of patients who survived less than three years are shown in the right part (separated by dashed line) . The first three principal components are shown.
  • RNA was extracted using the Oligotex Direct mRNA kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. mRNAs of normal tissues were purchased from Clontech (Palo Alto, CA) , Invitrogen (Carlsbad, CA) , and Ambion (Austin, TX) . All RNAs were analyzed with the Agilent 2100 Bioanalyzer and the RNA 6000 Nano Assay kit (Agilent Technologies, Palo Alto, CA) to determine RNA quality and quantity according to the manufacturer's protocol.
  • GAPDH intron forward 5'-CGCGTCTACGAGCCTTGCGGGCT-3'
  • GAPDH intron reverse 5 ' -GCTTTCCTAACGGCTGCCCATTCA-3 ' .
  • the integrity of synthesised cDNA was determined by PCR using GAPDH exon-specific primers (GAPDH exon forward: 5 ' -AAGGTGAAGGTCGGAGTCAACG-3 ' ; and GAPDH exon reverse: 5 ' -GGCAGAGATGATGACCCTTTTGGC-3 ' ) .
  • GAPDH exon forward 5 ' -AAGGTGAAGGTCGGAGTCAACG-3 '
  • GAPDH exon reverse 5 ' -GGCAGAGATGATGACCCTTTTGGC-3 '
  • An aliquot of normal and tumor tissue poly (A) + RNA was transcribed into Cla- oligo(dT) -primed (5 ' -ATTCGCGACTGATGATCGAT (16) -3 ' ) cDNA.
  • All libraries were generated by suppression subtraction hybridisation (SSH), using poly(A) + RNA of tumor tissue or a tumor cell line as a tester, and the corresponding normal tissue or a pool of normal tissues or a primary cell line as a driver.
  • SSH suppression subtraction hybridisation
  • a modified protocol of the PCR-Select cDNA Subtraction kit was used (Clontech) .
  • cDNA was not digested with restriction enzyme Rsal but with a pool of six restriction enzymes (5 units each of EcoRV, Nael, Nrul, Seal, Sspl, and Stul) to increase average length of cDNAs; cDNA was first incubated 1.5 h in buffer A (Promega, Madison, WI) with Nael and Stul.
  • cDNA was incubated for another 1.5 h at 37 °C with EcoRV, Nrul, Seal, and Sspl. All subsequent steps were performed according to the protocol of the PCR-Select cDNA Subtraction kit (Clontech) . After DNA amplification, PCR products were cloned into pCR 2.1-TA vector (Invitrogen) .
  • Hybridisation was performed as previously described by Eisen and Brown (Meth.Enzymol.303, 179-205 (1999)).
  • an equal mixture of 16 critical (vital) normal tissues rectum, bone marrow, lymph node, spleen, skeletal muscle, small intestine, thymus, trachea, brain, heart, kidney, liver, lung, pancreas, spleen, stomach, and colon
  • 16 critical (vital) normal tissues rectum, bone marrow, lymph node, spleen, skeletal muscle, small intestine, thymus, trachea, brain, heart, kidney, liver, lung, pancreas, spleen, stomach, and colon
  • 500 ng of poly (A) + RNA were subjected to oligo (dT) -primed reverse transcription following the in- structions of the Superscript Reverse Transcription kit (Clontech) .
  • the reaction was carried out in a final volume of 40 ⁇ l at 42 °C.
  • Fluorescent nucleotides Cy5-dUTP and Cy3- dUTP (Amersham Pharmacia, Piscataway, NJ) were used at 0.1 mM.
  • the nucleotide concentrations were 0.5 mM dGTP, dATP, dCTP and 0.2 mM dTTP.
  • 2 ⁇ l of Superscript II 200 units/ ⁇ l; Invitrogen) were added at the beginning of the labeling reaction, and 1 additional ⁇ l was added after 1 h and the reaction continued for an additional hour.
  • Unlabeled RNA was digested using the RNase One kit (Promega) according to the manufacturer's specifications.
  • the Cy5 and Cy3 probes were pooled and, following addition of 15 ⁇ g of human Cotl DNA (Clontech), 3 ⁇ g of poly (dA) 40-60, and 6 ⁇ g of tRNA (Sigma, St. Louis, MO), subjected to ethanol/am- monium acetate precipitation (15) .
  • the pellet was resuspended in 10.5 ⁇ l of water. 6 ⁇ l of 20x salinesodium phosphate-EDTA, 1.5 ⁇ l of 50x Denhardts solution, 10.5 ⁇ l of formamide, and 0.75 ⁇ l of 20% SDS were added, and the probe was preincubated for 1 h at 50 °C.
  • Chips were prehybridised with 30 ⁇ l of pre- hybridisation buffer [210 ⁇ l of formamide, 120 ⁇ l of 20x saline- sodium phosphate-EDTA, 60 ⁇ l of 50x Denhardts solution, 6 ⁇ l of salmon sperm DNA (10 ⁇ g/ ⁇ l), 190 ⁇ l of H 2 0, and 15 ⁇ l of 20% SDS] for 1 h at 50 °C.
  • pre- hybridisation buffer [210 ⁇ l of formamide, 120 ⁇ l of 20x saline- sodium phosphate-EDTA, 60 ⁇ l of 50x Denhardts solution, 6 ⁇ l of salmon sperm DNA (10 ⁇ g/ ⁇ l), 190 ⁇ l of H 2 0, and 15 ⁇ l of 20% SDS] for 1 h at 50 °C.
  • the probe was added to the array, covered with a coverslip, and placed in a sealed chamber to prevent evaporation. After hybridisation at 50°C over night, chips were washed in three solutions
  • Microarrays were scanned with an GenePix 4000A scanner (Axon Instruments, Inc., Union City, CA) at 10- ⁇ m resolution. The signal was converted into 16 bits/pixel resolution, yielding a 65,536 count dynamic range. Image analysis and calculation of feature pixel intensities adjusted for local channel specific background was performed using the GenePix Pro 3.0 software (Axon Instruments, Inc.). With this software, gridding, automated spot detection, manual and automated flagging were performed, as well as background subtraction and normalisation. Background-subtracted element signals were used to calculate Cy3/Cy5 ratios. Spots were excluded from additional analysis if the ratio of foreground versus background signal was less than 2. Each microarray was normalised by scaling according to the GenePix normalisation factor such that the median of ratios value is 1. For additional evaluation and statistical analysis, output files were exported to a relational Microsoft Access database.
  • Quantitative real-time PCR was performed in the presence of SYBR Green using the Lightcycler-DNA Master SYBR Green I kit from Roche (Mannheim, Germany) . Comparison with housekeeping genes allows relative quantification of monitored genes in different cDNA samples. Briefly, 100 ng of mRNA were converted to cDNA in a total volume of 50 ⁇ l using the Superscript Reverse Transcription kit (Clontech) . 1 ⁇ l of this mixture was used as template for PCR amplification. Thirty-five PCR cycles were performed as follows: 30 s denaturation at 94 °C; 30 s annealing at 65 °C; and 45 s for elongation at 72°C.
  • RT-PCR reactions were performed on an ABI PRISM 7700 Detector (Perkin- Elmer/ Applied Biosystems, Foster City, CA) . All plates con- tained 60 different cDNAs, a dilution series of a plasmid for the gene of interest, and nontemplate controls. Gene-specific primers were used to amplify fragments of about 130 bp. All plates were done in duplicates based on which average copy numbers were calculated. Copy numbers were normalised to beta-actin using the following primers for amplification: actin-up, 5'-TGTTTTCTGCGCAAGTTAGG-3' ; and actin-do, 5 ' -GTCCACCTTCCAGCAGAT- GT-3' .
  • RNA preparation Before RNA preparation, tumor tissues were histologically assessed to ensure homogeneity and integrity of the tumor tissue and to confirm the histological typing of the tumor samples. Cryosections were taken and stained with H&E. The fraction of tumor cells (in most samples more than 50%) , residual normal tissue cells, hyperplastic cells, and infiltrating leukocytes were determined (Table 2) . Tumor samples with massive leukocyte infiltration and large necrotic areas were excluded from additional analysis (altogether 50% of tumor samples analysed) . For a quantification and quality control, all RNA preparations were subjected to capillary electrophoresis (Bioanalyzer 2100; Agilent Technologies) and PCR analyses for potential chromosomal DNA contamination using intron-specific primers for GAPDH.
  • capillary electrophoresis Bioanalyzer 2100; Agilent Technologies
  • RNA preparations Thirty percent of all RNA preparations were eliminated because of degradation (28S/18S rRNA ratio less than 1.3 plus profile analysis of capillary electrophoresis) or contamination with rRNA (more than 10%) and/or chromosomal DNA (positive signal of GAPDH-intron-specific PCR). Additionally, each of the poly (A) + preparation from Individual tumor samples were analyzed for RNA degradation by performing a linear amplification of a Cla-oligo (dT)-primed cDNA aliquots (see "Material and Methods"; Table 2). RNA samples meeting those stringent quality criteria are listed in Table 2 and were subsequently used for microarray experiments. Sixteen of the 20 breast cancer samples used represent pairs of primary tumors and corresponding lymph node metastases (Table 2) . Generation of Tumor-cDNA-Enriched Libraries .
  • 9253 clones were derived by SSH from seven different sources and 105 clones by RT-PCR cloning of individual genes, which had been shown previously to be at least 6-fold up-regulated in colon tumors; 1682 sequence-verified and tumor-relevant genes were obtained from the I.M.A.G.E. consortium (Table 1). cDNA microarrays were produced from these 11,040 clones by robotic arraying onto poly-L-lysine-coated glass slides.
  • tumor samples [11 lung adenocarcinomas (LACs) , 11 LSCCs, 20 breast carcinomas, and 8 RCCs] , 16 critical (vital) normal tissues (rectum, bone marrow, lymph node, skeletal muscle, small intestine, thymus, trachea, brain, heart, kidney, liver, lung, pancreas, spleen, stomach, and colon) and 6 noncritical normal tissues (uterus, breast, prostate, fetal brain, fetal lung, and placenta) were analyzed. As a reference probe in the two-color hybridisations performed, an equal mixture of the 16 critical normal tissues was used.
  • Nmax was determined for each gene, i.e., the highest expression value in any of the 16 critical normal tissues. All normalised expression values of the 11,040 genes in 22 normal tissues and 50 tumor samples are listed in Supplemental Data, Table 1. Subsequently, all genes were selected that showed at least 2-fold higher expression in at least 20% of samples of any tumor type compared to Nmax. Applying these restrictive criteria resulted in a list of 527 clones representing 130 different genes, 116 of which coded for proteins of known functions (summarised in Table 3). Gene-wise hierarchical clustering of these 527 clones clearly separated the different tumor types.
  • Fig. 4 shows a side-by- side comparison of expression levels measured by micorarray analysis and real-time PCR for EGL nine homologue 3 (EGLN3) , a gene that was found to be highly up-regulated in all RCCs.
  • EGLN3 EGL nine homologue 3
  • EGLN3 correlate well with absence or low expression levels in real-time PCR (Fig. 4B) .
  • Overexpression of EGLN3 in RCCs was also confirmed by im- munohistochemistry .
  • Four additional genes, NDGR1, OSF-2, TP73L, and NAT1 were verified in the same way, all exhibiting high agreement between microarray and real-time PCR results.
  • Linear regression analysis revealed R 2 values of 0.64 (EGLN3) , 0.69 (ND- GR1) , 0.61 (OSF-2), 0.87 (TP73L) , and 0.67 (NAT1) .
  • the expression profile of these 42 genes was subjected to two distinct unsupervised cluster methods, hierarchical clustering (Table 4), and principal component analysis. Both methods were able to correctly separate the samples into two distinct groups. Furthermore, the gene dendogram clearly separates two groups of genes: those overexpressed in patients with short sur- vival time and those down-regulated in patients with short survival.
  • the former group includes cyclin Bl, TGF-beta3, the transcription factors Erg2 and B-Myb, and the cell adhesion molecules VCAM-1 and CD44, whereas genes down-regulated in patients with short survival include MIG-6, Epsl5, and CAK.
  • This clone collection is the largest one derived by subtraction cloning published to date and one of the largest human tumor-specific cDNA libraries available.
  • 1682 individually selected I.M.A.G.E. clones with a known or suspected role in tumor formation were further added, as well as 105 individually cloned genes previously found to be at least 6-fold up- regulated in colon cancer versus normal colon.
  • a protocol for the generation of cDNA fragments with increased size was developed and applied. When following the original protocol relying on a restriction enzyme recognising only a 4-base motif such as Rsal, a high percentage of fragments to be smaller than 50 bp was observed.
  • a set of 6-base recognising restriction enzymes was used, 3 with A/T-rich (primarily found at the 3 ' -end of eukaryotic cDNAs) and 3 with G/C-rich recognition sequences (characteristic for 5 ' -termini of eukaryotic genes).
  • the present approach resulted in a considerable shift toward longer cDNA fragments; sequence analysis of several thousands of clones revealed an increase in average length to about 800 bp.
  • Such longer cDNA fragments are more favorable for cDNA microarrays: they warrant efficient hybridisation; minimise cross-hybridisation (a problem often observed with short probes) ; and facilitate annotation after sequencing.
  • the present cDNA fragments typically are of sufficient length to be directly used in follow-up studies, e.g., involving translation of portions of the respective gene, circumventing the tedious recloning of longer cDNA fragments of the genes of interest.
  • RNA isolation was taken to ensure integrity of tumor tissue, to confirm histological typing of tumor samples, and to assess the percentage of tumor cells, residual normal cells, hyperplastic cells, necrotic areas, and infiltrating leukocytes (Table 2) . On the basis of these criteria, only 50% of tumor samples were used for RNA isolation. Next, RNA preparations were subjected to capillary electrophoresis (Bioanalyzer) for quality control and quantification, being more accurate than standard photometric assays, particularly for low amounts of mRNA.
  • Bioanalyzer capillary electrophoresis
  • RNA preparations were excluded from further analysis due to inadequate RNA quality or due to the presence of chromosomal DNA contamination, as determined by PCR analysis.
  • This stringent selection procedure may be considered as essential because it directly influences hybridisation efficiency, reproducibility, and statistical analyses of the results.
  • 5 randomly chosen genes N-myc downstream regulated, OSF-2, TP73L, EGLN3, and NAT1; Table 3 were subjected to real-time RT-PCR.
  • the expression levels quantified by real-time RT-PCR highly correlated with those determined by the present cDNA chip analysis, confirming the reliability of the present microarray experiments (Fig. 4).
  • beta-actin was used as a reference gene because it was the gene with the least fluctuation in the present samples according to the chip data.
  • GAPDH frequently used for normalisation
  • expression signatures derived from infiltrating cells of the immune system such as immunoglobulin genes (B cells), T-cell receptor, CD3D (T cells), or lysozyme and chitinase 1 (macrophages/monocytes) could be eliminated from the present list of tumor-specific genes (Table 3) . All those genes have been included as up-regulated in tumors in previous tran- scriptional profiling studies, e.g., for breast cancer.
  • this tissue-wide expression profiling approach allows to identify those genes up-regulated in specific tumors with no or low expression in all 16 critical normal tissues tested, which is important for the development of a chemotherapy with less severe or no side effects and which is an absolute prerequisite for immune therapy.
  • the latter approach aims at the induction of a systemic immune response against a tumor-specific/associated antigen (TAA) and—if the TAA is not exclusively expressed in tumors but also in critical normal tissues—may also lead to a destructive autoimmune response generating severe side effects.
  • TAA tumor-specific/associated antigen
  • prostaglandin D synthetase was found to be slightly up-regulated in breast cancer but most prominently expressed in several vital tissues such as heart (8.6-fold higher than the reference pool) and brain (6.2-fold higher) .
  • the nonre- ceptor tyrosine kinase Etk/Bmx which was found to be up-regulated about 3-fold in some of the LAC and LSCC samples, has been reported to play an important role for prostate cancer progression and has been suggested as a novel target for chemotherapy in prostate cancer.
  • the present tissue wide expression profiling revealed a strong expression in heart, kidney, and skeletal muscle (6.1-, 5-, and 3-fold higher than in the reference pool, respectively) , indicating that severe side effects would have to be expected upon use of Etk/Bmx as a novel therapeutic intervention site.
  • Another example is apolipoprotein D, which has been correlated with malignant transformation and poor prognosis in prostate cancer patients.
  • the present study revealed overexpression of apolipoprotein D only in breast cancer samples derived from patients with more than 9 year overall survival. Again, strong expression was found in several critical normal tissues such as brain, heart, and trachea (6.8-, 4.9-, and 3.5-fold higher than the reference probe).
  • Genes overexpressed in about 100% of a specific cancer type such as vascular endothelial growth factor or insulin-like growth factor binding protein 3 were identified in RCC.
  • Other genes were found to be upregulated only in a subset of a given tumor type such as stromelysin 3 or thrombospondin 2 in breast cancer. All those selected genes exhibit at least a 2-fold up-regulation in at least 20% of samples of any tumor type compared with the highest expression value in any of the 16 critical normal tissues, which makes them promising putative targets in an anticancer therapy (Table 3) .
  • tumor markers e.g., pronapsin A, a gene specifically up-regulated in LAC but absent in LSCC or NAT-1, which is involved in detoxification and used as a potential breast cancer marker.
  • tissue-specific genes such as keratin 6 isoforms, other cytokeratins that have been documented as potential markers for lung cancer, pemphigus vul- garis antigen, and annexin
  • tumor-specific genes such as parathyroid hormone-related peptide (PTHrP/PTHLH) , which causes humoral hypercalcaemia associated with malignant tumors such as leukemia, RCC, prostate, and breast cancer, and LSCC.
  • PTHrP/PTHLH parathyroid hormone-related peptide
  • S100A10 and S100A11 a subgroup of the EF-hand Ca 2+ -binding protein family (Table 3) .
  • S100A2 was found to be highly up- regulated in ovarian cancer together with other members of the S100 protein family, whereas an increase of S100A6 expression correlates with an increased malignancy in colon tumors.
  • Bone metastases are indeed found in virtually all advanced breast cancer patients.
  • the high osteotropism of breast cancer cells suggests that they exhibit a selective affinity for mineralised tissues.
  • Mammary malignant cells are able to induce hydroxyapatite crystals deposition within the primary tumor supporting the hypothesis that they can generate a microenvironment that favors the crystallisation of calcium and phosphate ions into the bone-specific hydroxyapatite.
  • the ectopic expression of bone matrix proteins in breast cancer could be involved in conferring osteo- tropic properties to circulating metastatic breast cancer cells and opens the possibility for therapeutic interference with mi- crocalcification during the homing process of metastatic breast cancer cells.
  • osteoclast differentiation/activation factor osteoprotegerinligand has been shown to be essential for normal mammary gland development and to be responsible for calcium release from the skeleton required for transmission of maternal calcium to neonates in mammalians. Therefore, normal cells of the mammary gland may already exhibit some properties of bone remodeling cells, a function that might be recruited/activated in breast tumor cells as well.
  • MIG-6 Genes down-regulated relative to normal tissues in patients with short survival are, for example, MIG-6, Epsl5, and APLP2.
  • MIG-6 and Epsl5 are negative regulators of signaling via the epidermal growth factor receptor, a positive key regulator of breast tumorigenesis .
  • van ⁇ t Veer et al . (Nature 415, 530-536 (2002)) reported on a set of 70 genes with an expression pattern by which breast cancer patients could be classified into those with a poor prognosis and those with a good prognosis with high accuracy.
  • a modified PCR-based cDNA subtraction method allowed the establishment of seven SSH cDNA libraries that subsequently were used for the preparation of cDNA microarrays. Together with 50 samples derived from lung, breast, or renal cell cancer tissues, a panel of 22 samples from normal tissues was hybridised. This detailed tissue-wide expression profiling led to the identification of 130 individual tumor-specific transcripts (527 clones) showing no or very low expression in 16 vital normal tissues. Gene-wise hierarchical clustering of these 130 genes clearly separated the different tumor types. The majority of the identified genes have not yet been brought into context with tumorigenesis such as genes involved in bone matrix mineralisation or genes controlling calcium homeostasis (RCN1, CALCA, and S100 protein family) .
  • genes up-regulated in tumors of patients with a poor prognosis such as cyclin Bl, TGF-beta3, B-Myb, and Erg2, and genes down-regulated such as MIG-6, Espl5, and CAK.
  • One of the marker genes discriminating between a good or a poor prognosis identified by us and one identified by van't Veer et al . (Nature 415, 530-536, 2002) were evaluated in an independent group of 39 breast cancer patients different from the ones in Figure 6 and by an independent method (quantitative real-time RT-PCR) .
  • the marker genes thus evaluated were cyclin Bl, discovered as a marker gene discriminating between a good or a poor prognosis by the present invention (cyclin Bl is one of the 42 marker genes shown in Table 4), and cyclin B2, which was discovered as one of the 240 best marker genes discriminating between a good or a poor prognosis by van't Veer et al .
  • cyclin Bl and cyclin B2 were determined in each of the 39 breast tumors by quantitative real-time RT-PCR. Patients were then divided into two groups each in three different ways: (1) The 77% of these 39 patients with the lowest cyclin B2 levels and the 23% of these patients with the highest cyclin B2 levels; (2) The 59% of these 39 patients with the lowest cyclin Bl levels and the 41% of these patients with the highest cyclin Bl levels; (3) The 79% of these 39 patients with the lowest cyclin Bl levels and the 21% of these patients with the highest cyclin Bl levels.
  • cyclin Bl as one of the 42 marker genes shown in Table 4, remains a very good prognostic marker discriminating between breast cancer patients with a long or a short overall survival even when used (1) alone as a single marker; (2) in a group of patients independent of the one shown in Table 4; and (3) if measured with a different method (quantitative real-time RT-PCR) .
  • cyclin Bl is a better prognostic marker than cyclin B2, which is one of the 240 best marker genes discriminating between a good or a poor prognosis reported by van't Veer et al. (Nature 415, 530-536, 2002) .
  • Table 4 shows that gene expression correlates with long or short overall survival of breast cancer patients.
  • the Table lists 42 genes with an expression profile most highly associated with the two survival groups (more than 9 versus less than 3 years) , as selected by Student's t test (significance, P less than 0.02), accompanied by P-chance analysis, as described in "Materials and Methods," to eliminate false positives. Only genes with P less than P-chance were selected.

Abstract

Described is a method for classifying a cell sample as being a tumor cell comprising detecting a difference in the expression by said cell sample of at least one gene of Table 4 relative to at least one control cell and classifying the cell sample as a tumor cell, if the at least one gene of Table 4 shows at least 1,5-fold higher expression than the control cell.

Description

A Method for Classifying a Tumor Cell Sample
The present invention relates to the field of tumor diagnosis and prognosis.
Lung, breast, and colorectal cancer are the most common cancers in the industrial world. In 2000, more than 3 million people were diagnosed to suffer from one of these tumors. Although improvements in diagnostics and therapy have reduced cancer mortality, still 65% of all cancer patients die. Prognosis for people suffering from lung cancer is particularly poor, with a mortality rate close to 90%. Two main reasons are responsible for the fact that cancer is expected to become the leading cause of death within a few years: first, cancer is a disease of multiple accumulating mutations that are becoming manifest in human populations with an increasingly prolonged life span, and second, neoplastic diseases still have many unmet needs, including lack of understanding of the mechanisms underlying cancer- related deaths and the difficulty in identification of the corresponding risk factors and development of specific targeted molecular therapies. The reason thus lies within the enormous complexity of tumor formation and tumor progression on the molecular level. Many efforts to detect differential gene expression between tumor and normal tissues have discovered numerous differentially expressed genes. However, the mechanistic contribution to tumorigenesis of many of these genes is still unknown.
Because of the genetic instability — characteristic of almost all cancer types — patients suffering from superficially identical tumors show an enormous variability in their gene expression profiles (patient-specific transcription profile) . To aim at a more efficient and individual therapy, it will be necessary to distinguish between patient-specific transcription profiles and the altered expression pattern underlying all tumors of the same type. Thus far, the majority of studies applying cDNA microarray technology for the detection of differences in gene expression in cancers have used randomly selected expressed sequence tags and probes derived from a specific tumor and its corresponding normal tissue. However, these approaches are detrimental to two essential aspects: first, a random selec- tion of cDNAs never covers the entire subset of genes that are specifically overexpressed in cancer, e.g., genes such as differentiation antigens are very unlikely to appear in a random selection of expressed sequence tags. Second, because of missing information about expression profiles of genes in critical (vital) normal tissues, predictions whether a certain gene may serve as a potential clinical target in chemo- and/or immuno- therapy are not feasible. Because of the lack of effective therapies for many major tumor types, the medical need for improved and new approaches of cancer treatment is obvious. Searching for novel targets for tumor therapy is therefore a major goal in this field. In recent publications, the use of cDNA microarrays has been shown to be a powerful tool to detect gene expression differences in cancers (e.g. Bangur et al., Oncogene 21, 3814- 3825 (2002) ) . The cDNA array technique has been also successfully combined with subtractive hybridisation to detect tumor- specific transcripts in, for example, lung squamous cell cancer (LSCC; Datchenko et al . , PNAS 93, 6025-6030 (1996)). This approach takes advantage of a preselected set of clones that might be more representative than random expressed sequence tags.
The incidence of breast cancer, a leading cause of death in women, has been gradually increasing in the United States over the last thirty years. Its cumulative risk is relatively high; 1 in 8 women are expected to develop some type of breast cancer by age 85 in the United States. In fact, breast cancer is the most common cancer in women and the second most common cause of cancer death in the United States. In 1997, it was estimated that 181,000 new cases were reported in the U. S., and that 44,000 people would die of breast cancer. While mechanism of tumorigen- esis for most breast carcinomas is largely unknown, there are genetic factors that can predispose some women to developing breast cancer. The discovery and characterisation of BRCA1 and BRCA2 has recently expanded the knowledge of genetic factors which can contribute to familial breast cancer. Germ-line mutations within these two loci are associated with a 50 to 85% lifetime risk of breast and/or ovarian cancer. Only about 5% to 10% of breast cancers are associated with breast cancer susceptibility genes, BRCA1 and BRCA2. The cumulative lifetime risk of breast cancer for women who carry the mutant BRCA1 is predicted to be approximately 92%, while the cumulative lifetime risk .for the non-carrier majority is estimated to be approximately 10%. BRCAl is a tumor suppressor gene that is involved in DNA repair anc cell cycle control, which are both important for the maintenance of genomic stability. More than 90% of all mutations reported so far result in a premature truncation of the protein product with abnormal or abolished function. The histology of breast cancer in BRCAl mutation carriers differs from that in sporadic cases, but mutation analysis is the only way to find the carrier. Like BRCAl, BRCA2 is involved in the development of breast cancer, and like BRCAl plays a role in DNA repair. However, unlike BRCAl, it is not involved in ovarian cancer.
Other genes have been linked to breast cancer, for example c- erb-2 (HER2) and p53. Overexpression of c-erb-2 (HER2) and p53 have been correlated with poor prognosis, as has been aberrant expression products of mdm.2 and cyclin 1 and p27 (W098/33450 A) . However, no other clinically useful markers consistently associated with breast cancer have been identified.
Sporadic tumors, those not currently associated with a known germline mutation, constitute the majority of breast cancers. It is also likely that other, non-genetic factors also have a significant effect on the etiology of the disease. Regardless of the cancer's origin, breast cancer morbidity and mortality increases significantly if it is not detected early in its progression. Thus, considerable effort has focused on the early detection of cellular transformation and tumor formation in breast tissue.
A marker-based approach to tumor identification and characterisation promises improved diagnostic and prognostic reliability. Typically, the diagnosis of breast cancer requires histopatholo- gical proof of the presence of the tumor. In addition to diagnosis, histopathological examinations also provide information about prognosis and selection of treatment regimens. Prognosis may also be established based upon clinical parameters such as tumor size, tumor grade, the age of the patient, and lymph node metastasis . Diagnosis and/or prognosis may be determined to varying degrees of effectiveness by direct examination of the outside of the breast, or through mammography or other X-ray imaging methods. The latter approach is not without considerable cost, however. Every time a mammogram is taken, the patient incurs a small risk of having a breast tumor induced by the ionising properties of the radiation used during the test. In addition, the process is expensive and the subjective interpretations of a technician can lead to imprecision. For example, one study showed major clinical disagreements for about one-third of a set of mammograms that were interpreted individually by a surveyed group of radiologists. Moreover, many women find that undergoing a mammogram is a painful experience. Accordingly, the National Cancer Institute has not recommended mammograms for women under fifty years of age, since this group is not as likely to develop breast cancers as are older women. It is compelling to note, however, that while only about 22% of breast cancers occur in women under fifty, data suggests that breast cancer is more aggressive in pre-menopausal women.
In clinical practice, accurate diagnosis of various subtypes of breast cancer is important because treatment options, prognosis, and the likelihood of therapeutic response all vary broadly depending on the diagnosis. Accurate prognosis, or determination of distant metastasis-free survival could allow the oncologist to tailor the administration of adjuvant chemotherapy, with women having poorer prognoses being given the most aggressive treatment. Furthermore, accurate prediction of poor prognosis would greatly impact clinical trials for new breast cancer therapies, because potential study patients could then be stratified according to prognosis. Trials could then be limited to patients having poor prognosis, in turn making it easier to discern if an experimental therapy is efficacious.
To date, no set of satisfactory predictors for prognosis based on the clinical information alone has been identified. The detection of BRCAl or BRCA2 mutations represents a step towards the design of therapies to better control and prevent the appearance of these tumors. However, there is no equivalent means for the diagnosis of patients with sporadic tumors, the most common type of breast cancer tumor, nor is there a means of differentiating subtypes of breast cancer.
It is therefore an object of the present invention to provide efficient tools and markers for tumor diagnosis. Another object is providing markers for good and poor prognosis in order to adopt an individual therapy scheme to a certain patient.
Therefore, the invention provides a method for classifying a cell sample as being a tumor cell comprising detecting a difference in the expression by said cell sample of at least one gene, preferably at least two genes, of Table 4 relative to at least one control cell and classifying the cell sample as a tumor cell, if the at least one gene, preferably the at least two genes, of Table 4 show at least 1.5-fold higher expression than the control cell. Selecting at least one, preferably at least two, of the genes of Table 4 (for a given tumor type) allows a fast and reliable classification of a given cell sample. Specifically preferred are those marker genes according to Table 4 which show - in table 3 - a "k/n" value of 10/20 or more (for BC) , of 3/11 or more for LAC, of 6/11 or more for LSCC and of 4/8 or more of RCC.
Preferably, a difference in the expression of at least 3, preferably at least 5, especially at least 10 genes of Table 4 are detected.
According to a preferred embodiment, the cell sample is classified as a tumor cell, if the cell sample is classified as a tumor cell, if the at least 2 genes of Table 4 show at least 2- fold, preferably 3-fold, especially 5-fold higher expression than the control cell .
The tumors to be classified according to the present invention are preferably selected from the group consisting of breast cancer (BC) , lung squamous cell cancer (LSCC) , lung adenocarcinoma (LAC) and renal cell cancer (RCC) .
Further tumor expression markers may be additionally applied and tested in combination with the markers according to the present invention, e.g. those described in WO 02/103320 A2.
Even more preferred, at least 10, 12, 15, 18, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80 or 90 marker genes of Table 4 or from other sources, such as from table 3 or WO 02/103320 A2 may be tested in the present method.
The present invention provides novel putative intervention sites for anticancer therapy using a combination of subtractive hybridisation and cDNA microarray technology. Importantly, a set of 22 samples from various normal tissues were included to allow discrimination between tumor-specific genes and those that are also expressed in vital normal tissues. This approach allowed to identify genes expressed exclusively in tumors but not in a comprehensive panel of vital normal tissues, which is a prerequisite in the design of a more specific anticancer immuno- or chemotherapy with less side effects. In the present experimental approach, a focus was laid on the transcriptional profiling of lung (squamous and adeno) , breast, and renal cell cancer (RCC) . Seven cDNA libraries were generated by subtracting cDNA fragments derived from normal tissues or primary cell lines from corresponding tumor tissues or tumor cell lines. Subsequently, the derived tumor-enriched clone collection (about 9250 clones in total) , together with about 1750 additional tumor-relevant genes, was used for the production of cDNA arrays. In extensive hybridisation experiments, 50 different tumor samples were analysed and compared with the expression profile of 22 different normal tissues. This approach allowed the selection of genes that show a significantly higher expression level in tumor tissues than in any of the analysed normal tissues. The present invention provides for the first time a tissue-wide expression profile used to increase the significance of expression data of tumor samples. Furthermore, a subset of differentially regulated genes was identified that can be correlated with poor prognosis in breast cancer patients.
According to another aspect, the present invention provides a method for classifying an individual as having a good prognosis (survival more than 9 years after initial diagnosis) or a poor prognosis (survival less than 3 years after initial diagnosis) , comprising detecting a difference in the expression of at least one gene of Table 4 in a cell sample taken from the individual relative to a control.
Table 4 provides clear markers for short or long survival expectation which are either up- or down-regulated in the two different survival groups.
For example, the genes with Ace. o. X77303, U07707, NM018948, NM003379, M17254, X14149, NM078467, X13293, NM000610 and NM031966 (another set of preferred genes comprises Ace.No. AF119662, NM001642, NM003605, NM018948, NM024640, NM018196, NM018948, NM033546, NM003379 and NM033546) are specifically up- regulated in the survival more than 9 years group whereas these genes and X77303, NM000701, U07707 and NM001993 are down-regulated in patients with low survival.
On the other hand, the other genes in Table 4 are (perhaps with the exception of Y00479, BC028152, AF065386, NM001946, X14149 and U77916) significantly up-regulated in the less than 3 years survival group. Especially NM021238, NM000297, D38594, AL034384, NM002235, AC009433, X13293, NM000610, NM031966, NM020698 and AA627385 are specifically up-regulated in this group of patients .
Preferably, at least two of the genes of Table 4 are examined, especially genes from the above mentioned specifically up-regulated genes for each patient group. Even more preferred, at least 5, 7, 10, 15, 20, 25, 30, 35 or 40 of these genes are examined in the method according to the present invention.
The invention further provides microarrays comprising the disclosed marker sets. In one embodiment, the invention provides a microarray comprising at least 2, especially at least 5 markers derived from any one of Table 4, especially wherein at least 50% of the probes on the microarray are present in any one of Table 4. In more specific embodiments, at least 60%, 70%, 80%, 90%, 95% or 98% of the probes on said microarray are present in any one of Table 4. In still another embodiment, the invention provides a microarray for distinguishing cell samples from patients having a good prognosis and cell samples from patients having a poor prognosis comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridisable to a plurality of genes, said plurality consisting of at least 2, preferably at least 5 of the genes corresponding to the markers listed in Table 4, wherein at least 50% of the probes on the microarray are present in Table 4. The invention further provides for mi- croarrays comprising at least 5, 10, 15, 20, 30, 40, 50, 70 or 100 marker genes listed in Table 4, Table 3 WO 01/74405 A, US 2002/142981 Al or WO 02/103320 A2 in addition to the at least one, preferably at least two, especially at least 5 genes from Table 4, especially at least 5, 10, 15, 20 or 30 of the prognostic marker genes listed in Table 4, in any combination, wherein at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of the probes on said microarrays are present in Table 4.
"Good prognosis" means that a patient is expected to have a survival of more than 9 years, counted from initial diagnosis of cancer.
"Poor prognosis" means that a patient is expected to have a survival of less than 3 years, counted from initial diagnosis of cancer.
"Marker" means an entire gene, or an EST derived from that gene, the expression or level of which changes between certain conditions. Where the expression of the gene correlates with a certain condition, the gene is a marker for that condition.
"Marker-derived polynucleotides" means the RNA transcribed from a marker gene, any cDNA or cRNA produced therefrom, and any nucleic acid derived therefrom, such as synthetic nucleic acid having a sequence derived from the gene corresponding to the marker gene. The present invention provides a set of genetic markers whose expression is correlated with the existence of BC, LSCCC, LAC or RCC.
The invention also provides a method of using these markers to distinguish tumor types in diagnosis or prognosis.
In another embodiment, the invention provides a set of 42 genetic markers that can distinguish between patients with a good tumor prognosis and patients with a poor tumor prognosis. These markers are listed in Table 4. The invention also provides subsets of at least 5, 10, 20, 30 or 40 markers, drawn from the set of 42, which also distinguish between patients with good and poor prognosis.
Any of the sets of markers provided above may be used alone specifically or in combination with markers outside the set. For example, markers according to the present invention may be combined with markers that distinguish ER status, BRCAl markers or sporadic markers, or with the prognostic markers, or both. Any of the marker sets provided above may also be used in combination with other markers for the preferred tumors according to the present invention, or for any other clinical or physiological condition.
The present invention provides a screening method for finding sets of markers for the identification of conditions or indications associated with BC, LSCC, LAC and RCC cancer types.
In one embodiment, the method for identifying marker sets is as follows: After extraction and labeling of target polynucleotides, the expression of all markers (genes) in a sample X is compared to the expression of all markers in a standard or control. In one embodiment, the standard or control comprises target polynucleotide molecules derived from a sample from a normal individual (i. e., an individual not afflicted with breast cancer) . In a preferred embodiment, the standard or control is a pool of target polynucleotide molecules. The pool may be derived from collected samples from a number of normal individuals. In a preferred embodiment, the pool comprises samples taken from a number of individuals having sporadic-type tumors. In another preferred embodiment,, the pool comprises an artificially-generated population of nucleic acids designed to approximate the level of nucleic acid derived from each marker found in a pool of marker-derived nucleic acids derived from tumor samples. In yet another embodiment, the pool is derived from normal or breast cancer cell lines or cell line samples.
The comparison may be accomplished by any means known in the art. For example, expression levels of various markers may be assessed by separation of target polynucleotide molecules (e. g., RNA or cDNA) derived from the markers in agarose or polyac- rylamide gels, followed by hybridisation with marker-specific oligonucleotide probes. Alternatively, the comparison may be accomplished by the labeling of target polynucleotide molecules followed by separation on a sequencing gel. Polynucleotide samples are placed on the gel such that patient and control or standard polynucleotides are in adjacent lanes. Comparison of expression levels is accomplished visually or by means of dens- itometer. In a preferred embodiment, the expression of all markers is assessed simultaneously by hybridisation to a microarray. In each approach, markers meeting certain criteria are identified as associated with cancer. A marker is selected based upon significant difference of expression in a sample as compared to a standard or control condition. Selection may be made based upon either significant up- or down regulation of the marker in the patient sample. Selection may also be made by calculation of the statistical significance (i. e., the p-value) of the correlation between the expression of the marker and the condition or indication. Preferably, both selection criteria are used. Thus, in one embodiment of the present invention, markers associated with cancer are selected where the markers show both more than two-fold change (increase or decrease) in expression as compared to a standard, and the p-value for the correlation between the existence of cancer and the change in marker expression is no more than 0.01 (i. e., is statistically significant). The expression of the identified cancer-related markers is then used to identify markers that can differentiate tumors into clinical types. In a specific embodiment using a number of tumor samples, markers are identified by calculation of correlation coeffi- cients between the clinical category or clinical parameter (s) and the linear, logarithmic or any transform of the expression ratio across all samples for each individual gene.
A specifically preferred screening method uses PCR-based cDNA subtraction and cDNA microarrays (see example-section) .
In the present invention, target polynucleotide molecules are extracted from a sample taken from an individual afflicted with breast cancer. The sample may be collected in any clinically ac- cepTable manner, but must be collected such that marker-derived polynucleotides (i. e., RNA) are preserved. mRNA or nucleic acids derived therefrom (i. e., cDNA or amplified DNA) are preferably labeled distinguishably from standard or control polynucleotide molecules, and both are simultaneously or independently hybridised to a microarray comprising some or all of the markers or marker sets or subsets described above.
Alternatively, mRNA or nucleic acids derived therefrom may be labeled with the same label as the standard or control polynucleotide molecules, wherein the intensity of hybridisation of each at a particular probe is compared. A sample may comprise any clinically relevant tissue sample, such as a tumor biopsy or fine needle aspirate, or a sample of bodily fluid, such as blood, plasma, serum, lymph, ascitic fluid, cystic fluid, urine or nipple exudate. The sample may be taken from a human, or, in a veterinary context, from non-human animals such as ruminants, horses, swine or sheep, or from domestic companion animals such as felines and canines.
Methods for preparing total and poly (A) + RNA are well known and are described generally in Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) ) and Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current Protocols Publishing, New York (1994)).
RNA may be isolated from eukaryotic cells by procedures that involve lysis of the cells and denaturation of the proteins contained therein. Cells of interest include wild-type cells (i. e., non-cancerous), drug-exposed wild-type cells, tumor-or tumor-derived cells, modified cells, normal or tumor cell line cells, and drug-exposed modified cells.
Additional steps may be employed to remove DNA. Cell lysis may be accomplished with a nonionic detergent, followed by micro- centrifugation to remove the nuclei and hence the bulk of the cellular DNA. In one embodiment, RNA is extracted from cells of the various types of interest using guanidinium thiocyanate lysis followed by CsCI centrifugation to separate the RNA from DNA (Chirgwin et al . , Biochemistry 18: 5294-5299 (1979)). Poly (A) + RNA is selected by selection with oligo-dT cellulose (see Sam- brook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) . Alternatively, separation of RNA from DNA can be accomplished by organic extraction, for example, with hot phenol or phenol/chloroform/isoamyl alcohol.
If desired, RNase inhibitors may be added to the lysis buffer. Likewise, for certain cell types, it may be desirable to add a protein denaturation/digestion step to the protocol.
For many applications, it is desirable to preferentially enrich mRNA with respect to other cellular RNAs, such as transfer RNA (tRNA) and ribosomal RNA (rRNA) . Most mRNAs contain a poly (A) tail at their 3 'end. This allows them to be enriched by affinity chromatography, for example, using oligo (dT) or poly (U) coupled to a solid support, such as cellulose or Sephadex (see Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current Protocols Publishing, New York (1994). Once bound, poly (A)+ mRNA is eluted from the affinity column using 2 mM EDTA/0.1% SDS. The sample of RNA can comprise a plurality of different mRNA molecules, each different mRNA molecule having a different nucleotide sequence. In a specific embodiment, the mRNA molecules in the RNA sample comprise at least 100 different nucleotide sequences. More preferably, the mRNA molecules of the RNA sample comprise mRNA molecules corresponding to each of the marker genes. In another specific embodiment, the RNA sample is a mammalian RNA sample. In a specific embodiment, total RNA or mRNA from cells are used in the methods of the invention. The source of the RNA can be cells of a plant or animal, human, mammal, primate, non-human animal, dog, cat, mouse, rat, bird, yeast, eukaryote, prokaryote, etc. In specific embodiments, the method of the invention is used with a sample containing total mRNA or total RNA from 1 x 106 cells or less. In another embodiment, proteins can be isolated from the foregoing sources, by methods known in the art, for use in expression analysis at the protein level.
Probes to the homologs of the marker sequences disclosed herein can be employed preferably wherein non-human nucleic acid is being assayed. The present invention provides sets of markers useful for distinguishing samples from those patients with a good prognosis from samples from patients with a poor prognosis. Thus, the invention further provides a method for using these markers to determine whether an individual afflicted with cancer, especially BC, will have a good or poor clinical prognosis. In one embodiment, the invention provides for method of determining whether an individual afflicted with cancer, especially breast cancer, will likely experience non-survival within 3 years of initial diagnosis (i. e., whether an individual has a poor prognosis) comprising (1) comparing the level of expression of the markers listed in Table 4 in a sample taken from the individual to the level of the same markers in a standard or control, where the standard or control levels represent those found in an individual with a poor prognosis; and (2) determining whether the level of the marker-related polynucleotides in the sample from the individual is significantly different than that of the control, wherein if no substantial difference is found, the patient has a poor prognosis, and if a substantial difference is found, the patient has a good prognosis. Persons of skill in the art will readily see that the markers associated with good prognosis can also be used as controls. In a more specific embodiment, both controls are run. In case the pool is not pure "good prognosis" or "poor prognosis", a set of experiments of individuals with known outcome should be hybridised against the pool to define the expression templates for the good prognosis and poor prognosis group. Each individual with unknown outcome is hybridised against the same pool and the resulting expression profile is compared to the templates to predict its outcome. Poor prognosis of breast cancer may indicate that a tumor is relatively aggressive, while good prognosis may indicate that a tumor is relatively nonaggressive.
Therefore, the invention provides for a method of determining a course of treatment of a breast cancer patient, comprising determining whether the level of expression of the 42 markers of Table 4, or a subset thereof, correlates with the level of these markers in a sample representing a good prognosis expression pattern or a poor prognosis pattern; and determining a course of treatment, wherein, if the expression correlates with the poor prognosis pattern, the tumor is treated as an aggressive tumor.
Classification of a sample as "good prognosis" or "poor prognosis" is accomplished substantially as for the diagnostic markers described above, wherein a template is generated to which the marker expression levels in the sample are compared. The use of marker sets is not restricted to the prognosis of breast cancer- related conditions, and may be applied in a variety of pheno- types or conditions, clinical or experimental, in which gene expression plays a role. Where a set of markers has been identified that corresponds to two or more phenotypes, the marker sets can be used to distinguish these phenotypes. For example, the phenotypes may be the diagnosis and/or prognosis of clinical states or phenotypes associated with other cancers, other disease conditions, or other physiological conditions, wherein the expression level data is derived from a set of genes correlated with the particular physiological or disease condition.
In using the markers disclosed herein, and, indeed, using any sets of markers to differentiate an individual having one pheno- type from another individual having a second phenotype, one can compare the absolute expression of each of the markers in a sample to a control; for example, the control can be the average level of expression of each of the markers, respectively, in a pool of individuals. To increase the sensitivity of the comparison, however, the expression level values are preferably transformed in a number of ways. For example, the expression level of each of the markers can be normalised by the average expression level of all markers the expression level of which is determ- ined, or by the average expression level of a set of control genes. Thus, in one embodiment, the markers are represented by probes on a microarray, and the expression level of each of the markers is normalised by the mean or median expression level across all of the genes represented on the microarray, including any non-marker genes. In a specific embodiment, the normalisation is carried out by dividing the median or mean level of expression of all of the genes on the microarray. In another embodiment, the expression levels of the markers are normalised by the mean or median level of expression of a set of control markers. In a specific embodiment, the control markers comprise a set of housekeeping genes. In another specific embodiment, the normalisation is accomplished by dividinc/ by the median or mean expression level of the control genes. The sensitivity of a marker-based assay will also be increased if the expression levels of individual markers are compared to the expression of the same markers in a pool of samples. Preferably, the comparison is to the mean or median expression level of each the marker genes in the pool of samples. Such a comparison may be accomplished, for example, by dividing by the mean or median expression level of the pool for each of the markers from the expression level each of the markers in the sample. This has the effect of accentuating the relative differences in expression between markers in the sample and markers in the pool as a whole, making comparisons more sensitive and more likely to produce meaningful results that the use of absolute expression levels alone. The expression level data may be transformed in any convenient way; preferably, the expression level data for all is log transformed before means or medians are taken.
In performing comparisons to a pool, two approaches may be preferably used. First, the expression levels of the markers in the sample may be compared to the expression level of those markers in the pool, where nucleic acid derived from the sample and nucleic acid derived from the pool are hybridised during the course of a single experiment. Such an approach requires that new pool nucleic acid be generated for each comparison or limited numbers of comparisons, and is therefore limited by the amount of nucleic acid available. Alternatively, and preferably, the expression levels in a pool, whether normalised and/or transformed or not, are stored on a computer, or on computer- readable media, to be used in comparisons to the individual expression level data from the sample (i. e., single-channel data) .
The expression levels of the marker genes in a sample may be determined by any means known in the art. The expression level may be determined by isolating and determining the level (i. e., amount) of nucleic acid transcribed from each marker gene. Alternatively, or additionally, the level of specific proteins translated from mRNA transcribed from a marker gene may be determined. The level of expression of specific marker genes can be accomplished by determining the amount of mRNA, or polynucleotides derived therefrom, present in a sample. Any method for determining RNA levels can be used. For example, RNA is isolated from a sample and separated on an agarose gel. The separated RNA is then transferred to a solid support, such as a filter. Nucleic acid probes representing one or more markers are then hybridised to the filter by northern hybridisation, and the amount of marker-derived RNA is determined. Such determination can be visual, or machine-aided, for example, by use of a densitometer . Another method of determining RNA levels is by use of a dot-blot or a slot-blot. In this method, RNA, or nucleic acid derived therefrom, from a sample is labeled. The RNA or nucleic acid derived therefrom is then hybridised to a filter containing oligo- nucleotides derived from one or more marker genes, wherein the oligonucleotides are placed upon the filter at discrete, easily- identifiable locations.
Hybridisation, or lack thereof, of the labeled RNA to the filter-bound oligonucleotides is determined visually or by densitometer. Polynucleotides can be labeled using a radiolabel or a fluorescent (i. e., visible) label. These examples are not intended to be limiting; other methods of determining RNA abundance are known in the art. The level of expression of particular marker genes may also be assessed by determining the level of the specific protein expressed from the marker genes. This can be accomplished, for example, by separation of proteins from a sample on a polyacrylamide gel, followed by identification of specific marker-derived proteins using antibodies in a western blot. Alternatively, proteins can be separated by two-dimensional gel electrophoresis systems. Two-dimensional gel electro- phoresis is well-known in the art and typically involves iso- electric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e. g., Hames et al, 1990, GEL ELECTROPHORESIS OF PROTEINS: A PRACTICAL APPROACH, 1RL Press, New York; Shevchenko et al . , Proc. NatYAcad. Sci. USA 93: 1440-1445 (1996); Sagliocco et al . , Yeast 12: 1519-1533 (1996); Lander, Science 274: 536-539 (1996). The resulting elec- tropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies.
Alternatively, marker-derived protein levels can be determined by constructing an antibody microarray in which binding sites comprise immobilised, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome. Preferably, antibodies are present for a substantial fraction of the markerderived proteins of interest. Methods for making monoclonal antibodies are well known (see, e. g. , Harlow and Lane, 1988, ANTIBODIES : A LABORATORY MANUAL, Cold Spring Harbor, New York, which is incorporated in its entirety for all purposes). In one embodiment, monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array, and their binding is assayed with assays known in the art.
Generally, the expression, and the level of expression, of proteins of diagnostic or prognostic interest can be detected through immunohistochemical staining of tissue slices or sections.
Finally, expression of marker genes in a number of tissue specimens may be characterised using a "tissue array" (Kononen et al., Nat. Med 4 (7): 844-7 (1998)). In a tissue array, multiple tissue samples are assessed on the same microarray. The arrays allow in situ detection of RNA and protein levels; consecutive sections allow the analysis of multiple samples simultaneously. In preferred embodiments, polynucleotide microarrays are used to measure expression so that the expression status of each of the markers above is assessed simultaneously. In a specific embodiment, the invention provides for oligonucleotide or cDNA arrays comprising probes hybridisable to the genes corresponding to each of the marker sets described above (i. e., markers to determine the tumor, especially the molecular type or subtype of a tumor; markers to distinguish patients with good versus patients with poor prognosis) . The microarrays provided by the present invention may comprise probes hybridisable to the genes corresponding to markers able to distinguish the status of one, two, or all three of the clinical conditions noted above. In particular, the invention provides polynucleotide arrays comprising probes to a subset or subsets of at least 2, 5, 10, 15, 20, 30 or 40 genetic markers from Table 4. In yet another specific embodiment, microarrays that are used in the methods disclosed herein optionally comprise markers additional to at least some of the markers listed in Table 4. For example, in a specific embodiment, the microarray is a screening or scanning array as described in WO 02/103320, WO 02/18646 and WO 02/16650. The scanning and screening arrays comprise regularlyspaced, position- ally-addressable probes derived from genomic nucleic acid sequence, both expressed and unexpressed. Such arrays may comprise probes corresponding to a subset of, or all of, the markers listed in Table 4, or a subset thereof as described above, and can be used to monitor marker expression in the same way as a microarray containing only markers listed in Table . In yet another specific embodiment, the microarray is a commercially available cDNA microarray that comprises at least five of the markers listed in Table 4.
Preferably, a commercially-available cDNA microarray comprises all of the markers listed in Table 4. However, such a microarray may comprise 5, 10, 15, 25, 40 or more of the markers in any of Table 4, up to the maximum number of markers in the Table or Figure. In a specific embodiment of the microarrays used in the methods disclosed herein, the markers that are all or a portion of Table 4 make up at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of the probes on the microarray. General methods pertaining to the construction of microarrays comprising the marker sets and/or subsets above are described in the following sections. Microarrays are prepared by selecting probes which comprise a polynucleotide sequence, and then immobilising such probes to a solid support or surface. For example, the probes may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA. The polynucleotide sequences of the probes may also comprise DNA and/or RNA analogues, or combinations thereof. For example, the polynucleotide sequences of the probes may be full or partial fragments of genomic DNA. The polynucleotide sequences of the probes may also be synthesised nucleotide sequences, such as synthetic oligonucleotide sequences . The probe sequences can be synthesised either enzymatically in vivo, en- zymatically in vitro (eg., by PCR) , or non-enzymatically in vitro.
The probe or probes used in the methods of the invention are preferably immobilised to a solid support which may be either porous or non-porous. For example, the probes of the invention may be polynucleotide sequences which are attached to a nitrocellulose or nylon membrane or filter covalently at either the 3 ' or the 5 ' end of the polynucleotide. Such hybridisation probes are well known in the art (see, e. g., Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) . Alternatively, the solid support or surface may be a glass or plastic surface. In a particularly preferred embodiment, hybridisation levels are measured to microarrays of probes consisting of a solid phase on the surface of which are immobilised a population of polynucleotides, such as a population of DNA or DNA mimics, or, alternatively, a population of RNA or RNA mimics. . The solid phase may be a nonporous or, optionally, a porous material such as a gel .
In preferred embodiments, a microarray comprises a support or surface with an ordered array of binding (e. g. , hybridisation) sites or "probes" each representing one of the markers described herein. Preferably the microarrays are addressable arrays, and more preferably positionally addressable arrays. More specifically, each probe of the array is preferably located at a known, predetermined position on the solid support such that the iden- tity (i. e., the sequence) of each probe can be determined from its position in the array (i. e., on the support or surface). In preferred embodiments, each probe is covalently attached to the solid support at a single site.
Microarrays can be made in a number of ways, of which several are described below. However produced, microarrays share certain characteristics. The arrays are reproducible, allowing multiple copies of a given array to be produced and easily compared with each other. Preferably, microarrays are made from materials that are sTable under binding (e. g., nucleic acid hybridisation) conditions. The microarrays are preferably small, e. g., between 0,1 cm2 and 25 cm2, between 2 cm2 and 13 cm2, or 3 cm2. However, larger and smaller arrays are also contemplated and may be preferable, e. g., for use in screening arrays.
Preferably, a given binding site or unique set of binding sites in the microarray will specifically bind (e. g., hybridise) to the product of a single gene in a cell (e. g., to a specific mRNA, or to a specific cDNA derived therefrom) . However, in general, other related or similar sequences will cross hybridise to a given binding site.
The microarrays of the present invention include one or more test probes, each of which has a polynucleotide sequence that is complementary to a subsequence of RNA or DNA to be detected. Preferably, the position of each probe on the solid surface is known. Indeed, the microarrays are preferably positionally addressable arrays. Specifically, each probe of the array is preferably located at a known, predetermined position on the solid support such that the identity (i. e., the sequence) of each probe can be determined from its position on the array (i. e., on the support or surface).
According to the invention, the microarray is an array (i. e. , a matrix) in which each position represents one of the markers described herein. For example, each position can contain a DNA or DNA analogue based on genomic DNA to which a particular RNA or cDNA transcribed from that genetic marker can specifically hybridise. The DNA or DNA analogue can be, e. g., a synthetic oli- gomer or a gene fragment. In one embodiment, probes representing each of the markers is present on the array. As noted above, the "probe" to which a particular polynucleotide molecule specifically hybridises according to the invention contains a complementary genomic polynucleotide sequence. The probes of the microarray preferably consist of nucleotide sequences of no more than 1,000 nucleotides. In some embodiments, the probes of the array consist of nucleotide sequences of 10 to 1,000 nucleotides. In a preferred embodiment, the nucleotide sequences of the probes are in the range of 10-200 nucleotides in length and are genomic sequences of a species of organism, such that a plurality of different probes is present, with sequences complementary and thus capable of hybridising to the genome of such a species of organism, sequentially tiled across all or a portion of such genome. In other specific embodiments, the probes are in the range of 10-30 nucleotides in length, in the range of 10-40 nucleotides in length, in the range of 20-50 nucleotides in length, in the range of 40-80 nucleotides in length, in the range of 50-150 nucleotides in length, in the range of 80-120 nucleotides in length, and most preferably are 60 nucleotides in length.
The probes may comprise DNA or DNA "mimics" (e. g., derivatives and analogues) corresponding to a portion of an organism's genome. In another embodiment, the probes of the microarray are complementary RNA or RNA mimics. DNA mimics are polymers composed of subunits capable of specific, Watson-Crick-like hybridisation with DNA, or of specific hybridisation with RNA. The nucleic acids can be modified at the base moiety, at the sugar moiety, or at the phosphate backbone. Exemplary DNA mimics include, e. g., phosphorothioates .
DNA can be obtained, e. g., by polymerase chain reaction ( PCR) amplification of genomic DNA or cloned sequences. PCR primers are preferably chosen based on a known sequence of the genome that will result in amplification of specific fragments of genomic DNA. Computer programs that are well known in the art are useful in the design of primers with the required specificity and optimal amplification properties, such as Oligo version 5.0 (National Biosciences) . Typically each probe on the microarray will be between 10 bases and 50,000 bases, usually between 300 bases and 1,000 bases in length. PCR methods are well known in the art, and are described, for example, in Innis et al., eds . , PCR PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS, Academic Press Inc., San Diego, CA (1990). It will be apparent to one skilled in the art that controlled robotic systems are useful for isolating and amplifying nucleic acids.
An alternative, preferred means for generating the polynucleotide probes of the microarray is by synthesis of synthetic polynucleotides or oligonucleotides, e. g., using Nphosphonate or phosphoramidite chemistries (Froehler et al., Nucleic Acid Res. 14: 53995407 (1986); McBride et al., Tetrahedron Lett. 24: 246-248 (1983)). Synthetic sequences are typically between about 10 and about 500 bases in length, more typically between about 20 and about 100 bases, and most preferably between about 40 and about 70 bases in length. In some embodiments, synthetic nucleic acids include non-natural bases, such as, but by no means limited to, inosine. As noted above, nucleic acid analogues may be used as binding sites for hybridisation. An example of a suiT- able nucleic acid analogue is peptide nucleic acid (see, e. g., Eghohn et al., Nature 363: 566-568 (1993); U. S. Patent No. 5,539,083) .
Probes are preferably selected using an algorithm that takes into account binding energies, base composition, sequence complexity, cross-hybridisation binding energies, and secondary structure (see Friend et al., International Patent Publication WO 01/05935, published January 25,2001; Hughes et al . , Nat. Biotech. 19: 342-7 (2001) ) . A skilled artisan will also appreciate that positive control probes, e. g. , probes known to be complementary and hybridisable to sequences in the target polynucleotide molecules, and negative control probes, e. g., probes known to not be complementary and hybridisable to sequences in the target polynucleotide molecules, should be included on the array. In one embodiment, positive controls are synthesised along the perimeter of the array. In another embodiment, positive controls are synthesised in diagonal stripes across the array. In still another embodiment, the reverse complement for each probe is synthesised next to the position of the probe to serve as a negative control. In yet another embodiment, sequences from other species of organism are used as negative controls or as"spike-in"controls . The probes are attached to a solid support or surface, which may be made, e. g., from glass, plastic (e. g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, or other porous or nonporous material. A preferred method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al, Science 270: 467-470 (1995). This method is especially useful for preparing microarrays of cDNA (See also, DeRisi et al, Nature Genetics 14: 457-460 (1996); Shalon et al., Genome Res. 6 : 639-645 (1996); and Schena et al . , Proc. Natl. Acad. Sci. U. S. A. 93: 10539-11286 (1995)).
A second preferred method for making microarrays is by making high-density oligonucleotide arrays. Techniques are known for producing arrays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ (see, Fodor et al . , 1991, Science 251: 767-773; Pease et al., 1994, Proc. Natl. Acad. Sci. U. S. A. 91: 5022-5026; Lockhart et al., 1996, Nature Biotechnology 14: 1675; U. S. Patent Nos. 5,578,832; 5,556,752; and 5,510,270) or other methods for rapid synthesis and deposition of defined oligonucleotides (Blanchard et al., Biosensors & Bioelectronics 11: 687-690). When these methods are used,, oligonucleotides (e. g., 60-mers) of known sequence are synthesised directly on a surface such as a derivat- ised glass slide. Usually, the array produced is redundant, with several oligonucleotide molecules per RNA. Other methods for making microarrays, e. g., by masking (Maskos and Southern, 1992, Nuc. Acids. Res. 20: 1679-1684), may also be used. In principle, and as noted supra, any type of array, for example, dot blots on a nylon hybridisation membrane (see Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989)) could be used. However, as will be recognised by those skilled in the art, very small arrays will frequently be preferred because hybridisation volumes will be smaller.
In one embodiment, the arrays of the present invention are pre- pared by synthesising polynucleotide probes on a support. In such an embodiment, polynucleotide probes are attached to the support covalently at either the 3 ' or the 5 ' end of the polynucleotide. In a particularly preferred embodiment, microarrays of the invention are manufactured by means of an ink jet printing device for oligonucleotide synthesis, e. g., using the methods and systems described by Blanchard in U. S. Pat. No. 6,028,189; Blanchard et al., 1996, Biosensors and Bioelectronics 11: 687- 690; Blanchard, 1998, in SYNTHETIC DNA ARRAYS IN GENETIC ENGINEERING, Vol. 20, J. K. Setlow, Ed., Plenum Press, New York at pages 111-123. Specifically, the oligonucleotide probes in such microarrays are preferably synthesised in arrays, e. g. , on a glass slide, by serially depositing individual nucleotide bases in "microdroplets" of a high surface tension solvent such as propylene carbonate. The microdroplets have small volumes (e. g., 100 pL or less, more preferably 50 pL or less) and are separated from each other on the microarray (e. g., by hydrophobic domains) to form circular surface tension wells which define the locations of the array elements (i. e., the different probes). Microarrays manufactured by this ink-jet method are typically of high density, preferably having a density of at least about 2,500 different probes per 1 cm2. The polynucleotide probes are attached to the support covalently at either the 3 ' or the 5 ' end of the polynucleotide.
The polynucleotide molecules which may be analyzed by the present invention (the "target polynucleotide molecules") may be from any clinically relevant source, but are expressed RNA or a nucleic acid derived therefrom (e. g., cDNA or amplified RNA derived from cDNA that incorporates an RNA polymerase promoter) , including naturally occurring nucleic acid molecules, a.s well as synthetic nucleic acid molecules. In one embodiment, the target polynucleotide molecules comprise RNA, including, but by no means limited to, total cellular RNA, poly (A) + messenger RNA (mRNA) or fraction thereof, cytoplasmic mRNA, or RNA transcribed from cDNA (i. e., cRNA ; see, e. g., Linsley & Schelter, U. S. Patent Application No. 09/411,074, filed October 4,1999, or U. S. Patent Nos. 5,545,522, 5,891,636, or 5,716,785). Methods for preparing total and poly (A) RNA are well known in the art, and are described generally, e. g., in Sambrook et al., MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) . In one embodiment, RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry 18: 5294-5299). In another embodiment, total RNA is extracted using a silica gel-based column, commercially available examples of which include RNeasy (Qiagen, Valencia, California) and StrataPrep (Stratagene, La Jolla, California) . In an alternative embodiment, which is preferred for S. cerevisiae, RNA is extracted from cells using phenol and chloroform, as described in Ausubel et al., eds . , 1989, CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Vol III, Green Publishing Associates, Inc., John Wiley & Sons, Inc., New York, at pp. 13.12.1-13.12.5).
Poly (A) + RNA can be selected, e. g., by selection with oligo- dT cellulose or, alternatively, by oligo-dT primed reverse transcription of total cellular RNA. In one embodiment, RNA can be fragmented by methods known in the art, e. g, by incubation with ZnCt, to generate fragments of RNA. In another embodiment, the polynucleotide molecules analyzed by the invention comprise cDNA, or PCR products of amplified RNA or cDNA. In one embodiment, total RNA, mRNA, or nucleic acids derived therefrom, is isolated from a sample taken from a person afflicted with BC, LSCC, LAC and RCC. Target polynucleotide molecules that are poorly expressed in particular cells may be enriched using normalisation techniques (Bonaldo et al . , 1996, Genome Res. 6: 791- 806) . As described above, the target polynucleotides are detect- ably labeled at one or more nucleotides. Any method known in the art may be used to detectably label the target polynucleotides. Preferably, this labeling incorporates the label uniformly along the length of the RNA, and more preferably, the labeling is carried out at a high degree of efficiency.
One embodiment for this labeling uses oligo-dT primed reverse transcription to incorporate the label; however, conventional methods of this method are biased toward generating 3' end fragments. Thus, in a preferred embodiment, random primers (e. g., 9-mers) are used in reverse transcription to uniformly incorporate labeled nucleotides over the full length of the target poly- nucleotides. Alternatively, random primers may be used in conjunction with PCR methods or T7 promoter-based in vitro transcription methods in order to amplify the target polynucleotides. In a preferred embodiment, the detectable label is a' luminescent label. For example, fluorescent labels, bio-luminescent labels, chemi-luminescent labels, and colorimetric labels may be used in the present invention. In a highly preferred embodiment, the label is a fluorescent label, such as a fluorescein, a phosphor, a rhodamine, or a polymethine dye derivative. Examples of commercially available fluorescent labels include, for example, fluorescent phosphoramidites such as FluorePrime (Amersham Pharmacia, Piscataway, N. J. ) , Fluoredite (Millipore, Bedford, Mass.), FAM (ABI, Foster City, Calif.), and Cy3 or Cy5 (Amersham Pharmacia, Piscataway, N. J. ) . In another embodiment, the detecTable label is a radiolabeled nucleotide.
In a further preferred embodiment, target polynucleotide molecules from a patient sample are labeled differentially from target polynucleotide molecules of a standard. The standard can comprise target polynucleotide molecules from normal individuals (i. e., those not afflicted with breast cancer). In a highly preferred embodiment, the standard comprises target polynucleotide molecules pooled from samples from normal individuals or tumor samples from individuals having sporadic-type breast tumors. According to another embodiment, the target polynucleotide molecules are derived from the same individual, but are taken at different time points, and thus indicate the efficacy of a treatment by a change in expression of the markers, or lack thereof, during and after the course of treatment (i. e., chemotherapy, radiation therapy or cryotherapy) , wherein a change in the expression of the markers from a poor prognosis pattern to a good prognosis pattern indicates that the treatment is efficacious. In this embodiment, different timepoints are differentially labeled.
Nucleic acid hybridisation and wash conditions are chosen so that the target polynucleotide molecules specifically bind or specifically hybridise to the complementary polynucleotide sequences of the array, preferably to a specific array site, wherein its complementary DNA is located. Arrays containing double-stranded probe DNA situated thereon are preferably subjected to denaturing conditions to render the DNA single-stranded prior to contacting with the target polynucleotide molecules. Arrays containing single-stranded probe DNA (e. g., synthetic oligodeoxyribonucleic acids) may need to be denatured prior to contacting with the target polynucleotide molecules, e. g., to remove hairpins or dimers which form due to self complementary sequences. Optimal hybridisation conditions will depend on the length (e. g. , oligomer versus polynucleotide greater than 200 bases) and type (e. g., RNA, or DNA) of probe and target nucleic acids. One of skill in the art will appreciate that as the oligonucleotides become shorter, it may become necessary to adjust their length to achieve a relatively uniform melting temperature for satisfactory hybridisation results. General parameters for specific (i. e., stringent) hybridisation conditions for nucleic acids are described in Sambrook et al . , MOLECULAR CLONING-A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York (1989) , and in Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current Protocols Publishing, New York (1994). Typical hybridisation conditions for the cDNA microarrays of Schena et al. are hybridisation in 5 X SSC plus 0.2% SDS at 65 C for four hours, followed by washes at 25 C in low stringency wash buffer (1 X SSC plus 0.2% SDS), followed by 10 minutes at 25 C in higher stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Schena et al., Proc. Natl. Acad. Sci. U. S. A. 93 : 10614 (1993)). Useful hybridisation conditions are also provided in, e. g., Tijessen, 1993, HYBRIDIZATION WITH NUCLEIC ACID PROBES, Elsevier Science Publishers B. V.; and Kricka, 1992, NONISOTOPIC DNA PROBE TECHNIQUES, Academic Press, San Diego, CA. Particularly preferred hybridation conditions include hybridation at a temperature at or near the mean melting temperature of the probes (e. g., within 5 C, more preferably within 2 C) in 1 M NaCl, 50 mM MES buffer (pH 6.5), 0.5% sodium sarcosine and 30% formamide.
When fluorescently labeled probes are used, the fluorescence emissions at each site of a microarray may be, preferably, detected by scanning confocal laser microscopy. In one embodiment, a separate scan, using the appropriate excitation line, is carried out for each of the two fluorophores used. Alternatively, a laser may be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al., 1996, "A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridation, "Genome Research 6: 639-645, which is incorporated by reference in its entirety for all purposes) . In a preferred embodiment, the arrays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomulti- plier tubes. Fluorescence laser scanning devices are described in Schena et al., Genome Res. 6: 639-645 (1996), and in other references cited herein. Alternatively, the fiber-optic bundle described by Ferguson et al., Nature Biotech. 14: 1681-1684 (1996) , may be used to monitor mRNA abundance levels at a large number of sites simultaneously.
Signals are recorded and, in a preferred embodiment, analyzed by computer, e. g., using a 12 or 16 bit analog to digital board. In one embodiment the scanned image is despeckled using a graphics program (e. g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridation at each wavelength at each site. If necessary, an experimentally determined correction for"cross talk" (or overlap) between the channels for the two fluors may be made. For any particular hybridation site on the transcript array, a ratio of the emission of the two fluorophores can be calculated. The ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated in association with the different cancer-related condition.
The invention is further described by the following examples and the drawing figures, yet without being restricted thereto.
Fig.l shows PCR products of subtractive lung squamous cell carcinoma cDNA libraries generated either by a 4-base or a pool of 6-base recogning restriction enzymes. Lanes 1 and 4: DNA size marker (Phi-x/Haelll + lambda/Hindlll) ; Lane 2: subtractive library generated using a pool of 6-cutters; Lane 3: subtractive library generated using the 4-cutter Rsal; arrows indicate the respective position for keratin 6A cDNA fragment.
Fig.2 shows a histogram of the distribution of coefficients of variation (CV) . Data are from a hybridisation that was repeated four times under the same conditions. Only spots with valid signals in each of the four hybridisations (71%) were included for calculation. Ninety-five percent of all spots have CVs smaller than 37%, more than 99% of all spots displayed CVs of less than 57%.
Fig.3 shows the distribution of clones among subtractive libraries. The 100 most up-regulated clones for lung-squamous cell carcinoma (SCC) , lung-adenocarcinoma (AC) , and renal cell cancer (RCC) , respectively, were analyzed with reference to their origin from the different subtractive libraries (for description of libraries, see Table 1), e.g., 96% of the most up-regulated clones in lung-SCC were derived from the lung-SCC subtractive library (C-Library) .
Fig.4 shows the comparison of expression levels of clone 709G4 [EGL nine homologue 3 (EGLN3) ] in various normal and tumor tissues analyzed with either cDNA microarrays or quantitative realtime PCR. Expressions levels of microarray experiments are presented as the ratio of intensities of Cy3 (green fluorescence; individual probe) versus Cy5 (red fluorescence; pool of critical normal tissues) ; data from real-time PCR are shown as relative copy numbers (reference: beta-actin) . The different tissue types are distinguished with the indicated colors.
Figure 5: Kaplan Meyer analysis of the overall survival (in years after diagnosis) of 39 breast cancer patients with a strong expression (high cyclin Bl or high cyclin B2) or a weak expression (low cyclin Bl or low cyclin B2) of the marker genes cyclin Bl and cyclin B2. cyclin Bl was discovered as a marker gene discriminating between a good or a poor prognosis by us (cyclin Bl is one of the 42 marker genes shown in Table 4), whereas cyclin B2 was discovered as one of 240 best marker genes discriminating between a good or a poor prognosis by van't Veer et al. (Nature 415, 530-536, 2002).
Figure 6: Gene expression correlates with long or short overall survival of breast cancer patients; left panel: Hierarchical clustering of the 42 genes selected to be most highly associated with the two survival groups (more than 9 years versus less than 3 years) including a dendrogram of the clustered patient samples. Each column represents one tumor sample, and each row represents one gene, presented in the same order as in Table 4. Student's t-test was used to select genes most differentially expressed between the survival groups (significance P less than 0.02), P-chance analysis was used to eliminate false positives. Only genes with P-value less than P-chance were selected. Genes downregulated in tumors of patients with an overall survival less than 3 years, and genes upregulated in tumors of patients with an overall survival less than 3 years are indicated (separated by dashed line) . The presence (black box) or absence (white box) of prognostic breast cancer markers estrogen receptor (ER) , progesterone receptor (PR), HER2, tumor stage T3, invasive ductal morphology, and tumor grade 3 are indicated for each tumor sample; right panel: Principal component analysis (PCA) of patient samples using the selected genes. Samples of patients surviving at least nine years are shown in the left part, and samples of patients who survived less than three years are shown in the right part (separated by dashed line) . The first three principal components are shown.
EXAMPLES :
MATERIALS AND METHODS
RNA Preparation and Quality Control.
All tumor tissue samples were collected at the University of Vienna and the University of Graz following approval by their Institutional Review Boards and written informed consent. Tissues were snap frozen in liquid nitrogen immediately after surgical resection and stored at -80°C. For isolation of total RNA, serial cryosections were directly dissolved in 4 M guanidinium thiocyanate containing 1% beta-mercaptoethanol, and the lysate was subjected to ultracentrifugation over a CsCl gradient (Sam- brook et al, 2001). The first and the last cryosections (6 μm) of the biopsies were used for standard H&E staining to verify integrity and tumor cell content of analyzed tissues. Poly (A) + RNA (mRNA) was extracted using the Oligotex Direct mRNA kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. mRNAs of normal tissues were purchased from Clontech (Palo Alto, CA) , Invitrogen (Carlsbad, CA) , and Ambion (Austin, TX) . All RNAs were analyzed with the Agilent 2100 Bioanalyzer and the RNA 6000 Nano Assay kit (Agilent Technologies, Palo Alto, CA) to determine RNA quality and quantity according to the manufacturer's protocol. To test cDNA samples derived from tumor material for residual genomic DNA, PCR was performed with 0.1 ,μg of cDNA and primers specific for an intronic region of glyceral- dehyde-3-phosphate dehydrogenase (GAPDH) : GAPDH intron forward: 5'-CGCGTCTACGAGCCTTGCGGGCT-3' ; and GAPDH intron reverse: 5 ' -GCTTTCCTAACGGCTGCCCATTCA-3 ' . The integrity of synthesised cDNA was determined by PCR using GAPDH exon-specific primers (GAPDH exon forward: 5 ' -AAGGTGAAGGTCGGAGTCAACG-3 ' ; and GAPDH exon reverse: 5 ' -GGCAGAGATGATGACCCTTTTGGC-3 ' ) . An aliquot of normal and tumor tissue poly (A) + RNA was transcribed into Cla- oligo(dT) -primed (5 ' -ATTCGCGACTGATGATCGAT (16) -3 ' ) cDNA. By linear amplification from the 3 ' -end of cDNAs with a Cla-specific primer (5 ' -ATTCGCGACTGATGATCGAT-3 ' ) , the samples were analyzed for degradation of mRNA. Intact mRNA results in a heterogeneous distribution from about 8 to 2 kb on agarose gels.
Construction of Subtractive Libraries :
All libraries were generated by suppression subtraction hybridisation (SSH), using poly(A)+ RNA of tumor tissue or a tumor cell line as a tester, and the corresponding normal tissue or a pool of normal tissues or a primary cell line as a driver. A modified protocol of the PCR-Select cDNA Subtraction kit was used (Clontech) . As a major difference to the suppliers protocol, cDNA was not digested with restriction enzyme Rsal but with a pool of six restriction enzymes (5 units each of EcoRV, Nael, Nrul, Seal, Sspl, and Stul) to increase average length of cDNAs; cDNA was first incubated 1.5 h in buffer A (Promega, Madison, WI) with Nael and Stul. After increase of the NaCl concentration from 6 to 150 mM, cDNA was incubated for another 1.5 h at 37 °C with EcoRV, Nrul, Seal, and Sspl. All subsequent steps were performed according to the protocol of the PCR-Select cDNA Subtraction kit (Clontech) . After DNA amplification, PCR products were cloned into pCR 2.1-TA vector (Invitrogen) .
Preparation of cDNA Microarrays .
In addition to clones derived from subtracted libraries, also individually chosen clones with a known or suspected role in tumor formation were used for our microarrays. Therefore, specific primers were synthesised (genXpress) for 105 expressed sequence tags found to be up-regulated in colon cancer. The corresponding cDNA fragments were amplified from a mixture of placenta and testis cDNA as a template. PCR products corresponding to partial cDNA fragments were cloned into pGEM-T (Promega) and sequenced. Additional 1682 cancer-relevant clones were obtained from the I.M.A.G.E. consortium. In total, 11,040 clones were used for preparation of microarrays (Table 1) . All cloned cDNA fragments were amplified by PCR, purified via ethanol precipitation in 96- well plates, and analyzed on agarose gels (Eisen and Brown (Meth.Enzymol.303, 179-205 (1999))). PCR products were spotted on poly-L-lysine-coated glass slides (Menzel, Braunschweig, Germany) by a customised robotic arrayer (Promedia Associates, New York, NY) . Spotted DNA was cross-linked to the glass surface of the chips by UV irradiation (60 mJ) . Chips were then blocked in blocking solution (1.37 g of succinic anhydride in 10 ml of 0.2 M sodium borate and 90 ml of N-methyl-pyrrolidinone) , and double strands were denatured in boiling water.
Labeling and Hybridisation.
Hybridisation was performed as previously described by Eisen and Brown (Meth.Enzymol.303, 179-205 (1999)). As a reference probe in the two-color hybridisations performed, an equal mixture of 16 critical (vital) normal tissues (rectum, bone marrow, lymph node, spleen, skeletal muscle, small intestine, thymus, trachea, brain, heart, kidney, liver, lung, pancreas, spleen, stomach, and colon) was used. Briefly, 500 ng of poly (A) + RNA were subjected to oligo (dT) -primed reverse transcription following the in- structions of the Superscript Reverse Transcription kit (Clontech) . The reaction was carried out in a final volume of 40 μl at 42 °C. Fluorescent nucleotides Cy5-dUTP and Cy3- dUTP (Amersham Pharmacia, Piscataway, NJ) were used at 0.1 mM. The nucleotide concentrations were 0.5 mM dGTP, dATP, dCTP and 0.2 mM dTTP. 2 μl of Superscript II (200 units/μl; Invitrogen) were added at the beginning of the labeling reaction, and 1 additional μl was added after 1 h and the reaction continued for an additional hour. Unlabeled RNA was digested using the RNase One kit (Promega) according to the manufacturer's specifications. The Cy5 and Cy3 probes were pooled and, following addition of 15 μg of human Cotl DNA (Clontech), 3 μg of poly (dA) 40-60, and 6 μg of tRNA (Sigma, St. Louis, MO), subjected to ethanol/am- monium acetate precipitation (15) . The pellet was resuspended in 10.5 μl of water. 6 μl of 20x salinesodium phosphate-EDTA, 1.5 μl of 50x Denhardts solution, 10.5 μl of formamide, and 0.75 μl of 20% SDS were added, and the probe was preincubated for 1 h at 50 °C. Chips were prehybridised with 30 μl of pre- hybridisation buffer [210 μl of formamide, 120 μl of 20x saline- sodium phosphate-EDTA, 60 μl of 50x Denhardts solution, 6 μl of salmon sperm DNA (10 μg/μl), 190 μl of H20, and 15 μl of 20% SDS] for 1 h at 50 °C. For hybridisation, the probe was added to the array, covered with a coverslip, and placed in a sealed chamber to prevent evaporation. After hybridisation at 50°C over night, chips were washed in three solutions of decreasing ionic strength (Schena et al., Science 270, 467-470 (1995)).
Scanning, Image Analysis, Data Filtering, and Normalisation.
Microarrays were scanned with an GenePix 4000A scanner (Axon Instruments, Inc., Union City, CA) at 10-μm resolution. The signal was converted into 16 bits/pixel resolution, yielding a 65,536 count dynamic range. Image analysis and calculation of feature pixel intensities adjusted for local channel specific background was performed using the GenePix Pro 3.0 software (Axon Instruments, Inc.). With this software, gridding, automated spot detection, manual and automated flagging were performed, as well as background subtraction and normalisation. Background-subtracted element signals were used to calculate Cy3/Cy5 ratios. Spots were excluded from additional analysis if the ratio of foreground versus background signal was less than 2. Each microarray was normalised by scaling according to the GenePix normalisation factor such that the median of ratios value is 1. For additional evaluation and statistical analysis, output files were exported to a relational Microsoft Access database.
Statistical Analysis .
All statistical analyses were performed with the software package S-Plus (Insightful, Inc.) and R.7 Student's t test was used to select genes most differentially expressed between each two groups, with significance set at P less than 0.02. Furthermore, P-chance analysis was used to eliminate false positives. Hereby, a t-statistic is assigned to each gene, and Ps (both unadjusted and adjusted according to Westfall and Young step-down algorithm) are obtained by permutating the samples in place of assuming t-distributions, as described by Dudoit et al. (Stat.Sin.12, 111-139 (2002)). To reduce the number of false positives (type I error) , only genes with P less than P-chance were selected. In addition, only genes with an absolute fold change more than 1.5-fold were chosen for additional analyses. For cluster analysis, either GeneSpring (Silicon Genetics) or Spotfire DecisionSite (Spotfire, Sweden) were used as software packages .
Quantitative Real-Time PCR.
Quantitative real-time PCR was performed in the presence of SYBR Green using the Lightcycler-DNA Master SYBR Green I kit from Roche (Mannheim, Germany) . Comparison with housekeeping genes allows relative quantification of monitored genes in different cDNA samples. Briefly, 100 ng of mRNA were converted to cDNA in a total volume of 50 μl using the Superscript Reverse Transcription kit (Clontech) . 1 μl of this mixture was used as template for PCR amplification. Thirty-five PCR cycles were performed as follows: 30 s denaturation at 94 °C; 30 s annealing at 65 °C; and 45 s for elongation at 72°C. All reverse transcription (RT)-PCR reactions were performed on an ABI PRISM 7700 Detector (Perkin- Elmer/ Applied Biosystems, Foster City, CA) . All plates con- tained 60 different cDNAs, a dilution series of a plasmid for the gene of interest, and nontemplate controls. Gene-specific primers were used to amplify fragments of about 130 bp. All plates were done in duplicates based on which average copy numbers were calculated. Copy numbers were normalised to beta-actin using the following primers for amplification: actin-up, 5'-TGTTTTCTGCGCAAGTTAGG-3' ; and actin-do, 5 ' -GTCCACCTTCCAGCAGAT- GT-3' .
RESULTS
Histological Characterisation and RNA Extraction.
Before RNA preparation, tumor tissues were histologically assessed to ensure homogeneity and integrity of the tumor tissue and to confirm the histological typing of the tumor samples. Cryosections were taken and stained with H&E. The fraction of tumor cells (in most samples more than 50%) , residual normal tissue cells, hyperplastic cells, and infiltrating leukocytes were determined (Table 2) . Tumor samples with massive leukocyte infiltration and large necrotic areas were excluded from additional analysis (altogether 50% of tumor samples analysed) . For a quantification and quality control, all RNA preparations were subjected to capillary electrophoresis (Bioanalyzer 2100; Agilent Technologies) and PCR analyses for potential chromosomal DNA contamination using intron-specific primers for GAPDH. Thirty percent of all RNA preparations were eliminated because of degradation (28S/18S rRNA ratio less than 1.3 plus profile analysis of capillary electrophoresis) or contamination with rRNA (more than 10%) and/or chromosomal DNA (positive signal of GAPDH-intron-specific PCR). Additionally, each of the poly (A) + preparation from Individual tumor samples were analyzed for RNA degradation by performing a linear amplification of a Cla-oligo (dT)-primed cDNA aliquots (see "Material and Methods"; Table 2). RNA samples meeting those stringent quality criteria are listed in Table 2 and were subsequently used for microarray experiments. Sixteen of the 20 breast cancer samples used represent pairs of primary tumors and corresponding lymph node metastases (Table 2) . Generation of Tumor-cDNA-Enriched Libraries .
As a major improvement to conventional SSH protocols, it was aimed at the generation of longer cDNA fragments for subtractive hybridisation. A combination of six different 6-base recognising restriction enzymes were used instead of an enzyme recognising a 4-base motif, as described in the original protocol (Diatchenko et al, 1996) . This led to an average cDNA length of 800 bp instead of a predicted average fragment size of 256 bp. The efficiency of adapter ligation and suppression PCR was found to be the same when either a 4-base (Rsal) or the present set of 6- base recognising restriction enzymes were used. As shown in Fig. 1, the present approach results in a considerable increase in the average length of cDNA fragments. A prominent band in Fig. 1 was identified by gel isolation and sequencing to be keratin 6A. This keratin 6A cDNA fragment was 850 bp long when the present combination of 6-base cutters was applied, but only 580 bp when Rsal was used. Furthermore, sequencing of several hundreds of the present clones revealed an average length of about 800 bp. The enrichment for known tumor markers such as keratin 6A in the subtractive LSCC library provided evidence for successful sub- tractive hybridisation; a high proportion of sequenced clones of this library were found to be members of the cytokeratin protein family (Table 3) . For description of libraries and number of clones that have been used for microarray experiments, see Table 1.
Microarray Experiments , Statistics , and Data Confidence .
In total, 9253 clones were derived by SSH from seven different sources and 105 clones by RT-PCR cloning of individual genes, which had been shown previously to be at least 6-fold up-regulated in colon tumors; 1682 sequence-verified and tumor-relevant genes were obtained from the I.M.A.G.E. consortium (Table 1). cDNA microarrays were produced from these 11,040 clones by robotic arraying onto poly-L-lysine-coated glass slides. Fifty selected tumor samples [11 lung adenocarcinomas (LACs) , 11 LSCCs, 20 breast carcinomas, and 8 RCCs] , 16 critical (vital) normal tissues (rectum, bone marrow, lymph node, skeletal muscle, small intestine, thymus, trachea, brain, heart, kidney, liver, lung, pancreas, spleen, stomach, and colon) and 6 noncritical normal tissues (uterus, breast, prostate, fetal brain, fetal lung, and placenta) were analyzed. As a reference probe in the two-color hybridisations performed, an equal mixture of the 16 critical normal tissues was used. Fluorescence scanning and image analysis resulted in ratios of Cy3/Cy5 intensities that were used for additional analyses. For comparison of individual hybridisations with each other, data were normalised before data mining. As the present chips comprised 11,040 clones of nine different sources (Table 1) , the present clone collection was regarded to be sufficiently diverse and balanced. Accordingly, a global normalisation was applied on the present data to balance total intensities of Cy3 (red) and Cy5 (green) fluorescence by a linear transformation. By this procedure, ratios were divided with a ratio of global Cy3/Cy5 intensities of each chip. Then, statistical analysis on the confidence and extent of variation of the present data was performed (Fig. 2) . For that purpose, one hybridisation was repeated four times under the same conditions. More than 95% of all spots displayed coefficient of variations (coefficient of variation = SD in percentage of average) smaller than 37%, whereas more than 99% of all spots displayed coefficient of variations of less than 57%. Thus, with a probability of 99%, any Cy3/Cy5 ratio determined has a maximum error of 57%. Therefore, 2-fold changes are statistically significant for more than 99% of all spots. Values with higher intensities are more reliable, indicated by a smaller coefficient of variation. Accordingly, spots that did not show intensities at least 2-fold higher than the background were excluded from subsequent analyses.
Cluster Analysis .
On the basis of normalised and prefiltered data, first Nmax was determined for each gene, i.e., the highest expression value in any of the 16 critical normal tissues. All normalised expression values of the 11,040 genes in 22 normal tissues and 50 tumor samples are listed in Supplemental Data, Table 1. Subsequently, all genes were selected that showed at least 2-fold higher expression in at least 20% of samples of any tumor type compared to Nmax. Applying these restrictive criteria resulted in a list of 527 clones representing 130 different genes, 116 of which coded for proteins of known functions (summarised in Table 3). Gene-wise hierarchical clustering of these 527 clones clearly separated the different tumor types. This was not necessarily to be expected because these 527 clones were selected based on different expression between tumors and normal tissues but not on different expression between different tumor types. However, the extent of similarity of tumors from the same tissue of origin varied strongly. All 8 RCCs showed nearly identical expression profiles, whereas expression profiles of LACs were remarkably variable among different patients. Sample-wise clustering confirmed these data, as RCCs again clustered together most closely.
Additional analysis of the 527 clones (Table 3) revealed that at least half of the clones selected as up-regulated in a given tumor type were derived from the corresponding subtractive library (Fig. 3) ; for example, 96% of the most highly up-regulated genes in LSCC have been derived from the library based on subtraction of normal tissues from LSCC tissues. A very similar situation is found with the differentially regulated genes in LAC and RCC; >50% of these genes are derived from their corresponding sub- tractive libraries (Fig. 3) . These data demonstrate the specificity of the subtractive libraries and underscore that the combination of subtractive hybridisation and cDNA microarray technology is a highly efficient way to identify differences in gene expression profiles. Furthermore, each clone of a given subtractive library is representative of the tumor tissue where it is derived from.
Quantitative Real-Time PCR.
To verify the results of microarray experiments, real-time PCR experiments were performed. Sixty of the total 72 RNA samples were subjected to reverse transcription and subsequent quantification in real-time PCR. beta-Actin was used as a reference because it exhibits the most constant expression level in all the tissues analyzed with microarrays. Fig. 4 shows a side-by- side comparison of expression levels measured by micorarray analysis and real-time PCR for EGL nine homologue 3 (EGLN3) , a gene that was found to be highly up-regulated in all RCCs. Real-time PCR and microarray analyses show nearly identical expression patterns. No significant expression ("flagged spots") in microarray experiments (absent bars in Fig. 4A) correlate well with absence or low expression levels in real-time PCR (Fig. 4B) . Overexpression of EGLN3 in RCCs was also confirmed by im- munohistochemistry . Four additional genes, NDGR1, OSF-2, TP73L, and NAT1 were verified in the same way, all exhibiting high agreement between microarray and real-time PCR results. Linear regression analysis revealed R2 values of 0.64 (EGLN3) , 0.69 (ND- GR1) , 0.61 (OSF-2), 0.87 (TP73L) , and 0.67 (NAT1) . Thus far, two splice variants have been demonstrated for EGLN3 (formerly described as 9D7), which recently has been shown to be involved in the regulation of the hypoxia-inducible factor. By RT-PCR studies using RNA samples derived from normal tissues and tumor samples, it could be demonstrated that the shorter splice variant is dominant over the longer one in all tissues. In addition, the shorter splice variant is up-regulated in all RCCs and half of LSCCs.
Survival Correlation .
Sixteen of the breast cancer samples represent 8 pairs of primary tumors and lymph node metastases. Three patients succumbed to their disease within 3 years after diagnosis, and 4 patients were still alive at least 9 years after diagnosis. This allowed a classification of the patients into two groups and apply a statistical filter criterion to select the genes that are most significantly correlated with overall survival. One patient had an overall survival of 7 years and was not included in this analysis. Starting with the expression values from the complete set of 11,040 clones, this approach resulted in a list of 45 clones that correlated best with the overall survival, corresponding to 42 genes; 3 genes were represented twice (Table 4) . Then, the expression profile of these 42 genes was subjected to two distinct unsupervised cluster methods, hierarchical clustering (Table 4), and principal component analysis. Both methods were able to correctly separate the samples into two distinct groups. Furthermore, the gene dendogram clearly separates two groups of genes: those overexpressed in patients with short sur- vival time and those down-regulated in patients with short survival. The former group includes cyclin Bl, TGF-beta3, the transcription factors Erg2 and B-Myb, and the cell adhesion molecules VCAM-1 and CD44, whereas genes down-regulated in patients with short survival include MIG-6, Epsl5, and CAK.
DISCUSSION
Generation of a Large, Extensively Sequenced Clone Collection of Tumor-Specific Genes Optimised for cDNA Microarray Production.
Two powerful gene expression profiling technologies, SSH and cDNA microarrays have been combined, for analysis of differential gene expression in human tumors. A combination of subtractive cDNA cloning and cDNA microarrays has previously been successfully used for the rapid identification of differentially expressed genes in estrogen receptor-positive and -negative breast cancer cell lines in human prostate cancer, LSCC, and in Ewing' s sarcoma. In each case, the use of microarrays to screen the cDNA clones isolated by subtraction cloning resulted in an exceptionally high fraction of differentially expressed cDNA clones present on the arrays and allowed the identification of previously unknown genes. In each of these studies, a clone collection of several hundred cDNAs has been generated from an individual type of tumor or cell line, as appropriate to each question addressed. As a major advance, according to the present invention, a large collection of about 9250 clones from seven subtractive cDNA libraries derived from four different tissues of origin was assembled (Table 1) . As the majority of cDNA clones were exclusively derived by subtractive gene cloning of tumor versus normal, the library has been significantly enriched for tumor-specific gene expression. Because of its large size and diverse and balanced sources, it seems likely that most genes expressed in the tumor entities analyzed here are represented in this cDNA clone collection. This clone collection is the largest one derived by subtraction cloning published to date and one of the largest human tumor-specific cDNA libraries available. For the production of cDNA microarrays, 1682 individually selected I.M.A.G.E. clones with a known or suspected role in tumor formation were further added, as well as 105 individually cloned genes previously found to be at least 6-fold up- regulated in colon cancer versus normal colon. For the preparation of the present subtractive libraries, a protocol for the generation of cDNA fragments with increased size was developed and applied. When following the original protocol relying on a restriction enzyme recognising only a 4-base motif such as Rsal, a high percentage of fragments to be smaller than 50 bp was observed. Accordingly, a set of 6-base recognising restriction enzymes was used, 3 with A/T-rich (primarily found at the 3 ' -end of eukaryotic cDNAs) and 3 with G/C-rich recognition sequences (characteristic for 5 ' -termini of eukaryotic genes). As shown in Fig. 1, the present approach resulted in a considerable shift toward longer cDNA fragments; sequence analysis of several thousands of clones revealed an increase in average length to about 800 bp. Such longer cDNA fragments are more favorable for cDNA microarrays: they warrant efficient hybridisation; minimise cross-hybridisation (a problem often observed with short probes) ; and facilitate annotation after sequencing. Furthermore, the present cDNA fragments typically are of sufficient length to be directly used in follow-up studies, e.g., involving translation of portions of the respective gene, circumventing the tedious recloning of longer cDNA fragments of the genes of interest.
Quality Control of RNA Samples and cDNA Microarrays.
Intermingled non-tumor cells, particularly residual normal cells and infiltrating leukocytes, may heavily interfere with tumor- specific transcription profiles. Therefore, tissue specimen were characterised before RNA isolation. Cryosections were taken to ensure integrity of tumor tissue, to confirm histological typing of tumor samples, and to assess the percentage of tumor cells, residual normal cells, hyperplastic cells, necrotic areas, and infiltrating leukocytes (Table 2) . On the basis of these criteria, only 50% of tumor samples were used for RNA isolation. Next, RNA preparations were subjected to capillary electrophoresis (Bioanalyzer) for quality control and quantification, being more accurate than standard photometric assays, particularly for low amounts of mRNA. On the basis of these studies together with data derived from PCR analyses, an additional 30% of RNA preparations was excluded from further analysis due to inadequate RNA quality or due to the presence of chromosomal DNA contamination, as determined by PCR analysis. This stringent selection procedure may be considered as essential because it directly influences hybridisation efficiency, reproducibility, and statistical analyses of the results. To confirm the results of the present microarray experiments with an independent method, 5 randomly chosen genes (N-myc downstream regulated, OSF-2, TP73L, EGLN3, and NAT1; Table 3) were subjected to real-time RT-PCR. The expression levels quantified by real-time RT-PCR highly correlated with those determined by the present cDNA chip analysis, confirming the reliability of the present microarray experiments (Fig. 4). In these analyses, beta-actin was used as a reference gene because it was the gene with the least fluctuation in the present samples according to the chip data. In contrast, GAPDH (frequently used for normalisation) was found to be much higher expressed in the liver than in any other tissue.
Tissue-Wide Expression Profiling.
In contrast to most, if not all, previously reported gene expression profiling studies of human cancer, the expression profiles of representative tumor samples were compared to those of 16 different critical normal tissues, including all normal tissues corresponding to the tumor samples analyzed (tissuewide expression profile) . Accordingly, a mRNA-pool of 16 critical normal tissues was used as a reference probe. In addition, hybridisation experiments were performed not only with tumor samples' but also individually with each of the 16 critical normal tissues of the reference pool plus 6 noncritical normal tissues. Application of restrictive criteria for gene selection resulted in characteristic differences of transcription profiles to those reported previously for solid tumors of the breast, lung, and kidney. For example, because of the extensive comparison to multiple normal tissues, expression signatures derived from infiltrating cells of the immune system such as immunoglobulin genes (B cells), T-cell receptor, CD3D (T cells), or lysozyme and chitinase 1 (macrophages/monocytes) could be eliminated from the present list of tumor-specific genes (Table 3) . All those genes have been included as up-regulated in tumors in previous tran- scriptional profiling studies, e.g., for breast cancer. Importantly, this tissue-wide expression profiling approach allows to identify those genes up-regulated in specific tumors with no or low expression in all 16 critical normal tissues tested, which is important for the development of a chemotherapy with less severe or no side effects and which is an absolute prerequisite for immune therapy. The latter approach aims at the induction of a systemic immune response against a tumor-specific/associated antigen (TAA) and—if the TAA is not exclusively expressed in tumors but also in critical normal tissues—may also lead to a destructive autoimmune response generating severe side effects. For example, prostaglandin D synthetase was found to be slightly up-regulated in breast cancer but most prominently expressed in several vital tissues such as heart (8.6-fold higher than the reference pool) and brain (6.2-fold higher) . The nonre- ceptor tyrosine kinase Etk/Bmx, which was found to be up-regulated about 3-fold in some of the LAC and LSCC samples, has been reported to play an important role for prostate cancer progression and has been suggested as a novel target for chemotherapy in prostate cancer. However, the present tissue wide expression profiling revealed a strong expression in heart, kidney, and skeletal muscle (6.1-, 5-, and 3-fold higher than in the reference pool, respectively) , indicating that severe side effects would have to be expected upon use of Etk/Bmx as a novel therapeutic intervention site. Another example is apolipoprotein D, which has been correlated with malignant transformation and poor prognosis in prostate cancer patients. The present study revealed overexpression of apolipoprotein D only in breast cancer samples derived from patients with more than 9 year overall survival. Again, strong expression was found in several critical normal tissues such as brain, heart, and trachea (6.8-, 4.9-, and 3.5-fold higher than the reference probe).
Identification of Tumor Type-Specific Genes .
Genes overexpressed in about 100% of a specific cancer type such as vascular endothelial growth factor or insulin-like growth factor binding protein 3 were identified in RCC. Other genes were found to be upregulated only in a subset of a given tumor type such as stromelysin 3 or thrombospondin 2 in breast cancer. All those selected genes exhibit at least a 2-fold up-regulation in at least 20% of samples of any tumor type compared with the highest expression value in any of the 16 critical normal tissues, which makes them promising putative targets in an anticancer therapy (Table 3) . Among the selected candidate genes, a number of genes was identified that have been previously described as tumor markers, e.g., pronapsin A, a gene specifically up-regulated in LAC but absent in LSCC or NAT-1, which is involved in detoxification and used as a potential breast cancer marker.
Carcinogenesis and Genes Involved in Ca2+ Homeostasis and Bone Matrix Mineralisation .
Serial analysis of gene expression analysis of differentially expressed genes in non-small cell lung cancer and a comparable study of LSCC identified genes that overlap with many genes found in the present study such as (a) tissue-specific genes such as keratin 6 isoforms, other cytokeratins that have been documented as potential markers for lung cancer, pemphigus vul- garis antigen, and annexin, (b) tumor-specific genes such as parathyroid hormone-related peptide (PTHrP/PTHLH) , which causes humoral hypercalcaemia associated with malignant tumors such as leukemia, RCC, prostate, and breast cancer, and LSCC. Importantly, several differentially regulated genes that are known to be involved in Ca2+ homeostasis were identified; for example, RCN1, CALCA, and S100 proteins such as S100A10 and S100A11, a subgroup of the EF-hand Ca2+-binding protein family (Table 3) . In metastatic cell lines, an altered intracellular localisation has been demonstrated, supporting the hypothesis that S100 proteins might play a crucial role in the regulation of Ca2+ homeostasis in cancer cells. For example, S100A2 was found to be highly up- regulated in ovarian cancer together with other members of the S100 protein family, whereas an increase of S100A6 expression correlates with an increased malignancy in colon tumors. Although a more detailed analysis on the expression profile of members of the S100 protein family in tumors of different origin is needed, this protein class provides promising intervention sites for novel therapeutic strategies. Together with genes in- volved in Ca2+ homeostasis, genes in breast but also in lung cancer and RCC involved in bone matrix mineralisation such as os- teonectin (SPARC and OSN) , osteopontin (OPN and SPP1) , and OSF-2 (Table 3) were identified. Although osteoclasts and osteoblasts were not included in the present panel of normal tissues, the observation that specific tumor types not originating from bone express higher levels of these genes than 16 critical normal tissues is intriguing. Noteworthy, the skeleton is the preferred target of metastatic human breast cancer cells. Bone metastases are indeed found in virtually all advanced breast cancer patients. The high osteotropism of breast cancer cells suggests that they exhibit a selective affinity for mineralised tissues. Mammary malignant cells are able to induce hydroxyapatite crystals deposition within the primary tumor supporting the hypothesis that they can generate a microenvironment that favors the crystallisation of calcium and phosphate ions into the bone-specific hydroxyapatite. The ectopic expression of bone matrix proteins in breast cancer could be involved in conferring osteo- tropic properties to circulating metastatic breast cancer cells and opens the possibility for therapeutic interference with mi- crocalcification during the homing process of metastatic breast cancer cells. Interestingly, the osteoclast differentiation/activation factor osteoprotegerinligand has been shown to be essential for normal mammary gland development and to be responsible for calcium release from the skeleton required for transmission of maternal calcium to neonates in mammalians. Therefore, normal cells of the mammary gland may already exhibit some properties of bone remodeling cells, a function that might be recruited/activated in breast tumor cells as well.
Gene Expression Profiles Predicting the Overall Survival of Breast Cancer Patients .
Tumors of lymph nodepositive breast cancer patients with known clinical outcome were used to determine gene expression signatures predictive of long or short overall survival. Although only a small number of patients was analysed, the present results led to the identification of novel potential diagnostic marker genes. Furthermore, when taken together with other array studies, the present findings highlight the consistent associ- ations of gene expression profiles with clinical outcome. Several genes found to be overexpressed in patients with short survival (Table 4) have already been discussed in the context of breast cancer such as TGF-beta3, VCAM-1, CD44, thyroid hormone receptor, and cyclin Bl, whereas others have not such as ERG2, B-Myb, MTH1, and NET-1. Genes down-regulated relative to normal tissues in patients with short survival are, for example, MIG-6, Epsl5, and APLP2. Interestingly, both MIG-6 and Epsl5 are negative regulators of signaling via the epidermal growth factor receptor, a positive key regulator of breast tumorigenesis . Recently, van λt Veer et al . (Nature 415, 530-536 (2002)) reported on a set of 70 genes with an expression pattern by which breast cancer patients could be classified into those with a poor prognosis and those with a good prognosis with high accuracy. Although these 70 prognostic genes are largely nonidentical to the 42 cjenes identified, many of them are functionally closely related and are involved, e.g., in cell cycle regulation, invasion and metastasis, angiogenesis, and signal transduction. Moreover, despite considerable differences in patient populations and technology platforms used, the study reported here independently arrives at the same general conclusions as van 't Veer et al. and van de Vijver et al. (NEJM 347, 1999-2009 (2002)). Apparently, the ability to metastasise to distant sites, which eventually determines the overall survival, is acquired relatively early during multistep tumorigenesis and thus can be diagnosed in the primary tumor several years before these metastases become manifest. This ability to form hematogeneous (distant) metastases appears to be largely independent of the presence or absence of lymph node metastases. All patients analyzed in the present study had lymph node metastases at the time of diagnosis; nevertheless, 4 of them remained free of distant metastases and disease relapse for at least 9 years of follow-up. An important clinical question concerns whether prognosis profiling is equally useful for all patients with breast cancer or whether it is limited to the specific subgroup (s) of patients for whom 'it has already been demonstrated. The fact that despite considerable differences in study design, the results reported here support the major conclusions of van 't Veer et al. and van de Vijver et al . provides supporting evidence that an accurate prediction of clinical outcome based on gene expression profiling could be generally applicable to all breast cancer patients.
In summary, a modified PCR-based cDNA subtraction method allowed the establishment of seven SSH cDNA libraries that subsequently were used for the preparation of cDNA microarrays. Together with 50 samples derived from lung, breast, or renal cell cancer tissues, a panel of 22 samples from normal tissues was hybridised. This detailed tissue-wide expression profiling led to the identification of 130 individual tumor-specific transcripts (527 clones) showing no or very low expression in 16 vital normal tissues. Gene-wise hierarchical clustering of these 130 genes clearly separated the different tumor types. The majority of the identified genes have not yet been brought into context with tumorigenesis such as genes involved in bone matrix mineralisation or genes controlling calcium homeostasis (RCN1, CALCA, and S100 protein family) . Forty-two genes were identified that significantly correlated with the overall survival of breast cancer patients, genes up-regulated in tumors of patients with a poor prognosis such as cyclin Bl, TGF-beta3, B-Myb, and Erg2, and genes down-regulated such as MIG-6, Espl5, and CAK.
Comparison between the prognosis marker genes of the prior art (van't Veer et al . (Nature 415, 530-536, 2002)) to the markers according to the present invention:
One of the marker genes discriminating between a good or a poor prognosis identified by us and one identified by van't Veer et al . (Nature 415, 530-536, 2002) were evaluated in an independent group of 39 breast cancer patients different from the ones in Figure 6 and by an independent method (quantitative real-time RT-PCR) . The marker genes thus evaluated were cyclin Bl, discovered as a marker gene discriminating between a good or a poor prognosis by the present invention (cyclin Bl is one of the 42 marker genes shown in Table 4), and cyclin B2, which was discovered as one of the 240 best marker genes discriminating between a good or a poor prognosis by van't Veer et al . (Nature 415, 530-536, 2002) . The expression of cyclin Bl and cyclin B2 was determined in each of the 39 breast tumors by quantitative real-time RT-PCR. Patients were then divided into two groups each in three different ways: (1) The 77% of these 39 patients with the lowest cyclin B2 levels and the 23% of these patients with the highest cyclin B2 levels; (2) The 59% of these 39 patients with the lowest cyclin Bl levels and the 41% of these patients with the highest cyclin Bl levels; (3) The 79% of these 39 patients with the lowest cyclin Bl levels and the 21% of these patients with the highest cyclin Bl levels. For each group, overall survival curves were calculated and each two groups were compared to each other (Kaplan-Meyer-analysis; see Fig.5). This analysis showed that patients with a low expression level of cyclin B2 on average have a longer overall survival than patients with a high expression level of cyclin B2 (p=0.09), and that patients with a low expression level of cyclin Bl on average have a longer overall survival than patients with a high expression level of cyclin Bl (p<0.05 if the 59% of patients with the lowest expression levels of cyclin Bl were considered, and p<0.01 if the 79% of patients with the lowest expression levels of cyclin Bl were considered) . These results demonstrate that cyclin Bl, as one of the 42 marker genes shown in Table 4, remains a very good prognostic marker discriminating between breast cancer patients with a long or a short overall survival even when used (1) alone as a single marker; (2) in a group of patients independent of the one shown in Table 4; and (3) if measured with a different method (quantitative real-time RT-PCR) . Furthermore, in this experimental setting, cyclin Bl is a better prognostic marker than cyclin B2, which is one of the 240 best marker genes discriminating between a good or a poor prognosis reported by van't Veer et al. (Nature 415, 530-536, 2002) .
Tables :
Table 1
Figure imgf000050_0001
Table 2(1) :
Figure imgf000051_0001
Table 2 (II)
Figure imgf000052_0001
Table 3 :
Tumor Type Gene Name AccNr k/n m
Breast Cancer Secreted protein, acidic, cysteine-rich (osteonectiπ) (SPARC) NM 003118 14/20 8 Collagen, type I, alpha 1 (COL1A1) NM 000088 13/20 12 Collagen, type I, alpha 2 (COL1A2) NM 000089 12/20 5 Collagen, type III, alpha 1 (COL3A1) NM 000090 11/20 9 Fibronectin 1 (FN1), transcript variant 1 NM 002026 10/20 8 N-acetyltransferase 1 (NAT1) NM 000662 10/20 1 Osteoblast specific factor 2 (fasciclin l-like) (OSF-2) NM 006475 9/20 1 5T oπcofetal trophoblast glycoprotein (5T4) NM 006670 8/20 1 Thrombospondin 2 (THBS2) NM 003247 8/21 1 KIAA0225 protein (KIAA0225) D86978 8/22 1 Pre-B-cell leukemia transcription factor 1 (PBX1) NM 002585 7/20 1 Collagen, type VI, alpha 3 (COL6A3) NM 057167 7/20 7 Platelet-derived growth factor receptor, beta polypeptide (PDGFRB) NM 002609 6/20 1 Similar to glucosamine-6-sulfatases (SULF2) NM 018837 6/20 1 Matrix metalloproteinase 11 (stromelysin 3) (MMP11) NM 0059<tO 6/20 1 Interferon, alpha-inducible protein (G1 P3), transcript variant 3 NM 022873 6/20 3 Transducin (beta)-like 1 (TBL1) NM 005647 6/20 1 Fer-1 (C.elegans)-like 3 (myoferlin) (FER1L3) NM 013 51 6/20 1 Matrix metalloproteinase 13 (collagenase 3) (MMP13) NM 002427 5/20 1 Cyclin D1 (PRAD1 : parathyroid adeno atosis 1) (CCND1) NM 053056 5/20 1 Melanophilin (MLPH) NM 024101 5/20 1 Non-metastatic cells 1, protein (NM23A) NM 000269 5/20 2 Chondroitin sulfate proteoglycan 2 (versican) (CSPG2) NM 004385 5/20 3 Prolactin receptor (PRLR) NM 000949 5/20 1 Small inducibie cytoklne subfamily A (Cys-Cys), member 19 (SCYA19) NM 006274 5/20 1 Homo sapiens H3 histone, family 3B (H3F3B NM 005324 4/20 1 Stanniocalciπ 2 (STC2) NM 003714 4/20 1 Transcription factor AP-2 beta (TFAP2B) NM 003221 4/20 1 X-box binding protein 1 (XBP1) NM 005080 4/20 1 Cathepsin K (pycnodysostosis) (CTSK) NM 000396 4/20 1 Protease, serine, 11 (IGF binding) (PRSS11) NM 002775 4/20 1 Activated RNA polymerase II transcription cofactor (PC ) NM 006713 /20 1 Chromosome 1 open reading frame 29 (C1orf29) NM 006820 4/20 1 Collagen, type V, alpha 1 (COL5A1) NM 000093 4/20 1 Hypothetical protein IMPACT (IMPACT) NM 018439 4/20 1 Melanoma differentiation associated protein-5 (MDA5) NM 022168 4/20 1 Non-metastatio cells 2, protein (NM23B) NM 002512 4/20 1 Plasmiπogen activator, urokinase (PLAU) NM 002658 4/20 1 Bone marrow stromal cell antigen 2 (BST2) NM 004335 4/20 1 Lung AC Omithlne decarboxylase 1 (ODC1) NM 002539 4/11 13 Surfactant, pulmonary-associated protein A2 (SFTPA2) NM 006926 4/11 6 Transmembrane 4 superfamily member 1 (TM4SF1 ) NM 014220 3/11 3 SHC (Src homology 2 domain-containing) transforming protein 1 (SHC1 ) NM 003029 3/11 1 Solute carrier family 34 (sodium phosphate), member 2, (SLC34A2) NM 006424 3/11 2 Chromosome 8 open reading frame 4 (C8orf4) NM 020130 3/11 3 Prostaglandin-endoperoxide synthase 2 NM 000963 3/11 1 Pronapsin A (NAP1) NM 004851 3/11 1 Dual specificity phosphatase 6 (DUSP6) NM 022652 3/11 2 Aspartate beta-hydroxylase (ASPH) NM 032466 3/11 2 Chitinase 3-like 1 (cartilage glycoprotein-39), (CHI3L1) NM 001276 3/11 1 Plasminogen activator, urokinase (PLAU) NM 002658 3/11 6 Reticulocalbin 1, EF-hand calcium binding domain (RCN1) NM 002901 2/11 1 Secreted protein, acidic, cysteine-rich (osteonedin) (SPARC) NM 003118 2/11 2 Solute carrier family 2 (facilitated glucose, transporter), member 1 (SLC2A1 ) NM 006516 2/11 2 TAF4 RNA polymerase II, TATA box binding protein-associated factor(TAF4) NM 003185 2/11 1 Trefoil factor 3 (intestinal) (TFF3) NM 003226 2/11 2 TyrosyMRNA synthetase (YARS) NM 003680 2/11 1 Tissue factor pathway inhibitor 2 (TFPI2) NM 006528 2/11 1 Collagen, type I, alpha 2 (C0L1A2) NM 000089 2/11 1 Calcltonin/calcitonin-related polypeptide, alpha (CALCA) NM 001741 2/11 1 Calumemn (CALU) NM 001219 2/11 1 Cytochrome P450, subfamily I (CYP1 B1 ) NM 000104 2/11 1 Chondroitin sulfate proteoglycan 2 (versican) (CSPG2) NM 004385 2/11 1 Collagen, type III, alpha 1 (COL3A1) NM 000090 2/11 1 Cystatin B (stefin B) (CSTB), NM 000100 2/11 1 5 hydroxytryptamlne (serotonin) receptor 2B (HTR2B) NM 000867 2/11 1 Epiregulin (EREG) NM 001432 2/11 2 Keratin 6A (KRT6A) NM_005554 2/11 41 Lung SCC Keratin 6A (KRT6A) NM 005554 9/11 80 RAN, member RAS oncogene family (RAN) NM 006325 8/11 1 Neurotrophic tyroslne kinase, receptor, type 1 (NTRK1) NM 002529 8/11 1 Solute carrier family 2 (facilitated glucose, transporter), member 1 (SLC2A1 ) NM 006516 8/11 6 Bullous pemphigoid antigen 1 (230/240kD) (BPAG1) NM 001723 8/11 1 S100 calcium-binding protein A2 (S100A2) NM 005978 8/11 10 CDNA FLJ33151 fis AK057713 7/11 1 Aldo-keto reductase family 1 , member C3 (AKR1 C3) NM 003739 7/11 3 Phosphoglycerate kinase 1 (PGK1 ) NM 000291 7/11 1 P53-induced protein PIGPC1 (PIGPC1) NM 022121 7/11 8 Collagen, type I, alpha 1 (COL1A1) NM 000088 7/11 3 N-myc downstream regulated (NDRG1 ) NM 006096 7/11 4 Sperm specific antigen 2 (SSFA2) NM 006751 7/11 1 Des oplakin (DPI, DPII) (DSP) NM 004415 7/11 1 Tumor protein p73-lιke (TP73L) NM 003722 6/11 2 KIAA2019 protein (KIAA2019) AB095939 6/11 1 Keratin 6B (KRT6B) NM 005555 6/11 3 Tπpartite motif-containing 29 (TRIM29) NM 012101 6/11 1 Aldo keto reductase family 1 , member B10 (aldose reductase) (AKR1 B10) NM 020299 6/11 1 Glycoprotem (transmembrane) nmb (GPNMB) NM 002510 5/11 3 Keratin 5 (KRT5) NM 000424 5/11 2 Keratin 14 (KRT14) NM 000526 5/11 2 Osteoblast specific factor 2 (fasciclin I like) (OSF-2) NM 006475 5/11 1 Hypothetical protein GC5306 (MGC5306) NM 024116 5/11 1 Plasminogen activator, urokinase (PLAU) NM 002658 5/11 6 Translational activator GCN1 U88836 5/11 1 S100 calcium-binding protein A11 (calgizzaππ) (S100A11) NM 005620 5/11 2 Parathyroid hormone-like hormone (PTHLH) NM 002820 5/11 2 Secreted protein, acidic, cysteine-rich (osteonectm) (SPARC) NM 003118 5/11 3 Disintegπn and metalloproteinase domain 9 (meltπn gamma) (ADAM9) NM 003816 5/11 1 Aldo-keto reductase family 1 , member C1 (AKR1 C1) NM 001353 5/11 2 Aldo-keto reductase family 1 , member C2 (AKR1 C2) NM 001354 5/11 3 Transmembrane protein vezatin (VEZATIN) NM 017599 5/11 1 Chromosome 18, clone RP11-650P15 AC021549 5/11 3 Solute carrier family 5 (sodium-dependent vitamin transporter), member 6 (SLC5A6) NM 021095 4/11 1 Peroxiredox 1 (PRDX1) NM 002574 4/11 1 Secreted phosphoprotein 1 (osteopont ) (SPP1) NM 000582 4/11 2 Aspartate beta-hydroxylase (ASPH) NM 032466 4/11 2 Laminm, gamma 2 (LAMC2) NM 005562 4/11 2 Claudin l (CLDN1) NM 021101 4/11 1 Annexιn A1 (ANXA1) NM 000700 4/11 1 Collagen, type I, alpha 2 (COL1A2) NM 000089 4/11 1 Collagen, type III, alpha 1 (COL3A1) NM 000090 4/11 6 Cystatin B (stefin B) (CSTB) NM 000100 4/11 1 Desmoglein 3 (pemphigus vulgaπs antigen) (DSG3) NM 001944 4/11 1 Dual specificity phosphatase 5 (DUSP5) NM 004 19 4/11 1 EGL nine (C elegans) homolog 3 (EGLN3) NM 022073 4/11 1 Glyoxalase l (GLOI) NM_006708 4/11 1 RCC EGL nine (C elegans) homolog 3 (EGLN3) NM 022073 8/8 1 N-myc downstream regulated (NDRG1 ) NM 006096 8/8 22 Vascular eπdothelial growth factor (VEGF) NM 003376 8/8 20 Insulin-like growth factor binding protein 3 (IGFBP3) NM 000598 8/8 32 Endothelial cell-specific molecule 1 (ESM1) NM 007036 8/8 1 Met proto-oncogene (hepatocyte growth factor receptor) (MET) NM 0002 5 7/8 7 Human DNA sequence from clone RP11-269F20 on chromosome 1 AL591721 7/8 1 Regulator of G protein signalling 5 (RGS5) NM 003617 6/8 16 Alpha glucosidase II alpha subunlt (G2AN) NM 014610 6/8 1 Iπterieukin enhancer binding factor 2, 45kD (ILF2) NM 004515 6/8 1 Epidermal growth factor receptor (EGFR) NM 005228 6/8 2 Transcnption factor 19 (SC1) (TCF19) NM 007109 6/8 1 Yes-associated protein 1, 65 kDa (YAP1) NM 006106 6/8 1 Phosphodiesterase IB, calmodulin-dependent (PDE1B) NM 000924 5/8 1 MSTP032 protein (MSTP032) NM 025226 5/8 11 Aldehyde dehydrogenase 1 family, member A3 (ALDH1A3) NM 000693 5/8 1 Secreted protein, acidic, cysteine-rich (osteonectm) (SPARC) NM 003118 4/8 6 Platelet-derived growth factor receptor, beta polypeptide (PDGFRB) NM 002609 4/8 1 Interferon induced protein with tetratncopeptide repeats 1 (IFIT1) NM 001548 4/8 1 Human DNA sequence from clone 596C15 on chromosome Xq23 AL031387 4/8 1 Cyclin D1 (PRAD1 parathyroid adenomatosis 1) (CCND1) NM 053056 4/8 1 Plasminogen activator, urokinase (PLAU) NM 002658 3/8 1 Phosphoglycerate kinase 1 (PGK1) NM 000291 3/8 4 Inhibin, beta B (activin AB beta polypeptide) (INHBB) NM 002193 3/8 2 Hypothetical protein FLJ13081 (FLJ13081) NM 024834 3/8 1 Heat shock protein 75 (TRAP1) NM 016292 3/8 1 BAC clone RP11-529E15 AC073218 3/8 1 Chromosome 1 open reading frame 8 (C1orf8) NM 004872 3/8 1 Transforming growth factor, alpha (TGFA) NM 003236 2/8 1 hypothetical protein DKFZp434F0318 (DKFZP434F0318) NM 030817 2/8 1 Transmembrane 4 superfamily member 1 (T 4SF1) NM 014220 2/8 1 CD27 binding (Siva) protein (SIVA) NM 006427 2/8 1 Integral, beta 1 (ITGB1) NM 002211 2/8 1 Transforming growth factor beta-induced, 68kD (TGFBI) NM 000358 2/8 4 Dlslntegrin and metalloproteinase domain 9 (meltnn gamma) (ADAM9) NM 003816 2/8 2 Tissue factor pathway Inhibitor 2 (TFPI2) NM 006528 2/8 1 Cytochrome P450 (CYP2J2) NM 000775 2/8 1 Fer-1 (C elegans)-like 3 (myoferlm) (FER1L3) NM 013451 2/8 1 Tumor necrosis factor receptor superfamily, member 21 (TNFRSF21) NM 014452 2/8 1 Table 4
Figure imgf000054_0001
Table 4 shows that gene expression correlates with long or short overall survival of breast cancer patients. The Table lists 42 genes with an expression profile most highly associated with the two survival groups (more than 9 versus less than 3 years) , as selected by Student's t test (significance, P less than 0.02), accompanied by P-chance analysis, as described in "Materials and Methods," to eliminate false positives. Only genes with P less than P-chance were selected.

Claims

Claims :
1. A method for classifying an individual as having a good prognosis (survival more than 9 years, after initial diagnosis) or a poor prognosis (survival less than 3 years after initial diagnosis) , comprising detecting a difference in the expression of at least one gene of Table 4 in a cell sample taken from the individual relative to a control.
2. A method for classifying a cell sample as being a tumor cell comprising detecting a difference in the expression by said cell sample of at least one gene of Table 4 relative to at least one control cell and classifying the cell sample as a tumor cell, if the at least one gene of Table 4 shows at least 1.5-fold higher expression than the control cell.
3. The method of claim 1 or 2, wherein a difference in the expression of at least 2, preferably at least 3, more preferred at least 5, especially at least 10, genes of Table 4 are detected.
4. The method of any one of claims 1 to 3, wherein the cell sample is classified as a tumor cell, if the at least one gene of Table 4 shows at least 2-fold, preferably 3-fold, especially 5-fold higher expression than the control cell.
5. The method of any one of claims 1 to 4, wherein the tumor is selected from the group consisting of breast cancer (BC) , lung squamous cell cancer (LSCC) , lung adenocarcinoma (LAC) and renal cell cancer (RCC) .
6. The method of any one of claims 1 to 5, wherein the further expression tumor marker genes, for which an at least 2-fold higher expression has been verified for tumor cells is detected and compared to a control cell.
7. A method according to any one of claims 1 to 6, wherein at least one of the genes with Ace. o. X77303, U07707, NM018948, NM003379, M17254, X14149, NM078467, X13293, NM000610 and NM031966 are examined with respect to a difference in expression relative to a control, especially for classifying an individual with a good prognosis.
8. A method according to any one of claims 1 to 7, wherein at least one of the genes with the Ace. No. NM021238, NM000297, D38594, AL034384, NM002235, AC009433, X13293, NM000610, NM031966, NM020698 and AA627385 are examined with respect to a difference in expression relative to a control, especially for classifying an individual with a poor prognosis.
9. A tumor diagnostics microarray comprising at least one marker, preferably at least two markers, especially at least three markers, of Table 4.
10. The microarray of claim 9, comprising at least 5, preferably at least 10, especially at least 20, markers of Table 4.
11. A microarray for distinguishing cell samples from individuals having a good prognosis and cell samples from individuals having a poor prognosis, comprising a positionally-addressable array of polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality of polynucleotide probes of different nucleotide sequences, each of said different nucleotide sequences comprising a sequence complementary and hybridisable to a different nucleotide sequences, said plurality consisting of at least one, preferably at least two, especially at least three, of the genes corresponding to the markers listed in table 4, wherein at least 50% of the probes on the microarray are present in said table 4.
12. The use of a microarray according to any one of claims 9 to 11 for tumor diagnosis in a tissue or body fluid sample from an individual.
PCT/EP2005/000858 2004-01-30 2005-01-28 A method for classifying a tumor cell sample based upon differential expression of at least two genes WO2005076005A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP04450020.5 2004-01-30
EP04450020 2004-01-30

Publications (2)

Publication Number Publication Date
WO2005076005A2 true WO2005076005A2 (en) 2005-08-18
WO2005076005A3 WO2005076005A3 (en) 2009-02-05

Family

ID=34833853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/000858 WO2005076005A2 (en) 2004-01-30 2005-01-28 A method for classifying a tumor cell sample based upon differential expression of at least two genes

Country Status (1)

Country Link
WO (1) WO2005076005A2 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8026060B2 (en) 2006-01-11 2011-09-27 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8148076B2 (en) 2002-11-15 2012-04-03 Genomic Health, Inc. Gene expression profiling of EGFR positive cancer
US8725426B2 (en) 2012-01-31 2014-05-13 Genomic Health, Inc. Gene expression profile algorithm and test for determining prognosis of prostate cancer
WO2014100220A2 (en) * 2012-12-18 2014-06-26 Biocare Medical, Llc Antibody cocktail systems and methods for classification of histologic subtypes in lung cancer
US8765383B2 (en) 2009-04-07 2014-07-01 Genomic Health, Inc. Methods of predicting cancer risk using gene expression in premalignant tissue
US8906625B2 (en) 2006-03-31 2014-12-09 Genomic Health, Inc. Genes involved in estrogen metabolism
US9417243B2 (en) 2011-05-10 2016-08-16 Biocare Medical, Llc Systems and methods for anti-PAX8 antibodies
US9428576B2 (en) 2013-02-28 2016-08-30 Biocare Medical, Llc Anti-p40 antibodies systems and methods
US9429577B2 (en) 2012-09-27 2016-08-30 Biocare Medical, Llc Anti-uroplakin II antibodies systems and methods
US9816997B2 (en) 2013-10-03 2017-11-14 Biocare Medical, Llc Anti-SOX10 antibody systems and methods
US10179936B2 (en) 2009-05-01 2019-01-15 Genomic Health, Inc. Gene expression profile algorithm and test for likelihood of recurrence of colorectal cancer and response to chemotherapy
IT201700109459A1 (en) * 2017-09-29 2019-03-29 Univ Degli Studi Di Perugia METHOD TO PERFORM PROGNOSIS OF BREAST CANCER, KITS AND THE USE OF THESE
US10260104B2 (en) 2010-07-27 2019-04-16 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer
CN111159409A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN112816431A (en) * 2020-09-22 2021-05-18 西华师范大学 Dual-wavelength capillary electrophoresis detection system for detecting tumor marker
CN117594133A (en) * 2024-01-19 2024-02-23 普瑞基准科技(北京)有限公司 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107312854A (en) * 2017-07-20 2017-11-03 北京泱深生物信息技术有限公司 A kind of diagnosis marker and its therapeutic targets of larynx squamous carcinoma

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001074405A1 (en) * 2000-03-31 2001-10-11 Gene Logic, Inc. Gene expression profiles in esophageal tissue
WO2001094629A2 (en) * 2000-06-05 2001-12-13 Avalon Pharmaceuticals Cancer gene determination and therapeutic screening using signature gene sets

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001074405A1 (en) * 2000-03-31 2001-10-11 Gene Logic, Inc. Gene expression profiles in esophageal tissue
WO2001094629A2 (en) * 2000-06-05 2001-12-13 Avalon Pharmaceuticals Cancer gene determination and therapeutic screening using signature gene sets

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
AMATSCHEK STEFAN ET AL: "Tissue-wide expression profiling using cDNA subtraction and microarrays to identify tumor-specific gene." CANCER RESEARCH, vol. 64, no. 3, 1 February 2004 (2004-02-01), pages 844-856, XP001206325 ISSN: 0008-5472 *
CHANG K ET AL: "FREQUENT EXPRESSION OF THE TUMOR ANTIGEN CAK1 IN SQUAMOUS-CELL CARCINOMAS" INTERNATIONAL JOURNAL OF CANCER, NEW YORK, NY, US, vol. 51, no. 4, 19 June 1992 (1992-06-19), pages 548-554, XP002038952 ISSN: 0020-7136 *
HUDELIST GERNOT ET AL: "Use of high-throughput protein array for profiling of differentially expressed proteins in normal and malignant breast tissue." BREAST CANCER RESEARCH AND TREATMENT. AUG 2004, vol. 86, no. 3, August 2004 (2004-08), pages 281-291, XP001206326 ISSN: 0167-6806 *
NEMOTO TETSUO ET AL: "Overexpression of protein tyrosine kinases in human esophageal cancer" PATHOBIOLOGY, vol. 65, no. 4, July 1997 (1997-07), pages 195-203, XP009048170 ISSN: 1015-2008 *
SÁNCHEZ-CARBAYO MARTA: "Use of high-throughput DNA microarrays to identify biomarkers for bladder cancer." CLINICAL CHEMISTRY. JAN 2003, vol. 49, no. 1, January 2003 (2003-01), pages 23-31, XP001206327 ISSN: 0009-9147 *
VAN HEEK N TJARDA ET AL: "Gene expression profiling identifies markers of ampullary adenocarcinoma." CANCER BIOLOGY & THERAPY. JUL 2004, vol. 3, no. 7, July 2004 (2004-07), pages 651-656, XP009048026 ISSN: 1538-4047 *
WU L ET AL: "MOLECULAR CLONING OF THE HUMAN CAK1 GENE ENCODING A CYCLIN-DEPENDENT KINASE-ACTIVATING KINASE" ONCOGENE, BASINGSTOKE, HANTS, GB, vol. 9, no. 7, 1 July 1994 (1994-07-01), pages 2089-2096, XP000673011 ISSN: 0950-9232 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8148076B2 (en) 2002-11-15 2012-04-03 Genomic Health, Inc. Gene expression profiling of EGFR positive cancer
US8198024B2 (en) 2006-01-11 2012-06-12 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8029995B2 (en) 2006-01-11 2011-10-04 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8153380B2 (en) 2006-01-11 2012-04-10 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8153378B2 (en) 2006-01-11 2012-04-10 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8153379B2 (en) 2006-01-11 2012-04-10 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8273537B2 (en) 2006-01-11 2012-09-25 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8367345B2 (en) 2006-01-11 2013-02-05 Genomic Health Inc. Gene expression markers for colorectal cancer prognosis
US8026060B2 (en) 2006-01-11 2011-09-27 Genomic Health, Inc. Gene expression markers for colorectal cancer prognosis
US8906625B2 (en) 2006-03-31 2014-12-09 Genomic Health, Inc. Genes involved in estrogen metabolism
US8765383B2 (en) 2009-04-07 2014-07-01 Genomic Health, Inc. Methods of predicting cancer risk using gene expression in premalignant tissue
US10179936B2 (en) 2009-05-01 2019-01-15 Genomic Health, Inc. Gene expression profile algorithm and test for likelihood of recurrence of colorectal cancer and response to chemotherapy
US10260104B2 (en) 2010-07-27 2019-04-16 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer
US9417243B2 (en) 2011-05-10 2016-08-16 Biocare Medical, Llc Systems and methods for anti-PAX8 antibodies
US11011252B1 (en) 2012-01-31 2021-05-18 Genomic Health, Inc. Gene expression profile algorithm and test for determining prognosis of prostate cancer
US8725426B2 (en) 2012-01-31 2014-05-13 Genomic Health, Inc. Gene expression profile algorithm and test for determining prognosis of prostate cancer
US9429577B2 (en) 2012-09-27 2016-08-30 Biocare Medical, Llc Anti-uroplakin II antibodies systems and methods
US9823251B2 (en) 2012-09-27 2017-11-21 Biocare Medical, Llc Anti-Uroplakin II antibodies systems and methods
WO2014100220A2 (en) * 2012-12-18 2014-06-26 Biocare Medical, Llc Antibody cocktail systems and methods for classification of histologic subtypes in lung cancer
US10429390B2 (en) 2012-12-18 2019-10-01 Biocare Medical, Llc Antibody cocktail systems and methods for classification of histologic subtypes in lung cancer
WO2014100220A3 (en) * 2012-12-18 2014-10-02 Biocare Medical, Llc Antibody cocktail systems and methods for classification of histologic subtypes in lung cancer
US9708395B2 (en) 2013-02-28 2017-07-18 Biocare Medical, Llc Anti-p40 antibodies systems and methods
US9428576B2 (en) 2013-02-28 2016-08-30 Biocare Medical, Llc Anti-p40 antibodies systems and methods
US9816997B2 (en) 2013-10-03 2017-11-14 Biocare Medical, Llc Anti-SOX10 antibody systems and methods
US10295542B2 (en) 2013-10-03 2019-05-21 Biocare Medical, Llc Systems and methods for anti-SOX10 antibodies
IT201700109459A1 (en) * 2017-09-29 2019-03-29 Univ Degli Studi Di Perugia METHOD TO PERFORM PROGNOSIS OF BREAST CANCER, KITS AND THE USE OF THESE
CN111159409A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN111159409B (en) * 2019-12-31 2023-06-02 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN112816431A (en) * 2020-09-22 2021-05-18 西华师范大学 Dual-wavelength capillary electrophoresis detection system for detecting tumor marker
CN117594133A (en) * 2024-01-19 2024-02-23 普瑞基准科技(北京)有限公司 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Also Published As

Publication number Publication date
WO2005076005A3 (en) 2009-02-05

Similar Documents

Publication Publication Date Title
WO2005076005A2 (en) A method for classifying a tumor cell sample based upon differential expression of at least two genes
Amatschek et al. Tissue-wide expression profiling using cDNA subtraction and microarrays to identify tumor-specific genes
JP6404304B2 (en) Prognosis prediction of melanoma cancer
JP4619350B2 (en) Diagnosis and prognosis of breast cancer patients
JP2007506442A (en) Gene expression markers for response to EGFR inhibitors
US20030190640A1 (en) Genes expressed in prostate cancer
JP4913331B2 (en) Prognosis of colorectal cancer
JP2006521793A (en) Gene expression marker responsive to EGFR inhibitor drug
US20130065789A1 (en) Compositions and methods for classifying lung cancer and prognosing lung cancer survival
JP2006521793A5 (en)
WO2002103320A2 (en) Diagnosis and prognosis of breast cancer patients
US20110143946A1 (en) Method for predicting the response of a tumor in a patient suffering from or at risk of developing recurrent gynecologic cancer towards a chemotherapeutic agent
KR20080098055A (en) Urine gene expression ratios for detection of cancer
JP2007513635A (en) Gene expression profiles and methods of use
EP2307570A1 (en) Molecular signature of liver tumor grade and use to evaluate prognosis and therapeutic regimen
WO2008157277A1 (en) Methods for evaluating breast cancer prognosis
US20180172689A1 (en) Methods for diagnosis of bladder cancer
JP2005270093A (en) Gene participating in estimating postoperative prognosis of breast cancer
US11913076B2 (en) Prostate cancer gene profiles and methods of using the same
US20150176078A1 (en) Prostate cancer gene expression profiles
US8105777B1 (en) Methods for diagnosis and/or prognosis of colon cancer
JP2006505256A (en) Different gene expression patterns to predict the chemical sensitivity and chemical resistance of docetaxel
WO2005080570A1 (en) Gene relating to estimation of postoperative prognosis for breast cancer
EP1699937A2 (en) Predicting response and outcome of metastatic breast cancer anti-estrogen therapy
JP2007510424A (en) Molecular marker

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase in:

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase