WO2008103761A2

WO2008103761A2 - Methods and compositions for cancer diagnosis and treatment based on nucleic acid methylation

Info

Publication number: WO2008103761A2
Application number: PCT/US2008/054468
Authority: WO
Inventors: Mathias Enrich; Dirk J. Van Den Boom
Original assignee: Sequenom, Inc.
Priority date: 2007-02-20
Filing date: 2008-02-20
Publication date: 2008-08-28
Also published as: WO2008103763A3; WO2008103763A2; WO2008103761A9

Abstract

Featured herein are compositions and methods for identifying a subject at risk of cancer, which comprises determining the methylation status of one or more nucleic acid target gene regions in a human nucleic acid sample, wherein the methylation status is associated with the occurrence of cancer. The present invention also provides evidence that PRC2 target methylation is present in multiple forms of cancer. In six out of seven tumor types, hypermethylated genes were found to have promoters that are enriched for PRC2 targets. Thus, the invention further includes nucleic acid target gene regions that comprise one or more PRC2 binding sites.

Description

METHODS AND COMPOSITIONS FOR CANCER DIAGNOSIS AND TREATMENT BASED

ON NUCLEIC ACID METHYLATION

RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Patent Application No. 60/890,822 filed

February 20, 2007, entitled "METHODS AND COMPOSITIONS FOR CANCER DIAGNOSIS AND TREATMENT BASED ON NUCLEIC ACID METHYLATION," naming Mathias Ehrich and Dirk van den Boom as inventors, and bearing attorney docket no. SEQ-6006-PV. This application also claims the benefit of U.S. Provisional Patent Application No. 60/890,825 filed February 20, 2007, entitled "METHODS AND COMPOSITIONS FOR CANCER DIAGNOSIS AND TREATMENT BASED ON NUCLEIC ACID METHYLATION," naming Mathias Ehrich, Dirk van den Boom and Aaron Scalia as inventors, and bearing attorney docket no. SEQ-6007-PV. This application is related to U.S. Patent Application No. 10/888,359 filed July 9, 2004, entitled "Methods and Compositions for Phenotype Identification Based on Nucleic Acid Methylation," naming Dirk van den Boom & Mathias Ehrich as inventors, and bearing attorney docket no. SEQ.001.P; and PCT Patent Application No.

PCT/US2006/030256 filed August 2, 2006, entitled "Methods and Compositions for Disease Prognosis Based on Nucleic Acid Methylation," naming Dirk van den Boom & Mathias Ehrich as inventors, and bearing attorney docket no. SEQ-4098-PC. This application also is related to International Patent Application No. PCT/US2005/009929 filed March 24, 2005 (attorney docket no. SEQ-2080-PC), U.S. Patent Application No. 11/089,805 filed March 24, 2005 (attorney docket no. SEQ-2080-UT) and U.S.

Provisional Patent Application No. 60/556,632 filed March 26, 2004 (attorney docket no. SEQ-2080-PV), each entitled "Base Specific Cleavage Of Methylation- Specific Amplification Products In Combination With Mass Analysis" and each naming Mathias Ehrich and Dirk van den Boom as inventors. These applications are related to subject matter in U.S. Application No. 10/272,665 filed October 15, 2002, entitled "Methods For Generating Databases And Databases For Identifying Polymorphic Genetic

Markers" and naming Andreas Braun, Christian Jurinke and Dirk van den Boom as inventors. Each of the foregoing patent applications is incorporated herein by reference in its entirety in jurisdictions allowing incorporation by reference.

FIELD OF THE INVENTION

The present invention relates to diagnostic applications in the field of medicine and biotechnology. More specifically, the invention relates to methods and compositions for the diagnosis of cancer based on the methylation state of nucleic acids alone or in combination with other diagnostic methods.

BACKGROUND Genetic information is stored not only in the sequential arrangement of nucleotide bases, but also in covalent modification of selected bases (see, e.g. , Robertson et al , Nature Rev. Genet. 1 : 11-19 (2000)). One of these covalent modifications is methylation of cytosine nucleotides, particularly cytosines adjacent to guanine nucleotides in "CpG" dinucleotides. Covalent addition of methyl groups to cytosine within CpG dinucleotides is catalyzed by proteins from the DNA methyltransferase (DNMT) family (Amir et al , Nature Genet. 23: 185-88 (1999); Okano et al, Cell 99:247-57 (1999)). In the human genome, CpG dinucleotides are generally under represented, and many of the CpG dinucleotides occur in distinct areas called CpG islands. A large proportion of these CpG islands can be found in promoter regions of genes. The conversion of cytosine to 5'-methylcytosine in promoter associated CpG islands has been linked to changes in chromatin structure and often results in transcriptional silencing of the associated gene. Transcriptional silencing by DNA methylation has been linked to mammalian development, imprinting and X-Chromosome inactivation, suppression of parasitic DNA and numerous cancer types (see, e.g., Li et al, Cell 69:915-26 (1992); Okano et al, Cell 99:247-57 (1999)).

DNA methylation has become increasingly important in the field of cancer research. Methylation changes have been associated with the occurrence of cancer, and these changes may serve as markers for the early detection of neoplastic events (Costello et al, Nature Genet. 24: 132-38 (2000)). The best known methylation changes refer to promoter specific hypermethylation with correlated downregulation of the corresponding transcription product (Laird, P. W. Nat Rev Cancer 3, 253-66 (2003)). Another frequent methylation change is global hypomethylation of repetitive elements within the genome (Rollins, R.A. et al. Genome Res 16, 157-63 (2006)). Together, studying these changes offer a powerful tool for cancer research, in terms of diagnostics, prognostics and treatment.

SUMMARY

In embryonic stem cells a distinct set of developmental regulator genes is silenced through epigenetic modifications to maintain pluripotency. These genes have binding sites for the polycomb repressive complex 2 (PRC2) in their promoter region. Subunits of the PRC2 complex are able to recruit DNA methyltransferases (DNMTs) and initiate de-novo methylation. Provided herein is evidence that PRC2 target methylation is present in multiple forms of cancer. A cell line model was used for discovery and a subset of the results were further validated using clinical specimens. In six out of seven tumor types, hypermethylated genes were found to have promoters that are enriched for PRC2 targets.

Thus, featured herein are methods for identifying a subject at risk of cancer and/or a risk of cancer in a subject, which comprises determining the methylation status of one or more nucleic acid target gene regions described herein in a human nucleic acid sample, wherein the methylation status is associated with the occurrence of cancer.

The invention in part provides a method for detecting the presence or absence of a cancer in a tissue or cell that correlates with changes in the methylation state of said tissue or cell. In an embodiment of the invention, the method comprises treating a nucleic acid sample from said tissue or cell with a reagent that modifies unmethylated cytosine to produce uracil; amplifying a nucleic acid target gene region using at least one primer that hybridizes to a strand of said nucleic acid target gene region producing amplified nucleic acids; determining the nucleic acid target gene region sequence and methylation sites by sequence analysis of said amplified nucleic acids thereby determining a characteristic methylation state; and comparing the percentage of methylated cytosine to unmethylated cytosine for each of said methylation sites of said characteristic methylation state of said sample from said tissue or cell nucleic acid to the percentage of methylated cytosine to unmethylated cytosine for each of said methylation sites of a healthy tissue or cell nucleic acid sample of the same type thereby detecting the presence or absence of said cancer. In a related embodiment, the nucleic acid target gene region contains one or more PRC2 binding sites. The invention in part provides a method of classifying the susceptibility of a tissue or cell to a cancer wherein said cancer is correlated with changes in the methylation state of said tissue or cell. In an embodiment of the invention, the method comprises treating a nucleic acid sample from said tissue or cell with a reagent that modifies unmethylated cytosine to produce uracil; amplifying a nucleic acid target gene region using at least one primer that hybridizes to a strand of said nucleic acid target gene region producing amplified nucleic acids; determining the nucleic acid target gene region sequence and methylation sites by sequence analysis of said amplified nucleic acids thereby determining a characteristic methylation state; and comparing the percentage of methylated cytosine to unmethylated cytosine for each of said methylation sites of said characteristic methylation state of said sample from said tissue or cell nucleic acid to the percentage of methylated cytosine to unmethylated cytosine for each of said methylation sites of a healthy tissue or cell nucleic acid sample of the same type thereby classifying the susceptibility of said tissue or cell to said cancer. In a related embodiment, the nucleic acid target gene region contains one or more PRC2 binding sites. The methods described herein have been practiced using a novel approach for DNA methylation analysis. This method employs MALDI-TOF analysis to overcome the limitations of previous large scale methylation analysis methods. Using a combination of four base specific cleavage reactions, each CpG of a target region can be analyzed individually and is represented by multiple indicative mass signals. The acquired information about the methylation status of the examined region is based on numerous independent observations. The redundancy of this information can be leveraged to achieve higher confidence in qualitative analysis, and to obtain highly accurate averages in quantitative analysis with small standard deviations. The present methods may be customized to meet individual needs in DNA methylation analysis. For example, discovery of methylation in large stretches of genomic DNA with a single cleavage reaction, methylation ratio analysis, where fractions of methylated DNA are as low as 5% may be detected in mixtures of methylated and non-methylated template, and methylation pattern analysis, where the methylation status of each CpG within a target region can be determined as a group or independently. The general applicability of these methods have been demonstrated by reconstructing the described methylation sites for IGF2/ Hl 9 using cloned DNA as well as genomic DNA (see Examples 1- 7). The semi-quantitative assessment of methylation in larger target regions spanning multiple CpG sites was demonstrated and was able to accurately analyze methylation down to ratios of approximately 5%. The large-scale analysis of methylation in AML is a first implementation of the method for quantitative assessment of methylation ratios in a high-throughput format to predict AML patient outcome.

In some embodiments, the number of target gene regions is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 147, 150 or more.

In certain embodiments, the comparison of methylation states or characteristic methylation states is made by use of a classification algorithm.

In particular embodiments, the reagent that modifies unmethylated cytosine to produce uracil is bisulfite. In certain embodiments, the methylated or unmethylated nucleic acid base is cytosine. In another embodiment, a non-bisulfite reagent modifies unmethylated cytosine to produce uracil.

In selected embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies nucleotides of the target nucleic acid molecule as a function of the methylation state of the target nucleic acid molecule, amplifying treated target nucleic acid molecule, fragmenting amplified target nucleic acid molecule, and detecting one or more amplified target nucleic acid molecule fragments, and based upon the fragments, such as size and/or number thereof, identifying the methylation state of a target nucleic acid molecule, or a nucleotide locus in the nucleic acid molecule, or identifying the nucleic acid molecule or a nucleotide locus therein as methylated or unmethylated.

Fragmentation can be performed, for example, by treating amplified products under base specific cleavage conditions. Detection of the fragments can be effected by measuring or detecting a mass of one or more amplified target nucleic acid molecule fragments, for example, by mass spectrometry such as MALDI-TOF mass spectrometry. Detection also can be affected, for example, by comparing the measured mass of one or more target nucleic acid molecule fragments to the measured mass of one or more reference nucleic acid, such as measured mass for fragments of untreated nucleic acid molecules. In an exemplary method, the reagent modifies unmethylated nucleotides, and following modification, the resulting modified target is specifically amplified.

In some embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; contacting the treated target nucleic acid molecule with a primer containing one or more nucleotides complementary to the selected nucleotide, or one or more nucleotides complementary to the different nucleotide; treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, whereby nucleotides are synthesized onto primers hybridized to the target nucleic acid molecule; treating the synthesized products under base specific cleavage conditions; and detecting the products of the cleavage treatment, where a target nucleic acid molecule containing one or more methylated or unmethylated selected nucleotides is determined according to the number of cleavage products or according to a comparison between one or more cleavage products and one or more references.

In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; amplifying the treated target nucleic acid molecule to form an amplification product; contacting the treated target nucleic acid molecule with a primer containing one or more nucleotides complementary to a nucleotide complementary to the selected nucleotide, or one or more nucleotides complementary to a nucleotide complementary to the different nucleotide; treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, whereby nucleotides are synthesized onto primers hybridized to the target nucleic acid molecule; treating the synthesized products under base specific cleavage conditions; and detecting the products of the cleavage treatment, where a target nucleic acid molecule containing one or more methylated or unmethylated selected nucleotides is determined according to the number of cleavage products or according to a comparison between one or more cleavage products and one or more references.

In some embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent selected from among a reagent that modifies an unmethylated selected nucleotide to produce a different nucleotide, and a reagent that modifies a methylated selected nucleotide to produce a different nucleotide; specifically amplifying the treated target nucleic acid molecule by a method selected from: (i) contacting the treated target nucleic acid molecule with a primer that specifically hybridizes to a target nucleic acid region containing one or more of the selected nucleotides or one or more of the different nucleotides, and treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, and (ii) amplifying the treated target nucleic acid molecule to form an amplification product, contacting the amplification product with a primer that specifically hybridizes to a target nucleic acid region containing one or more of the selected nucleotides, or one or more of the different nucleotides, and treating the contacted amplification product under nucleic acid synthesis conditions; treating the amplified products with base specific cleavage conditions; and detecting the products of the cleavage treatment, where a target nucleic acid molecule containing one or more methylated or unmethylated selected nucleotides is indicated by an observation selected from among: the presence of two or more cleavage products, the presence of only a single cleavage product, the presence of one or more cleavage products greater than the number of reference nucleic acid molecules, the presence of one or more cleavage products fewer than the number of reference nucleic acid molecules, the presence of the same number of cleavage products as reference nucleic acid molecules, a change in the mass of one or more cleavage products compared to a reference nucleic acid molecule mass, and one or more cleavage products that are the same mass as a reference nucleic acid molecule mass. In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies unmethylated cytosine to produce uracil; specifically amplifying the treated target nucleic acid molecule with a primer that contains one or more guanine nucleotides; base specifically cleaving the amplified products; and detecting the cleaved products, where the presence of two or more fragments indicates that the target nucleic acid molecule contains one or more methylated cytosines. Another example includes a method of identifying an unmethylated nucleic acid molecule, by treating a target nucleic acid molecule with a reagent that modifies unmethylated cytosine to produce uracil; specifically amplifying the treated target nucleic acid molecule with a primer that contains one or more adenine nucleotides; base specifically cleaving the amplified products; and detecting the cleaved products, where the presence of two or more fragments indicates that the target nucleic acid molecule contains one or more unmethylated cytosines. In some embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies unmethylated cytosine to produce uracil; specifically amplifying the treated target nucleic acid molecule with a primer that contains one or more guanine nucleotides; base specifically cleaving the amplified products; and detecting the mass of the cleaved products, where: a change in mass of one or more cleaved products compared to a reference mass indicates that a nucleotide locus in a target is methylated. A similar exemplary method includes a method for identifying the nucleotide locus of an unmethylated nucleotide in a nucleic acid, by treating a target nucleic acid molecule with a reagent that modifies unmethylated cytosine to produce uracil; specifically amplifying the treated target nucleic acid molecule with a primer that contains one or more adenine nucleotides; base specifically cleaving the amplified products; and detecting the mass of the cleaved products, where: a change in mass of one or more cleaved products compared to a reference mass indicates that a nucleotide locus in a target is methylated.

In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule to deaminate unmethylated cytosine nucleotides; specifically amplifying the treated target nucleic acid molecule with a primer that specifically hybridizes to a pre-determined first region in the target nucleic acid molecule containing one or more cytosine nucleotides; base specifically cleaving the amplified products; and detecting the mass of the cleaved products, where: a change in mass of one or more cleaved products compared to a reference mass indicates that a nucleotide locus in a second region in a target is methylated, where the first region and second region do not overlap. In some embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies unmethylated cytosine to produce uracil; specifically amplifying the treated target nucleic acid molecule with a primer that contains one or more guanine nucleotides; base specifically cleaving the amplified products; and cleaving or simulating cleavage of a reference nucleic acid with the same cleavage reagent(s); detecting the mass of the cleaved products; determining differences in the mass signals between the target nucleic acid molecule fragments and the reference fragments; and determining a reduced set of sequence variation candidates from the differences in the mass signals and thereby determining sequence variations in the target compared to the reference nucleic acid, where methylation of a nucleotide locus is indicated by the nucleotide locus of a sequence variation. In another example of the methods, combinations and kits provided herein, a method, combination and kit is provided for identifying the nucleotide locus of a methylated nucleotide in a nucleic acid, by treating a target nucleic acid molecule with a reagent that modifies unmethylated cytosine to produce uracil; amplifying the treated target nucleic acid molecule to form a first amplification product; specifically amplifying the first amplification product with a primer that contains one or more cytosine nucleotides to form a second amplification product; base specifically cleaving the second amplification products; cleaving or simulating cleavage of a reference nucleic acid with the same cleavage reagent(s); detecting the mass of the cleaved products; determining differences in the mass signals between the target nucleic acid molecule fragments and the reference fragments; and determining a reduced set of sequence variation candidates from the differences in the mass signals and thereby determining sequence variations in the target compared to the reference nucleic acid, where methylation of a nucleotide locus is indicated by the nucleotide locus of a sequence variation. In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include treating two or more different target nucleic acid molecules with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; contacting the treated target nucleic acid molecules with a primer containing one or more nucleotides complementary to the selected nucleotide, or one or more nucleotides complementary to the different nucleotide; treating the contacted target nucleic acid molecules under nucleic acid synthesis conditions, whereby nucleotides are synthesized onto primers hybridized to the target nucleic acid molecules; treating the synthesized products under base specific cleavage conditions; and detecting the products of the cleavage treatment, where target nucleic acid molecules containing one or more methylated or unmethylated selected nucleotides are determined according to a comparison between one or more cleavage products and one or more references.

In some embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; contacting the treated target nucleic acid molecule with a primer containing one or more nucleotides complementary to the selected nucleotide, or one or more nucleotides complementary to the different nucleotide; treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, whereby nucleotides are synthesized onto primers hybridized to the target nucleic acid molecules; treating the synthesized products under fragmentation conditions; and detecting the products of the fragmentation treatment by mass spectrometry, where target nucleic acid molecules containing one or more methylated or unmethylated selected nucleotides are determined according to the number of fragmentation products or according to a comparison between one or more fragmentation products and one or more references. Similarly, methods are provided for identifying one or more methylated or unmethylated nucleotides in a nucleic acid, by treating a target nucleic acid molecule with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; contacting the treated target nucleic acid molecule with a blocking oligonucleotide containing one or more nucleotides complementary to the selected nucleotide, or one or more nucleotides complementary to the different nucleotide; treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, where nucleotide synthesis is inhibited when the blocking oligonucleotide is hybridized to a target nucleic acid molecule; treating the synthesized products under base specific cleavage conditions; and detecting the products of the cleavage treatment, where a target nucleic acid molecule containing one or more methylated or unmethylated selected nucleotides are determined according to the number of cleavage products or according to a comparison between one or more cleavage products and one or more references.

In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include treating a target nucleic acid molecule with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; contacting the target nucleic acid molecule with a cleavage reagent that selectively cleaves the target nucleic acid at a site containing one or more methylated selected nucleotides or one or more unmethylated selected nucleotides, or with a cleavage reagent that selectively cleaves the treated target nucleic acid at a site containing one or more selected nucleotides or one or more different nucleotides; treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, where a target nucleic acid molecule not cleaved is amplified; treating the amplified products under base specific cleavage conditions; and detecting the products of the cleavage treatment, where a target nucleic acid molecule containing one or more methylated or unmethylated selected nucleotides are determined according to the number of cleavage products or according to a comparison between one or more cleavage products and one or more references. In some embodiments, the methods for determining the methylation state of one or more target gene regions may include contacting the target nucleic acid molecule with a primer and treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions, where a strand complementary to the target nucleic acid molecule is synthesized; contacting the target nucleic acid- synthesized product duplex with a methyltransferase reagent whereby methylation in a CpG sequence of the target nucleic acid also is present in the complementary CpG sequence of the synthesized product; repeating the primer and methyltransferase reagent contacting steps to form a second synthesized product having the same sequence of nucleotides and methylation state of CpG nucleotides as present in the target nucleic acid molecule; treating synthesized products with a reagent that modifies a selected nucleotide as a function of the methylation state of the selected nucleotide to produce a different nucleotide; treating the reagent-treated products under base specific cleavage conditions; and detecting the products of the cleavage treatment, where a target nucleic acid molecule containing one or more methylated or unmethylated selected nucleotides are determined according to the number of cleavage products or according to a comparison between one or more cleavage products and one or more references.

In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include identifying one or more methylated or unmethylated nucleotides in a nucleic acid, where the amplified products are cleaved by base specific cleavage conditions selected from chemical conditions, physical conditions, enzymatic base specific cleavage conditions, and combinations thereof. For example, the amplified products can be cleaved by an RNase, a DNase, an alkaline compound, piperidine formate, piperidine, dimethyl sulfate, hydrazine, sodium chloride, and combinations thereof.

In some embodiments, the methods for determining the methylation state of one or more target gene regions may include identifying one or more methylated or unmethylated nucleotides in a nucleic acid, where the amplifying step includes transcription. In such methods, the nucleoside triphosphates incorporated into the transcript can include three rNTPs and one dNTP. For example, the one dNTP can be selected from dCTP, dTTP, dATP and dGTP. In another example, the one dNTP can be selected from dCTP and dTTP, and the transcript can be cleaved by RNase A. In certain embodiments, the methods for determining the methylation state of one or more target gene regions may include identifying one or more methylated or unmethylated nucleotides in a nucleic acid, where the intensity of one or more sample measured masses is compared to the intensity of one or more reference masses. Similarly, also provided herein are methods of identifying one or more methylated or unmethylated nucleotides in a nucleic acid, where two or more nucleic acid samples are pooled, and the intensity of one or more sample measured masses is compared to the intensity of one or more reference masses. In such methods an incompletely converted target nucleic acid molecule can be distinguished from a methylated target nucleic acid molecule. In some embodiments, the methods for determining the methylation state of one or more target gene regions may be used for distinguishing between a false positive methylation specific amplification and a true methylation specific amplification, by, for example, treating a target nucleic acid molecule with a reagent that modifies an unmethylated selected nucleotide to produce a different nucleotide; contacting the treated target nucleic acid molecule with a methylation state specific primer complementary to a first target nucleic acid region containing one or more of the selected nucleotides; treating the contacted target nucleic acid molecule under nucleic acid synthesis conditions; treating the synthesized products under base specific cleavage conditions; and detecting the mass of the cleaved products, where: a change in mass of one or more cleaved products compared to a reference mass indicates that a nucleotide locus in a second region in a target is methylated, where the second region does not overlap with the first region, whereby presence of one or more methylated loci in the second region confirms true methylation specific amplification.

In certain embodiments, the methods for determining the methylation state of one or more target gene regions may be used for identifying methylated nucleotides and thereby identify methylation patterns, which can be correlated with a disease, disease outcome, or outcome of a treatment regimen, by, for example, identifying methylated or unmethylated nucleotides, in accordance with the method of any of methods provided herein, in one or more nucleic acid molecules from one or more samples collected from one or more subjects having a known disease, disease outcome, or outcome of a treatment regimen; identifying methylated or unmethylated nucleotides, in accordance with the method of any of methods provided herein, in one or more nucleic acid molecules from one or more samples collected from one or more normal subjects; and identifying the differently methylated or unmethylated nucleotides between the one or more nucleic acid molecules of step (a) and the one or more nucleic acid molecules of step (b); whereby the differently methylated or unmethylated nucleotides identify methylation correlated with a disease, disease outcome, or outcome of a treatment regimen. In some embodiments, the methods for determining the methylation state of one or more target gene regions may be used for diagnosing a disease, deciding upon a treatment regimen, or determining a disease outcome in a subject, by, for example, identifying one or more methylated or unmethylated nucleotides in one or more nucleic acid molecules from one or more samples collected from a subject; and comparing the methylated or unmethylated nucleotides in the one or more nucleic acid molecules with one or more reference nucleic acid molecules correlated with a known disease, disease outcome, or outcome of a treatment regimen; whereby methylated or unmethylated nucleotides that are the same as the reference nucleic acid molecules identify the disease, disease outcome, or outcome of a treatment regimen in the subject. The methods, combinations and kits provided herein also can be used in deciding upon a treatment regimen, or determining a disease outcome in a subject, by, for example, identifying one or more methylated or unmethylated nucleotides in one or more nucleic acid molecules from one or more samples collected from a subject; and comparing the methylated or unmethylated nucleotides in the one or more nucleic acid molecules with one or more reference nucleic acid molecules correlated with a known disease, disease outcome, or outcome of a treatment regimen; whereby methylated or unmethylated nucleotides that are different from the reference nucleic acid molecules identify the disease, disease outcome, or outcome of a treatment regimen in the subject.

In certain embodiments, the methods for determining the methylation state of one or more target gene regions may be used in determining a methylation state at one or more nucleotide loci correlated with an allele, by, for example, pooling nucleic acid molecules containing a known allele; identifying one or more methylated or unmethylated nucleotide loci in the nucleic acid molecules containing the known allele; identifying the methylation state of the corresponding nucleotide loci in nucleic acid molecules that do not contain the allele; and comparing the methylation state of the nucleotide loci in allele-containing nucleic acid molecules to the methylation state of nucleotide loci in allele-lacking nucleic acid molecules, whereby differences in methylation state frequency at one or more loci identify the different loci as correlated with the allele. Similarly, the methods, combinations and kits provided herein can be used for determining an allele correlated with a methylation state at one or more nucleotide loci, by forming a first pool of nucleic acid molecules containing one or more known methylated or unmethylated nucleotide loci, which loci were identified in accordance with the methods provided herein; identifying the frequency at which one or more alleles are present in the pooled nucleic acid samples; identifying the allele frequency at which one or more alleles are present in a second pool of nucleic acid molecules having nucleotide loci with different methylation state relative to the first pooled nucleic acid molecules; and comparing the allelic frequency in the first pool of nucleic acid molecules to the allelic frequency in the second pool of nucleic acid molecules, whereby differences in allelic frequency identify the one or more loci as correlated with the allele.

In some embodiments, the methods for determining the methylation state of one or more target gene regions may be used for determining the probable identity of one or more alleles, by, for example, identifying one or more methylated or unmethylated nucleotides a nucleic acid molecule; and determining the frequency of presence of one or more alleles with the presence of one or more methylated or unmethylated nucleotides where the probable identity of the allele is determined. Also provided herein are combinations and kits for determining the methylation state of a target nucleic acid molecule. Kits can include a reagent that modifies one or more nucleotides of the target nucleic acid molecule as a function of the methylation state of the target nucleic acid molecule, one or more methylation specific primers capable of specifically hybridizing to a treated target nucleic acid molecule, and one or more compounds capable of fragmenting an amplified target nucleic acid molecule. The one or more compounds capable of fragmenting amplified nucleic acid products can include an RNase, a DNase, an alkaline compound, piperidine formate, piperidine, dimethyl sulfate, hydrazine, sodium chloride, and combinations thereof. For example, kits provided herein can include one or more RNases In some embodiments, the methylation state is determined by mass spectrometry. In some embodiments, the methylation state is determined by multiplexed hME assays, fluorescence-based realtime PCR, methylation-sensitive single nucleotide primer extension, methylated CpG island amplification, methylation-specific PCR, restriction landmark genomic scanning, methylation-sensitive- representational difference analysis (MS-RDA), methylation-specific AP-PCR (MS-AP-PCR) methyl- CpG binding domain column/segregation of partly melted molecules (MBD/SPM), or bisulphite sequencing direct. Specific methods for determining the methylation state may include combined bisulfite restriction analysis (COBRA), PyroMeth or MethyLight.

In some embodiments, the cancer diagnosis for the subject in the preceding embodiments is combined with another cancer diagnosis based on morphology, cytochemistry, immunophenotype, cytogenetics or other molecular techniques to provide a more accurate diagnosis for the subject. In a related embodiment, the molecular technique is a gene expression profile. In a further related embodiment, the gene expression profile consists of one or more target gene regions and/or genes regulated by one or more target gene regions.

In other embodiments, the invention provides a method for identifying a subject at risk of cancer and then prescribing to the subject a cancer detection procedure, prevention procedure and/or a treatment procedure.

In some embodiments, the method for determining the susceptibility to cancer of a subject further comprises administering a cancer treatment procedure or preventative procedure based upon the cancer diagnosis. In a further related embodiment, the cancer treatment is selected from the group consisting of administering a a non-standard, non-aggressive or experimental chemotherapy agent chemotherapy agent, administering a novel therapy, administering palliative care, and combinations of the foregoing. A "novel therapy" as used herein refers to an investigational treatment (e.g., monoclonal antibodies, new consolidation chemotherapy regimens, multiple drug resistance inhibitors, biological modifier therapies, and demethylating agents). In another related embodiment, the cancer treatment is a standard cancer treatment course. Standard cancer treatment are well known in the art and often varying depending on the type of cancer. Typically cancer treatment comprises radiation therapy, chemotherapy and/or surgery. In another related embodiment, a subject found to be at an increased risk for cancer may undergo more frequent cancer exams for the detection of cancer.

In certain embodiments, the methods described herein may be utilized to detect the presence or absence of a disease in a tissue or cell that correlates with changes in the methylation state of the tissue or cell, or classify the susceptibility of a tissue or cell to a disease where the disease is correlated with changes in the methylation state of the tissue or cell. In another embodiment, the methods described herein may be utilized for the early detection of cancer before the cancer is otherwise detectable by current diagnostic methods known in the art. For example, the methods described herein may be utilized to detect an altered methylation state associated with the presence of an oncogenic event before physical indicators manifest (e.g., the presence of a detectable tumor). In some embodiments, the nucleic acid target gene is one or one or more of ACSL6, ATPlOA,

BCL6, BCR, CA3, CCNDl, CCND2, CD38, CDKNlC, CHGA, COL5A1, EGFR, ESRl, FLIl, FLJ32447, FLT3, FLT4, FRATl, GABRB3, GAS7, GNAS, GPC3, HEAB, HIST1H4I, HOXAl, HOXAlO, H0XA2, H0XA6, H0XA7, H0XA9, HOXC 13, ILlORA, IRF4, KIT, LCK, LEP, LM02, LYLl, MLLT6, MYHl 1, MYODl, NFKB2, OLIG2, PAX5, PAX7, PAX8, PEG3, PHOX2B, PLODl, PONl, PSIPl, PTPRN2, REL, RET, RUNXlTl, SBDS, SET, SLC22A3, SLC38A4, SNRPN, TALI, TCLlA, TLXl, TLX3, TMPRSS2, TSPYL5, WDR66, WTl, ZIM2, ZNF198 or ZNF331, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene comprises one or more PRC2 binding sites, wherein the nucleic acid target gene is one or one or more of CA3, CD38, EGFR, ESRl, FLIl, FLJ32447, FLT3, FRATl, GAS7, GNAS, GPC3, HIST1H4L HOXAl, HOXAlO, H0XA2, H0XA6, H0XA7, H0XA9, H0XC13, ILlORA, IRF4, MYODl, OLIG2, PAX5, PAX7, PAX8, PHOX2B, PTPRN2, SLC22A3, TALI, TLXl, TLX3 and WTl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

In a related embodiment, the disease state is cancer. In a further related embodiment, the cancer is present in a blood, breast, CNS, colon, lung, ovarian, prostate, renal, or skin cell or tissue. As used herein, a target gene may be associated with one or more cancers. See Tables 6 and 7. For example, the following gene targets were found to be hypermethylated in 5 cancer types: TSPYL, PAX8, LEP, PHOX2B, TMPRSS2. The following gene target was found to be hypermethylated in 6 cancer types: MYODl, and the following gene target was found to be hypermethylated in 8 cancer types: PAX5. When the cancer is cancer of the blood, it is preferably a blood-related carcinoma. In one embodiment, the blood-related carcinoma is myeloid leukemia, acute myeloid leukemia (AML), chronic myeloid leukemia (CML), acute lymphoblastic leukemia (ALL), chronic lymphocytic leukemia (CLL), blood myeloproliferative diseases, blood multiple myeloma, blood myelodysplasic syndrome, Hodgkin's disease or non-Hodgkin's lymphoma. The hematologic cancer often is acute myeloid leukemia. In one embodiment, the nucleic acid target gene associated with a blood-related carcinoma is one or one or more of CCNDl, GNAS, MYODl or TSPYL5, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a blood-related carcinoma comprises one or more PRC2 binding sites and is either GNAS or MYODl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

When the cancer is cancer found in breast cells, it is preferably a breast carcinoma. Breast cancer is typically described as the uncontrolled growth of malignant breast tissue. Breast cancers arise most commonly in the lining of the milk ducts of the breast (ductal carcinoma), or in the lobules where breast milk is produced (lobular carcinoma). Other forms of breast cancer include Inflammatory Breast Cancer and Recurrent Breast Cancer. In one embodiment, the nucleic acid target gene associated with a breast carcinoma is one or one or more of GNAS, H0XA6, LEP, LYLl, MYODl, PAX5, PAX8, PTPRN2, TSPYL5 or ZIM2, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a breast carcinoma comprises one or more PRC2 binding sites and is one or more of GNAS, H0XA6, MYODl, PAX5, PAX8 or PTPRN2, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. When the cancer is a cancer of the nervous system, it is preferably a glioma. In one embodiment, the glioma is skull osteoma, skull hemangioma, skull granuloma, skull xanthoma, skull osteitis, skull defomians, meningioma, meningiosarcoma, bliomatosis, brain astrocytoma, brain medulloblastoma, brain glioma, brain ependymoma, brain germinoma, brain glioblastoma multiform, brain oligodendroglioma, brain schwannoma, brain retinoblastoma, brain congenital tumors, spinal cord neurofibroma, spinal cord memingioma, spinal cord glioma, or spinal cord sarcoma. In one embodiment, the nucleic acid target gene associated with a glioma is one or one or more of BCR, H0XA2, H0XA6, H0XA9, IRF4, LCK, LEP, LYLl, MLLT6, MYODl, PAX5, PAX8, PHOX2B, SET, SLC22A3, SLC38A4, TLX3, TMPRSS2, WDR66 or ZNF 198, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a glioma comprises one or more PRC2 binding sites and is one or more of H0XA2, H0XA6, H0XA9, IRF4, MYODl, PAX5, PAX8, PHOX2B, SLC22A3, SLC38A4 or TLX3, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

When the cancer is a cancer of the colon, it is preferably a colorectal carcinoma. In one embodiment, the nucleic acid target gene associated with a colorectal carcinoma is one or one or more of ACSL6, ATPlOA, CA3, CD38, CDKNlC, CHGA, COL5A1, EGFR, ESRl, FLIl, FLJ32447, FLT3, FLT4, GABRB3, GNAS, GPC3, H0XA2, HOXC13, IRF4, LEP, LM02, MYHl 1, MYODl, OLIG2, PAX5, PAX7, PAX8, PEG3, PHOX2B, PONl, PTPRN2, RET, RUNXlTl, SNRPN, TALI, TCLlA, TLX3, TSPYL5, WTl, ZIM2 or ZNF331, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a colorectal carcinoma comprises one or more PRC2 binding sites and is one or more of CA3, CD38, EGFR, ESRl, FLIl, FLJ32447, FLT3, GNAS, GPC3, H0XA2, HOXC13, IRF4, MYODl, OLIG2, PAX5, PAX7, PAX8, PHOX2B, PTPRN2, TAL 1 , TLX3 or WT 1 , wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

When the cancer is a lung cancer, it is preferably a lung carcinoma. In one embodiment, the lung carcinoma is lung squamous cell carcinoma, lung undifferentiated small cell carcinoma, lung undifferentiated large cell carcinoma, lung adenocarcinoma, alveolar carcinoma, bronchial adenoma, lung sarcoma, lung lymphoma, lung chondromatous hanlartoma, lung bronchoalveolar carcinoma or lung mesothelioma. In one embodiment, the nucleic acid target gene associated with a lung carcinoma is one or one or more of BCR, CA3, CCND2, FLIl, FLJ32447, GPC3, HIST1H4L HOXAlO, H0XA6, H0XA7, H0XA9, MYODl, OLIG2, PAX5, PAX7, PAX8, PHOX2B, PLODl, PONl, PTPRN2, SBDS, SLC38A4, TCLlA, TLX3, TMPRSS2 or WTl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a lung carcinoma comprises one or more PRC2 binding sites and is one or more of CA3, FLIl, FLJ32447, GPC3, HIST1H4L HOXAlO, H0XA6, H0XA7, H0XA9, MYODl, OLIG2, PAX5, PAX7, PAX8, PHOX2B, PTPRN2, TLX3 or WTl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. When the cancer is an ovarian cancer, it is preferably an ovarian carcinoma. In one embodiment, the ovarian carcinoma is a uterus endometrial carcinoma, cervical carcinoma, pre-tumor cervical dysphasia, ovarian carcinoma, serous cystadenocarcinoma, mucinous cystadenocarcinoma, unclassified ovarian carcinoma, ovarian granulosa-thecal cell tumors, SertoliLeydig tumors, ovarian dysgerminoma, ovarian malignant teratoma, vulva squamous cell carcinoma, vulva intaepithelial carcinoma, vulva adenocarcinoma, vulva fibrosarcoma, vulva melanoma, vagina clear cell carcinoma, vagina squamous cell carcinoma, botryoid sarcoma, or fallopian tube carcinoma. In one embodiment, the nucleic acid target gene associated with an ovarian carcinoma is one or one or more of BCL6, CA3, FLJ32447, GNAS, GPC3, HOXAlO, HOXA7, HOXA9, IRF4, LEP, LYLl, PAX5, PAX7, PHOX2B, SBDS, TMPRSS2, WTl or ZIM2, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with an ovarian carcinoma comprises one or more PRC2 binding sites and is one or more of CA3, FLJ32447, GNAS, GPC3, HOXAlO, HOXA7, HOXA9, IRF4, PAX5, PAX7, PHOX2B or WTl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

When the cancer is a cancer of the prostate, it is preferably prostate carcinoma. In one embodiment, the nucleic acid target gene associated with an prostate carcinoma is one or one or more of CDKNlC, HIST1H4I, HOXAl, ILlORA, NFKB2, PAX5 or PAX8, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a prostate carcinoma comprises one or more PRC2 binding sites and is one or more of HIST1H4L HOXAl, ILlORA, PAX5 or PAX8, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

When the cancer is a cancer of the kidney, it is preferably a renal cell carcinoma. In one embodiment, the nucleic acid target gene associated with renal cell carcinoma is one or one or more of BCR, CA3, CDKNlC, CHGA, FLJ32447, FRATl, GAS7, HOXC13, LEP, LYLl, MYODl, PAX5, PAX7, PEG3, PHOX2B, PONl, RET, SLC22A3, TLX3, TMPRS S2, TSPYL5 or WTl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with a renal cell carcinoma comprises one or more PRC2 binding sites and is one or more of CA3, FLJ32447, FRATl, GAS7, HOXC13, MYODl, PAX5, PAX7, PHOX2B, SLC22A3, TLX3 or WTl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

When the cancer is a skin cancer, it is preferably a melanoma, skin basal cell carcinoma, skin squamous cell carcinoma, skin Karposi's sarcoma, skin moles dysplasic nevi, skin lipoma, skin angioma, dematofibroma or skin keloids, psoriasis. In one embodiment, the nucleic acid target gene associated with a melanoma is one or one or more of BCR, CCND2, FLIl, FRATl, HEAB, HOXAlO, H0XA2, H0XA6, H0XA7, H0XA9, KIT, PAX5, PLODl, PONl, PSIPl, REL, SBDS, TLXl, TMPRSS2, TSPYL5 or WDR66, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer. In a related embodiment, the nucleic acid target gene associated with melanoma comprises one or more PRC2 binding sites and is one or more of FLIl, FRATl, HOXAlO, H0XA2, H0XA6, H0XA7, H0XA9, PAX5 or TLXl, wherein hypermethylation of the nucleic acid target gene is associated with the occurrence of cancer.

In another embodiment of the invention, the nucleic acid target region is one or more of TCLlA, SLC22A2, TRPM5, KCNQl, IGF2, PEG3 or DLKl, wherein hypomethylation of the nucleic acid target gene is associated with the occurrence of cancer.

In one embodiment of the invention, the methods of the present invention are combined with known methods of diagnosing cancer such as transcriptional profiling (Staunton, J.E. et al. Proc Natl Acad Sd USA 98, 10787-92 (2001)); and Wang, H. et al. BMC Genomics 7, 166 (2006)), spectral karyotyping (Roschke, A.V. et al. Cancer Res 63, 8634-47 (2003)) or proteomic profiling (Nishizuka, S. et al. Proc Natl Acad Sci USA 100, 14229-34 (2003)). All of the above references are hereby incorporated by reference. In certain embodiments the primer sequence further comprises a promoter sequence. In an embodiment the promoter sequence is obtained from a T7 promoter, a SP6 promoter or a T3 promoter. If the promoter is a T7 promoter it may have the sequence: 5'-CAGTAATACGACTCACTATAGGGAGA- 3' (SEQ ID NO.: )

In some embodiments, a method is provided for identifying at least one CpG island region in a nucleic acid having a characteristic methylation state that correlates with an unknown disease outcome of an organism, tissue or cell comprising the steps of providing a first CpG island region of the nucleic acid; identifying or discovering at least a second CpG island region within a region spanning about 5 Kb 5' of the first CpG island region and about 5Kb 3' of the first CpG island region in the nucleic acid including the first CpG island region; and determining if at least one of the at least a second CpG island region has a characteristic methylation state that correlates with the unknown disease outcome of the organism, tissue or cell.

In the preceding embodiments, the methylation state of 50 or more gene target regions in the nucleic acid of the subject is determined in 24 hours or less. In some embodiments the methylation state of 50 or more gene target regions in the nucleic acid of the subject is determined in 12 hours or less, 8 hours or less, 6 hours or less, 5 hours or less, 4 hours or less, 3 hours or less, 2 hours or less, or less than 1 hour. In some embodiments the methylation state of 100 or more gene target regions in the nucleic acid of the subject is determined in 24 hours or less. In some embodiments the methylation state of 100 or more gene target regions in the nucleic acid of the subject is determined in 12 hours or less, 8 hours or less, 6 hours or less, 5 hours or less, 4 hours or less, 3 hours or less, 2 hours or less, or less than 1 hour. In some embodiments the methylation state of 150 or more gene target regions in the nucleic acid of the subject is determined in 24 hours or less. In some embodiments the methylation state of 150 or more gene target regions in the nucleic acid of the subject is determined in 12 hours or less, 8 hours or less, 6 hours or less, 5 hours or less, 4 hours or less, 3 hours or less, 2 hours or less, or less than 1 hour. In some embodiments the methylation state of 20 or more gene target regions in the nucleic acid of the subject is determined in 24 hours or less. In some embodiments the methylation state of 20 or more gene target regions in the nucleic acid of the subject is determined in 12 hours or less, 8 hours or less, 6 hours or less, 5 hours or less, 4 hours or less, 3 hours or less, 2 hours or less, or less than 1 hour.

The methods, combinations and kits provided herein can be performed or used in conjunction with any of a variety of other procedures including, but not limited to, any procedures for modifying the target nucleic acid molecule according to the methylation state of the target nucleic acid molecule, any procedures for amplifying a target nucleic acid molecule, any procedures for fragmenting a target nucleic acid molecule, and any procedures for detecting target nucleic acid molecule fragments.

BRIEF DECRIPTION OF THE DRAWINGS

Figure IA displays mass signals generated by cytosine specific cleavage of the forward transcript of the IGF2/H19 region (upper spectral analysis is the methylated template; lower spectral analysis is the non-methylated template). Figure IB shows the IGF2/H19 RNA transcript sequence wherein each CpG sequence is methylated (upper sequence) and the same RNA transcript sequence where none of the CpG sequence is methylated (lower seqeunce).

Figure 2 is an overlay of mass signal patterns generated by cytosine specific cleavage of the forward transcript of the IGF2/H19 region. Figure 3 is an overlay of mass spectra generated by uracil specific cleavage of the reverse transcript of the IGF2/H19 region.

Figure 4 depicts mass spectra representing all four base-specific cleavage reactions of the IGF2/H19 amplicon. Numbers correspond to the CpG positions within this target region. Arrows point at the mass signals that indicate the presence of a methylated Cytosine at the marked position. All methylated CpG's in the selected region can be identified by one or more mass signals.

Figure 5 depicts mass spectra generated by uracil specific cleavage of the reverse transcript of the IGF2/H19 region. Genomic DNA was used for amplification. Dotted lines mark the position of mass signals representing non-methylated CpG's. Signals with 16 Dalton shift (or a multitude thereof) represent methylation events. The area-under-the-curve ratio of methylated versus non-methylated template approximates to 1, as one expects for hemi-methylated target regions.

Figures 6A-6E provide descriptive analysis of methylation data for normal and tumor cell line samples, (a) A scatterplot depicts the results from a replicate analysis of ERBB2 using two different primer designs. The quantitative measurements are highly concordant, (b) Relationship between CpG density and mean methylation levels in cancer and normal samples. CpG density of amplicons was calculated as the fraction of CpG nucleotides within the total amplicon sequence. The mean methylation value for each amplicon was generated using all individual CpG sites methylation values. Amplicons with more than 10% CpG content are likely to have lower methylation values in normal tissues. In cancer cell lines DNA methylation is observed more frequently in these amplicons. (c) Amplicons were binned based on their average methylation values. Each bin contained amplicons within a 5% range of methylation values. Bins from 15% to 85% average methylation contain more amplicons in the set of cancer cell lines, (d) Histogram of methylation differences. For each amplicon the difference in mean methylation was calculated between the group of normal samples and the group of tumor cell lines. Positive values translate to hypermethylation in cancer cell lines; whereas negative values indicate that the mean methylation was higher in the group of normal samples. The distribution of methylation differences is skewed towards hypermethylated changes, (e) DNA methylation in relation to the closest 5'-UTR. The distance from the 5'UTR was calculated for every individual CpG site. Each data point contains 1770 individual methylation values for the cancer cell lines and 180 values for the normal samples. It is necessary to adjust the number of datapoints in each group because of the difference in sample numbers in the two sets. A window of lkbp around the 5'-UTR shows low methylation values in the normal and the cancer cell line samples. Methylation values in the cancer cell line samples are generally elevated. Figure 7 is a two-way hierarchical cluster analysis of 59 tumor cell line samples and 6 samples from normal tissues (rows) and DNA-methylation of CpG Units in 531 promoter regions (columns). DNA-methylation values are depicted in this false color image on a continous scale from red (non- methylated) to yellow (100% methylated). Poor quality data is annotated in gray. Samples are color-coded according to their cell line tissue origin (legend depicted upper left) to simplify identification of potential sample clusters. Strong sample cluster formation is observed for the group of normal samples, the group of colon cancer samples (brown), Melanoma samples (green) and CNS tumors (yellow). Less dominant clustering is observed in lung cancers (black), renal carcinoma (orange) and ovarian cancer. The cell line samples derived from breast cancer (pink), leukemia (red) and prostate cancer (grey) do not form obvious clusters. The normal samples are characterized by consistent low methylation levels. The cell line samples show more variable methylation patterns.

Figure 8 shows the group of significantly differentially methylated genes in cancer. The genes are divided into subsets according to their number of PCR2 marks. A box whisker plot reveals that genes with a higher number of PRC2 marks tend to be connected altered in more tumor types, while genes with no PRC2 marks are most connected to one or two tumor types.

Figure 9 shows therapeutic options available to a subject diagnosed with AML.

Throughout the document and in the Figures, CpG sites are referenced according to their CpG ID. The CpG ID's refer to the specific CpG location within the particular genomic region. For example, each CpG ID follows the general schema: databaseID_GeneName_ AmpliconID CPG CPGposition in the amplicon. "GeneName" is the refseq gene name of the analysed promoter region, or in the case of intragenic regions, the nearest gene is identified. "AmpliconID" is the particular amplicon analyzed within the gene or region, especially relevant if multiple amplicons were analyzed for this gene. "CPG" is a constant text string. "CPGposition in the amplicon" indicates which CpG Sites are enclosed in the measured CpG Unit. The numbers given refer to the CpG sites as counted from the 5' end of the analyzed amplicon sequence. The amplicon sequences are provided in Table 10.

DETAILED DESCRIPTION OF THE INVENTION

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in the art to which the invention(s) belong. All patents, patent applications, published applications and publications, GENBANK sequences, websites and other published materials referred to throughout the entire disclosure herein, unless noted otherwise, are incorporated by reference in their entirety. In the event that there are a plurality of definitions for terms herein, those in this section prevail. Where reference is made to a URL or other such identifier or address, it is understood that such identifiers can change and particular information on the internet can come and go, but equivalent information is known and can be readily accessed, such as by searching the internet and/or appropriate databases. Reference thereto evidences the availability and public dissemination of such information.

As used herein, a "nucleic acid target gene region" is a nucleic acid molecule that is examined using the methods disclosed herein. For the purposes of the application, "nucleic acid target gene region", "target gene", "target region", "region" and "gene" may be used interchangeably. A nucleic acid target gene region includes genomic DNA or a fragment thereof, which may or may not be part of a gene, a segment of mitochondrial DNA of a gene or RNA of a gene and a segment of RNA of a gene. A nucleic target gene region may be further defined by its chromosome position range. The chromosome position ranges provided herein were gathered from the March 2006 human reference sequence (NCBI Build 36.1), which was produced by the International Human Genome Sequencing Consortium and can be accessed at World Wide Web URL genome.ucsc.edu/cgi-bin/hgGateway. Information about a particular target gene (e.g., genomic sequence, cDNA sequence, peptide sequence, expression profile, etc.) can be easily obtained by methods well known in the art, for example, by visiting the World Wide Web URL genecards.org/index.shtml.

In the context of methods for diagnosis determination, the invention provides methods for identifying the methylation state of a nucleic acid target gene region and/or the methylation state of a nucleotide locus. A nucleic acid target gene region can also refer to an amplified product of a nucleic acid target gene region, including an amplified product of a treated nucleic acid target gene region, where the nucleotide sequence of such an amplified product reflects the methylation state of the nucleic acid target gene region. One skilled in the art would recognize that the size or length of the nucleic acid target gene region may vary depending on the limitation, or limitations, of the equipment used to perform the analysis. The nucleic acid target gene region may comprise intragenic nucleic acid, a gene of interest, more than one gene of interest, at least one gene of interest or a portion of a gene of interest. Correspondingly a sequential or non-sequential series of nucleic acid target gene regions may be analyzed and exploited to map an entire gene or genome. The intended target will be clear from the context or will be specified.

As used herein, a "nucleic acid target gene molecule" is a molecule comprising a nucleic acid sequence of the nucleic acid target gene region. The nucleic acid target gene molecule may contain less than 10%, less than 20%, less than 30%, less than 40%, less than 50%, greater than 50%, greater than 60%, greater than 70% greater than 80%, greater than 90% or up to 100% of the sequence of the nucleic acid target gene region. A "target peptide" refers to a peptide encoded by a nucleic acid target gene. As used herein, the "methylation state" or "methylation status" of a nucleic acid target gene region refers to the presence or absence of one or more methylated nucleotide bases or the ratio of methylated cytosine to unmethylated cytosine for a methylation site in a nucleic acid target gene region. For example, a nucleic acid target gene region containing at least one methylated cytosine is considered methylated (i.e. the methylation state of the nucleic acid target gene region is methylated). A nucleic acid target gene region that does not contain any methylated nucleotides is considered unmethylated. Similarly, the methylation state of a nucleotide locus in a nucleic acid target gene region refers to the presence or absence of a methylated nucleotide at a particular locus in the nucleic acid target gene region. For example, the methylation state of a cytosine at the 7th nucleotide in a nucleic acid target gene region is methylated when the nucleotide present at the 7^th nucleotide in the nucleic acid target gene region is 5- methylcytosine. Similarly, the methylation state of a cytosine at the 7th nucleotide in a nucleic acid target gene region is unmethylated when the nucleotide present at the 7th nucleotide in the nucleic acid target gene region is cytosine (and not 5-methylcyto.yme). Correspondingly the ratio of methylated cytosine to unmethylated cytosine for a methylation site or sites can provide a methylation state of a nucleic acid target gene region. In certain embodiments the methylation state or status may be expressed as a percentage of methylateable nucleotides (e.g., cytosine) in a nucleic acid (e.g., amplicon or gene region) that are methylated (e.g., about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95% or about 100% methylated; greater than 80% methylated, between 20% to 80% methylated, or less than 20% methylated). A nucleic acid may be "hypermethylated," which refers to the nucleic acid having a greater number of methylateable nucleotides that are methylated relative to a control, and in some embodiments refers to greater than 80% of the methylatable nucleotides being methylated. A nucleic acid may be "hypomethylated," which refers to the nucleic acid having a smaller number of methylateable nucleotides that are methylated relative to a control, and in some embodiments refers to less than 20% of the methylatable nucleotides being methylated. The methylation status or state is determined in a CpG island in certain embodiments. As used herein, a "characteristic methylation state" refers to a unique, or specific data set comprising the location of at least one, a portion of the total or all of the methylation sites of a nucleic acid, a nucleic acid target gene region, a gene or a group of genes of a sample obtained from an organism, a tissue or a cell.

As used herein, "methylation ratio" refers to the number of instances in which a molecule or locus is methylated relative to the number of instances the molecule or locus is unmethylated.

Methylation ratio can be used to describe a population of individuals or a sample from a single individual. For example, a nucleotide locus having a methylation ratio of 50% is methylated in 50% of instances and unmethylated in 50% of instances. Such a ratio can be used, for example, to describe the degree to which a nucleotide locus or nucleic acid region is methylated in a population of individuals. Thus, when methylation in a first population or pool of nucleic acid molecules is different from methylation in a second population or pool of nucleic acid molecules, the methylation ratio of the first population or pool will be different from the methylation ratio of the second population or pool. Such a ratio also can be used, for example, to describe the degree to which a nucleotide locus or nucleic acid region is methylated in a single individual. For example, such a ratio can be used to describe the degree to which a nucleic acid target gene region of a group of cells from a tissue sample are methylated or unmethylated at a nucleotide locus or methylation site. As used herein, a "methylated nucleotide" or a "methylated nucleotide base" refers to the presence of a methyl moiety on a nucleotide base, where the methyl moiety is not present in a recognized typical nucleotide base. For example, cytosine does not contain a methyl moiety on its pyrimidine ring, but 5-methylcytosine contains a methyl moiety at position 5 of its pyrimidine ring. Therefore, cytosine is not a methylated nucleotide and 5-methylcytosine is a methylated nucleotide. In another example, thymine contains a methyl moiety at position 5 of its pyrimidine ring, however, for purposes herein, thymine is not considered a methylated nucleotide when present in DNA since thymine is a typical nucleotide base of DNA. Typical nucleoside bases for DNA are thymine, adenine, cytosine and guanine. Typical bases for RNA are uracil, adenine, cytosine and guanine. Correspondingly a "methylation site" is the location in the target gene nucliec acid region where methylation has, or has the possibility of occuring. For example a location containing CpG is a methylation site wherein the cytosine may or may not be methylated.

As used herein, a "methylation site" is a nucleotide within a nucleic acid, nucleic acid target gene region or gene that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro. As used herein, a "methylated nucleic acid molecule" refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein "CpG island" refers to a G:C-rich region of genomic DNA containing a greater number of CpG dinucleotides relative to total genomic DNA. A CpG island may be about 200 base pairs in length, where the G:C content of the region is at least 50% and the ratio of observed CpG frequency over expected frequency is 0.6; typically a CpG island can be at least 500 base pairs in length, where the G:C content of the region is at least 55% and the ratio of observed CpG frequency over expected frequency is 0.65. The observed CpG frequency over expected frequency can be calculated according to the method provided in Gardiner-Garden et al, J. MoI. Biol. 196:261-281 (1987). For example, the observed CpG frequency over expected frequency could be calculated according to the formula: R = (AxB)/(CxD) where R is the ratio of observed CpG frequency over expected frequency, A is the number of CpG dinucleotides in an analyzed sequence, B is the total number of nucleotides in the analyzed sequence, C is the total number of C nucleotides in the analyzed sequence, and D is the total number of G nucleotides in the analyzed sequence.

As used herein, a first nucleotide that is "complementary" to a second nucleotide refers to a first nucleotide that base-pairs, under high stringency conditions to a second nucleotide. An example of complementarity is Watson-Crick base pairing in DNA (e.g., A to T and C to G) and RNA (e.g., A to U and C to G). Thus, for example, G base-pairs, under high stringency conditions, with higher affinity to C than G base-pairs to G, A or T, and, therefore, when C is the selected nucleotide, G is a nucleotide complementary to the selected nucleotide.

As used herein, "treat", "treating" or grammatical variations thereof, refers to the process of exposing an analyte, typically a nucleic acid molecule, to conditions under which physical or chemical analyte modification or other chemical reactions (including enzymatic reactions) can occur. For example, treating a nucleic acid target gene molecule with a reagent that modifies the nucleic acid target gene molecule as a function of its methylation state may include adding a reagent such as bisulfite or an enzyme such as cytosine deaminase to a solution containing the nucleic acid target gene region. In treating the nucleic acid target gene with bisulfite any unmethylated nucleotide, such as any unmethylated C nucleotide, present in the nucleic acid target gene molecule can be chemically modified, such as deaminated; however, if the nucleic acid target gene molecule contains no unmethylated selected nucleotide, such as no unmethylated C nucleotide, then a nucleic acid target gene molecule treated with such a reagent may not be chemically modified. In another example, treating a nucleic acid target gene molecule under fragmentation or cleavage conditions can include adding a cleavage reagent such as RNase Tl, such that in selected nucleic acid target gene molecules, such as nucleic acid target gene molecules containing G nucleotides, cleavage can occur. Cleavage, however, need not occur, such as with nucleic acid target gene molecules not containing G nucleotides, cleavage with RNase Tl may not occur. In another example, treating a nucleic acid target gene molecule under nucleic acid synthesis conditions can include adding a DNA or RNA polymerase and NTPs, such that nucleic acid synthesis can occur if, for example, a primer is hybridized to a nucleic acid target gene molecule, however, no nucleic acid synthesis is necessary if, for example, no primer is hybridized to a nucleic acid target gene molecule.

As used herein, the phrase "hybridizing" or grammatical variations thereof, refers to binding of a first nucleic acid molecule to a second nucleic acid molecule under low, medium or high stringency conditions, or under nucleic acid synthesis conditions. Hybridizing can include instances where a first nucleic acid molecule binds to a second nucleic acid molecule, where the first and second nucleic acid molecules are complementary. As used herein, "specifically hybridizes" refers to preferential hybridization under nucleic acid synthesis conditions of a probe, or primer, to a nucleic acid molecule having a sequence complementary to the probe or primer compared to hybridization to a nucleic acid molecule not having a complementary sequence. For example, specific hybridization includes the hybridization of a probe to a target nucleic acid sequence that is complementary to the probe.

As used herein, "nucleotide synthesis conditions" in the context of primer hybridization refer to conditions in which a primer anneals to the nucleic acid molecule to be amplified. Exemplary nucleotide synthesis conditions are 10 mM TrisHCl pH 8.3, 1.5 mM MgCl, 50 mM KCl, 62°C. Other exemplary nucleotide synthesis conditions are 16.6 mM ammonium sulfate, 67 mM Tris pH 8.8, 6.7 mM MgCl, 10 mM 2-mercaptoethanol, 60⁰C. Those of skill in the art are familiar with parameters that affect hybridization; such as temperature, probe or primer length and composition, buffer composition and pH, and salt concentration can readily adjust these parameters to achieve specific hybridization of a nucleic acid to a target sequence.

As used herein, "complementary base pairs" refer to Watson-Crick base pairs (e.g., G to C and A to T in DNA and G to C and A to U in RNA) or the equivalent thereof when non-natural or atypical nucleotides are used. Two nucleic acid strands that are complementary contain complementary base pairing. A probe is not complementary when mismatches such as G-T, G-A, C-T or C-A arise when a probe or primer hybridizes to a nucleic acid target gene molecule.

As used herein "substantially complementary" refers to primers that are sufficiently complementary to hybridize with nucleic acid target gene molecules having a desired sequence under nucleic acid synthesis conditions. Primers should have sufficient complementarity to hybridize to a desired nucleic acid target gene molecule and permit amplification of the nucleic acid target gene molecule. For example, a primer used in the methods disclosed herein can be 100% complementary with the nucleic acid target gene molecule desired to be amplified. In another example, a primer can have 1 , 2, 3, or more mismatches, provided that the primer can be used to amplify at least one nucleic acid target gene molecule desired to be amplified. For example, a nucleic acid target gene molecule can have three cytosine nucleotides in the region with which a primer hybridizes; when only one of the three C nucleotides are methylated, treatment with bisulfite can convert the two unmethylated C nucleotides to U nucleotides, and a primer 100% complementary to a nucleic acid target gene molecule having three C nucleotides can still hybridize to a nucleic acid target gene molecule having only one C nucleotide, such that the nucleic acid target gene molecule having only one C nucleotide can still be amplified. As used herein "nucleic acid" refers to polynucleotides such as deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The term also includes, as equivalents, derivatives, variants and analogs of either RNA or DNA made from nucleotide analogs, single-stranded ("sense" or "antisense", "plus" strand or "minus" strand, "forward" reading frame or "reverse" reading frame) and double-stranded polynucleotides. Deoxyribonucleotides include deoxyadenosine, deoxycytidine, deoxyguanosine and deoxythymidine. For RNA, the base cytosine is replaced with uracil.

As used herein, "mass spectrometry" encompasses any suitable mass spectrometric format known to those of skill in the art. Such formats include, but are not limited to, Matrix-Assisted Laser Desorption/lonization, Time-of-Flight (MALDI-TOF), Electrospray (ES), IR-MALDI (see, e.g., published International PCT application No.99/57318 and U.S. Patent No. 5,118,937), Ion Cyclotron Resonance (ICR), Fourier Transform and combinations thereof.

As used herein, the phrase "mass spectrometric analysis" refers to the determination of the mass to charge ratio of atoms, molecules or molecule fragments.

As used herein, a "reference nucleic acid molecule" refers to a nucleic acid molecule known to be methylated or unmethylated, or a nucleic acid molecule in which the methylation state of one or more nucleotide loci of the nucleic acid molecule is known. A reference nucleic acid can be used to calculate or experimentally derive reference masses. A reference nucleic acid used to calculate reference masses is typically a nucleic acid containing a known sequence with known methylated nucleotide loci. A reference nucleic acid used to experimentally derive reference masses can have, but is not required to have, a known sequence or known methylated nucleotide loci; methods such as those disclosed herein or otherwise known in the art can be used to identify a reference nucleic acid as methylated even when the reference nucleic acid does not have a known sequence.

As used herein, a "correlation" between a nucleic acid target gene molecule and a reference, including a "correlation" between a nucleotide locus in a nucleic acid target gene molecule and a nucleotide locus in a reference, refers to a similarity or identity of the methylation state of a nucleic acid target gene molecule or nucleotide locus to that of a reference, such that the nucleic acid target gene molecule and the reference are expected to have at least one undefined locus with the same methylation state. For example, when the methylation state of fewer than all nucleotide loci of a nucleic acid target gene molecule have been identified, and when there is a correlation between a reference nucleic acid and a nucleic acid target gene, one or more of the unidentified loci of the nucleic acid target gene molecule can be expected to have the same methylation state as the corresponding nucleotide locus in the reference. As used herein, the term "correlates" as between a specific diagnosis of a sample or of an individual and the changes in methylation state of a nucleic acid target gene region refers to an identifiable connection between a particular diagnosis of a sample or of an individual and its methylation state. As used herein, "nucleic acid synthesis" refers to a chemical or biochemical reaction in which a phosphodiester bond is formed between one nucleotide and a second nucleotide or an oligonucleotide. Nucleic acid synthesis can include enzymatic reactions such as DNA replication reactions such as PCR or transcription, or chemical reactions such as solid phase synthesis. Nucleic acid synthesis conditions refers to conditions of a nucleic acid molecule-containing solution in which nucleotide phosphodiester bond formation is possible. For example, a nucleic acid target gene molecule can be contacted with a primer, and can be treated under nucleic acid synthesis reactions, which can include, for example, PCR or transcription conditions, and, when the primer hybridizes to the nucleic acid target gene molecule, nucleotides can be synthesized onto the primer, that is, nucleotides can be enzymatically added via phosphodiester linkage to the 3' end of primer, however, when no primer is hybridized to the nucleic acid target gene molecule, it is possible that no nucleotides are synthesized onto the primer.

As used herein, "amplifying" refers to increasing the amount of a nucleic acid molecule or a number of nucleic acid molecules. Amplification may be performed by one or more cycles of polymerase chain reaction (PCR). Based on the 5' and 3' primers that are chosen the region or regions of the nucleic acid molecule or nucleic acid molecules to be amplified may be selected. Amplification can be by any means known to those skilled in the art, including use of the PCR, transcription, and other such methods. As used herein, "specifically amplifying" refers to increasing the amount of a particular nucleic acid molecule based on one or more properties of the molecule. For example, a nucleic acid molecule can be specifically amplified using specific hybridization of one or more primers to one or more regions of the nucleic acid molecule in PCR. Typically, specifically amplifying includes nucleic acid synthesis of a nucleic acid target gene molecule where a primer hybridizes with complete complementarity to a nucleotide sequence in the nucleic acid target gene molecule.

As used herein a "primer" is a polynucleotide such as DNA or RNA that because of its specific nucleotide sequence is able to hybridize to a template nucleic acid, whereupon an enzyme can catalyze addition of one or more nucleotides to the 3' hydroxyl group of the primer thorough formation of a phosphoester or phosphodiester bond in a nucleotide synethesis reaction such as transcription or DNA replication. As used herein, a "methylation specific primer" or "methylation state specific primer" refers to a primer that can specifically hybridize with a nucleic acid target gene region or a methylation-specific reagent-treated nucleic acid target gene molecule in accordance with the methylation state of the nucleic acid target gene molecule. For example, a nucleic acid target gene molecule can be treated with a methylation-specific reagent, resulting in a change in the nucleotide sequence of the nucleic acid target gene molecule as a function of the methylation state of the nucleic acid target gene molecule; and a methylation state specific primer can specifically hybridize to the treated methylated nucleic acid target gene molecule, without hybridizing to a treated unmethylated nucleic acid target gene molecule or without hybridizing to a treated, differently methylated nucleic acid target gene molecule. In another example, a nucleic acid target gene molecule can be treated with a methylation-specific reagent, resulting in a change in the nucleotide sequence of the nucleic acid target gene molecule as a function of the methylation state of the nucleic acid target gene molecule and a methylation state specific primer can specifically hybridize to the treated unmethylated nucleic acid target gene molecule, without hybridizing to a treated methylated nucleic acid target gene molecule or without hybridizing to a treated, differently unmethylated nucleic acid target gene molecule. Methylation specific primers that hybridize to a nucleic acid target gene molecule then can serve as primers for subsequent nucleotide synthesis reactions, such as PCR.

As used herein, an "amplified product" or "amplified nucleic acid" is any product of a nucleotide synthesis reaction using a nucleic acid target gene molecule as the template. Thus, a single-stranded nucleic acid molecule complementary to the treated nucleic acid target gene molecule and formed in the first amplification step is an amplified product. In addition, products of subsequent nucleotide synthesis reactions, which contain the same sequence as the treated nucleic acid target gene molecule, or the complement thereof, are amplification products. An amplification product can be a single-stranded nucleic acid molecule or a double-stranded nucleic acid molecule. As used herein, "fragmentation" or "cleavage" refers to a procedure or conditions in which a nucleic acid molecule, such as a nucleic acid target gene molecule or amplified product thereof, is severed into two or more smaller nucleic acid molecules. Such fragmentation or cleavage can be sequence specific, base specific, or nonspecific, and can be accomplished by any of a variety of methods, reagents or conditions, including, for example, chemical, enzymatic, physical fragmentation. As used herein, "fragments' ", "cleavage products ", "cleaved products" or grammatical variants thereof, refers to nucleic acid molecules resultant from a fragmentation or cleavage of a nucleic acid target gene molecule or amplified product thereof. While such fragments or cleaved products can refer to all nucleic acid molecules resultant from a cleavage reaction, typically such fragments or cleaved products refer only to nucleic acid molecules resultant from a fragmentation or cleavage of a nucleic acid target gene molecule or the portion of an amplified product thereof containing the corresponding nucleotide sequence of a nucleic acid target gene molecule. For example, it is within the scope of the present methods, compounds and compositions, that an amplified product can contain one or more nucleotides more than the amplified nucleotide region of the nucleic acid target gene sequence (e.g., a primer can contain "extra" nucleotides such as a transcriptional initiation sequence, in addition to nucleotides complementary to a nucleic acid target gene molecule, resulting in an amplified product containing "extra" nucleotides or nucleotides not corresponding to the amplified nucleotide region of the nucleic acid target gene molecule). In such an example, the fragments or cleaved products corresponding to the nucleotides not arising from the nucleic acid target gene molecule will typically not provide any information regarding methylation in the nucleic acid target gene molecule. One skilled in the art can therefore understand that the fragments of an amplified product used to provide methylation information in the methods provided herein are fragments containing one or more nucleotides arising from the nucleic acid target gene molecule, and not fragments containing nucleotides arising solely from a sequence other than that in the nucleic acid target gene molecule. Accordingly, one skilled in the art will understand the fragments arising from methods, compounds and compositions provided herein to include fragments arising from portions of amplified nucleic acid molecules containing, at least in part, nucleotide sequence information from or based on the representative nucleic acid target gene molecule. As used herein, "base specific cleavage" refers to selective cleavage of a nucleic acid at the site of a particular base (e.g., A, C, U or G in RNA or A, C, T or G in DNA) or of a particular base type (e.g., purine or pyrimidine). For example, C-specific cleavage refers to cleavage of a nucleic acid at every C nucleotide in the nucleic acid.

As used herein, the phrase "non-specifically cleaved", in the context of nucleic acid cleavage, refers to the cleavage of nucleic acid target gene molecule at random locations throughout, such that various cleaved fragments of different size and nucleotide sequence content are randomly generated. Cleavage at random locations, as used herein, does not require absolute mathematical randomness, but instead only a lack of sequence-based preference in cleavage. For example, cleavage by irradiative or shearing means can cleave DNA at nearly any position, however, such methods can result in cleavage at some locations with slightly more frequency than other locations. Nevertheless, cleavage at nearly all positions with only a slight sequence preference is still random for purposes herein. Non-specific cleavage using the methods described herein can result in the generation of overlapping nucleotide fragments.

As used herein, the phrase "statistically range in size" refers to the size range for a majority of the fragments generated using cleavage methods known in the art or disclosed herein, such that some of the fragments can be substantially smaller or larger than most of the other fragments within the particular size range. An example of such a statistical range in sizes of fragments is a Poisson distribution. For example, the statistical size range of 12-30 bases also can include some oligonucleotides as small as 1 nucleotide or as large as 300 nucleotides or more, but these particular sizes statistically occur relatively rarely. In some embodiments, there is no limit to the statistical range of fragments. In other embodiments, a statistical range of fragments can specify a range such that 10% of the fragments are within the specified size range, where 20% of the fragments are within the specified size range, where 30% of the fragments are within the specified size range, where 40% of the fragments are within the specified size range, where 50% of the fragments are within the specified size range, where 60% or more of the fragments are within the specified size range, where 70% or more of the fragments are within the specified size range, where 80% or more of the fragments are within the specified size range, where 90% or more of the fragments are within the specified size range, or where 95% or more of the fragments are within the specified size range.

As used herein, the phrase "set of mass signals" or a "mass peak pattern" refers to two or more mass determinations made for each of two or more nucleic acid fragments of a nucleic acid molecule. A "mass pattern" refers to two or more masses corresponding to two or more nucleic acid fragments of a nucleic acid molecule.

As used herein, a "subject" includes, but is not limited to, an animal, plant, bacterium, virus, parasite and any other organism or entity that has nucleic acid. Among animal subjects are mammals, including primates, such as humans. As used herein, "subject" may be used interchangeably with "patient" or "individual".

As used herein, "normal", when referring to a nucleic acid molecule or sample source, such as an individual or group of individuals, refers to a nucleic acid molecule or sample source that was not selected according to any particular criterion, and generally refers to a typical nucleotide sequence of a nucleic acid molecule or health condition of a sample source (e.g., one or more healthy subjects or one or more subjects that do not a disease). For example, a normal methylation state of a particular nucleotide locus can be the wild type methylation state of the nucleotide locus. In another example, a group of normal subjects can be a group of subjects not having a particular phenotype (such as a disease). As used herein, a "phenotype" refers to a set of parameters that includes any distinguishable trait of an organism. A phenotype can be physical traits and/or mental traits, such as emotional traits. A phenotype may also include a subject's disease diagnosis, prognosis or therapeutic response.

As used herein, a "methylation" or "methylation state" correlated with a disease, disease outcome or outcome of a treatment regimen refers to a methylation state of a nucleic acid target gene region or nucleotide locus that is present or absent more frequently in subjects with a known disease, disease outcome or outcome of a treatment regimen, relative to the methylation state of a nucleic acid target gene region or nucleotide locus than otherwise occur in a larger population of individuals (e.g., a population of all individuals). As used herein, a "plurality of genes" or a "plurality of nucleic acid target gene molecules" includes at least two, five, 10, 25, 50, 100, 250, 500, 1000, 2,500, 5,000, 10,000, 100,000, 1,000,000 or more genes or nucleic acid target gene molecules. A plurality of genes or nucleic acid target gene molecules can include complete or partial genomes of an organism or even a plurality thereof. Selecting the organism type determines the genome from among which the gene or nucleic acid target gene molecules are selected.

As used herein, "sample" refers to a composition containing a material to be detected. Samples include "biological samples", which refer to any material obtained from a living source, for example, an animal such as a human or other mammal, a plant, a bacterium, a fungus, a protist or a virus or a processed form, such as amplified or isolated material. The biological sample can be in any form, including a solid material such as a tissue, cells, a cell pellet, a cell extract, a biopsy, or feces, or a biological fluid such as urine, whole blood, plasma, serum, interstitial fluid, peritoneal fluid, lymph fluid, ascites, sweat, saliva, follicular fluid, breast milk, non-milk breast secretions, cerebral spinal fluid, seminal fluid, lung sputum, amniotic fluid, exudate from a region of infection or inflammation, a mouth wash containing buccal cells, synovial fluid, or any other fluid sample produced by the subject. In addition, the sample can be solid samples of tissues or organs, such as collected tissues, including bone marrow, epithelium, stomach, prostate, kidney, bladder, breast, colon, lung, pancreas, endometrium, neuron, muscle, and other tissues. Samples can include organs, and pathological samples such as a formalin- fixed sample embedded in paraffin. If desired, solid materials can be mixed with a fluid or purified or amplified or otherwise treated. Samples examined using the methods described herein can be treated in one or more purification steps in order to increase the purity of the desired cells or nucleic acid in the sample, Samples also can be examined using the methods described herein without any purification steps to increase the purity of desired cells or nucleic acid. In particular, herein, the samples include a mixture of matrix used for mass spectrometric analyses and a biopolymer, such as a nucleic acid.

As used herein, "array" refers to a collection of elements, such as nucleic acids. Typically an array contains three or more members. An addressable array is one in which the members of the array are identifiable, typically by position on a solid support. Hence, in general the members of the array will be immobilized to discrete identifiable loci on the surface of a solid phase. Arrays include a collection on elements on a single solid phase surface, such as a collection of nucleotides on a chip.

As use herein, the term "data set" refers to numerical values obtained from the analysis, such as by mass spectral analysis of the nucleic acid target gene region. These numerical values associated with analysis may be values such as peak height, area under the curve and molecular mass for example in the case of mass spectral analysis.

As used herein the term "data structure" refers to a combination of two or more data sets, applying one or more mathematical manipulations to one or more data sets to obtain one or more new data sets, or manipulating two or more data sets into a form that provides a visual illustration of the data in a new way. An example of a data structure prepared from manipulation of two or more data sets would be a hierarchical cluster.

As used herein, the term "breast cancer" refers to a condition characterized by anomalous rapid proliferation of abnormal cells in one or both breasts of a subject. The abnormal cells often are referred to as "neoplastic cells," which are transformed cells that can form a solid tumor. The term "tumor" refers to an abnormal mass or population of cells (i.e. two or more cells) that result from excessive or abnormal cell division, whether malignant or benign, and pre-cancerous and cancerous cells. Malignant tumors are distinguished from benign growths or tumors in that, in addition to uncontrolled cellular proliferation, they can invade surrounding tissues and can metastasize. In breast cancer, neoplastic cells may be identified in one or both breasts only and not in another tissue or organ, in one or both breasts and one or more adjacent tissues or organs (e.g. lymph node), or in a breast and one or more non-adjacent tissues or organs to which the breast cancer cells have metastasized.

The term "invasion" as used herein refers to the spread of cancerous cells to adjacent surrounding tissues. The term "invasion" often is used synonymously with the term "metastasis," which as used herein refers to a process in which cancer cells travel from one organ or tissue to another non-adjacent organ or tissue.

In the case of breast cancer, cancer cells in the breast(s) can spread to tissues and organs of a subject, and conversely, cancer cells from other organs or tissue can invade or metastasize to a breast. Cancerous cells from the breast(s) may invade or metastasize to any other organ or tissue of the body. Breast cancer cells often invade lymph node cells and/or metastasize to the liver, brain and/or bone and spread cancer in these tissues and organs. Breast cancers can spread to other organs and tissues and cause lung cancer, prostate cancer, colon cancer, ovarian cancer, cervical cancer, gastrointestinal cancer, pancreatic cancer, glioblastoma, bladder cancer, hepatoma, colorectal cancer, uterine cervical cancer, endometrial carcinoma, salivary gland carcinoma, kidney cancer, vulval cancer, thyroid cancer, hepatic carcinoma, skin cancer, melanoma, ovarian cancer, neuroblastoma, myeloma, various types of head and neck cancer, acute lymphoblastic leukemia, acute myeloid leukemia, Ewing sarcoma and peripheral neuroepithelioma, and other carcinomas, lymphomas, blastomas, sarcomas, and leukemias. The present invention also provides a method for identifying an unknown phenotype of a tissue or cell that correlates with changes in the methylation state of the tissue or cell comprising; treating a nucleic acid sample from said tissue or cell with a reagent that modifies unmethylated cytosine to produce uracil; amplifying a nucleic acid target gene region using at least one primer that hybridizes to a strand of the nucleic acid target gene region producing amplified nucleic acids; determining the characteristic methylation state of the nucleic acid target gene region by base specific cleavage and identification of methylation sites of the amplified nucleic acids; and comparing the ratio of methylated cytosine to unmethylated cytosine for each of the methylation sites of the characteristic methylation state of the sample from the tissue or cell nucleic acid to the ratio of methylated cytosine to unmethylated cytosine for each of the methylation sites of a tissue or cell nucleic acid sample of the same type having a known phenotype thereby identifying the unknown phenotype.

In one preferred aspect of the present invention analysis of the DNA methylation of a nucleic acid target gene region is obtained by MALDI-TOF MS analysis of base-specific cleavage products derived from amplified nucleic acid target gene molecules. In general, a PCR amplification product is generated from bisulfite treated DNA, which is transcribed in vitro into a single stranded RNA molecule and subsequently cleaved base-specifically by an endoribonuc lease. The conversion of cytosine to uracil during bisulfite treatment generates different base specific cleavage patterns that can be readily analysed by MALDI-TOF MS. These spectral analyses may be used to determine the ratio of methylated versus non-methylated nucleotide at each methylation site of the nucleic acid target gene region. One skilled in the art will recognise that the methylation state of any nucleic acid, nucleic acid target gene region or gene of interest may be determined using the methods of the present invention. In addition, one skilled in the art would recognise the importance of the location of CpG islands in identifying novel, unique or specific methylation states for diagnostic purposes. Correspondingly, the location of a CpG island in a nucleic acid of interest may indicate other CpG islands of significance located in and around, or in close proximity to, the initially identified CpG island. Consequently it would be reasonable that one skilled in the art would look to other areas in proximity to initially identified CpG island to locate other CpG islands of interest.

The Polycomb repressive complex (PRC2)

The invention provides in part evidence of de-novo DNA methylation moderated through genes in the polycomb group family. The Polycomb repressive complex 2 (PRC2) comprises histone deacetylases and methyltransferases and is involved in the initiation of genes silencing through chromatin silencing. PRC2 institutes methylation of Hisytone H3 lysine 27, which can be recognized by polycomb repressive complex 1 (PRCl) to maintain gene silencing (Valk-Lingbeek, M.E., Bruggeman, S. W. & van Lohuizen, M. Cell 118, 409-18 (2004)). PRC2 is of particular interest because of its ability to initiate gene silencing. PRC2 recruits DNA methyltransferases to establish cytosine methylation on the bases of H3K27 markings (Vire, E. et al. Nature 439, 871-4 (2006)) and (Schlesinger, Y. et al. Nat Genet 39, 232- 6 (2007)). Hypermethylation of PRC2 target genes in colon cancer has been reported, but it remains unclear how frequent this biological mechanism is employed in other types of cancer such as breast cancer, ovarian cancer, brain cancer, prostate cancer, colon cancer, lung cancer, kidney cancer, leukemias and melanoma.

Acute Myeloid Leukemia (AML)

Acute myelogenous leukemia (AML) is the most common form of leukemia with more than 10,000 people diagnosed each year, according to National Cancer Institute estimates.

Etiology Heredity, radiation, chemical and other occupational exposures, and drugs have been implicated in the development of AML. There is no direct evidence of a viral etiology in AML.

Heredity: Certain syndromes with somatic cell chromosome aneuploidy, e.g., Down Syndrome, are associated with an increased incidence of AML. Inherited diseases with excessive chromatin fragility, e.g., ataxia telangiectasia, are also associated with AML. Chemical and Other Exposures: Exposure to benzene, which is used as a solvent in the chemical, plastic, rubber, and pharmaceutical industries, is associated with an increased incidence of AML. Smoking and exposure to petroleum products, paint, embalming fluids, ethylene oxide, herbicides, pesticides, and electromagnetic fields have also been associated with an increased risk of AML.

Drugs: Antineoplastic drugs are the leading cause of drug-related (or treatment-associated) AML. Alkylating agent-associated leukemia occurs on average 48-72 months after exposure and demonstrates aberrations in chromosomes 5 and 7. Topoisomerase II inhibitor-associated leukemias occur 1-3 years after exposure and usually have aberrations involving chromosome band 1 Iq23. Similarly, chloramphenicol, phenylbutazone, and less commonly chloroquine and methoxypsoralen have been reported to result in bone marrow failure that may evolve into AML. Classification

Currently, the categorization of acute leukemia into biologically distinct groups is based on morphology, cytochemistry and immunophenotype as well as cytogenetic and molecular techniques. See Table 1 below:

Source: BD Cheson et Ά\, J CUΠ Oncol 8:813, 1990. Morphologic and Cytochemical Classification: The diagnosis of AML is established by the presence of at least 20% myeloblasts in blood and/or bone marrow according to the World Health Organization classification. Once diagnosed, AML is classified based on morphology and cytochemistry according the FAB schema (see Table 1), which includes eight major subtypes, M0-M7. Immunophenotypic Classification: The phenotype of human myeloid leukemia cells can be studied by multiparameter flow cytometry following labeling with monoclonal antibodies to cell-surface antigens. While results are useful for both diagnosis and prognosis, the process is complicated, time consuming and expensive. For example, M7 can often be diagnosed only by expression of the platelet- specific antigen cluster designation (CD) 41 or by electron-microscopic demonstration of myeloperoxidase.

Chromosomal Classification: Chromosomal analysis of the leukemic cell currently provides the most important pretreatment prognostic information for AML, but suffers from resolution limitations especially among those AML patients that fall into an "intermediate" risk group. Therefore, any improvement of existing AML classification methods (in terms of accuracy, speed and cost) has tremendous utility within the AML diagnostic, prognostic and therapeutic area. Two cytogenetic abnormalities have been invariably associated with a specific FAB group: T(15;17)(q22;ql2) with M3 and inv(16)(pl3q22) with M4Eo, and many chromosomal abnormalities have been associated primarily with one FAB group, including t(8;21)(q22;q22) with M2. Many of the recurring chromosomal abnormalities in AML have been associated with specific clinical characteristics. Changes in chromosomes in leukemia cells can be identified in 80% of children with AML. More commonly associated with younger age onset are t(8;21) and t(15;17), and with older age onset, del(5q) and del(7q). With currently available treatments, 30-50% of children with AML are cured. It is important to identify those children who can be cured with standard treatments and those who should receive more individualized treatment or more aggressive treatment. The distinct type of chromosomal abnormality present at diagnosis has been shown to help identify patients with a "good" or "bad" outcome.

For example, in one Pediatric Oncology Group study, outcomes of 478 children with AML were reported. They found that children with an inverted 16th chromosome had a survival rate without relapse of 58%, those with a translocation of chromosomes 8 and 21 had a survival rate without relapse of 45% and patients with no chromosomal abnormalities had a survival rate without relapse of 45%. Children with translocation of chromosomes 15 and 17 had a survival rate without relapse of 20% and children with 1 Iq23 abnormalities had a survival rate of 24%. This study demonstrates the benefit of using clinical data to decide which treatment regimen is best suited for patients suffering from AML. Molecular Classification: Molecular studies of many recurring cytogenetic abnormalities have revealed genes that may by involved in leukogenesis. The 15; 17 translocation encodes a chimeric protein, Pml/Rarα, which is formed by the fusion of the retinoic acid receptor-α (RARα) gene from chromosome 17 and the promyelocytic leukemia (PML) gene from chromosome 15. The Pml-Rarα fusion protein tends to suppress gene transcription and blocks differentiation of the cells. Pharmacologic doses of the Rarα ligand, dλX-trans-rQtmoϊc acid (tretinoin), relieve the block and promote differentiation.

Similar translocations resulting in molecular aberrations involved in leukogenesis include inv(16), t(8;21), and 1 Iq23, all of which are increasingly being used for diagnosis and detection of residual disease after treatment. Molecular aberrations are also being identified that are useful for classifying risk of relapse in patients without cytogenetic abnormalities. A partial tandem duplication (PTD) of the MLL gene is found in 5-10% of patients with normal cytogenetics and results in short remission duration.

Recently, more wide-scale gene expression profiling has been used in to improve the molecular AML classification. Initial studies have provided useful results identifying novel AML subgroups and prognostic gene expression signatures (Bullinger L. et al. N Engl J Med 350: 1605-16 (2004)) and (VaIk PJ et al. NEnglJMed 350:1617-28 (2004)). In addition, Bullinger et al observed differential expression of DNA methylation enzymes (regulators) DNMT3A and DNMT3B in AML patients. DNA methylation is recognized as a key regulatory element of gene expression (Feinberg, AP Nat Genet 27:9-10 (2001), therefore these findings point to a potential pathogenic role of aberrant DNA methylation patterns in subgroups of AML patients resulting in distinct gene expression signatures. In particular, aberrant promoter hypermethylation represents an important mechanism in the initiation and progression of human cancer. Aberrant methylation patterns have also been described in AML by Toyota, M. et al (Blood 97:2823-9 (2001)) and Issa JP (Nat Rev Cancer 4:988-93 (2004)).

Thus, in an embodiment of the invention, the methods described herein may be used alone or in combination with currently used morphology (e.g., the percent of myeloblasts in blood and/or bone marrow), cytochemistry, immunophenotype (e.g., platelet-specific antigen cluster designation) as well as cytogenetic and molecular techniques (e.g., gene expression) to provide a better means to stratify AML patients into different risk groups and accordingly administer the proper treatment regimen as determined by one skilled in the art. Clinical Presentation

Symptoms: Patients with AML most often present with nonspecific symptoms that begin gradually or abruptly and are the consequence of anemia, leukocytosis, leukopenia or leukocyte dysfunction, or thrombocytopenia. Nearly half have had symptoms for greater than three months before the leukemia was diagnosed.

Half of leukemia patients mention fatigue as the first symptom, but most complain of fatigue or weakness at the time of first diagnosis. Anorexia and weight loss are common. Fever with or without an identifiable infection is the initial symptom in -10% of patients. Signs of abnormal hemostasis are noted in 5% of patients. On occasion, bone pain, lymphaderiopathy, non-specific cough, headache, or diaphoresis is the presenting symptom.

Physical Findings: Fever, splenomegaly, hepatomegaly, lymphadenopathy, sternal tenderness, and evidence of infection and hemorrhage are often found at diagnosis. Significant gastrointestinal bleeding, intrapulmonary hemorrhage, or intracranial hemorrhage occur most often in acute promyelocytic leukemia (APL). Retinal hemorrhages are detected in 15% of patients.

Hematologic Findings: Anemia is usually present at diagnosis and can be severe. The degree varies considerably irrespective of other hematologic findings, splenomegaly, or the duration of symptoms. Decreased erythropoiesis often results in a reduced reticulocyte count, and erythrocyte survival is decreased by accelerated destruction. Active blood loss also contributes to the anemia.

The median presenting leukocyte count is about 15,000/μl. Between 25 and 40% of patients have counts <5,000/μl, and 20% have counts >100,000/μl. Fewer than 5% have no detectable leukemic cells in the blood. Poor neutrophil function may be noted functionally by impaired phagocytosis and migration and morphologically by abnormal lobulation and deficient granulation. Platelet counts <100,000/μl are found at diagnosis in -75% of patients, and about 25% have counts <25,000/μl.

Pretreatment Evaluation: Once the diagnosis of AML is suspected, a rapid evaluation and initiation of appropriate therapy should follow. Factors that have prognostic significance, for example, for achieving complete remission (CR), for predicting the duration of CR or for predicting survivability, should also be assessed before initiating treatment. Prognostic Factors

Although 70-80% of younger AML patients achieve complete remission (CR) with current chemotherapy induction regimens, more than half of these patients relapse and die of their disease. More intensive consolidation treatments, such as allogeneic stem cell transplantation, often prevent relapse, but are themselves associated with high treatment-related mortality (Giles, FJ. et al. Acute myeloid leukemia. Hematology (Am Soc Hematol Educ Program), 73-110 (2002)). Therefore, it is crucial to stratify patients by risk in order to prescribe the appropriate treatment regimen that matches their risk profile. For example, a patient with a poor prognosis (i.e., high risk) may be more willing to assume the risks associated with intensive consolidation treatments, such as allogeneic stem cell transplantation.

Many factors influence the likelihood of entering CR, the length of CR, and the curability of AML. In an embodiment of the invention, the methylation-based prognostic methods provided herein may be used to predict the probability of a subject's likelihood of complete remission following induction therapy wherein said likelihood of complete remission is correlated with changes in the methylation state of said subject. CR is defined after examination of both blood and bone marrow. The blood neutrophil count must be >1500/μl and the platelet count >100,000/μl. Hemoglobin concentration or hematocrit are not considered in determining CR. Circulating blasts should be absent. While rare blasts may be detected in the blood during marrow regeneration, they should disappear on successive studies. Bone marrow cellularity should be >20% with trilineage maturation. The bone marrow should contain <5% blasts, and Auer rods should be absent. For patients in CR, reverse transcriptase PCR to detect AML-associated molecular abnormalities and FISH to detect AML-associated cytogenetic aberrations are currently used to detect residual disease. Methods to detect minimal residual disease may become a reliable discriminator between patients in CR who do or do not require additional and/or alternative therapies. Prognostic factors are influenced by the treatment used.

Other prognostic factors include the following: age at diagnosis, chromosome findings at diagnosis, history of an antecedent hematologic disorder, history of a previous malignany, a high presenting leukocyte count, and other factors described in the FAB classification diagnosis of Table 1 (e.g., leukemic cell characteristics such as ultrastructural features, immunophenotype, expression of the MDRl gene, etc.). In addition to pretreatment variables, several treatment factors correlate with prognosis in AML, including the quickness with which the blast cells disappear from the blood after the institution of therapy. In addition, patients who achieve CR after one induction cycle have longer CR durations than those requiring multiple cycles. Treatment Options for AML

Although treatment of acute myeloid leukemia (AML) has improved dramatically over the past 30 years, the majority of patients with this disease will die within two years of diagnosis. Researchers have learned that the best way to cure patients with AML is to administer large doses of chemotherapeutic agents in a short period of time. The concept is to kill leukemia cells within 6 months before resistance to the drugs occurs. Therapy is divided into two phases: remission induction and post- remission consolidation/maintenance. Induction chemotherapy is administered to produce a complete remission (CR) in the bone marrow. Once CR is obtained, further therapy must be used to prolong survival and achieve cure. The initial induction treatment and subsequent consolidation therapy are often chosen based upon the prognostic factors described above. In an embodiment of the invention, the initial induction treatment may be chosen based soley upon the methylation-based prognostic methods provided herein or in combination with existing prognostic factors or markers. The influence of intensifying therapy with traditional chemotherapy agents such as cytarabine and anthracyclines in younger and/or lower risk patients appears to increase the cure rate of AML. In older and/or higher risk patients, the benefit of intensive therapy has been more difficult to document and therefore pursuit of novel therapies as consolidation for these patients is being actively pursued.

Remission Induction Therapy: During remission induction therapy, patients are given large doses of chemotherapy over a period of 5-7 days. These chemotherapy drugs kill leukemia cells and normal bone marrow cells. The major side effects of these drugs are related to toxicities of rapidly growing cells in the body, i.e., normal bone marrow, skin and the gastrointestinal tract. Each drug also has specific side effects for other organs.

Figure 9 is a flow chart outlining the therapeutic options available to a newly diagnosed AML patient. For all forms of AML, except APL, standard therapy includes a 7-day continuous infusion of cytarabine, and a 3 -day course of an anthracycline. The anthracyclines include daunorubicin (Cerubidine), doxorubicin (Adriamycin, Rubex), epirubicin (Ellence, Pharmorubicin), and idarubicin (Idamycin). Following induction, patients typically require 2-3 weeks for bone marrow blood cell production to recover. During this time, patients often require blood and platelet transfusions to maintain red blood cell and platelet levels. In order to reduce the risk of infection, antibiotics and blood cell growth factors that stimulate the bone marrow to produce normal white blood cells are often given during this period of time. Neupogen® and Leukine® are white blood cell growth factors currently approved by the Food and Drug Administration to facilitate white blood cell production. After 2-3 weeks, blood counts will begin to recover and often return to normal. A bone marrow examination is repeated to see if a remission has been achieved. For patients in remission, the consolidation therapy will begin. If patients have not achieved a remission, another induction course of treatment will be given immediately. However, for patients with an HLA-compatible marrow donor, consideration should be given to having an immediate allogeneic stem cell transplant without receiving a second course of induction therapy. This will depend on chances of achieving a remission with a second cycle of chemotherapy. However, even if a remission is achieved with a second cycle of chemotherapy, remission duration is often very short despite consolidation. For patients with acute promyelocytic leukemia (M3), all-trans-retinoic acid, Vesanoid®, may be included in the remission induction regimen. Patients with acute promyelocytic leukemia typically receive Vesanoid® at some time during their treatment course. There are ongoing clinical trials to determine the optimal time to administer this drug. Strategies to Improve Remission Induction

New Drug Development: All new drugs for the treatment of patients with AML are tested first in patients with relapsed or refractory disease. When they are found to be effective, they are then evaluated in remission induction regimens.

Mylotarg®: Mylotarg® is a targeted chemotherapy, comprised of a monoclonal antibody attached to calicheamicin, an antibiotic that kills cancer cells. Monoclonal antibodies are proteins that can be produced in a laboratory and are able to identify specific antigens (small carbohydrates and/or proteins) on the surface of certain cells and bind to them. This binding stimulates the immune system to attack and kill the cells to which the monoclonal antibody is bound. Mylotarg® is targeted against the CD 33 antigen, a protein found on the surface of cancerous blood cells. Calicheamicin is an antibiotic substance that is toxic to cancer cells. Once the monoclonal antibody binds to the cancer cells, calicheamicin is absorbed into the cells and kills them. A significant benefit of this approach is that Mylotarg® mainly targets cancer cells, thereby sparing healthy cells from destruction. This is in contrast to chemotherapy or radiation, which do not differentiate between cancer cells or healthy cells in the body, a characteristic that leads to potentially intolerable side effects. The European Organization for Research and Treatment of Cancer (EORTC) is currently conducting a clinical trial evaluating Mylotarg® plus intensive chemotherapy consisting of mitoxantrone, cytarabine and etoposide (MICE) as induction therapy for AML patients over the age of 60. Of the 34 patients in this trial so far, nearly 50% achieved an anti-cancer response to Mylotarg® alone. Approximately two months following Mylotarg® plus chemotherapy, over 40% of patients in the trial were in a complete remission (disappearance of cancer). At four and six months following therapy, the estimated survival rates are 65% and 57%, respectively. All patients had low blood cell levels from treatment, with other side effects being consistent with standard intensive chemotherapy regimens. Other clinical trials are ongoing to evaluate Mylotarg® either alone or in combination with other therapies. Multiple Drug Resistance Inhibitors: Patients with AML may fail to achieve a remission or relapse because of chemotherapy drug resistance genes that can be present at the time of diagnosis or are induced by treatment. Several drugs are being tested to determine if they will overcome or prevent the development of multiple drug resistance in AML as part of remission induction strategies. Post-Remission Therapy for Acute Myeloid Leukemia

If a complete remission is achieved and no further therapy given, over 90% of patients will have a recurrence of disease in weeks to months. Therefore, patients who achieve complete remission almost always undergo some form of consolidation therapy, including sequential courses of high dose cytarabine, high-dose combination therapy with allogeneic stem cell transplant (SCT), or novel therapies, based on their predicted risk of relapse (i.e., risk-stratified therapy), their perceptions of the outcomes associated with each treatment, the availability of an HLA-matched sibling stem cell donor, their physician's bias concerning the appropriateness of each treatment option, and the geographic availability of each treatment. In an embodiment of the invention, the consolidation therapy may be chosen based soley upon the methylation-based prognostic methods provided herein or in combination with existing factors or markers provided above.

Post-remission therapy treatments are given as close together as possible. The more intensive the chemotherapy and the closer together the courses of therapy are given, the less chance the leukemia has of returning (i.e., lower doses of drugs do not work as well as higher doses of drugs). In two randomized studies, high-dose cytarabine with an anthracycline produced CR rates similar to those achieved with standard 7 and 3 regimens. However, the CR duration was longer after high-dose cytarabine than after standard-dose cytarabine.

Risks and Benefits of an Allogeneic Stem Cell Transplant: If an allogeneic stem cell transplant is performed as consolidation, patients may proceed directly to the transplant following remission induction, as there does not appear to be an advantage to receiving chemotherapy in addition to that related to the transplant itself. In essence, the transplant is the consolidation treatment. Additional chemotherapy not related to the transplant procedure for consolidation before the allogeneic transplant may increase toxicity without preventing relapses.

Patients with a suitable stem cell donor who should consider an allogeneic transplant as consolidation immediately after remission induction include patients with normal cytogenetics or adverse cytogenetic abnormalities, patients who require more than one induction cycle to achieve a remission, and patients who refuse to undergo the 3-4 cycles of consolidation and maintenance required for adequate control of disease with conventional chemotherapy alone. In an embodiment of the invention, patients with a suitable stem cell donor who should consider an allogeneic transplant as consolidation immediately after remission induction may further include patients with a poor prognosis based soley upon the methylation-based prognostic methods provided herein or in combination with existing factors or markers provided above. Some patients with a suitable stem cell donor may consider delaying allogeneic transplant until first relapse. Patients over the age of 50-60, depending on other risk factors and general condition, patients with acute promyelocytic leukemia, and patients with "good" cytogenetic abnormalities (t8-22 and inverted 16) who can tolerate all prescribed consolidation therapy may not need to expose themselves to the immediate risk of an allogeneic stem cell transplant. In an embodiment of the invention, patients with a good prognosis based on the methylation-based methods provided herein, may not choose to undergo allogeneic transplant or may consider delaying allogeneic transplant until first relapse in order to not expose themselves to the immediate risk of an allogeneic stem cell transplant.

For patients who choose to have a stem cell transplant only if they relapse, it is important that it be performed at the very first sign of relapse. This requires bone marrow examinations every 4-6 weeks for the first 2 years after diagnosis. This strategy offers the best chance to catch the leukemia early when treatment will be more effective.

Consolidation Chemotherapy: Consolidation chemotherapy typically consists of 3 to 4 cycles of cytarabine given in high doses over 5 days in conjunction with additional chemotherapy drugs such as etoposide, daunomycin or idarubicin. Remission duration has been correlated with the dose of cytarabine and the number of cycles administered. In general, the more intensive the consolidation, the higher the cure rate.

The administration of consolidation chemotherapy interferes with the production of blood cells by the bone marrow, resulting in low white cell counts in the blood. There is usually a delay of one to two weeks after the administration of chemotherapy before the bone marrow resumes function, leaving patients with low blood counts for days or weeks. During this time, patients are often hospitalized and given antibiotics and observed for infections. Neupogen® and Leukine® are growth factors that hasten the recovery of white blood cells after the administration of chemotherapy.

Consolidation chemotherapy is typically associated with 14-21 days of myelosuppression similar to induction for each of 3-4 courses. For patients who are unwilling or unable to undergo the complex and intensive chemotherapy required for consolidation therapy, either an autologous or allogeneic transplant may be considered, since these treatments condense the therapy and produce results that are equivalent or superior to the best chemotherapy regimens. Strategies to Improve Post-Remission Therapy Allogeneic SCT in first CR should be strongly considered by patients with high-risk karyotypes.

Patients with normal karyotypes who have other poor risk factors (antecedent hematologic disorder, failure to attain remission with a single induction course, hyperleukocytosis, PTD or the MLL gene, and FLT3 abnormalities) are also potential candidates. If a suitable HLA donor does not exist, autologous SCT or novel therapeutic approaches are considered. In each of the above cases, a patient's methylation state as determined by the methods provided herein offers the patient and doctor additional information to consider while deciding whether to pursue allogeneic SCT or any other AML treatment available. Possible Future Treatments

While significant progress has been made in the treatment of leukemia, many patients still succumb to leukemia and the complications of treatment and better treatment strategies are still needed. Future progress in the treatment of leukemia will result from continued participation in appropriate clinical studies. Currently, there are several areas of active exploration aimed at improving the treatment of leukemia.

Monoclonal Antibodies: Another approach is to deliver additional treatment directed specifically to cancer cells and avoid harming the normal cells. Monoclonal antibodies are proteins that can be produced in a laboratory that can locate cancer cells and kill them directly or stimulate the immune system to kill them. Some monoclonal antibodies have to be linked to a radioactive isotope or a toxin in order to kill cells and the antibodies essentially serve as a delivery system. Monoclonal antibodies such as Mylotarg® can be administered alone or with chemotherapy and are being evaluated to determine whether they can improve cure rates.

Mylotarg® is the first antibody-targeted chemotherapy and represents a breakthrough technology in the treatment of AML. It is currently approved by the FDA for the treatment of elderly patients with recurrent AML and is in clinical trials to evaluate its efficacy alone and in combination with other therapies in different stages of AML. Mylotarg® is comprised of a monoclonal antibody attached to calicheamicin, an antibiotic that kills cancer cells. Mylotarg® is targeted against the CD 33 antigen, a protein found on the surface of cancerous blood cells. Calicheamicin is an antibiotic substance that is toxic to cancer cells. Once the monoclonal antibody binds to the cancer cells, calicheamicin is absorbed into the cells and kills them.

Researchers from Saint Louis University Health Sciences recently conducted a small trial to evaluate the effectiveness of Mylotarg® as consolidation therapy for patients with AML in first remission (disappearance of cancer). In this trial, five patients received Mylotarg® within one to four months of being in complete remission following standard induction and consolidation therapy. Four patients remained in complete remission for 10 to 15 months. Two of these patients later received an allogeneic stem cell transplant and are free of cancer at nine months after the transplant. All patients had severely low levels of white blood cells following treatment with Mylotarg®; however, there were no treatment- related deaths. Future clinical trials will be evaluating the effectiveness of incorporating Mylotarg® into consolidation therapy for AML.

Supportive Care: Supportive care refers to treatments designed to prevent and control the side effects of cancer and its treatment. Side effects not only cause patients discomfort, but also may prevent the optimal delivery of therapy at its planned dose and schedule. In order to achieve optimal outcomes from treatment and improve quality of life, it is imperative that side effects resulting from cancer and its treatment are appropriately managed.

Stem Cell Transplant: High-dose chemotherapy and autologous or allogeneic stem cell transplantation is currently a superior consolidation treatment option for many patients. New Consolidation Chemotherapy Regimens: Development of new multi-drug chemotherapy treatment regimens that incorporate new or additional anti-cancer therapies for use as treatment is an active area of clinical research. New anti-cancer therapies that are being evaluated in combination with consolidation chemotherapy include the following:

Multiple Drug Resistance Inhibitors: Patients with AML fail to achieve a remission or relapse because of chemotherapy drug resistance that can be present at the time of diagnosis or are induced by treatment. Several drugs are being tested to determine if they will overcome or prevent the development of multiple drug resistance in AML as part of remission induction strategies.

Biological Modifier Therapy: Biologic response modifiers are naturally occurring or synthesized substances that direct, facilitate or enhance the body's normal immune defenses. Biologic response modifiers include interferons, interleukins and monoclonal antibodies. In an attempt to improve survival rates, these and other agents are being tested alone or in combination with chemotherapy in clinical studies. Interleukin-2 is currently being evaluated as a maintenance agent after consolidation therapy.

Newer biologic agents are in the developmental phase.

Treatment for Minimal Residual Disease: Following post-remission treatment, patients typically achieve a complete remission (complete disappearance of the cancer). Unfortunately, many patients in remission still experience a relapse of leukemia. This is because not all the leukemia cells were destroyed.

Doctors refer to this as a state of "minimal residual disease." Many doctors believe that applying additional treatments when only a few leukemia cells remain represents the best opportunity to prevent the leukemia from returning. Immunotherapy to activate the body's anti-cancer defense system or other agents including monoclonal antibodies, biologic response modifiers and chemotherapy drugs can be administered over several weeks to months in an attempt to eliminate any leukemia cells remaining in the body. Relapsed Acute Myeloid Leukemia

If a remission is not achieved or a recurrence occurs, there are essentially two choices of therapy. Since subsequent treatment with chemotherapy is rarely curative, a palliative approach can be adopted where biologic agents, such as Mylotarg®, or chemotherapy drugs are administered in non-toxic doses to keep the disease under control for as long as possible. In this situation, the emphasis is on the quality of life and supportive care measures.

The alternative approach is to receive more intensive treatment in an attempt to produce a complete remission. There are two main intensive strategies available. For younger patients, a bone marrow or blood stem cell transplant offers a possibility for control or cure of the leukemia. The other approach is to participate in clinical trials evaluating new treatments.

The most important factors predicting response at relapse are the length of the previous CR, whether initial CR was achieved with one or two courses of chemotherapy, and the type of post-remission therapy. When predicting response at relapse, a patient's methylation state as determined by the methods provided herein offers the patient and doctor additional information to consider while deciding which post-remission therapy to select.

Breast Cancer

Breast cancer is typically described as the uncontrolled growth of malignant breast tissue. Breast cancers arise most commonly in the lining of the milk ducts of the breast (ductal carcinoma), or in the lobules where breast milk is produced (lobular carcinoma). Other forms of breast cancer include Inflammatory Breast Cancer and Recurrent Breast Cancer. Inflammatory breast cancer is a rare, but very serious, aggressive type of breast cancer. The breast may look red and feel warm with ridges, welts, or hives on the breast; or the skin may look wrinkled. It is sometimes misdiagnosed as a simple infection. Recurrent disease means that the cancer has come back after it has been treated. It may come back in the breast, in the soft tissues of the chest (the chest wall), or in another part of the body.

In an effort to detect breast cancer as early as possible, regular physical exams and screening mammograms often are prescribed and conducted. A diagnostic mammogram often is performed to evaluate a breast complaint or abnormality detected by physical exam or routine screening mammography. If an abnormality seen with diagnostic mammography is suspicious, additional breast imaging (with exams such as ultrasound) or a biopsy may be ordered. A biopsy followed by pathological (microscopic) analysis is a definitive way to determine whether a subject has breast cancer. Excised breast cancer samples often are subjected to the following analyses: diagnosis of the breast tumor and confirmation of its malignancy; maximum tumor thickness; assessment of completeness of excision of invasive and in situ components and microscopic measurements of the shortest extent of clearance; level of invasion; presence and extent of regression; presence and extent of ulceration; histological type and special variants; pre-existing lesion; mitotic rate; vascular invasion; neurotropism; cell type; tumor lymphocyte infiltration; and growth phase.

The stage of a breast cancer can be classified as a range of stages from Stage 0 to Stage IV based on its size and the extent to which it has spread. The following table summarizes the stages:

TABLE 2

Stage 0 cancer is a contained cancer that has not spread beyond the breast ductal system. Fifteen to twenty percent of breast cancers detected by clinical examinations or testing are in Stage 0 (the earliest form of breast cancer). Two types of Stage 0 cancer are lobular carcinoma in situ (LCIS) and ductal carcinoma in situ (DCIS). LCIS indicates high risk for breast cancer. Many physicians do not classify LCIS as a malignancy and often encounter LCIS by chance on breast biopsy while investigating another area of concern. While the microscopic features of LCIS are abnormal and are similar to malignancy, LCIS does not behave as a cancer (and therefore is not treated as a cancer). LCIS is merely a marker for a significantly increased risk of cancer anywhere in the breast. However, bilateral simple mastectomy may be occasionally performed if LCIS patients have a strong family history of breast cancer. In DCIS the cancer cells are confined to milk ducts in the breast and have not spread into the fatty breast tissue or to any other part of the body (such as the lymph nodes). DCIS may be detected on mammogram as tiny specks of calcium (known as microcalcifications) 80% of the time. Less commonly DCIS can present itself as a mass with calcifications (15% of the time); and even less likely as a mass without calcifications (<5% of the time). A breast biopsy is used to confirm DCIS. A standard DCIS treatment is breast- conserving therapy (BCT), which is lumpectomy followed by radiation treatment or mastectomy. To date, DCIS patients have chosen equally among lumpectomy and mastectomy as their treatment option, though specific cases may sometimes favor lumpectomy over mastectomy or vice versa. In Stage I, the primary (original) cancer is 2 cm or less in diameter and has not spread to the lymph nodes. In Stage HA, the primary tumor is between 2 and 5 cm in diameter and has not spread to the lymph nodes. In Stage HB, the primary tumor is between 2 and 5 cm in diameter and has spread to the axillary (underarm) lymph nodes; or the primary tumor is over 5 cm and has not spread to the lymph nodes. In Stage IIIA, the primary breast cancer of any kind that has spread to the axillary (underarm) lymph nodes and to axillary tissues. In Stage IIIB, the primary breast cancer is any size, has attached itself to the chest wall, and has spread to the pectoral (chest) lymph nodes. In Stage IV, the primary cancer has spread out of the breast to other parts of the body (such as bone, lung, liver, brain). The treatment of Stage IV breast cancer focuses on extending survival time and relieving symptoms. Based in part upon selection criteria set forth above, individuals having breast cancer can be selected for genetic studies. Also, individuals having no history of cancer or breast cancer often are selected for genetic studies. Other selection criteria can include: a tissue or fluid sample is derived from an individual characterized as Caucasian; the sample was derived from an individual of German paternal and maternal descent; the database included relevant phenotype information for the individual; case samples were derived from individuals diagnosed with breast cancer; control samples were derived from individuals free of cancer and no family history of breast cancer; and sufficient genomic DNA was extracted from each blood sample for all allelotyping and genotyping reactions performed during the study. Phenotype information included pre- or post-menopausal, familial predisposition, country or origin of mother and father, diagnosis with breast cancer (date of primary diagnosis, age of individual as of primary diagnosis, grade or stage of development, occurrence of metastases, e.g., lymph node metastases, organ metastases), condition of body tissue (skin tissue, breast tissue, ovary tissue, peritoneum tissue and myometrium), method of treatment (surgery, chemotherapy, hormone therapy, radiation therapy).

Lung Cancer

Lung cancer is the rapid proliferation of abnormal cells in one or both of the lungs. While normal lung tissue cells reproduce and develop into healthy lung tissue, these abnormal cells proliferate rapidly and rarely form normal lung tissue. Instead, the abnormal cells proliferate, form tumors, and disrupt the lung, thereby decreasing lung function and eventually lead to death. As used herein, the term "lung cancer" refers to a condition characterized by anomalous rapid proliferation of abnormal cells in one or both lungs of a subject. The abnormal cells often are referred to as "neoplastic cells," which are transformed cells that can form a solid tumor. The term "tumor" refers to an abnormal mass or population of cells (i.e. two or more cells) that result from excessive or abnormal cell division, whether malignant or benign, and pre-cancerous and cancerous cells. Malignant tumors are distinguished from benign growths or tumors in that, in addition to uncontrolled cellular proliferation, they can invade surrounding tissues and can metastasize. In lung cancer, neoplastic cells may be identified in one or both lungs only and not in another tissue or organ, in one or both lungs and one or more adjacent tissues or organs (e.g. lymph node), or in a lung and one or more non-adjacent tissues or organs to which the lung cancer cells have metastasized. Lung cancer includes squamous cell carcinoma (SCC), adenocarcinoma, small-cell lung cancer, and non-small cell lung cancer.

The World Health Organization classifies lung cancer into four major histological types: (1) squamous cell carcinoma (SCC), (2) adenocarcinoma, (3) large cell carcinoma, and (4) small cell lung carcinoma (SCLC). (see e.g. "The World Health Organization histological typing of lung tumours," Am J Clin Pathol 1982; 77:123-136; see also the World Wide Web URL lungcancer.org). However, there is a great deal of tumor heterogeneity even within the various subtypes, and it is not uncommon for lung cancer to have features of more than one morphologic subtype. The term non-small cell lung carcinoma (NSCLC) includes squamous, adenocarcinoma and large cell carcinomas.

Lung cancer can be categorized into the following stages:

Stage IA: Tumor of any size found in the lung only. Tumor less than 3 cm without evidence of lymph node or metastatic spread.

Stage IB: Tumor of any size found in the lung only. Tumor greater than 3 cm plus atelectasis without evidence of lymph node or metastatic spread. Tumor greater than 3 cm, involves main bronchus, invades the visceral pleura, or associated with atelectasis or pneumonitis that does not involve entire lung; no evidence of lymph node or distant metastases.

Stage HA: Tumor has spread to lymph nodes associated with the lung. Tumor less than 3 cm; ipsilateral peribronchial and/or hilar lymph node disease; no distant metastases. Stage HB: Tumor has spread to lymph nodes associated with the lung. Tumor greater than 3 cm, involves main bronchus, invades visceral pleura, or is associated with atelectasis or pneumonitis that does not involve the whole lung; ipsilateral peribronchial and/or hilar lymph node disease; no distant metastases.

Stage IHA: Tumor has spread to lymph nodes in the tracheal area, including chest wall and diaphragm. Tumor less than 3 cm; metastases in ipsilateral hilar and mediastinal lymph nodes; no distant metastases. In advanced stages of Stage IIIA, tumor greater than 3 cm, involves main bronchus, invades visceral pleura, with atelectesis or pneumonitis that does not involve the entire lung; metastases in ipsilateral hilar and mediastinal lymph nodes; no distant metastases. In further advanced stages of Stage IIIA, tumor of any size that invades the chest wall, diaphragm, mediastinal pleura, parietal pericardium or superior sulcus tumors; or tumor in the main bronchus less than 2 cm distal to the carina but not involving the carina; or atelectasis or pneumonitis of entire lung; no lymph node or distant metastases. Stage HIB: Tumor has spread to lymph nodes on opposite lung or in the neck. Any size tumor; metastases in contralateral mediastinal or hilar lymph nodes; any scalene or supraclavicular lymph node(s); no distant metastases. In advanced stages of Stage IIIB, tumor of any size that invades any of the following: liver, mediastinum, heart, great vessels, trachea, esophagus, vertebral body, carina; or malignant pleural effusion; any lymph nodes (0-3); no distant metastases. Stage IV: Tumor has spread beyond the chest, most often to the brain or bones. Any size Ml tumor; any nodal involvement: evidence of distant metastasis.

Early detection of lung cancer is critical to improving chances of survival. Physicians use a number of different tests to detect and diagnose lung cancer, including imaging scans that provide more accurate and sensitive results than conventional X-rays. Information from these tests enables the physician to determine the type and stage of the cancer and the best way to treat the disease. Tests include physical examination (e.g., detecting signs such as swollen lymph nodes in the neck or collarbone area); chest examination (e.g., examining the chest and listening to the lungs for abnormal breathing sounds or patterns); chest X-ray (e.g., identifying abnormal growths); CT scans and MRI's; PET scans (finding cancerous tumors because of their ability to take up radioactive sugar); sputum cytology (e.g., examining coughed-up phlegm from the lungs to check for abnormal or cancerous cells); bronchoscopy (e.g., viewing of the lungs through a hollow, flexible tube (bronchoscope) that is passed through the nose and throat into the main airway of the lungs); and biopsy (e.g., removing lung tissue for examination under a microscope).

Genetics plays a role in the development of lung cancer. For example, some subjects may be more susceptible to developing lung cancer despite common risk factors such as smoking. Others may be susceptible to developing aggressive forms of lung cancer more likely to metastasize or invade surrounding tissues, thus making the disease more difficult to treat.

Prostate Cancer Prostate cancer is the rapid proliferation of abnormal cells in the prostate gland. While normal prostate cells reproduce and develop into healthy prostate tissue, these abnormal cells proliferate rapidly and rarely form normal prostate tissue. Instead, the abnormal cells proliferate, form tumors, disrupt the prostate, and spread to surrounding tissues.

As used herein, the term "prostate cancer" refers to a condition characterized by anomalous rapid proliferation of abnormal cells in the prostate gland of a subject. In prostate cancer, neoplastic cells may be identified in the prostate only and not in another tissue or organ, in the prostate and one or more adjacent tissues or organs (e.g. spine, lungs, liver or brain), or in a lung and one or more non-adjacent tissues or organs to which the prostate cancer cells have metastasized. Prostate cancer cells often invade spine cells (e.g., vertebral column) and/or metastasize to the lungs, liver, and/or brain and spread cancer in these tissues and organs. The prostate is about the size of a walnut and can be divided into two parts referred to as the right or left lobes. It lies just below the urinary bladder and surrounds the upper part of the urethra. The urethra is the tube that carries urine from the bladder and semen from the sex glands out through the penis. As one of a man's sex glands, the prostate is affected by male sex hormones (most notably testosterone). These hormones stimulate the activity of the prostate and the replacement of prostate cells as they wear out.

The prostate gland surrounds the neck of the bladder and urethra, and most prostate cancers initially occur in the peripheral zone of the prostate gland, away from the urethra. Tumors within this zone may not produce any symptoms and, as a result, most men with early-stage prostate cancer will not present clinical symptoms of the disease until significant progression has occurred. Tumor progression into the transition zone of the prostate may lead to urethral obstruction, thus producing the first symptoms of the disease. However, these clinical symptoms are indistinguishable from the common non-malignant condition of benign prostatic hyperplasia (BPH).

Early detection of the disease has proven to be critical for survival among prostate cancer patients. For example, there is 100% five-year survival rates for prostate cancer patients diagnosed at the local and regional stage; wherease, the five-year survival rate for prostate cancer patients in which the cancer has metastasized is only 34%. Early detection and diagnosis of prostate cancer currently relies on digital rectal examinations (DRE), prostate specific antigen (PSA) measurements, transrectal ultrasonography (TRUS), and transrectal needle biopsy (TRNB). At present, serum PSA measurement in combination with DRE are the most common tools used to detect and diagnose prostate cancer. Both have major limitations which have fueled intensive research into finding better diagnostic markers for prostate cancer. The most common method of staging prostate cancer is by using a system called the TNM system that stands for Tumor, Node, Metastases. Tables 3A, 3B and 3C below describe the characteristics of each stage and the available treatment options:

TABLE 3A: size of the primary tumor

TABLE 3B: extent of lymph node involvement

TABLE 3C: presence or absence of metastases

In addition, the aggressiveness of prostate cancer may be measured using the Gleason scale (2- 10), e.g., 2 = normal looking tumor, 10 = very abnormal looking tumor. Gleason grading system involves assigning numbers (called a Gleason grade) to cancerous prostate tissue, ranging from 1 through 5, based on how much the arrangement of the cancer cells mimics the way normal prostate cells form glands. Two grades are assigned to the most common patterns of cells that appear; these two grades (they can be the same or different) are then added together to determine the Gleason score (a number from 1 to 10).

A high Gleason score indicates that the prostate cancer is aggressive and likely to metastasize to surrounding organs or tissues. Prostate cancer most commonly spreads to the surrounding bones, including the pelvis, hips, pubic bone and spine. In 90% of prostate cancer metastasis, the cancer spreads to the spine, and often involves vertebral column. In 50% of prostate cancer metastasis, the cancer spreads to the either one or both of the lungs, while in 25% of prostate cancer metastasis, the cancer spreads to the liver. In rare cases, prostate cancer may spread to the brain, with a poor prognosis (average survival 7.6 months).

There is no available marker that can predict the emergence of the typically fatal metastatic stage of prostate cancer. Diagnosis of metastatic stage is presently achieved by open surgical or laparoscopic pelvic lymphadenectomy, whole body radionuclide scans, skeletal radiography, and/or bone lesion biopsy analysis. Clearly, identification of susceptibility genes and other less invasive diagnostic methods offer the promise of easing the difficulty those procedures place on a patient, as well as improving diagnostic accuracy and opening therapeutic options. A similar problem is the lack of effective prognostic markers for determining which cancers are indolent and which ones are or will be aggressive. PSA, for example, fails to discriminate accurately between indolent and aggressive cancers. Until there are prostate cancer markers capable of predicting susceptibility to development of the disease, reliably identifying early-stage disease, and predicting susceptibility to metastasis, the management of prostate cancer will continue to be extremely difficult. Inclusion or exclusion of samples for a prostate cancer pool to be used in a genetic study may be based upon the following criteria: relevant phenotype information for the individual (e.g., case samples derived from individuals diagnosed with prostate cancer); or type of prostate cancer diagnosed. Control samples may be selected based on relevant phenotype information for the individual (e.g., derived from individuals free of any cancer); and no family history of cancer. Additional phenotype information collected for both cases and controls may include age of the individual, gender, date of primary diagnosis, age of individual as of primary diagnosis, age of individual when sample collected, or method of treatment, height, weight, disease status, such as heart disease, hypertension, vascular disease, CNS disease, gastrointestinal disease, urogenital disease, asthma, other cancers, diabetes, and also smoking status. In an embodiment, the same phenotypic information is collected for the parents of cases and controls, making additional phenotypic analysis possible.

Based in part upon selection criteria set forth above, individuals suffering from prostate cancer can be selected for genetic studies. Also, individuals having no history of cancer, particularly prostate cancer, often are selected for genetic studies as controls. Methods for Treating Prostate Cancer

Despite the prevalence of prostate cancer among men in North America and Europe, there is still no effective or specific treatment for metastatic prostate cancer. Surgical prostatectomy, radiation therapy, hormone ablation therapy, and chemotherapy continue to be the main treatment modalities. Surgery is often effective for early stage or non-aggressive prostate cancer. Radiation can be used for early stage prostate cancer, and in advanced prostate cancer. Hormonal therapy may be used to remove androgens (LHRH agonists, anti-androgens); however, all prostate cancer becomes resistant to hormonal therapy eventually. Chemotherapy may be effective in advanced cases non-responsive to hormonal therapy. Chemotherapy agents used to treat prostate cancer include mitoxantrane plus corticosteroids; and estramustane plus taxanes. Thus, provided are methods for identifying a predisposition to prostate cancer in an individual as described herein and, if a genetic predisposition is identified, treating that individual to delay or reduce or prevent the development of prostate cancer. Treatment of prostate cancer includes prostatectomy, radiation therapy, chemotherapy, cryotherapy and hormonal therapy.

Prostatectomy is the surgical removal of the prostate. A radical prostatectomy can be performed to remove the cancer from the prostate and from nearby areas where the cancer has spread. This type of surgery may help prevent further spread of the cancer. Prostatectomy may be used alone or in combination with hormonal therapy for the treatment of prostate cancer. It is most often used during early stages (Stages Tl and T2), when prostate cancer is located only within the prostate.

Radiation therapy uses high-energy rays to kill prostate cancer cells, shrink tumors, or prevent cancer cells from dividing and spreading. It is nearly impossible to direct these rays only at the cancer cells. As a result, they may damage both cancer cells and healthy cells nearby. Radiation doses are usually small and spread out over time. This allows the healthy cells to recover and survive, while the cancer cells eventually die. Radiation therapy may be used when prostate cancer has not spread beyond the prostate (Stages T1-T2). Like prostatectomy, radiation therapy works best when the cancer is located in a small area, and it can help prevent the cancer from spreading further. In early stages of prostate cancer, radiation therapy may cure the disease.

Chemotherapy is the use of powerful and toxic drugs to attack cancer cells. The drugs circulate throughout the body in the bloodstream and may kill any rapidly growing cells, including healthy ones. Chemotherapy is generally reserved for patients with advanced stage prostate cancer (Stage M+) that no longer responds to hormonal therapy. Cryotherapy sometimes is referred to as cryosurgery, and it is a procedure where the tumor is frozen, allowed to thaw, and then frozen again. It may be used to treat early stages of prostate cancer in which the tumor has not spread outside the prostate.

The primary strategy of hormonal therapy is to decrease the production of testosterone by the testes or block the actions that testosterone has on the prostate cells. Hormonal therapy cannot cure prostate cancer. Instead, it slows the cancer's growth and reduces the size of the tumor(s). Drugs that prevent the production or block the action of testosterone and other male hormones, called androgens. Two classes of drugs most commonly used as hormonal therapy in prostate cancer include: LHRH analog(s) (luteinizing hormone-releasing hormone analogs) or medical castration — class of drugs which prevent testosterone production by the testes; and Nonsteroidal antiandrogens (also called antiandrogens) — class of drugs which block the action of testosterone at the prostate.

Identifying Nucleic Acid Target Gene Regions

Selecting nucleic acid target gene regions of interest that harbor potential methylated sites may be based on a variety of characteristics known or available to those skilled in the art regarding the target gene of interest. Selection criteria may include for example the gene's physiological role or function in a biological pathway related to the disease/phenotype of interest, existence of mutations effecting disease/phenotype or sequence polymorphisms conferring predisposition to disease/phenotype of interest. Selection may also be based on known expression status or sequence motifs binding specific proteins relevant to methylation of gene regions/chromosomal regions. One skilled in the art would recognize that a considerable amount of information may be obtained through publication of data and experiments that may provide key indications that the methylation state of a particular gene may be of importance for future prognostic or diagnostic purposes that are the subject of the present invention.

Any type of disease condition that can be correlated with changes in the methylation state of a sample organism, tissue or cell can be analyzed with the methods of the present invention, some of these disease conditions include for example, cancer, cardiovascular disease (CVD), central nervous system disease (CNS), metabolic disease, inflammation, aging, morbidity, osteoarthritis, infection and drug response. Of particular interest are hematologic cancers, and include for example, acute myeloid leukemia and chronic myeloid leukemia.

Any nucleic acid, nucleic acid target gene region or gene may be have a potentially significant characteristic methylation state for diagnostic purposes. Consequently, any nucleic acid of interest may be analyzed using the method described herein, some examples of particular genes of interest include, ACSL6, ATPlOA, BCL6, BCR, CA3, CCNDl, CCND2, CD38, CDKNlC, CHGA, COL5A1, EGFR, ESRl, FLIl, FLJ32447, FLT3, FLT4, FRATl, GABRB3, GAS7, GNAS, GPC3, HEAB, HIST1H4I, HOXAl, HOXAlO, H0XA2, H0XA6, H0XA7, H0XA9, HOXC 13, ILlORA, IRF4, KIT, LCK, LEP, LM02, LYLl, MLLT6, MYHl 1, MYODl, NFKB2, OLIG2, PAX5, PAX7, PAX8, PEG3, PHOX2B, PLODl, PONl, PSIPl, PTPRN2, REL, RET, RUNXlTl, SBDS, SET, SLC22A3, SLC38A4, SNRPN, TALI, TCLlA, TLXl, TLX3, TMPRSS2, TSPYL5, WDR66, WTl, ZIM2, ZNF198 and ZNF331. Each gene may have particular regions of interest selected by a variety of methods including for example the presence of CpG islands. In a related embodiment, the gene of interest may comprise one or more PRC2 binding sites, wherein the target gene is selected from one or more of the following targets: CA3, CD38, EGFR, ESRl, FLIl, FLJ32447, FLT3, FRATl, GAS7, GNAS, GPC3, HIST1H4I, HOXAl, HOXAlO, H0XA2, H0XA6, H0XA7, H0XA9, HOXC 13, ILlORA, IRF4, MYODl, OLIG2, PAX5, PAX7, PAX8, PHOX2B, PTPRN2, SLC22A3, TALI, TLXl, TLX3 and WTl.

Isolated Nucleic Acids

Featured herein are isolated nucleic acid target gene regions, which include the nucleic acid having the nucleotide sequence for the targets provided in Table 10, nucleic acid variants, and substantially identical nucleic acids of the foregoing. In Table 10, the amplicon sequence is provided, and Table 11 provides the location of the CpG units that were analyzed. Throughout the document and in the Figures and Tables, the target gene regions may be referenced according to their amplicon ID. The amplicon ID follows the general schema: databaseID GeneName AmpliconID. Also throughout the document and in the Figures and Tables, CpG sites are referenced according to their CpG ID. The CpG ID's refer to the specific CpG location within the particular genomic region. For example, each CpG ID follows the general schema: databaseID_GeneName_ AmpliconID CPG CPGposition in the amplicon. "GeneName" is the refseq gene name of the analysed promoter region, or in the case of intragenic regions, the nearest gene is identified. "AmpliconID" is the particular amplicon analyzed within the gene or region, especially relevant if multiple amplicons were analyzed for this gene. "CPG" is a constant text string. "CPG position in the amplicon" indicates which CpG Sites are enclosed in the measured CpG Unit. The numbers given refer to the CpG sites as counted from the 5' end of the analyzed amplicon sequence. Also, the amplification primers are provided in Table 10, wherein the left and right primers for each amplicon can be determined according to the numbers provided. For example, for ABLl, the left primer corresponds to the first 24 nucleotides of the amplicon sequence provided, and the right primer corresponds to the last 26 nucleotides of the amplicon sequence provided. However, single sequence amplification is occuring, therefore, the right primer sequence is complementary to the amplicon sequence provided and the left primer sequence is the actual amplicon sequence (and binds to the extended sequence generated from the right primer).

As used herein, the term "nucleic acid" includes DNA molecules (e.g., a complementary DNA (cDNA) and genomic DNA (gDNA)) and RNA molecules (e.g., mRNA, rRNA, and tRNA) and analogs of DNA or RNA, for example, by use of nucleotide analogs. The nucleic acid molecule can be single- stranded and it is often double-stranded. The term "isolated or purified nucleic acid" refers to nucleic acids that are separated from other nucleic acids present in the natural source of the nucleic acid. For example, with regard to genomic DNA, the term "isolated" includes nucleic acids which are separated from the chromosome with which the genomic DNA is naturally associated. An "isolated" nucleic acid is often free of sequences which naturally flank the nucleic acid (i.e., sequences located at the 5' and/or 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated nucleic acid molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of 5' and/or 3' nucleotide sequences which flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived. Moreover, an "isolated" nucleic acid molecule, such as a cDNA molecule, can be substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Also included herein are nucleic acid fragments. These fragments typically are a nucleotide sequence identical to a nucleotide sequence in Table 10, a nucleotide sequence substantially identical to a nucleotide sequence in Table 10, a sequence (genomic or cDNA) corresponding to the targets genes disclosed in Table 9, or a nucleotide sequence that is complementary to the foregoing. The nucleic acid fragment may be identical, substantially identical or homologous to a nucleotide sequence corresponding to the genomic sequence of any target gene or target gene region of the invention, and may encode a domain or part of a domain or motif of a target gene polypeptide. Sometimes, the fragment will comprises the polymorphic variation described herein as being associated with cancer. The nucleic acid fragment sometimes is 50, 100, or 200 or fewer base pairs in length, and is sometimes about 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3800, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000, 150000 or 160000 base pairs in length. A nucleic acid fragment complementary to a nucleotide sequence identical or substantially identical to the nucleotide sequence of Table 10 and hybridizes to such a nucleotide sequence under stringent conditions often is referred to as a "probe." Nucleic acid fragments often include one or more polymorphic sites, or sometimes have an end that is adjacent to a polymorphic site as described hereafter.

An example of a nucleic acid fragment is an oligonucleotide. As used herein, the term "oligonucleotide" refers to a nucleic acid comprising about 8 to about 50 covalently linked nucleotides, often comprising from about 8 to about 35 nucleotides, and more often from about 10 to about 25 nucleotides. The backbone and nucleotides within an oligonucleotide may be the same as those of naturally occurring nucleic acids, or analogs or derivatives of naturally occurring nucleic acids, provided that oligonucleotides having such analogs or derivatives retain the ability to hybridize specifically to a nucleic acid comprising a targeted polymorphism. Oligonucleotides described herein may be used as hybridization probes or as components of diagnostic assays, for example, as described herein.

Oligonucleotides are typically synthesized using standard methods and equipment, such as the ABI 3900 High Throughput DNA Synthesizer and the EXPEDITE™ 8909 Nucleic Acid Synthesizer, both of which are available from Applied Biosystems (Foster City, CA). Analogs and derivatives are exemplified in U.S. Pat. Nos. 4,469,863; 5,536,821; 5,541,306; 5,637,683; 5,637,684; 5,700,922; 5,717,083; 5,719,262; 5,739,308; 5,773,601; 5,886,165; 5,929,226; 5,977,296; 6,140,482; WO 00/56746; WO 01/14398, and related publications. Methods for synthesizing oligonucleotides comprising such analogs or derivatives are disclosed, for example, in the patent publications cited above and in U.S. Pat. Nos. 5,614,622; 5,739,314; 5,955,599; 5,962,674; 6,117,992; in WO 00/75372; and in related publications.

Oligonucleotides also may be linked to a second moiety. The second moiety may be an additional nucleotide sequence such as a tail sequence (e.g., a polyadenosine tail), an adapter sequence (e.g., phage M 13 universal tail sequence), and others. Alternatively, the second moiety may be a non- nucleotide moiety such as a moiety which facilitates linkage to a solid support or a label to facilitate detection of the oligonucleotide. Such labels include, without limitation, a radioactive label, a fluorescent label, a mass label, a chemiluminescent label, a paramagnetic label, and the like. The second moiety may be attached to any position of the oligonucleotide, provided the oligonucleotide can hybridize to the nucleic acid comprising the polymorphism. As used herein, "polymorphism" may refer to a CpG site that undergoes a methylation change.

Uses for Nucleic Acid Sequences

Nucleic acid coding sequences described herein may be used for diagnostic purposes. Also, included herein are oligonucleotide sequences such as antisense RNA, small-interfering RNA (siRNA), and DNA molecules and ribozymes that function to inhibit translation of a polypeptide. Antisense techniques and RNA interference techniques are known in the art and are described herein.

Ribozymes are enzymatic RNA molecules capable of catalyzing the specific cleavage of RNA. The mechanism of ribozyme action involves sequence specific hybridization of the ribozyme molecule to complementary target RNA, followed by a endonucleolytic cleavage. Ribozymes may be engineered hammerhead motif ribozyme molecules that specifically and efficiently catalyze endonucleolytic cleavage of RNA sequences corresponding to or complementary to the nucleotide sequences set forth in Table 10. Specific ribozyme cleavage sites within any potential RNA target are initially identified by scanning the target molecule for ribozyme cleavage sites which include the following sequences, GUA, GUU and GUC. Once identified, short RNA sequences of between fifteen (15) and twenty (20) ribonucleotides corresponding to the region of the target gene containing the cleavage site may be evaluated for predicted structural features such as secondary structure that may render the oligonucleotide sequence unsuitable. The suitability of candidate targets may also be evaluated by testing their accessibility to hybridization with complementary oligonucleotides, using ribonuclease protection assays. Antisense RNA and DNA molecules, siRNA and ribozymes may be prepared by any method known in the art for the synthesis of RNA molecules. These include techniques for chemically synthesizing oligodeoxyribonucleotides well known in the art such as solid phase phosphoramidite chemical synthesis. Alternatively, RNA molecules may be generated by in vitro and in vivo transcription of DNA sequences encoding the antisense RNA molecule. Such DNA sequences may be incorporated into a wide variety of vectors which incorporate suitable RNA polymerase promoters such as the T7 or SP6 polymerase promoters. Alternatively, antisense cDNA constructs that synthesize antisense RNA constitutively or inducibly, depending on the promoter used, can be introduced stably into cell lines.

PNA nucleic acids can be used in prognostic, diagnostic, and therapeutic applications. As used herein, the terms "peptide nucleic acid" or "PNA" refers to a nucleic acid mimic such as a DNA mimic, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone and only the four natural nucleobases are retained. The neutral backbone of a PNA can allow for specific hybridization to DNA and RNA under conditions of low ionic strength. Synthesis of PNA oligomers can be performed using standard solid phase peptide synthesis protocols as described, for example, in Hyrup et al., (1996) supra and Perry-O'Keefe et al., Proc. Natl. Acad. Sci. 93: 14670-675 (1996). For example, PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene expression by, for example, inducing transcription or translation arrest or inhibiting replication. PNA nucleic acid molecules can also be used in the analysis of single base pair mutations in a gene, (e.g., by PNA-directed PCR clamping); as "artificial restriction enzymes" when used in combination with other enzymes.

DNA encoding a polypeptide also may have a number of uses for the diagnosis of diseases, including cancer, resulting from aberrant expression of a target gene described herein. For example, the nucleic acid sequence may be used in hybridization assays of biopsies or autopsies to diagnose abnormalities of expression or function (e.g., Southern or Northern blot analysis, in situ hybridization assays).

Substantially Identical Nucleic Acids and Polypeptides

Nucleotide sequences and polypeptide sequences that are substantially identical to a target nucleotide sequence and the target polypeptide sequences encoded by those nucleotide sequences are included herein. The term "substantially identical" as used herein refers to two or more nucleic acids or polypeptides sharing one or more identical nucleotide sequences or polypeptide sequences, respectively. Included are nucleotide sequences or polypeptide sequences that are 55% or more, 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more (each often within a 1%, 2%, 3% or 4% variability) or more identical to the nucleotide sequences in Table 10 or the encoded target polypeptide amino acid sequences. One test for determining whether two nucleic acids are substantially identical is to determine the percent of identical nucleotide sequences or polypeptide sequences shared between the nucleic acids or polypeptides.

Calculations of sequence identity are often performed as described below. Sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is sometimes 30% or more, 40% or more, 50% or more, often 60% or more, and more often 70% or more, 80% or more, 90% or more, 90% or more, or 100% of the length of the reference sequence. The nucleotides or amino acids at corresponding nucleotide or polypeptide positions, respectively, are then compared among the two sequences. When a position in the first sequence is occupied by the same nucleotide or amino acid as the corresponding position in the second sequence, the nucleotides or amino acids are deemed to be identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, introduced for optimal alignment of the two sequences. Comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. Percent identity between two amino acid or nucleotide sequences can be determined using the algorithm of Meyers & Miller, CABIOS 4: 11-17 (1989), which has been incorporated into the ALIGN program (version 2.0), using a PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4. Also, percent identity between two amino acid sequences can be determined using the Needleman & Wunsch, J. MoI. Biol. 48: 444-453 (1970) algorithm which has been incorporated into the GAP program in the GCG software package (available at the World Wide Web URL gcg.com), using either a Blossum 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. Percent identity between two nucleotide sequences can be determined using the GAP program in the GCG software package (available at the World Wide Web URL gcg.com), using a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. A set of parameters often used is a Blossum 62 scoring matrix with a gap open penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5. Another manner for determining if two nucleic acids are substantially identical is to assess whether a polynucleotide homologous to one nucleic acid will hybridize to the other nucleic acid under stringent conditions. As use herein, the term "stringent conditions" refers to conditions for hybridization and washing. Stringent conditions are known to those skilled in the art and can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. , 6.3.1-6.3.6 (1989). Aqueous and non- aqueous methods are described in that reference and either can be used. An example of stringent hybridization conditions is hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 50⁰C. Another example of stringent hybridization conditions are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45DC, followed by one or more washes in 0.2X SSC, 0.1% SDS at 55°C. A further example of stringent hybridization conditions is hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 60⁰C. Often, stringent hybridization conditions are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 65°C. More often, stringency conditions are 0.5M sodium phosphate, 7% SDS at 65 D C, followed by one or more washes at 0.2X SSC, 1% SDS at 65°C.

An example of a substantially identical nucleotide sequence to a target nucleotide sequence is one that has a different nucleotide sequence but still encodes the same polypeptide sequence encoded by the target nucleotide sequence. Another example is a nucleotide sequence that encodes a polypeptide having a polypeptide sequence that is more than 70% or more identical to, sometimes 75% or more, 80% or more, or 85% or more identical to, and often 90% or more and 95% or more identical to a polypeptide sequence encoded by a target nucleotide sequence.

Target nucleotide sequences and target amino acid sequences can be used as "query sequences" to perform a search against public databases to identify other family members or related sequences, for example. Such searches can be performed using the NBLAST and XBLAST programs (version 2.0) of Altschul et al., J. MoI. Biol. 215: 403-10 (1990). BLAST nucleotide searches can be performed with the NBLAST program, score = 100, wordlength = 12 to obtain nucleotide sequences homologous to nucleotide sequences from SEQ ID NO: 1. BLAST polypeptide searches can be performed with the XBLAST program, score = 50, wordlength = 3 to obtain amino acid sequences homologous to polypeptides encoded by a target nucleotide sequence. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., Nucleic Acids Res. 25(17): 3389-3402 (1997). When utilizing BLAST and Gapped BLAST programs, default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used (see the World Wide Web URL ncbi.nlm.nih.gov).

Substantially identical nucleotide and polypeptide sequences include those that are naturally occurring, such as allelic variants (same locus), splice variants, homologs (different locus), and orthologs (different organism) or can be non-naturally occurring. Non-naturally occurring variants can be generated by mutagenesis techniques, including those applied to polynucleotides, cells, or organisms. The variants can contain nucleotide substitutions, deletions, inversions and insertions. Variation can occur in either or both the coding and non-coding regions. The variations can produce both conservative and non- conservative amino acid substitutions (as compared in the encoded product). Orthologs, homologs, allelic variants, and splice variants can be identified using methods known in the art. These variants normally comprise a nucleotide sequence encoding a polypeptide that is 50% or more, about 55% or more, often about 70-75% or more, more often about 80-85% or more, and typically about 90-95% or more identical to the amino acid sequences of target polypeptides or a fragment thereof. Such nucleic acid molecules readily can be identified as being able to hybridize under stringent conditions to a target nucleotide sequence or a fragment thereof. Nucleic acid molecules corresponding to orthologs, homologs, and allelic variants of a target nucleotide sequence.

Also, substantially identical nucleotide sequences may include codons that are altered with respect to the naturally occurring sequence for enhancing expression of a target polypeptide in a particular expression system. For example, the nucleic acid can be one in which one or more codons are altered, and often 10% or more or 20% or more of the codons are altered for optimized expression in bacteria (e.g., E. coli.), yeast (e.g., S. cervesiae), human (e.g., 293 cells), insect, or rodent (e.g., hamster) cells.

Samples

The methods described herein can be applied to samples that contain nucleic acids, preferably a nucleic acid target gene region of interest, from any of a variety of sources, for any of a variety of purposes. Typically the methods used herein are used to determine information regarding a subject, or to determine a relationship between nucleic acid methylation and disease. The samples used in the methods described herein will be selected according to the purpose of the method to be applied. For example, samples can contain nucleic acid from a plurality of different organisms when a phenotype of the organisms is to be correlated with the presence or absence of a methylated nucleic acid molecule or nucleotide locus. In another example, samples can contain nucleic acid from one individual, where the sample is examined to determine the disease state or tendency toward disease of the individual. One skilled in the art can use the methods described herein to determine the desired sample to be examined.

A sample may be from any subject, including for example, animal, plant, bacterium, fungus, virus or parasite. Animal may include for example mammals, birds, reptiles, amphibians or fish. Preferably subject mammals are humans. A sample from a subject can be in any form that provides a desired nucleic acid to be analyzed, including a solid material such as a tissue, cells, a cell pellet, a cell extract, feces, or a biopsy, or a biological fluid such as urine, whole blood, serum, plasma, interstitial fluid, peritoneal fluid, lymph fluids, ascites, sweat, saliva, follicular fluid, breast milk, non-milk breast secretions, cerebral spinal fluid, seminal fluid, lung sputum, amniotic fluid, exudate from a region of infection or inflammation, a mouth wash containing buccal cells, synovial fluid, or any other fluid sample produced by the subject. In addition, the sample can be collected tissues, including bone marrow, epithelium, stomach, prostate, kidney, bladder, breast, colon, lung, pancreas, endometrium, neuron, and muscle. Samples can include tissues, organs, and pathological samples such as a formalin-fixed sample embedded in paraffin.

As one of skill in the art will recognize, some samples may be used directly in the methods provided herein. For example, samples can be examined using the methods described herein without any purification or manipulation steps to increase the purity of desired cells or nucleic acid molecules. If desired, a sample may be prepared using known techniques, such as that described by Maniatis, et al. (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, N.Y., pp. 280-281 (1982)). For example, samples examined using the methods described herein can be treated in one or more purification steps in order to increase the purity of the desired cells or nucleic acid in the sample. If desired, solid materials may be mixed with a fluid. Methods for isolating nucleic acid in a sample from essentially any organism or tissue or organ in the body, as well as from cultured cells are well known. For example, the sample can be treated to homogenize an organ, tissue or cell sample, and the cells may be lysed using known lysis buffers, sonication, electroporation and combinations thereof. Further purification can be performed as needed, as will be appreciated by those skilled in the art. In addition, sample preparation may include a variety of reagents, which can be included in subsequent steps. These include reagents such as salts, buffers, neutral proteins (e.g., albumin), detergents, and such reagents, which can be used to facilitate optimal hybridization or enzymatic reactions, and/or reduce non-specific or background interactions. Also, reagents that otherwise improve the efficiency of the assay, such as, for example, protease inhibitors, nuclease inhibitors and anti-microbial agents, can be used, depending on the sample preparation methods and purity of the nucleic acid target gene molecule.

Nucleic Acid Target Gene Molecule

The methods provided herein are used to determine methylation states, including whether a nucleic acid target gene molecule contains a methylated or unmethylated nucleotide and determination of methylation ratios (methylated versus unmethylated) for one or more methylation sites or groups of methylation sites. Thus, nucleic acid target gene molecules used in the methods provided herein include any nucleic acid molecule. One or more methods provided herein may be practiced to provide information regarding methylated nucleotides in the nucleic acid target gene molecule.

The methods provided herein permit any nucleic acid-containing sample or specimen, in purified or non-purified form, to be used. Thus, the process may employ for example, DNA or RNA, including messenger RNA, wherein DNA or RNA can be single stranded or double stranded.

The specific nucleic acid sequence to be examined, (i.e., the nucleic acid target gene molecule), may be a fraction of a larger molecule or may be present initially as a discrete molecule, so that the specific nucleic acid target gene molecule constitutes the entire nucleic acid component of a sample, It is not necessary that the nucleic acid target gene molecule to be examined be present initially in a pure form; it may be a minor fraction of a complex mixture, such as contained in whole organism DNA. The nucleic acid target gene molecule for which methylation status is to be determined may be an isolated molecule or part of a mixture of nucleic acid molecules.

The nucleic acid target gene molecule to be analyzed may include one or more protein-encoding regions of genomic DNA or a portion thereof. The nucleic acid target gene molecule can contain one or more gene promoter regions, one or more CpG islands, one or more sequences related to chromatin structure, or other regions of cellular nucleic acid. The nucleic acid target gene molecule can be methylated or unmethylated at individual nucleotides, such as cytosines; at small groups of nucleotides, such as cytosine-rich sequences, or at one or more CpG islands.

The length of the nucleic acid target gene molecule that may be used in the current methods may vary according to the sequence of the nucleic acid target gene molecule, the particular methods used for methylation identification, and the particular methylation state identification desired, but will typically be limited to a length at which fragmentation and detection methods disclosed herein can be used to identify the methylation state of one or more nucleotide loci of the nucleic acid target gene molecule.

In one embodiment, the nucleic acid target gene molecule is of a length in which the methylation state of two or more nucleotide loci can be identified. For example, a nucleic acid target gene molecule may be at least about 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 225, 250, 275, 300, 350, 400, 450, 500, 550, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000, 2500 or 3000 bases in length. Typically, a nucleic acid target gene molecule will be no longer than about 10,000, 5000, 4000, 3000, 2500, 2000, 1500, 1000, 900, 800, 700, 600, 500, 450, 400, 350, 280, 260, 240, 220, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110 or 100 bases in length.

A nucleic acid target gene molecule examined using the methods disclosed herein may contain one or more methylated nucleotides, but is not required to contain any methylated nucleotides. The methods disclosed herein may be used to identity whether or not a nucleic acid target gene molecule contains methylated or unmethylated nucleotides, to identity the nucleotide locus of a methylated or unmethylated nucleotide in the nucleic acid target gene molecule and to determine the ratio of methylated versus unmethylated nucleotides at one or more methylation sites. A nucleotide that has been identified as methylated in genomic DNA is cytosine. Methylated cytosines can be present in any of a variety of regions of genomic DNA. The methods provided herein may be used to determine the methylation state of a cytosine in any of a variety of genomic DNA regions. For example, methylcytosine is commonly found in cytosine-guanine dinucleotides termed "CpG" dinucleotides. In one embodiment, the methylation state of a cytosine nucleotide in one or more CpG dinucleotides in the nucleic acid target gene molecule is identified. Such dinucleotides are enriched in some regions of the genome, where these enriched regions are termed CpG islands. CpG islands may be found near promoter regions for some genes, including promoter regions for tumor suppressor genes, oncogenes, developmental regulatory genes, and housekeeping genes. Thus, the methods disclosed herein can be used to identify whether a cytosine in a CpG dinucleotide in a nucleic acid target gene molecule is methylated where the CpG nucleotide is located in a gene promoter region, such as a tumor suppressor gene, oncogene, developmental regulatory gene, or housekeeping gene promoter region. The methods disclosed herein also may be used to identify whether a one or more cytosines in a CpG island in a nucleic acid target gene molecule are methylated.

The methods provided herein may be used to identify the methylation of a plurality of nucleotide loci. Accordingly, methylation of one or more, up to all, nucleotide loci of a large nucleic acid target gene region may be identified using the methods provided herein. For example, the methylation state of a plurality of nucleotide loci, up to all nucleotide loci of an entire CpG island may be identified using the methods provided herein.

Nucleic acid molecules can contain nucleotides with modifications, such as methylation, that do not change the nucleotide sequence of the nucleic acid molecule. Amplification of a nucleic acid molecule containing such a modified nucleotide can result in an amplified product complementary to the unmodified nucleotide, resulting in the amplified product not containing the information regarding the nucleotide modification. For example, the amplified product of a nucleic acid molecule containing a methylated cytosine will result in an amplified product containing either an unmodified guanine (for the complementary strand) or an unmodified cytosine at the location of the methylated cytosine. Reagents are known that can modify the nucleotide sequence of a nucleic acid target gene molecule according to the presence or absence of modifications in one or more nucleotides, where the modification itself does not change the nucleotide sequence. For example, bisulfite may be used in a process to convert unmethylated cytosine into uracil, thus resulting in a modification of the nucleotide sequence of a nucleic acid target gene molecule according to the presence of unmethylated cytosines in the nucleic acid target gene molecule. In performing the methods disclosed herein, the nucleic acid target gene molecule is treated with a reagent that can modify the nucleic acid target gene molecule as a function of its methylation state. The treated nucleic acid target gene molecule can have a resulting sequence that reflects the methylation state of the untreated nucleic acid target gene molecule. In one embodiment, the reagent can be used to modify an unmethylated selected nucleotide to produce a different nucleotide. For example, the reagent may be used to modify unmethylated cytosine to produce uracil.

Reagents for Sequence Modification

A method for determining the methylation state of a nucleic acid molecule or nucleotide locus includes contacting a nucleic acid target gene molecule-containing sample with a reagent that can modify the nucleic acid target gene molecule nucleotide sequence as a function of its methylation state. A variety of reagents for modifying the nucleotide sequence of nucleic acid molecules are known in the art and can be used in conjunction with the methods provided herein. For example, a nucleic acid target gene molecule can be contacted with a reagent that modifies unmethylated bases but not methylated bases, such as unmethylated cytosines but not methylated cytosines, in such a manner that the nucleotide sequence of the nucleic acid target gene molecule is modified at the location of an unmethylated base but not at the location of the methylated base, such as at the location of an unmethylated cytosine but not at the location of a methylated cytosine. An exemplary reagent that modifies unmethylated bases but not methylated bases is sodium bisulfite, which modifies unmethylated cytosines but not methylated cytosines. Methods for modifying a nucleic acid target gene molecule in a manner that reflects the methylation pattern of the nucleic acid target gene molecule are known in the art, as exemplified in U.S. Pat. No. 5,786,146 and U.S. patent publications 20030180779 and 20030082600.

In one embodiment, the reagent can be used to modify unmethylated cytosine to uracil. An exemplary reagent used for modifying unmethylated cytosine to uracil is sodium bisulfite. Sodium bisulfite (NaHSO,) reacts with the 5,6-double bond of cytosine to form a sulfonated cytosine reaction intermediate which is susceptible to deamination, giving rise to a sulfonated uracil. The sulfonate group of the sulfonated uracil can be removed under alkaline conditions, resulting in the formation of uracil. Uracil is recognized as a thymine by DNA polymerase enzymes such as Taq polymerase, and, therefore, upon amplification of the nucleic acid target gene molecule using methods such as PCR, the resultant amplified nucleic acid target gene molecule contains thymine at positions where unmethylated cytosine occurs in the starting template nucleic acid target gene molecule, and the complementary strand contains adenine at positions complementary to positions where unmethylated cytosine occurs in the starting nucleic acid target gene molecule. Further, amplification methods such as PCR can yield an amplified nucleic acid target gene molecule containing cytosine where the starting nucleic acid target gene molecule contains 5-methylcytosine, and the complementary strand maintains guanine at positions complementary to positions where methylated cytosine occurs in the starting nucleic acid target gene molecule. Thus, in amplification methods such as PCR, cytosine in the amplified product can mark the location of 5- methylcytosine, and thymine in the amplified product can mark the location of unmethylated cytosine. Similarly, in the amplified product strands complementary to the treated nucleic acid target gene molecule, guanine can mark the location of 5-methylcytosine and adenine can mark the location of unmethylated cytosine. Exemplary methods for bisulfite treatment of target DNA can include contacting denatured DNA with a bisulfite solution that also may contain urea and hydroquinone, and incubating the mix for 30 seconds at 95°C and 15 minutes at 55°C, for 20 cycles. In one alternative method, the bisulfite treatment may be performed in agarose, and precipitation steps may be replaced with dialysis steps (U.S. Pat. No. 6,214,556 and Olek et al, Nucl. Acids Res. 24:5064-66 (1996)). Variations of bisulfite treatment of a nucleic acid target gene molecule are known in the art as exemplified in U.S. Pats. Nos. 5,786,146 and 6,214,556, U.S. patent publication 20030082600, Tost et al, Nucl. Acids Res. 37:e50 (2003), Olek et al, Nucl. Acids Res. 24:5064-66 (1996), and Grunau et al, Nucl. Acids Res. 29:e65 (2001).

In the methods provided herein, a methylation-specific reagent-treated nucleic acid target gene molecule can have a different nucleotide sequence compared to the nucleotide sequence of the nucleic acid target gene molecule prior to treatment. Since the methylation-specific reagent modifies the nucleotide sequence of a nucleic acid target gene molecule as a function of the methylation state of the nucleic acid target gene molecule, the treated nucleic acid target gene molecule will have a nucleotide sequence related to the nucleotide sequence of the untreated nucleic acid target gene molecule, which reflects the methylation state of the untreated nucleic acid target gene molecule.

Amplification of Treated Nucleic Acid Target Gene Molecule The methods provided herein also may include a step of amplifying the treated nucleic acid target gene molecule using one or more primers. In one embodiment, at least one primer is a methylation specific primer. In another embodiment, the primer contains one or more nucleotides complementary to the nucleotide treated using the methylation-specific reagent. For example, bisulfite is cytosine specific; when bisulfite is used, a primer used in a method of identifying methylated nucleotides can contain one or more guanine nucleotides. The amplification methods can serve to selectively amplify nucleic acid target gene molecules complementary to the primers while not amplifying one or more other nucleic acid molecules in a nucleic acid sample.

Methylation-specific primers, which are also referred to herein as methylation state specific primers, are designed to distinguish between nucleotide sequences of treated nucleic acid target gene molecules based on the methylation state of one or more nucleotides in the untreated nucleic acid target gene molecule. For example, methylation specific primers may be designed to hybridize to a nucleotide sequence of a reagent-treated nucleic acid target gene molecule arising from a nucleic acid target gene molecule that contained methylated nucleotides in preference to hybridizing to a nucleotide sequence of a reagent-treated nucleic acid target gene molecule arising from a nucleic acid target gene molecule that contained unmethylated nucleotides. Correspondingly, methylation specific primers may be designed to hybridize to a nucleotide sequence of a reagent-treated nucleic acid target gene molecule arising from a nucleic acid target gene molecule that contained unmethylated nucleotides in preference to hybridizing to a nucleotide sequence of a reagent-treated nucleic acid target gene molecule arising from a nucleic acid target gene molecule that contained methylated nucleotides.

The primers used for amplification of the treated nucleic acid target gene molecule in the sample can hybridize to the treated nucleic acid target gene molecule under conditions in which a nucleotide synthesis reaction, such as PCR, can occur. Typically, two or more nucleotide synthesis reaction cycles are performed to produce sufficient quantities of nucleic acid target gene molecule for subsequent steps including fragmentation and detection. In methods of selectively amplifying a nucleic acid target gene molecule using a methylation specific primer, at least one primer used in the amplification method will be methylation specific. Preferably the primers used in the amplification method are not methylation specific.

Primers used in the methods disclosed herein are of sufficient length and appropriate sequence to permit specific primer extension using a nucleic acid target gene molecule template. The primers are typically designed to be complementary to each strand of the nucleic acid target gene molecule to be amplified. The primer can be an oligodeoxyribonucleotide, an oligoribonucleotide, or an oligonucleotide containing both deoxyribonucleotides and ribonucleotides, in some embodiments, a primer can contain one or more nucleotide analogs. The length of primer can vary, depending on any of a variety of factors, including temperature, buffer, desired selectivity and nucleotide composition. The primer can contain at least about 5, 8, 10, 15, 20, 25, 30, 40, 50, 60, 70 or 80 nucleotides, and typically contains no more than about 120, 110, 100, 90, 70, 60, 50, 40, 30, 20 or 10 nucleotides.

The oligonucleotide primers used herein can be prepared using any suitable method, such as conventional phosphotriester and phosphodiester methods or automated embodiments thereof. In one such automated embodiment, diethylphosphoramidites are used as starting materials and can be synthesized as described by Beaucage, et al, Tetrahedron Letters 22:1859-1862 (1981). Methods for synthesizing oligonucleotides on a solid support are known in the art, as exemplified in U.S. Pat. No.: 4,458,066.

A primer used in accordance with the disclosed amplification and nucleic acid synthesis methods can specifically hybridize to a nucleic acid target gene molecule.

In methods provided herein, the nucleotide sequence of a nucleic acid target gene molecule can be modified as a function of the methylation state of the nucleic acid target gene molecule. Accordingly, the primer binding region of a methylation-specific reagent-treated nucleic acid target gene molecule that corresponds to a methylation state of a region of an untreated nucleic acid target gene molecule can be a primer binding region whose nucleotide sequence reflects the methylation state of that region in the untreated nucleic acid target gene molecule. For example, a region of an untreated nucleic acid target gene molecule that contains a methylcytosine at the 4th nucleotide and an unmethylated cytosine at the 7th nucleotide can be treated with bisulfite, which will convert the cytosine at the 7^th nucleotide to uracil without changing the methylcytosine at the 4^th nucleotide; thus, a primer binding region of the treated nucleic acid target gene molecule that corresponds to that region of the untreated nucleic acid target gene molecule will contain a cytosine at the 4th nucleotide and a uracil (or thymine) at the 7th nucleotide, and a primer complementary to such a primer binding region will contain an adenine at the locus complementary to the 4th nucleotide and a guanine at the locus complementary to the 7^th nucleotide.

The methylation specific primers may be used in methods to specifically amplify nucleic acid target gene molecules according to the methylation state of the nucleic acid target gene molecule, and to thereby selectively increase the amount of nucleic acid target gene in a sample. Methylation state specific amplification methods include one or more nucleic acid synthesis steps, using one or more methylation specific primers.

In accordance with the methods disclosed herein, a nucleic acid target gene sequence can serve as a template for one or more steps of nucleic acid synthesis. The nucleic acid synthesis step or steps can include primer extension, DNA replication, polymerase chain reaction (PCR), reverse transcription, reverse transcription polymerase chain reaction (RT-PCR), rolling circle amplification, whole genome amplification, strand displacement amplification (SDA), and transcription based reactions.

In one embodiment an amplification step can be performed that can amplify one or more nucleic acids without distinguishing between methylated and unmethylated nucleic acid molecules or loci. Such an amplification step can be performed, for example, when the amount of nucleic acid in a sample is very low and detection of methylated nucleic acid target gene molecules can be improved by a preliminary amplification step that does not distinguish methylated nucleic acid target gene molecules from unmethylated nucleic acid target gene molecules or other nucleic acids in the sample. Typically, such an amplification step is performed subsequent to treating the nucleic acid sample with a reagent that modifies the nucleotide sequence of nucleic acid molecules as a function of the methylation state of the nucleic acid molecules. Although this method does not distinguish according to methylation state, the primers used in such an amplification step nevertheless may be used to increase the amount of nucleic acid molecules of a particular nucleic acid target gene region to be examined relative to the total amount of nucleic acid in a sample. For example, primers can be designed to hybridize to a pre-determined region of a nucleic acid target gene molecule in order to increase the relative amount of that nucleic acid target gene molecule in the sample, but without amplifying the nucleic acid target gene molecule according to the methylation state of the nucleic acid target gene molecule. One skilled in the art may determine the primer used in such a preamplification, or amplification, step according to various known factors and including the desired selectivity of the amplification step and any known nucleotide sequence information.

In the methods of nucleic acid synthesis using a double-stranded nucleic acid molecule, the strands are first separated before any nucleic acid synthetic steps. Following strand separation, one or more primers can be hybridized to one or more treated single-stranded nucleic acid molecules to be amplified, and nucleotide synthesis can be performed to add nucleotides to each primer to form a strand complementary to the strand of the nucleic acid target gene molecule. In one embodiment, nucleic acid synthesis can be performed to selectively amplify one of two strands of a treated nucleic acid target gene molecule. In another embodiment, the step of synthesizing a strand complementary to each strand of a double-stranded treated nucleic acid target gene molecule is performed in the presence of two or more primers, such that at least one primer can hybridize to each strand and prime additional nucleotide synthesis. In the methods of nucleic acid synthesis using a single- stranded nucleic acid molecule, a primer can be hybridized to the single- stranded nucleic acid molecule to be amplified, and nucleotide synthesis may be performed to add nucleotides to the primer to form a strand complementary to the single-stranded nucleic acid molecule. In one embodiment, the step of synthesizing a strand complementary to a single- stranded nucleic acid molecule is performed in the presence of two or more primers, such that one primer can hybridize to the nucleotide sequence of the strand of the nucleic acid target gene molecule, and one primer can hybridize to the synthesized complementary strand and prime additional nucleotide synthesis. For example, after synthesis of the complementary strand, PCR amplification of the nucleic acid molecule can be immediately performed without further manipulation of the sample. In another embodiment, the step of synthesizing a strand complementary to a single- stranded nucleic acid molecule is performed separately from additional nucleotide synthetic reactions. For example, the complementary strand can be synthesized to form a double-stranded nucleic acid molecule, and the sample may be subjected to one or more intermediate steps prior to amplifying the double- stranded nucleic acid molecule. Intermediate steps may include any of a variety of methods of manipulating a nucleic acid sample, including increasing the purity of the nucleic acid molecule, removing excess primers, changing the reaction conditions (e.g., the buffer conditions, enzyme or reactants present in the sample), and other parameters. In one example, the sample may be subjected to one or more purification steps of the nucleic acid molecule. For example, the primer used to create the strand complementary to the nucleic acid molecule can contain a moiety at its 5' end that permits identification or isolation of the primer or of a nucleic acid into which the primer is incorporated. Such a moiety may be, for example, a bindable moiety such as biotin, polyhistidine, magnetic bead, or other suitable substrate, whereby contacting the sample with the binding partner of the bindable moiety may result in selective binding of nucleic acid molecule into which the primer has been incorporated. Such selective binding may be used to separate the nucleic acid molecule from sample impurities, thereby increasing the purity of the nucleic acid molecule. After performing one or more intermediate steps, such as purity enhancing steps, the nucleic acid molecule may be amplified according to the methods provided herein and as known in the art.

After formation of the strand complementary to the single- stranded nucleic acid target gene molecules, subsequent nucleic acid target gene molecule amplification steps may be performed in which the complementary strands are separated, primers are hybridized to the strands, and the primers have added thereto nucleotides to form a new complementary strand. Strand separation may be effected either as a separate step or simultaneously with the synthesis of the primer extension products. This strand separation may be accomplished using various suitable denaturing conditions, including physical, chemical, or enzymatic means, the word "denaturing" includes all such means. One physical method of separating nucleic acid strands involves heating the nucleic acid target gene molecule until it is denatured. Typical heat denaturation may involve temperatures ranging from about 8O⁰C to 105⁰C, for times ranging from about 1 to 10 minutes. Strand separation also may be accomplished by chemical means, including high salt conditions or strongly basic conditions. Strand separation also may be induced by an enzyme from the class of enzymes known as helicases or by the enzyme RecA, which has helicase activity, and in the presence of riboATP, is known to denature DNA. The reaction conditions suitable for strand separation of nucleic acids with helicases are described by Kuhn Hoffmann-Berling, CSH-Quan tita rive Biology, 43:63 (1978) and techniques for using RecA are reviewed in C. Radding, Ann. Rev. Genetics 16:405-437 (1982).

After each amplification step, the amplified product will be double stranded, with each strand complementary to the other. The complementary strands of may be separated, and both separated strands may be used as a template for the synthesis of additional nucleic acid strands. This synthesis may be performed under conditions allowing hybridization of primers to templates to occur. Generally synthesis occurs in a buffered aqueous solution, typically at about a pH of 7-9, such as about pH 8. Typically, a molar excess of two oligonucleotide primers can be added to the buffer containing the separated template strands. In some embodiments, the amount of target nucleic acid is not known (for example, when the methods disclosed herein are used for diagnostic applications), so that the amount of primer relative to the amount of complementary strand cannot be determined with certainty.

In an exemplary method, deoxyribonucleoside triphosphates dATP, dCTP, dGTP, and dTTP can be added to the synthesis mixture, either separately or together with the primers, and the resulting solution can be heated to about 90⁰C-IOO⁰C from about 1 to 10 minutes, typically from 1 to 4 minutes. After this heating period, the solution can be allowed to cool to about room temperature. To the cooled mixture can be added an appropriate enzyme for effecting the primer extension reaction (called herein "enzyme for polymerization"), and the reaction can be allowed to occur under conditions known in the art. This synthesis (or amplification) reaction can occur at room temperature up to a temperature above which the enzyme for polymerization no longer functions. For example, the enzyme for polymerization also may be used at temperatures greater than room temperature if the enzyme is heat stable. In one embodiment, the method of amplifying is by PCR, as described herein and as is commonly used by those of skill in the art. Alternative methods of amplification have been described and also may be employed. A variety of suitable enzymes for this purpose are known in the art and include, for example, E. coli DNA polymerase I, Klenow fragment of E. coli DNA polymerase I, T4 DNA polymerase, other available DNA polymerases, polymerase muteins, reverse transcriptase, and other enzymes, including thermostable enzymes (i.e., those enzymes which perform primer extension at elevated temperatures, typically temperatures that cause denaturation of the nucleic acid to be amplified).

Manipulation of Both Strands of a Nucleic Acid Target Gene Molecule

Methods of manipulating a nucleic acid target gene molecule subsequent to methylation-based sequence modification treatment, such as amplification and fragmentation, may be performed using only one strand of the treated nucleic acid target gene molecule, or using both strands of the treated nucleic acid target gene molecule. For example, primers used for amplification steps may be complementary to only one strand of the treated nucleic acid target gene molecule, or may be complementary to both strands of the treated nucleic acid. Accordingly, amplification steps may be performed to create at least two different amplified double-stranded products, where both strands of the treated nucleic acid target gene molecule is amplified into separate double-stranded products. Alternatively, amplification may be performed such that only one of the two strands of the treated nucleic acid target gene molecule is amplified. For example, when amplification is performed using at least one primer that is selective for the sequence of one of the two strands, the strand hybridized to the primer may be selectively amplified.

After one or more steps of amplification, the amplified products may be subjected to one or more manipulation steps prior to additional amplification steps or prior to cleavage steps. For example, amplified products can be subjected to one or more purification steps prior to additional amplification or prior to cleavage.

Methods for purifying nucleic acid molecules are known in the art and include precipitation, dialysis or other solvent exchange, gel electrophoresis, enzymatic degradation of impurities (e.g., protease treatment, or RNase treatment for a DNA nucleic acid target gene molecule sample), liquid chromatography including ion exchange chromatography and affinity chromatography, and other methods of specifically binding nucleic acid target gene molecules to separate them from impurities (e.g., hybridization, biotin binding). Purification steps also may include separating complementary strands of amplification products. One skilled in the art will know to select which, if any, purification steps to use according to desired level of purity and/or desired sample composition for subsequent amplification, modification or cleavage steps. Methods for determining methylation in a nucleic acid target gene may include methods in which a single sample is treated in one or more steps, and then the single sample may be divided into two or more aliquots for parallel treatment in subsequent steps.

Amplified products may be split into two or more aliquots after amplification. For example, amplified products may be split into two or more aliquots after amplification but prior to cleaving the amplified products, amplified products may split into two or more aliquots after amplification and subjected to further steps such as one or more amplified product purification steps.

When amplified products are split into two or more aliquots prior to cleavage, different cleavage methods may be applied to each of the two or more aliquots. For example, a first nucleic acid target gene molecule aliquot may be base specifically fragmented with RNase A, while a second nucleic acid target gene molecule aliquot may be base specifically fragmented with Rnase Tl. In another example, amplified nucleic acid target gene molecule may be split into four aliquots and each aliquot may be treated with a different base-specific reagent to produce four different sets of base specifically cleaved nucleic acid target gene molecule fragments. Separation into two or more aliquots permits different cleavage reactions to be performed on the same amplification product. Use of different cleavage reactions on the same amplification product is further described in the cleavage methods provided herein.

A sample may be divided into two or more aliquots in specifically amplifying different strands of a nucleic acid target gene molecule in different aliquots. For example, a treated nucleic acid target gene molecule can have non-complementary strands that can be separately treated with different primers such as different methylation state specific primers in separately amplifying the different strands in different aliquots. In another embodiment, complementary strands of an amplified nucleic acid target gene molecule can be separately amplified in different aliquots, according to the primers used in each aliquot. For example, a sample of amplified nucleic acid target gene molecules can be separated into two or more aliquots, where the forward strand is transcribed in a first set of aliquots and the reverse strand is transcribed in a second set of aliquots. As will be appreciated by one skilled in the art, a sample can be divided into any of a plurality of aliquots in which any combination of the parallel reactions described herein may be performed.

Fragmentation in Conjunction with Nucleotide Synthesis Selective nucleotide synthesis also may be performed in conjunction with fragmentation. A nucleic acid target gene amplified through a plurality of nucleic acid synthesis cycles will utilize primers hybridizing to two separate regions of the nucleic acid target gene molecule. Fragmentation of a nucleic acid target gene molecule in the center region in between the two primer hybridization sites will prevent amplification of the nucleic acid target gene molecule. Hence selective fragmentation of the center region of nucleic acid molecules may result in selective amplification of a nucleic acid target gene molecule even if the primers used in the nucleic acid synthesis reactions are not selective. In one example, the sample may be treated with fragmentation conditions prior to being treated with nucleic acid synthesis conditions, and prior to being treated with a reagent that modifies the nucleic acid target gene molecule sequence as a function of the methylation state of the nucleic acid target gene. In such an example, the fragmentation conditions may be selective for methylated or unmethylated nucleotides. For example, a sample can have added thereto a methylation sensitive endonuclease, such as HPAII, which cleaves at an unmethylated recognition site but not at a methylated recognition site. This results in a sample containing intact nucleic acid target gene molecules that are methylated at the recognition site and cleaved nucleic acid target gene molecules that are unmethylated at the recognition site. The sample then may be treated with nucleic acid synthesis conditions using primers designed so that only uncleaved nucleic acid target gene molecules are amplified. As a result of the cleavage, amplification will be selective for nucleic acid target gene molecules that are methylated at the recognition site.

In another example, the sample may be treated with fragmentation conditions prior to treatment with nucleic acid synthesis conditions, but subsequent to treatment with a reagent that modifies the nucleic acid target gene molecule sequence as a function of the methylation state of the nucleic acid target gene. For example, a sample can have added thereto an endonuclease that cleaves at a recognition site that includes a C nucleotide at a particular locus, but not a recognition site that contains a T or U nucleotide at that particular locus. Or vice versa, a sample can have added thereto an endonuclease that cleaves at a recognition site that includes a T or U nucleotide at a particular locus, but not a recognition site that contains a C nucleotide at that particular locus. The sample can first be treated with a reagent that modifies the nucleic acid target gene molecule sequence as a function of the methylation state of the nucleic acid target gene molecule, and then treated with such an endonuclease. The resulting sample will contain intact nucleic acid target gene molecules that have the desired methylation state at the recognition site and cleaved nucleic acid target gene molecules that have the undesired methylation state at the recognition site. The sample then can be treated with nucleic acid synthesis conditions using primers designed so that only uncleaved nucleic acid target gene molecules are amplified. As a result of the cleavage, amplification will be selective for nucleic acid target gene molecules that are methylated at the recognition site. Transcription

Transcription of template DNA such as a nucleic acid target gene molecule, or an amplified product thereof, may be performed for one strand of the template DNA or for both strands of the template DNA. In one embodiment, the nucleic acid molecule to be transcribed contains a moiety to which an enzyme capable of performing transcription can bind; such a moiety may be, for example, a transcriptional promotor sequence.

Transcription reactions may be performed using any of a variety of methods known in the art, using any of a variety of enzymes known in the art. For example, mutant T7 RNA polymerase (T7 R&DNA polymerase; Epicentre, Madison, WI) with the ability to incorporate both dNTPs and rNTPs may be used in the transcription reactions. The transcription reactions may be run under standard reaction conditions known in the art, for example, 40 mM Tris-Ac (pH 7.51, 10 mM NaCl, 6 mM MgCl, 2 mM spermidine, 10 mM dithiothreitol, 1 mM of each rNTP, 5 mM of dNTP (when used), 40 nM DNA template, and 5 U/uL T7 R&DNA polymerase, incubating at 37°C for 2 hours. After transcription, shrimp alkaline phosphatase (SAP) may be added to the cleavage reaction to reduce the quantity of cyclic monophosphate side products. Use of T7 R&DNA polymerase is known in the art, as exemplified by U.S. Pat. Nos.: 5,849,546 and 6,107,037, and Sousa et al, EMBO J. 14:4609-4621 (1995), Padilla et al, Nucl. Acid Res. 27: 1561-1563 (1999), Huang et al, Biochemistry 36:8231-8242 (1997), and Stanssens et al, Genome Res., 14: 126-133 (2004). In addition to transcription with the four regular ribonucleotide substrates (rCTP, rATP, rGTP and rUTP), reactions may be performed replacing one or more ribonucleoside triphosphates with nucleoside analogs, such as those provided herein and known in the art, or with corresponding deoxyribonucleoside triphosphates (e.g. , replacing rCTP with dCTP, or replacing rUTP with either dUTP or dTTP). In one embodiment, one or more rNTPs are replaced with a nucleoside or nucleoside analog that, upon incorporation into the transcribed nucleic acid, is not cleavable under the fragmentation conditions applied to the transcribed nucleic acid.

In one embodiment, transcription is performed subsequent to one or more nucleic acid synthesis reactions, including one or more nucleic acid synthesis reactions using methylation specific primers. For example, transcription of an amplified product can be performed subsequent to amplification of a nucleic acid target gene molecule, including methylation specific amplification of the nucleic acid target gene molecule. In another embodiment, the treated nucleic acid target gene molecule is transcribed without any preceding nucleic acid synthesis steps. Fragmentation of Nucleic Acid Molecules

The methods provided herein also include steps of fragmentation and/or cleavage of nucleic acid target gene molecules or amplified products. Any method for cleaving a nucleic acid molecule into fragments with a suitable fragment size distribution may be used to generate the nucleic acid fragments. Fragmentation of nucleic acid molecules is known in the art and may be achieved in many ways. For example, nucleic acid molecules composed of DNA, RNA, analogs of DNA and RNA or combinations thereof, can be fragmented physically, chemically, or enzymatically. In one embodiment, enzymatic cleavage at one or more specific cleavage sites can be used to produce the nucleic acid molecule fragments utilized herein. Typically, cleavage is effected after amplification such that once a sufficient quantity of amplified products is generated using the methods provided herein, the amplified products can be cleaved into two or more fragments.

In embodiments where restriction enzymes are used, depending on the number and type of restriction enzymes used and the particular reaction conditions selected, the average length of fragments generated may be controlled within a specified range. In particular embodiments, fragments of nucleic acid molecules prepared for use herein may range in size from the group of ranges including about 1-50 bases, about 2-40 bases, about 3-35 bases, and about 5-30 bases. Yet other size ranges contemplated for use herein include between about 50 to about 150 bases, from about 25 to about 75 bases, or from about 12-30 bases. In one particular embodiment, fragments of about 3 to about 35 bases are used. Generally, fragment size range will be selected so that the mass of the fragments can be accurately determined using the mass measurement methods described herein and known in the art; also in some embodiments, size range is selected in order to facilitate the desired desorption efficiencies in MALDI-TOF MS.

Base-specific fragmentation using nucleases is a preferred fragmentation method. Nucleic acid target gene molecules may be fragmented using nucleases that selectively cleave at a particular base (e.g., A, C, T or G for DNA and A, C, U or G for RNA) or base type (i.e., pyrimidine or purine). In one embodiment, RNases that specifically cleave 3 RNA nucleotides (e.g., U, G and A), 2 RNA nucleotides (e.g., C and U) or 1 RNA nucleotide (e.g., A), may be used to base specifically cleave transcripts of a nucleic acid target gene molecule. For example, RNase Tl cleaves ssRNA (single-stranded RNA) at G ribonucleotides, RNase U2 digests ssRNA at A ribonucleotides, RNase CL3 and cusativin cleave ssRNA at C ribonucleotides, PhyM cleaves ssRNA at U and A ribonucleotides, and RNAse A cleaves ssRNA at pyrimidine ribonucleotides (C and U). The use of mono-specific Rnases such as RNase T, (G specific) and RNase U, (A specific) is known in the art (Donis-Keller et al, Nucl. Acids Res. 4:2527-2537 (1977); Gupta and Randerath, Nucl. Acids Res. 4: 1957-1978 (1977); Kuchino and Nishimura, Methods Enzymol. 180:154-1 63 (1989); and Hahner et al, Nucl. Acids Res. 25(10): 1957-1964 (1997)). Another enzyme, chicken liver ribonuclease (RNase CL3) has been reported to cleave preferentially at cytidine, but the enzyme's proclivity for this base has been reported to be affected by the reaction conditions (Boguski et al, J. Biol. Chem. 255:2160-2163 (1980)). Reports also claim cytidine specificity for another ribonuclease, cusativin, isolated from dry seeds of Cucumis sativus L (Rojo et al., Planta 194:328-338 (1994)). Alternatively, the identification of pyrimidine residues by use o/RNase PhyM (A and U specific) (Donis-Keller, H. Nucleic Acids Res. 8:3133-3142 (1980)) and RNase A (C and U specific) (Simoncsits et al, Nature 269:833-836 (1977); Gupta and Randerath, Nucl. Acids Res. 4: 1957-1978 (1977)) has been demonstrated. Examples of such cleavage patterns are given in Stanssens et al., WO 00/66771.

Base specific cleavage reaction conditions using an RNase are known in the art, and can include, for example 4 mM Tris-Ac (pH 8.01, 4 mM KAc, 1 mM spermidine, 0.5 mM dithiothreitol and 1.5 mM MgCl.

In one embodiment, amplified product can be transcribed into a single stranded RNA molecule and then cleaved base specifically by an endoribonuclease. Treatment of the target nucleic acid, for example using bisulfite which converts unmethylated cytosine to uracil without modifying methylated cytosine, can be used to generate differences in base specific cleavage patterns that can be analyzed by mass analysis methods, such as mass spectrometry, and can be used for identification of methylated sites. In one embodiment, transcription of a nucleic acid target gene molecule can yield an RNA molecule that can be cleaved using specific RNA endonucleases. For example, base specific cleavage of the RNA molecule can be performed using two different endoribonucleases, such as RNAse Tl and RNAse A. RNAse Tl specifically cleaves G nucleotides, and RNAse A specifically cleaves pyrimidine ribonucleotides (i.e., cytosine and uracil residues). In one embodiment, when an enzyme that cleaves more than one nucleotide, such as RNAse A, is used for cleavage, non-cleavable nucleosides, such as dNTP's may be incorporated during transcription of the nucleic acid target gene molecule or amplified product. For example, dCTPs may be incorporated during transcription of the amplified product, and the resultant transcribed nucleic acid can be subject to cleavage by RNAse A at U ribonucleotides, but resistant to cleavage by RNAse A at C deoxyribonucleotides. In another example, dTTPs can be incorporated during transcription of the nucleic acid target gene molecule, and the resultant transcribed nucleic acid can be subject to cleavage by RNAse A at C ribonucleotides, but resistant to cleavage by RNAse A at T deoxyribonucleotides. By selective use of non-cleavable nucleosides such as dNTPs, and by performing base specific cleavage using RNases such as RNAse A and RNAse Tl, base cleavage specific to three different nucleotide bases can be performed on the different transcripts of the same target nucleic acid sequence. For example, the transcript of a particular nucleic acid target gene molecule can be subjected to G-specific cleavage using RNAse Tl; the transcript can be subjected to C-specific cleavage using dTTP in the transcription reaction, followed by digestion with RNAse A; and the transcript can be subjected to T-specific cleavage using dCTP in the transcription reaction, followed by digestion with RNAse A. These types of base specific cleavage patterns are exemplified below showing the theoretical cleavage products of a given nucleotide sequence TAACGCAT converted through bisulfite treatment to the sequence TAAACGTAT if methylated at the cytosine and to TAAATGTAT if not methylated.

In another embodiment, the use of dNTPs, different RNAses, and both orientations of the nucleic acid target gene molecule can allow for six different cleavage schemes. For example, a double stranded nucleic acid target gene molecule can yield two different single stranded transcription products, which can be referred to as a transcript product of the forward strand of the nucleic acid target gene molecule and a transcript product of the reverse strand of the nucleic acid target gene molecule. Each of the two different transcription products can be subjected to three separate base specific cleavage reactions, such as G-specific cleavage, C-specific cleavage and T-specific cleavage, as described herein, to result in six different base specific cleavage reactions. The six possible cleavage schemes are listed below.

RNAse A: dCTP

C specific cleavage C specific cleavage

Use of four different base specific cleavage reactions can yield information on all four nucleotide bases of one strand of the nucleic acid target gene molecule. That is, by taking into account that cleavage of the forward strand can be mimicked by cleaving the complementary base on the reverse strand, base specific cleavage can be achieved for each of the four nucleotides of the forward strand by reference to cleavage of the reverse strand. For example, the three base-specific cleavage reactions can be performed on the transcript of the nucleic acid target gene molecule forward strand, to yield G-, C- and T-specific cleavage of the nucleic acid target gene molecule forward strand; and a fourth base specific cleavage reaction can be a T-specific cleavage reaction of the transcript of the nucleic acid target gene molecule reverse strand, the results of which will be equivalent to A-specific cleavage of the transcript of the nucleic acid target gene molecule forward strand. One skilled in the art will appreciate that base specific cleavage to yield information on all four nucleotide bases of one nucleic acid target gene molecule strand can be accomplished using a variety of different combinations of possible base specific cleavage reactions, including cleavage reactions listed above for RNases Tl and A, and additional cleavage reactions for forward or reverse strands and/or using non- hydro lyzab Ie nucleotides can be performed with other base specific RNases known in the art or disclosed herein.

In one example, RNAse U2 can be used to base specifically cleave nucleic acid target gene molecule transcripts. RNAse U2 can base specifically cleave RNA at A nucleotides. Thus, by use of RNAses Tl, U2 and A, and by use of the appropriate dNTPs (in conjunction with use of RNase A), all four base positions of a nucleic acid target gene molecule can be examined by base specifically cleaving transcript of only one strand of the nucleic acid target gene molecule. In some embodiments, non- cleavable nucleoside triphosphates are not required when base specific cleavage is performed using RNAses that base specifically cleave only one of the four ribonucleotides. For example, use of RNAse Tl, RNase CL3, cusativin, or RNAse U2 for base specific cleavage does not require the presence of non- cleavable nucleotides in the nucleic acid target gene molecule transcript. Use of RNAses such as RNAse Tl and RNAse U2 can yield information on all four nucleotide bases of a nucleic acid target gene molecule. For example, transcripts of both the forward and reverse strands of a nucleic acid target gene molecule or amplified product can be synthesized, and each transcript can be subjected to base specific cleavage using RNAse Tl and RNAse U2. The resulting cleavage pattern of the four cleavage reactions will yield information on all four nucleotide bases of one strand of the nucleic acid target gene molecule. In such an embodiment, two transcription reactions can be performed: a first transcription of the forward nucleic acid target gene molecule strand and a second of the reverse nucleic acid target gene molecule strand.

Also contemplated for use in the methods are a variety of different base specific cleavage methods. A variety of different base specific cleavage methods are known in the art and are described herein, including enzymatic base specific cleavage of RNA, enzymatic base specific cleavage of modified DNA, and chemical base specific cleavage of DNA. For example enzymatic base specific cleavage, such as cleavage using uracil-deglycosylase (UDG) or methylcytosine deglycosylase (MCDG), are known in the art and described herein, and can be performed in conjunction with the enzymatic RNAse-mediated base specific cleavage reactions described herein.

Methods for using restriction endonucleases to fragment nucleic acid molecules are widely known in the art. In one exemplary protocol a reaction mixture of 20-5OuI is prepared containing; DNA l-3ug; restriction enzyme buffer IX; and a restriction endonuclease 2 units for lug of DNA. Suitable buffers also are known in the art and include suitable ionic strength, cofactors, and optionally, pH buffers to provide optimal conditions for enzymatic activity. Specific enzymes may require specific buffers that are generally available from commercial suppliers of the enzyme. An exemplary buffer is potassium glutamate buffer (KGB). Hannish, J. and M. McClelland, "Activity of DNA modification and restriction enzymes in KGB, a potassium glutamate buffer," Gene Anal. Tech 5: 105 (1988); McClelland, M. et al, "A single buffer for all restriction endonucleases," Nucl. Acids Res. 16:364 (1988). The reaction mixture is incubated at 37°C for 1 hour or for any time period needed to produce fragments of a desired size or range of sizes. The reaction may be stopped by heating the mixture at 65°C or 8O°C as needed. Alternatively, the reaction may be stopped by chelating divalent cations such as Mg²⁺ with for example, EDTA.

DNAses also may be used to generate nucleic acid molecule fragments. Anderson, S., "Shotgun DNA sequencing using cloned Dnase I-generated fragments," Nucl. Acids Res. 9:3015-3027 (1981). DNase I (Deoxyribonuclease I) is an endonuclease that non-specifically digests double- and single- stranded DNA into poly- and mono-nucleotides.

Catalytic DNA and RNA are known in the art and can be used to cleave nucleic acid molecules to produce nucleic acid molecule fragments. Santoro, S. W. and Joyce, G. F. "A general purpose RNA- cleaving DNA enzyme," Proc. Natl. Acad. ScL USA 94:4262-4266 (1997). DNA as a single-stranded molecule can fold into three-dimensional structures similar to RNA, and the 2'-hydroxy group is dispensable for catalytic action. As ribozymes, DNAzymes also can be made, by selection, to depend on a cofactor. This has been demonstrated for a histidine-dependent DNAzyme for RNA hydrolysis. U.S. Patent Nos. 6,326,174 and 6,194,180 disclose deoxyribonucleic acid enzymes, catalytic and enzymatic DNA molecules, capable of cleaving nucleic acid sequences or molecules, particularly RNA.

Fragmentation of nucleic acid molecules may be achieved using physical or mechanical forces including mechanical shear forces and sonication. Physical fragmentation of nucleic acid molecules may be accomplished, for example, using hydrodynamic forces. Typically nucleic acid molecules in solution are sheared by repeatedly drawing the solution containing the nucleic acid molecules into and out of a syringe equipped with a needle. Thorstenson, Y.R. et al., "An Automated Hydrodynamic Process for Controlled, Unbiased DNA Shearing," Genome Research 8:848-855 (1998); Davison, P. F. Proc. Natl. Acad. ScL USA 45: 1560-1568 (1959); Davison, P. F. Nature 185:918-920 (1960); Schriefer, L. A. et al, "Low pressure DNA shearing: a method for random DNA sequence analysis," Nucl. Acids Res. 18:7455- 7456 (1990). Shearing of DNA, for example with a hypodermic needle, typically generates a majority of fragments ranging from 1-2 kb, although a minority of fragments can be as small as 300 bp.

The hydrodynamic point-sink shearing method developed by Oefner et al., is one method of shearing nucleic acid molecules that utilizes hydrodynamic forces. Oefner, P. J. et al, "Efficient random subcloning of DNA sheared in a recirculating point-sink flow system," Nucl. Acids Res. 24(20):3879- 3886 (1996).

Nucleic acid molecule fragments also may be obtained by agitating large nucleic acid molecules in solution, for example by mixing, blending, stirring, or vortexing the solution. Hershey, A. D. and Burgi, E. J. MoI. Biol, 2:143-152 (1960); Rosenberg, H. S. and Bendich, A. J. Am. Chem. Soc. 82:3198- 3201 (1960).

One suitable method of physically fragmenting nucleic acid molecules is based on sonicating the nucleic acid molecule. Deininger, P. L. "Approaches to rapid DNA sequence analysis," Anal. Biochem. 129:216-223 (1983). Fragmentation of nucleic acid molecules also may be achieved using a nebulizer. Bodenteich, A.,

Chissoe, S., Wang, Y.-F. and Roe, B. A. (1994) In Adams, M. D., Fields, C. and Venter, J. C. (eds.) Automated DNA Sequencing and Analysis. Academic Press, San Diego, CA. Nebulizers are known in the art and commercially available.

Another method for fragmenting nucleic acid molecule employs repeatedly freezing and thawing a buffered solution of nucleic acid molecules. The sample of nucleic acid molecules may be frozen and thawed as necessary to produce fragments of a desired size or range of sizes. Nucleic acid molecule fragmentation also may be achieved by irradiating the nucleic acid molecules. Typically, radiation such as gamma or x-ray radiation will be sufficient to fragment the nucleic acid molecules.

Chemical fragmentation may be used to fragment nucleic acid molecules either with base specificity or without base specificity. Nucleic acid molecules may be fragmented by chemical reactions including for example, hydrolysis reactions including base and acid hydrolysis. An exemplary acid/base hydrolysis protocol for producing nucleic acid molecule fragments are known (see, e.g., Sargent et al, Meth. Enz. 152:432 (1988)).

Mass Spectrometry

When analyses are performed using mass spectrometry, such as MALDI, nanoliter volumes of sample can be loaded on chips. Use of such volumes can permit quantitative or semi-quantitative mass spectrometric results. For example, the area under the peaks in the resulting mass spectra are proportional to the relative concentrations of the components of the sample. Methods for preparing and using such chips are known in the art, as exemplified in U.S. Patent No. 6,024,925, U.S. Publication 20010008615, and PCT Application No. PCT/US97/20195 (WO 98/20020); methods for preparing and using such chips also are provided in co-pending U.S. Application Serial Nos. 08/786,988, 09/364,774, and 09/297,575. Chips and kits for performing these analyses are commercially available from SEQUENOM under the trademark MassARRAY"'. MassARRAY"' systems contain a miniaturized array such as a SpectroCHlP@ useful for MALDI-TOF (Matrix-Assisted Laser Desorption Ionization-Time of Flight) mass spectrometry to deliver results rapidly. It accurately distinguishes single base changes in the size of DNA fragments relating to genetic variants without tags.

In one embodiment, the mass of all nucleic acid molecule fragments formed in the step of fragmentation is measured. The measured mass of a nucleic acid target gene molecule fragment or fragment of an amplification product also can be referred to as a "sample" measured mass, in contrast to a "reference" mass which arises from a reference nucleic acid fragment.

In another embodiment, the length of nucleic acid molecule fragments whose mass is measured using mass spectroscopy is no more than 75 nucleotides in length, no more than 60 nucleotides in length, no more than 50 nucleotides in length, no more than 40 nucleotides in length, no more than 35 nucleotides in length, no more than 30 nucleotides in length, no more than 27 nucleotides in length, no more than 25 nucleotides in length, no more than 23 nucleotides in length, no more than 22 nucleotides in length, no more than 21 nucleotides in length, no more than 20 nucleotides in length, no more than 19 nucleotides in length, or no more than 18 nucleotides in length. In another embodiment, the length of the nucleic acid molecule fragments whose mass is measured using mass spectroscopy is no less than 3 nucleotides in length, no less than 4 nucleotides in length, no less than 5 nucleotides in length, no less than 6 nucleotides in length, no less than 7 nucleotides in length, no less than 8 nucleotides in length, no less than 9 nucleotides in length, no less than 10 nucleotides in length, no less than 12 nucleotides in length, no less than 15 nucleotides in length, no less than 18 nucleotides in length, no less than 20 nucleotides in length, no less than 25 nucleotides in length, no less than 30 nucleotides in length, or no less than 35 nucleotides in length.

In one embodiment, the nucleic acid molecule fragment whose mass is measured is RNA. In another embodiment the nucleic acid target gene molecule fragment who's mass is measured is DNA. In yet another embodiment, the nucleic acid target gene molecule fragment whose mass is measured contains one modified or atypical nucleotide (i.e., a nucleotide other than deoxy-C, T, G or A in DNA, or other than C, U, G or A in RNA). For example, a nucleic acid molecule product of a transcription reaction may contain a combination of ribonucleotides and deoxyribonucleotides. In another example, a nucleic acid molecule can contain typically occurring nucleotides and mass modified nucleotides, or can contain typically occurring nucleotides and non-naturally occurring nucleotides.

Prior to mass spectrometric analysis, nucleic acid molecules can be treated to improve resolution. Such processes are referred to as conditioning of the molecules. Molecules can be "conditioned," for example to decrease the laser energy required for volatilization and/or to minimize fragmentation. A variety of methods for nucleic acid molecule conditioning are known in the art. An example of conditioning is modification of the phosphodiester backbone of the nucleic acid molecule (e.g., by cation exchange), which can be useful for eliminating peak broadening due to a heterogeneity in the cations bound per nucleotide unit. In another example, contacting a nucleic acid molecule with an alkylating agent such as alkyloidide, iodoacetamide, P-iodoethanol, or 2,3- epoxy- 1 -propanol, can transform a monothio phosphodiester bonds of a nucleic acid molecule into a phosphotriester bond. Likewise, phosphodiester bonds can be transformed to uncharged derivatives employing, for example, trialkylsilyl chlorides. Further conditioning can include incorporating nucleotides that reduce sensitivity for depurination (fragmentation during MS) e.g., a purine analog such as N7- or N9-deazapurine nucleotides, or RNA building blocks or using oligonucleotide triesters or incorporating phosphorothioate functions which are alkylated, or employing oligonucleotide mimetics such as PNA.

For some applications, simultaneous detection of more than one nucleic acid molecule fragment may be performed. In other applications, parallel processing can be performed using, for example, oligonucleotide or oligonucleotide mimetic arrays on various solid supports. "Multiplexing" can be achieved by several different methodologies. For example, fragments from several different nucleic acid molecules can be simultaneously subjected to mass measurement methods. Typically, in multiplexing mass measurements, the nucleic acid molecule fragments should be distinguishable enough so that simultaneous detection of the multiplexed nucleic acid molecule fragments is possible. Nucleic acid molecule fragments may be made distinguishable by ensuring that the masses of the fragments are distinguishable by the mass measurement method to be used. This may be achieved either by the sequence itself (composition or length) or by the introduction of mass-modifying functionalities into one or more nucleic acid molecules. In one embodiment, the nucleic acid molecule to be mass-measured contains attached thereto one or more mass-modifying moieties. Mass-modifying moieties are known in the art and may be attached to the 3' end or 5' end of a nucleic acid molecule fragment, may be attached to a nucleobase or to a sugar moiety of a nucleotide, or may be attached to or substitute for the phosphodiester linkage between nucleotides. A simple mass-modification may be achieved by substituting H for halogens like F, Cl, Br and/or I, or pseudohalogens such as SCN, NCS, or by using different alkyl, aryl or aralkyl moieties such as methyl, ethyl, propyl, isopropyl, t-butyl, hexyl, phenyl, substituted phenyl, benzyl, or functional groups such as N₃, CH₂F, CHF₂, CF₃, Si(CH₃)₃, Si(CH₃)₂, (C₂H₅), Si(CH₃)(C₂H₅)₂, Si(C₂H₅)₃. Yet another mass- modification can be obtained by attaching homo- or heteropeptides through the nucleic acid molecule (e.g., detector (D)) or nucleoside triphosphates. One example useful in generating mass-modified species with a mass increment of 57 is the attachment of oligoglycines, e.g., mass-modifications of 74, 131, 188, 245 are achieved. Simple oligoamides also can be used, e.g., mass- modifications of 74, 88, 102, 116 . . ., are obtainable.

Mass-modifications also may include oligo/polyethylene glycol derivatives. The oligo/polyethylene glycols also can be monoalkylated by a lower alkyl such as methyl, ethyl, propyl, isopropyl, t-butyl and other suitable substituents. Other chemistries also can be used in the mass-modified compounds (see, e.g., those described in Oligonucleotides and Analogues, A Practical Approach, F. Eckstein, editor, IRL Press, Oxford, 1991).

Mass modifying moieties can be attached, for instance, to either the 5 '-end of the oligonucleotide, to the nucleobase (or bases), to the phosphate backbone, to the 2 '-position of the nucleoside (nucleosides), and/or to the terminal 3 '-position. Examples of mass modifying moieties include, for example, a halogen, an azido, or of the type, XR, wherein X is a linking group and R is a mass-modifying functionality. A mass-modifying functionality can, for example, be used to introduce defined mass increments into the oligonucleotide molecule, as described herein. Modifications introduced at the phosphodiester bond such as with alpha-thio nucleoside triphosphates, have the advantage that these modifications do not interfere with accurate Watson-Crick base-pairing and additionally allow for the one-step post-synthetic site- specific modification of the complete nucleic acid molecule e.g., via alkylation reactions (see, e.g., Nakamaye et al., Nucl. Acids Res. 23:9947-9959(1988)). Exemplary mass-modifying functionalities are boron-modified nucleic acids, which can be efficiently incorporated into nucleic acids by polymerases (see, e.g., Porter et al, Biochemistry 34: 11963-11969 (1995); Hasan et al, Nucl. Acids Res. 24:2150- 2157 (1996); Li et al. Nucl. Acids Res. 23:4495-4501 (1995)).

Furthermore, the mass-modifying functionality may be added so as to affect chain termination, such as by attaching it to the 3'-position of the sugar ring in the nucleoside triphosphate. For those skilled in the art, it is clear that many combinations can be used in the methods provided herein. In the same way, those skilled in the art will recognize that chain-elongating nucleoside triphosphates also can be mass- modified in a similar fashion with numerous variations and combinations in functionality and attachment positions. Different mass-modified nucleotides may be used to simultaneously detect a variety of different nucleic acid fragments simultaneously. In one embodiment, mass modifications can be incorporated during the amplification process. In another embodiment, multiplexing of different nucleic acid target gene molecules may be performed by mass modifying one or more nucleic acid target gene molecules, where each different nucleic acid target gene molecule can be differently mass modified, if desired. Additional mass measurement methods known in the art may be used in the methods of mass measurement, including electrophoretic methods such as gel electrophoresis and capillary electrophoresis, and chromatographic methods including size exclusion chromatography and reverse phase chromatography.

Using methods of mass analysis such as those described herein, information relating to mass of the nucleic acid target gene molecule fragments can be obtained. Additional information of a mass peak that can be obtained from mass measurements include signal to noise ratio of a peak, the peak area (represented, for example, by area under the peak or by peak width at half-height), peak height, peak width, peak area relative to one or more additional mass peaks, peak height relative to one or more additional mass peaks, and peak width relative to one or more additional mass peaks. Such mass peak characteristics may be used in the present methylation identification methods, for example, in a method of identifying the methylation state of a nucleotide locus of a nucleic acid target gene molecule by comparing at least one mass peak characteristic of an amplification fragment with one or more mass peak characteristics of one or more reference nucleic acids.

Methylation State Identification Fragment measurements may be used to identify the methylation state of a nucleic acid target gene molecule or to identify the methylation state of a particular nucleotide locus of a nucleic acid target gene molecule. Fragment measurements may be used to identify whether or not a nucleic acid target gene molecule contains one or more methylated or unmethylated nucleotides, such as methylcytosine or cytosine, respectively; to determine the number of methylated or unmethylated nucleotides such as methylcytosine or cytosine, respectively, present in a nucleic acid target gene molecule, to identify whether or not a nucleotide locus, such as a cytosine locus, is methylated or unmethylated in a nucleic acid target gene molecule, to identify the nucleotide locus of a methylated or unmethylated nucleotide, such as methylcytosine or cytosine, respectively, in a nucleic acid target gene molecule; to determine the ratio of methylated nucleic acid target gene molecule relative to unmethylated nucleic acid target gene molecule in a sample, to determine the ratio of methylated nucleotide at a particular nucleotide locus on a nucleic acid target gene molecule relative to unmethylated nucleotide at that locus, and to provide redundant information to further confirm any of the determinations provided herein.

Additional Methylation Analysis Methods Various methylation assay procedures are known in the art, and can be used in conjunction with the present invention. These assays allow for determination of the methylation state of one or a plurality of CpG islands within a DNA sequence. Such assays involve, among other techniques, DNA sequencing of bisulfite -treated DNA, PCR (for sequence-specific amplification), Southern blot analysis, use of methylation-sensitive restriction enzymes, etc. For example, genomic sequencing has been simplified for analysis of DNA methylation patterns and 5-methylcytosine distribution by using bisulfite treatment (Frommer et al., Proc. Natl. Acad. Sci. USA 89: 1827-1831, 1992). Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA is used, e.g., the method described by Sadri & Hornsby (Nucl. Acids Res. 24:5058-5059, 1996), or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 25:2532-2534, 1997).

COBRA analysis is a quantitative methylation assay useful for determining DNA methylation levels at specific gene loci in small amounts of genomic DNA (Xiong & Laird, Nucleic Acids Res. 25:2532-2534, 1997). Briefly, restriction enzyme digestion is used to reveal methylation-dependent sequence differences in PCR products of sodium bisulfite-treated DNA. Methylation-dependent sequence differences are first introduced into the genomic DNA by standard bisulfite treatment according to the procedure described by Frommer et al. (Proc. Natl. Acad. Sci. USA 89:1827-1831, 1992). PCR amplification of the bisulfite converted DNA is then performed using primers specific for the interested CpG islands, followed by restriction endonuclease digestion, gel electrophoresis, and detection using specific, labeled hybridization probes. Methylation levels in the original DNA sample are represented by the relative amounts of digested and undigested PCR product in a linearly quantitative fashion across a wide spectrum of DNA methylation levels. In addition, this technique can be reliably applied to DNA obtained from microdissected paraffin-embedded tissue samples. Typical reagents (e.g., as might be found in a typical COBRA-based kit) for COBRA analysis may include, but are not limited to: PCR primers for specific gene (or methylation-altered DNA sequence or CpG island); restriction enzyme and appropriate buffer; gene-hybridization oligo; control hybridization oligo; kinase labeling kit for oligo probe; and radioactive nucleotides. Additionally, bisulfite conversion reagents may include: DNA denaturation buffer; sulfonation buffer; DNA recovery reagents or kits (e.g., precipitation, ultrafiltration, affinity column); desulfonation buffer; and DNA recovery components.

Preferably, assays such as "MethyLightTM." (a fluorescence-based real-time PCR technique) (Eads et al., Cancer Res. 59:2302-2306, 1999), Ms-SNuPE (Methylation- sensitive Single Nucleotide Primer Extension) reactions (Gonzalgo & Jones, Nucleic Acids Res. 25:2529-2531, 1997), methylation- specific PCR ("MSP"; Herman et al., Proc. Natl. Acad. Sci. USA 93:9821-9826, 1996; U.S. Pat. No. 5,786,146), and methylated CpG island amplification ("MCA"; Toyota et al., Cancer Res. 59:2307-12, 1999) are used alone or in combination with other of these methods.

The MethyLight.TM. assay is a high-throughput quantitative methylation assay that utilizes fluorescence-based real-time PCR (TaqMan.RTM.) technology that requires no further manipulations after the PCR step (Eads et al., Cancer Res. 59:2302-2306, 1999). Briefly, the MethyLight.TM. process begins with a mixed sample of genomic DNA that is converted, in a sodium bisulfite reaction, to a mixed pool of methylation-dependent sequence differences according to standard procedures (the bisulfite process converts unmethylated cytosine residues to uracil). Fluorescence-based PCR is then performed either in an "unbiased" (with primers that do not overlap known CpG methylation sites) PCR reaction, or in a "biased" (with PCR primers that overlap known CpG dinucleotides) reaction. Sequence discrimination can occur either at the level of the amplification process or at the level of the fluorescence detection process, or both. The MethyLight.TM. assay may be used as a quantitative test for methylation patterns in the genomic DNA sample, wherein sequence discrimination occurs at the level of probe hybridization. In this quantitative version, the PCR reaction provides for unbiased amplification in the presence of a fluorescent probe that overlaps a particular putative methylation site. An unbiased control for the amount of input DNA is provided by a reaction in which neither the primers, nor the probe overlie any CpG dinucleotides. Alternatively, a qualitative test for genomic methylation is achieved by probing of the biased PCR pool with either control oligonucleotides that do not "cover" known methylation sites (a fluorescence-based version of the "MSP" technique), or with oligonucleotides covering potential methylation sites.

The MethyLight.TM. process can by used with a "TaqMan.RTM." probe in the amplification process. For example, double-stranded genomic DNA is treated with sodium bisulfite and subjected to one of two sets of PCR reactions using TaqMan.RTM. probes; e.g., with either biased primers and TaqMan.RTM. probe, or unbiased primers and TaqMan.RTM. probe. The TaqMan.RTM. probe is dual- labeled with fluorescent "reporter" and "quencher" molecules, and is designed to be specific for a relatively high GC content region so that it melts out at about 10. degree. C. higher temperature in the PCR cycle than the forward or reverse primers. This allows the TaqMan.RTM. probe to remain fully hybridized during the PCR annealing/extension step. As the Taq polymerase enzymatically synthesizes a new strand during PCR, it will eventually reach the annealed TaqMan.RTM. probe. The Taq polymerase 5' to 3' endonuclease activity will then displace the TaqMan.RTM. probe by digesting it to release the fluorescent reporter molecule for quantitative detection of its now unquenched signal using a real-time fluorescent detection system.

Typical reagents (e.g., as might be found in a typical MethyLight.TM.-based kit) for MethyLight.TM. analysis may include, but are not limited to: PCR primers for specific gene (or methylation-altered DNA sequence or CpG island); TaqMan.RTM. probes; optimized PCR buffers and deoxynucleotides; and Taq polymerase. Ms-SNuPE. The Ms-SNuPE technique is a quantitative method for assessing methylation differences at specific CpG sites based on bisulfite treatment of DNA, followed by single -nucleotide primer extension (Gonzalgo & Jones, Nucleic Acids Res. 25:2529-2531, 1997).

Briefly, genomic DNA is reacted with sodium bisulfite to convert unmethylated cytosine to uracil while leaving 5-methylcytosine unchanged. Amplification of the desired target sequence is then performed using PCR primers specific for bisulfite-converted DNA, and the resulting product is isolated and used as a template for methylation analysis at the CpG site(s) of interest. Small amounts of DNA can be analyzed (e.g., microdissected pathology sections), and it avoids utilization of restriction enzymes for determining the methylation status at CpG sites.

Typical reagents (e.g., as might be found in a typical Ms-SNuPE-based kit) for Ms-SNuPE analysis may include, but are not limited to: PCR primers for specific gene (or methylation-altered DNA sequence or CpG island); optimized PCR buffers and deoxynucleotides; gel extraction kit; positive control primers; Ms-SNuPE primers for specific gene; reaction buffer (for the Ms-SNuPE reaction); and radioactive nucleotides. Additionally, bisulfite conversion reagents may include: DNA denaturation buffer; sulfonation buffer; DNA recovery regents or kit (e.g., precipitation, ultrafiltration, affinity column); desulfonation buffer; and DNA recovery components. MSP (methylation-specific PCR) allows for assessing the methylation status of virtually any group of CpG sites within a CpG island, independent of the use of methylation-sensitive restriction enzymes (Herman et al. Proc. Nat. Acad. Sci. USA 93:9821-9826, 1996; U.S. Pat. No. 5,786,146). Briefly, DNA is modified by sodium bisulfite converting all unmethylated, but not methylated cytosines to uracil, and subsequently amplified with primers specific for methylated versus umnethylated DNA. MSP requires only small quantities of DNA, is sensitive to 0.1% methylated alleles of a given CpG island locus, and can be performed on DNA extracted from paraffin-embedded samples. Typical reagents (e.g., as might be found in a typical MSP-based kit) for MSP analysis may include, but are not limited to: methylated and unmethylated PCR primers for specific gene (or methylation-altered DNA sequence or CpG island), optimized PCR buffers and deoxynucleotides, and specific probes. The MCA technique is a method that can be used to screen for altered methylation patterns in genomic DNA, and to isolate specific sequences associated with these changes (Toyota et al., Cancer Res. 59:2307-12, 1999). Briefly, restriction enzymes with different sensitivities to cytosine methylation in their recognition sites are used to digest genomic DNAs from primary tumors, cell lines, and normal tissues prior to arbitrarily primed PCR amplification. Fragments that show differential methylation are cloned and sequenced after resolving the PCR products on high-resolution polyacrylamide gels. The cloned fragments are then used as probes for Southern analysis to confirm differential methylation of these regions. Typical reagents (e.g., as might be found in a typical MCA-based kit) for MCA analysis may include, but are not limited to: PCR primers for arbitrary priming Genomic DNA; PCR buffers and nucleotides, restriction enzymes and appropriate buffers; gene-hybridization oligos or probes; control hybridization oligos or probes.

Another method for analyzing methylation sites is a primer extension assay, including an optimized PCR amplification reaction that produces amplified targets for subsequent primer extension genotyping analysis using mass spectrometry. The assay can also be done in multiplex. This method (particularly as it relates to genotyping single nucleotide polymorphisms) is described in detail in PCT publication WO05012578A1 and US publication US20050079521A1. For methylation analysis, the assay can be adopted to detect bisulfite introduced methylation dependent C to T sequence changes. These methods are particularly useful for performing multiplexed amplification reactions and multiplexed primer extension reactions (e.g., multiplexed homogeneous primer mass extension (hME) assays) in a single well to further increase the throughput and reduce the cost per reaction for primer extension reactions.

Four additional methods for DNA methylation analysis include restriction landmark genomic scanning (RLGS, Costello et al., 2000), methylation-sensitive-representational difference analysis (MS- RDA), methylation-specific AP-PCR (MS-AP-PCR) and methyl-CpG binding domain column/segregation of partly melted molecules (MBD/SPM).

Additional methylation analysis methods that may be used in conjunction with the present invention are described in the following papers: Laird, P. W. Nature Reviews Cancer 3, 253-266 (2003); Biotechniques; Uhlmann, K. et al. Electrophoresis 23:4072-4079 (2002) - PyroMeth; Colella et al.

Biotechniques. 2003 Jul;35(l): 146-50; Dupont JM, Tost J, Jammes H, and Gut IG. Anal Biochem, Oct 2004; 333(1): 119-27; Tooke N and Pettersson M. IVDT. Nov 2004; 41; and the following published patents and patent applications: WO03080863A1, WO03057909A2, US2005/0153347, US20050009059A1, US20050069879A1, US20050064428A1, US20050064406A1, WO02086163C1, US20050019762A1, US6884586, WO04013284A2, US20050153316A1 and WO05040399A2.

Disease-Related Discovery

In one embodiment, presence or absence of one or more methylated or unmethylated nucleotides may be identified as indicative of a particular disease outcome associated with methylated or unmethylated DNA. In another embodiment, presence or absence of one or more methylated or unmethylated nucleotides may be identified as indicative of a normal, healthy or disease free state. In another embodiment, an abnormal ratio of methylated nucleic acid target gene molecules relative to unmethylated nucleic acid target gene molecules in a sample may be indicative of a particular disease outcome associated with methylated or unmethylated DNA. For example, a relatively high number or a relatively low number of methylated nucleic acid target gene molecules compared to the relative amount in a normal individual may be indicative of the presence of a disease state associated with methylated or unmethylated DNA. For example, a relatively high number or a relatively low number of methylated nucleotide loci compared to the relative amount in a normal individual can be indicative of the presence or absence of a disease state (e.g., cancer) associated with methylated or unmethylated DNA.

Disease-Related Analysis Increased or decreased levels of methylation have been associated with a variety of diseases.

Methylation or lack of methylation at defined positions can be associated with a disease or a disease outcome. The methods disclosed herein can be used in methods of determining the propensity of a subject to disease, diagnosing a disease, prognosing a disease and determining a treatment regimen for a subject having a disease. Diseases associated with a modification of the methylation of one or more nucleotides include, for example: leukemia (Aoki E. et al, "Methylation status of the pl51NK4B gene in hematopoietic progenitors and peripheral blood cells in myelodyyplastic syndromes", Leukemia 14(4):586-593 (2000); Nosaka, K. et al, "Increasing methylation of the CDKN2A gene is associated with the progression of adult T-cell leukemia", Cancer Res. 60(4): 1043- 1048 (2000); Asimakopoulos FA et al, "ABL 1 methylation is a distinct molecular event associated with clonal evolution of chronic myeloid leukemia" Blood 94(7):2452-2460 (1999); Fajkusova L. et al, "Detailed Mapping of Methylcytosine Positions at the CpG Island Surrounding the Pa Promoter at the bcr-abl Locus in CML Patients and in Two Cell Lines, K562 and BVl 73" Blood Cells MoI. Dis. 26(3): 193-204 (2000); Litz C. E. et al, "Methylation status of the major breakpoint cluster region in Philadelphia chromosome negative leukemias" Leukemia 6(1):35- 41 (1992))

The methylation state of a variety of nucleotide loci and/or nucleic acid regions are known to be correlated with a disease, disease outcome, and success of treatment of a disease, and also may be used to distinguish disease types that are difficult to distinguish according to the symptoms, histologic samples or blood or serum samples. For example, CpG island methylator indicator phenotype (CIMP) is present in some types of ovarian carcinomas, but not in other types (Strathdee, et al, Am. J. Pathol. 158:1121- 1127 (2001)). In another example, methylation may be used to distinguish between a carcinoid tumor and a pancreatic endocrine tumor, which may have different expected outcomes and disease treatment regimens (Chan et al, Oncogene 22:924-934 (2003)). In another example, H. pylori dependent gastric mucosa associated lymphoid tissue (MALT) lymphomas are characterized as having several methylated nucleic acid regions, while those nucleic acid regions in H. pylori independent MALT lymphomas are not methylated Kaneko et al, Gut 52:641-646 (2003)). Similar relationships with disease, disease outcome and disease treatment have been correlated with hypomethylation or unmethylated nucleic acid regions or unmethylated nucleotide loci.

Methods related to the disease state of a subject may be performed by collecting a sample from a subject, treating the sample with a reagent that modifies a nucleic acid target gene molecule sequence as a function of the methylation state of the nucleic acid target gene molecule, subjecting the sample to methylation specific amplification, then detecting one or more fragments that are associated with a disease outcome (measured as survivability). In another embodiment, the fragments are detected by measuring the mass of the nucleic acid target gene molecule or nucleic acid target gene molecule fragments. Detection of a nucleic acid target gene molecule or nucleic acid target gene molecule fragment can identify the methylation state of a nucleic acid target gene molecule or the methylation state of one or more nucleotide loci of a nucleic acid target gene molecule. Identification of the methylation state of a nucleic acid target gene molecule or the methylation state of one or more nucleotide loci of a nucleic acid target gene molecule can indicate the propensity of the subject toward one or more diseases, the disease state of a subject, likelihood of survival or an appropriate or inappropriate course of disease treatment or management for a subject.

Applications of Prognostic and Diagnostic Results to Pharmacogenomic Methods

Pharmacogenomics is a discipline that involves tailoring a treatment for a subject according to the subject's genetic profile (e.g., genotype, methylation state or characteristic methylation state). For example, based upon the outcome of a prognostic test described herein, a clinician or physician may target pertinent information and preventative or therapeutic treatments to a subject who would benefit by the information or treatment and avoid directing such information and treatments to a subject who would not be benefited (e.g., the treatment has no therapeutic effect, the subject experiences adverse side effects, and/or the treatment poses unnecessary risks given the prognosis).

The following is an example of a pharmacogenomic embodiment. A particular treatment regimen can exert a differential effect depending upon the subject's characteristic methylation state. Where a candidate therapeutic response is correlated with a given methylation state, a therapeutic typically would not be administered to a subject determined to have a methylation state that correlates with a poor response, and conversely may be administered to a subject determined to have a methylation state that correlates with a positive response. In another example, where a candidate therapeutic is significantly toxic (e.g., a chemotherapeutic agent) when administered to subjects, a subject with a good prognosis may be willing to endure the adverse effects and risks associated with the toxic therapeutic more so than a patient with a poor prognosis that is unlikely to survive regardless of the therapeutic administered. The methods described herein are applicable to pharmacogenomic methods for preventing, alleviating or treating cancer. For example, a nucleic acid sample from an individual may be subjected to a prognostic test described herein. Where a methylation state or characteristic methylation state that is predictive of cancer outcome is identified in a subject, information for preventing or treating cancer and/or one or more cancer treatment regimens then may be prescribed to that subject.

In certain embodiments, a treatment or preventative regimen is specifically prescribed and/or administered to individuals who will most benefit from it based upon their likelihood of survival assessed by the methods described herein. Thus, provided are methods for determining a prognosis for cancer patients and then prescribing a therapeutic or preventative regimen to individuals according to their prognosis.

Pharmacogenomics methods also may be used to analyze and predict a response to a cncer treatment or a drug. For example, if pharmacogenomics analysis indicates a likelihood that an individual will respond positively to a cancer treatment with a particular drug or combination of drugs, the drug(s) may be administered to the individual. Conversely, if the analysis indicates that an individual is likely to respond negatively to treatment with a particular drug or combination of drugs, an alternative course of treatment may be prescribed. A negative response may be defined as either the absence of an efficacious response or the presence of toxic side effects. The response to a therapeutic treatment can be predicted in a background study in which the methylation state of subjects in any of the following populations is determined: a population that responds favorably to a treatment regimen, a population that does not respond significantly to a treatment regimen, and a population that responds adversely to a treatment regiment (e.g., exhibits one or more side effects). These populations are provided as examples and other populations and subpopulations may be analyzed. The tests described herein also are applicable to clinical drug trials. A subject's diagnosis or prognosis may be determined using the methods described herein. Thereafter, subjects with a poor prognosis or diagnosis of an aggressive form of cancer may choose to participate in clinical trials that may increase their probability of survival but have unknown or high-risk side effects; whereas subjects with a good prognosis or diagnosis of an less aggressive form of cancer may choose to undergo treatments that have higher success rates but expose the subject to adverse side effects. Alternatively, subjects with a good prognosis or diagnosis of an less aggressive form of cancer might choose to enroll in a clinical trial for a treatment which decreases a risk of relapse or a clinical trial with known or low-risk side effects.

Also provided herein is a method of partnering between a diagnostic/prognostic testing provider and a provider of a consumable product, which comprises: (a) the diagnostic/prognostic testing provider determines a subject's prognosis; (b) the diagnostic/prognostic testing provider forwards information to the subject about a particular product which may be obtained and consumed or applied by the subject given their prognosis; and (c) the provider of a consumable product forwards to the diagnostic test provider a fee every time the diagnostic/prognostic test provider forwards information to the subject as set forth in step (b) above. Results from prognostic tests may be combined with other test results to diagnose cancer. For example, prognostic results may be gathered, a patient sample may be ordered based on a determined predisposition to cancer, the patient sample is analyzed, and the results of the analysis may be utilized to diagnose cancer. Also cancer diagnostic methods can be developed from studies used to generate prognostic/diagnostic methods in which populations are stratified into subpopulations having different progressions of cancer. In another embodiment, prognostic results may be gathered; a patient's risk factors for developing cancer analyzed (e.g., age, race, family history); and a patient sample may be ordered based on a determined predisposition to cancer. In an alternative embodiment, the results from predisposition analyses described herein may be combined with other test results indicative of cancer, which were previously, concurrently, or subsequently gathered with respect to the predisposition testing. In these embodiments, the combination of the prognostic test results with other test results can be probative of cancer, and the combination can be utilized as a cancer diagnostic. The results of any test indicative of cancer known in the art may be combined with the methods described herein.

Examples of such tests are provided herein. For breast cancer, they include mammography (e.g., a more frequent and/or earlier mammography regimen may be prescribed); breast biopsy and optionally a biopsy from another tissue; breast ultrasound and optionally an ultrasound analysis of another tissue; breast magnetic resonance imaging (MRI) and optionally an MRI analysis of another tissue; electrical impedance (T-scan) analysis of breast and optionally of another tissue; ductal lavage; nuclear medicine analysis (e.g., scintimammography); BRCAl and/or BRCA2 sequence analysis results; and thermal imaging of the breast and optionally of another tissue. Testing may be performed on tissue other than breast to diagnose the occurrence of metastasis (e.g., testing of the lymph node).

Risk of cancer sometimes is expressed as a probability, such as an odds ratio, percentage, or risk factor. The risk is based upon the methylation status of a target nucleic acid as described herein, and also may be based in part upon phenotypic traits of the individual being tested. Methods for calculating predispositions based upon patient data are well known (see, e.g., Agresti, Categorical Data Analysis, 2nd Ed. 2002. Wiley). Allelotyping and genotyping analyses may be carried out in populations other than those exemplified herein to enhance the predictive power of the prognostic method. These further analyses are executed in view of the exemplified procedures described herein, and may be based upon the same polymorphic variations (e.g., methylation sites) or additional polymorphic variations. Risk determinations for cancer are useful in a variety of applications. In one embodiment, cancer risk determinations are used by clinicians to direct appropriate detection, preventative and treatment procedures to subjects who most require these.

Combinations and Kits

In another embodiment, there are provided prognostic or diagnostic systems, typically in combination or kit form, containing a reagent that modifies one or more nucleotides of the nucleic acid target gene molecule as a function of the methylation state of the nucleic acid target gene molecule, such as bisulfite; one or more methylation specific primers for specifically hybridizing to a reagent-treated nucleic acid target gene molecule, such as one or more methylation specific PCR primers; and one or more compounds for fragmenting amplified nucleic acid target gene molecule, such as RNases, including RNase A or RNase Tl. A kit also may include the appropriate buffers and solutions for performing the methylation identification methods described herein. For example, a kit can include a glass vial used to contain milligram quantities of a primer or enzyme. A kit also may include substrates, supports or containers for performing the methylation identification methods, including vials or tubes, or a mass spectrometry substrate such as a Sequenom SpectroCHIP substrate.

EXAMPLES The following Examples describe a novel technique that uses base-specific cleavage of amplification products and Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight Mass Spectrometry (MALDI-TOF MS) to perform large scale quantitative DNA methylation analysis across a set of candidate genes (n=147). This method led to the identification of clinically relevant AML subclasses, while highlighting methylated genes of potential pathogenic relevance. Also described is a methylation-based outcome predictor derived from AML-associated promoter methylation patterns that provide a basis for improved outcome prediction in AML. Example 1 Bisulfite Treatment of a Nucleic Acid Target Gene Region

Bisulfite treatment of genomic DNA was performed with a commercial kit from Zymo Research Corporation (Orange, CA) that combines bisulfite conversion and DNA clean up. The kit follows a protocol from Paulin, R. et al. in Nucleic Acids Res. 26:5009-5010, 1998. Briefly, in this protocol 2 μg of genomic DNA is digested with a restriction endonuc lease (EcoRl), then denatured by the addition of 3 M sodium hydroxide and incubated for 15 min at 37°C. A 6.24 M urea/2 M sodium metabisulfite (4 M bisulfite) solution is prepared and added with 1 OmM hydroquinone to the denatured DNA. The corresponding final concentrations are 5.36 M, 3.44 M and 0.5 mM respectively. The reaction is performed in a 0.5 ml tube overlaid with mineral oil. This reaction mix is repeatedly heated between 55°C for 15 min and 95°C for 30 s in a PCR machine (MJ Tetrad) for 20 cycles. DNA purification was done using the commercially available GENECLEAN kit from Q-biogene.

All bisulfite-based methylation analysis methods suffer from a considerable amount of measurement variability introduced by the chemical treatment of genomic DNA (e.g., bisulfite treatment leading to DNA degradation) (Ehrich, M., ZoIl, S., Sur, S. & van den Boom, D. Nucleic Acids Res (2007)). To determine if the variability is applicable to the model system used in this study, duplicate measurements of four control DNAs was performed in 96 amplicons and sufficient data stability was observed (R squared = 0.96, Figure 6A). The effect of primer design on the quantitative measurements was also analyzed by designing two different but overlapping amplification regions for the ERBB2 gene. The quantitative values from both reactions were almost identical and showed a high correlation (R- Squared 0.96).

Example 2 PCR and in vitro Transcription of a Nucleic Acid Target Gene Region

The IGF2/H19 gene region (Human Genome Chromosome 11 : 1,983,678-1,984,097) serves as an exemplary gene to demonstrate the effectiveness and feasibility of the methylation analysis methods disclosed herein. The IGF2/H19 region provides an ideal test case because of its hemi-methylated status. In a hemi-methylated region, the paternal allele is usually silenced by methylation, which results in an ideal 50/50 ratio. The presence of an expected 50/50 ratio validates the approach. As the following Examples demonstrate, this is in fact the case, and the methods used to analyze IGF2/H19 were applied to the AML target genes disclosed herein.

IGF2/H19 was PCR-amplified from bisulfite treated human genomic DNA using primers that incorporate the T7 [5'-CAG TAA TAC GAC TCA CTA TAG GGA GA] promoter sequence. Two sets of primers were designed to incorporate the T7 promoter sequence either to the forward (5'-CAG TAA TAC GAC TCA CTA TAG GGA GAA GGC TGT TAG TTT TTA TTT TAT TTT TAA T-3'; 5'-AGG AAG AGA GAA CCA CTA TCT CCC CTC AAA AAA-3') or to the reverse (5'-AGG AAG AGA GGT TAG TTT TTA TTT TAT TTT TAA T-3'; 5'-CAG TAA TAC GAC TCA CTA TAG GGA GAA GGC TAA CCA CTA TCT CCC CTC AAA AAA-3') strand. Alternatively the derived PCR product was cloned into a pGEM-T vector system (Promega, Madison, WI) and re-amplified from the cloned DNA. The PCR reactions were carried out in a total volume of 5 μl using 1 pmol of each primer, 40 μM dNTP, 0.1U Hot Star Taq DNA polymerase (Qiagen, Valencia, CA), 1.5 mM MgCl₂ and buffer supplied with the enzyme (final concentration Ix). The reaction mix was pre-activated for 15 min at 95 ⁰C. The reactions were amplified in 45 cycles of 95 ⁰C for 20 s, 62 ⁰C for 30 s and 72 ⁰C for 30 s followed by 72 ⁰C for 3 min. Unincorporated dNTPs were dephosphorylated by adding 1.7ul H2O and 0.3 U Shrimp Alkaline

Phosphatase. The reaction was incubated at 37°C for 20 min and SAP was then heat-inactivated for 10 minutes at 85°C.

Typically, two microliters of the PCR reaction were directly used as template in a 4 μl transcription reaction. Twenty units of T7 R&DNA polymerase (Epicentre, Madison, WI) were used to incorporate either dCTP or dTTP in the transcripts. Ribonucleotides were used at 1 mM and the dNTP substrate at 2.5 mM; other components in the reaction were as recommended by the supplier. Following the in vitro transcription, RNase A (SEQUENOM, San Diego) was added to cleave the in vitro transcript. The mixture was then further diluted with H₂O to a final volume of 27 μl. Conditioning of the phosphate backbone prior to MALDI-TOF MS was achieved by the addition of 6 mg CLEAN Resin (SEQUENOM Inc., San Diego, CA).

Example 3 Mass Spectral Measurements of Transcribed Nucleic Acid Target Gene Region

Conditioning of the phosphate backbone was achieved by the addition of 6mg CLEAN Resin

(Sequenom Inc., San Diego, CA) to the transcription sample. A 15 nl aliquot of the cleavage reaction was robotically dispensed onto a silicon chip preloaded with matrix (SpectroCHIP; Sequenom Inc., San Diego, CAI. Mass spectra were collected using a MassARRAY mass spectrometer (Bruker- SEQUENOM). Spectra were analyzed using proprietary peak picking and spectra interpretation tools (Little, et al. Nat Med 3: 1413-6 (1997)).

Example 4

Identification of Methylation Sites in IGF2/H19

The difference in the mass spectra results from a C-specific cleavage reaction of the forward transcript may be seen in Figure 1. The mass spectrum derived from the methylated template shows signals corresponding to the expected methylation sites. In this spectra each mass signal represents at least two CpG sites (cleavage at the beginning of the fragment and at the end) and two cleavage products therefore represent each methylated CpG site. The non-methylated template creates a mass spectrum that is devoid of any sequence/methylation associated signals. Figure 1 displays mass signals generated by cytosine specific cleavage of the forward transcript of the IGF2/H19 region (upper spectral analysis is the methylated template; lower spectral analysis is the non-methylated template). Methylation of the target sequence results in the generation of rCTP-containing transcripts; every methylated CpG is represented in the transcript by a cleavage site. Each of the cleavage products is labeled with a number, which indicates the CpG position in the template. These numbers can be cross-referenced with the cleavage products listed in Tables 4 and 5. The non-methylated target sequence does not contain cytosine and therefore does not contain cleavage sites. Mass signals are labeled with letters and the corresponding explanations are listed in Figure l(B). A full list of expected cleavage products illustrates the predicted difference between methylated and non-methylated template. Predicted mass signals 12 and 13 are not found in the experimental spectrum, because the corresponding CpGs 23 and 24 are not methylated which results in concatenation of fragment 5167 and 12616 in a much larger fragment that can not be detected. The below tables show the cleavage products of mass signals generated by cytosine specific cleavage of the forward transcript of IGF2/H19 in both the methylated (Table 4) and non-methylated (Table 5) transcript sequences.

TABLE 4

TABLE 5

Cleavage product characterization legend:

MAIN = regular cleavage product

OOMR = out of mass range (molecular mass either too low or too high to be detected within the automated data acquisition)

DBLC = double charged molecular ion species (at half mass of parent molecular ion)

ACYC = Abortive cycling (incomplete transcription products generated during the first 10 nt of transcription)

All masses below 1300 Da cannot be detected reliably in the chosen mass window. The mass signal labeled A is a doubly charged molecular ion E. Mass signals labeled B and D represent so called abortive cycling products. Abortive cycling is the premature" termination during the transcribtioon process while the polymerase has still formed the initiation complex and has not yet reached the more stable elongation complex. During that phase the transcribtin might occasionally be terminated without generating a full lenght transcribt. Mass signals labeled C and E are expected main signals generated by cleavage of the transcription product.

The reactions described above provide ideal mass signal patterns that are well suited to identify methylation in mixtures that contain methylated DNA in a fraction as low as 5%, without selective PCR amplification. Figure 2 is an overlay of mass signal patterns generated by cytosine specific cleavage of the forward transcript of the IGF2/H19 region. In the depicted case, the template used for PCR amplification consisted of a mixture of methylated and non-methylated DNA. Mass spectra reveal increasing signal intensity of cleavage products with increasing amount of methylated template DNA. Methylation specific mass signals can be detected in mixtures containing as little as 5% methylated DNA. Example 5 Statistical Methods

Base-specific cleavage reactions also can be used in determination of methylation ratios. For example, methylation induced C/T changes on the forward strand are represented as G/A changes on the complementary strand. These changes lead to a mass shift of 16Da (G/A mass shift) or multiples thereof, when multiple CpGs are enclosed in one cleavage product. In reactions where methylation results in a mass shift of nucleic acid target gene molecule fragments, one fragment represents the methylated template and a second fragment represents the non-methylated template. The intensities of the measured masses of these fragments can be compared to determine the ratio of methylated vs. non-methylated nucleic acid target gene molecules. Also, the base composition of the measured fragments differs only by one or a few nucleotides, which assures equal desorption and ionization behavior during MALDI-TOF measurement. Methods for intensity estimation of mass measurements such as "area-under the peak" and "signal to noise" can yield similar results. Depending on the sequence of the nucleic acid target gene molecule, multiple signal pairs can be used in determining the ratio between signal intensities. This information can be used to assess the degree of methylation for each CpG site independently, or, if all CpG sites are methylated approximately to the same degree, to average the methylation content over the complete target region. A direct correlation between signal intensity ratios and the ratio of the deployed DNAs can be determined for ranges of 10%-90% of methylated template. If the ratio between methylated and non-methylated template is below 10% or exceeds 90%, the signals that represent the lower amount of template can still be detected, but the quantitation can be subject to higher error.

All statistical analyses were performed using the R statistical environment . Distances from gene start sites were calculated using the 'RMySQL' package and the SQL database version of the UCSC genome browser (genome-mysql.cse.ucsc.edu). Two dimensional clustering has been performed using the 'heatmap.2' function in the 'gregmisc' package. Classical multidimensional scaling has been performed using the 'cmdscale' function and visualization was done through the 'scatterplot3d' function is the same named package. Tests for statistical significance (t-test, Wilcox Test and Fisher's exact test) were performed using the standard function in R build into the 'stats' package. For sequence motif detection, a permutation based method was used. N sequences from the pool of all analyzed sequences were randomly sampled, where N is equal to the number of sequences in the low or high methylation group, respectively. It was then determined how often every possible 6mer (n=4096) is present in the sampled subset. One thousand permutations were performed for each analysis. A sequence motif was identified as being overrepresented if it occurred more often in the analyzed group of sequences than in any of the 1000 random draws.

Graphical representation of the gene tissue relationships was performed using the 'dot' algorithm implemented in Graphviz.

Example 6 Methylation Ratio Analysis

Determination of methylation ratios is enabled by a different base-specific cleavage reaction.

Methylation induced C/T changes on the forward strand are represented as G/A changes on the reverse strand. Since cleavage schemes were restricted to C- and T-specific cleavage, methylation events led to a mass shift of 16Da (G/A mass shift) or a multitude thereof when multiple CpGs are enclosed in one cleavage product. The signal pair shown in Figure 3 demonstrates this. Figure 3 is an overlay of mass spectra generated by uracil specific cleavage of the reverse transcript of the IGF2/H19 region. Cleavage products derived from the methylated template contain rGTP at every position where the Cytosine of the forward strand was methylated. In contrast, the bisulfite conversion of non-methylated Cytosine to Uracile results in incorporation of rATP on the reverse strand. This 16Da difference between rGTP and rATP, or a multitude thereof when several CpGs are embedded in one cleavage product, can be detected unambiguously. The calculation of the area under the curve of mass signals specific for methylated and non-methylated template can be used to determine the ratio between methylated and non-methylated DNA used for amplification.

The cleavage product derived from the non-methylated template (CGCAACCACT) was detected at 3132 Da while its equivalent derived from the methylated template (CAC AACCACT) can be found at 3148 Da.

Reactions where one signal represents the methylated template and a second signal represents the non-methylated template can be used to determine the ratio of methylated vs. non-methylated template by comparing their signal intensities. The nucleotide composition of the measured fragments differs only by a single nucleotide, which ensures equivalent desorption and ionization behavior during MALDI-TOF measurement. Depending on the reference sequence of the target region, multiple signal pairs are available for determining the ratio between signal intensities. This information can be used to assess the degree of methylation for each CpG site independently or, if all CpG sites are methylated approximately to the same degree, to average the methylation content over the complete target region.

A direct correlation can be seen between signal intensity ratios and the ratio of the deployed DNAs. The span of linearity of this correlation ranged from 10 % - 90 % of methylated template. The average standard deviation of the investigated concentrations was approximately 3%, with higher standard deviations towards both ends of the scale. If the ratio between methylated and non-methylated template is below 10% or exceeds 90%, the signals that are representing the lower amount of template can still be detected, but the intensity of signal does not correlate exactly to the actual ratio anymore.

Example 7

Methylation Pattern Analysis of IGF2/H19

The capability of base specific cleavage to determine the methylation status of each and every CpG within a given target region was determined. As outlined above, the C-specific forward reaction incorporates a cleavage nucleotide for each methylated CpG within the amplicon. The resulting cleavage products represent the existence of two cleavage nucleotides (exception: first and last fragment) or in this case two methylated Cs. Given the current limitations of MALDI-TOF instrumentation, a practical mass window ranges from around 1000 Da to 10000 Da. In this mass window, cleavage products with a length around 4 to 30 nucleotides can be detected. When the distance between two methylated cytosines becomes smaller or larger than this range, the resulting mass of the cleavage product might be too high or too low to be detected under standard conditions. The analysis of a single reaction still allows determining the methylation status of approximately 75% (depending on the reference sequence) of all CpG sites within the amplified nucleic acid molecule. To obtain information about all CpG sites, a set of four reactions were performed: C- and T-specific cleavage of the forward and reverse transcription product. This combination enables base-specific cleavage after each nucleotide (C-specific cleavage on the reverse strand equals G-specific cleavage on the forward strand; T-specific cleavage on the reverse strand equals A-specific cleavage on the forward strand). The combined information from these four cleavage reactions allows compilation of the exact methylation pattern. For the IGF2/H19 region described here, two reactions were sufficient to obtain the methylation status for each CpG site. However using four reactions provides the advantage of information redundancy. In this system 92% of all CpG sites were represented by more than one signal. This means that each methylation event is independently confirmed by more than one observation in one or more reactions. This redundancy is an important aspect in potential diagnostic use. Figure 4 is a mass spectra representing all four base-specific cleavage reactions of the IGF/H19 amplicon. Numbers correspond to the CpG positions within this target region. Arrows point at the mass signals that indicate the presence of a methylated cytosine at the marked position. All methylated CpG's in the selected region were identified by one or more mass signals. Approximately 75% were identified by more than two mass signals.

The methylation pattern of the IGF2/H19 imprinted region in adult blood samples confirmed the segregation into methylated and non-methylated template strands reported by Vu et al. (Genomics 64(2):p.29331-40, 1999). Out of the 24 clones analyzed, 13 (54%) could be identified as methylated and 11 (46%) as non-methylated. No sequence changes were observed. Vu et al. (supra) showed by dideoxy sequencing of bisulfite treated DNA that 25 out of the 26 CpG sites within the amplicon are methylated. The only non-methylated CpG was found at position 470. The results indicated a slightly different methylation pattern in the studied sample DNA, where all CpG sites were methylated. The data also confirmed methylation of the CpNpG site at position 347. Due to the variability in individual methylation patterns, which have been observed by other groups, minor differences are anticipated. The results demonstrate the capability of the method to discriminate methylated and non- methylated DNA nucleic acid target gene regions simultaneously and to reconstruct the exact methylation pattern. In order to support this contention, bisulfite treated genomic DNA was analyzed directly. The produced mass signal spectra showed signal patterns that are representative for the methylated template as well as those that are characteristic for the non-methylated template. The signal intensities for methylation-specific signals and non-methylation-specific signals were compared and the 50/50 ratio expected for hemi-methylated DNA, as in control blood samples, was confirmed. Figure 5 is a mass Spectra generated by uracil specific cleavage of the reverse transcript of the IGF2/H19 region. Genomic DNA was used for amplification. Dotted lines mark the position of mass signals representing non- methylated CpG's. Signals with 16 Da shift (or a multitude thereof) represent methylation events. The area-under-the-curve ratio of methylated versus non-methylated template approximates to 1, as one expects for hemi-methylated nucleic acid target gene regions. This indicates an unbiased amplification of methylated and non-methylated template for the analyzed region and validates our semi-quantitative capabilities.

Example 8

Analysis of Methylation in Cancer Samples More than 400 genes were included in a high resolution scan to investigate DNA methylation changes in cancer samples. The genes were selected to include a majority of cancer consensus genes as described by Futreal and colleagues and a subset of known imprinted genes (Futreal, P. A. et al. Nat Rev Cancer 4, 177-83 (2004)). This set of genes contained more than 70 genes with associated PRC2 binding sites. All genes were analyzed in 59 cell lines derived from 9 different tumor types and control DNA from 6 normal tissues. The cancer cell lines are compiled by the National Cancer Institute (NCI) as the NCI-60 panel.

DNA methylation status was determined by gene specific amplification of bisulfite treated DNA followed by in vitro transcription and MALDI TOF analysis (Ehrich, M. et al. Proc Natl Acad Sd USA 102, 15785-90 (2005)). The amplification regions were primarily designed to cover CpG Islands (CGI) overlapping with the 5'UTR of the target genes. When no CGI was annotated in close proximity, the sequence directly surrounding the 5'UTR was used for primer design. If no CpG dinucleotides were found in this region, the upstream flanking CGI was used.

The initial methylation data was filtered to exclude poor quality measurements. Poor quality was defined as amplicons with data available for less than 75% of all samples. These regions were excluded from further analysis. The filtered dataset contained 531 amplification regions, representing 430 genes. For excluded amplicons, PCR was identified as the leading cause of reaction failure.

All autosomal chromosomes and the X-chromosome are represented in the current gene set. The median amplicon length was 413 bp (range = 171 bp to 683 bp) and the median CpG content per amplicon was 33 CpG / amplicon (range = 6 CpG to 81 CpG per amplicon). For each sample a total of 11723 CpG sites were analyzed in 7216 CpG Units. See Table 11. A CpG Unit is defined as an analytical Unit which encompasses one or multiple subsequent CpG sites.

DNA methylation was analyzed in 59 cell lines comprised in the NCI-60 panel and used 6 commercially available DNA's from adult tissues to represent 'healthy' control samples. The resulting data provides a comprehensive panel of cancer-related DNA methylation changes and can be integrated with previous datasets on mutational, transcriptional and proteomic profiles to obtain additional information about neoplastic transformations. Clinical Colon Cancer Tissue

Tissue colorectal cancer and normal colon tissue samples were obtained from the Royal Melbourne Hospital (RMH) Tissue Bank; as part of the Ludwig Colon Cancer Initiative biomarker project. Samples were obtained with informed consent under the approved protocol from patients having a resection for histologically confirmed colorectal cancer and the normal matched tissue was obtained from the same resection specimen at a site adjacent to the tumor.

Tissue samples were snap frozen in liquid nitrogen within 30 minutes of collection and stored in a minus 80 degrees Celsius freezer. Matched tissue sample pairs were cut into 2mm cube sections weighing approximately 10-15mg. After manual dissection, DNA was extracted from the tissue sections using a QIAGEN DNeasy® Blood and Tissue Kit. Briefly, samples were first lysed using a Proteinase K digestion for 3 hours at 56 degrees Celsius followed by selective binding of DNA to a membrane; final steps involving a spin-column procedure allowed for the washing and subsequent elution of DNA with precipitated DNA resuspended in a buffer solution. DNA was quantified using a biophotometer, with A₂₆o/A₂₈o ratio in the range 1.7- 2.0. DNA samples were normalized to a concentration of 50μg/ml. Methods

Amplification of bisulfite treated DNA was performed as described in Examples 1 and 2 using the primers described in Table 10. Some of the regions have more than 1 set of primers because more than one amplicon in that region was amplified. The PCR primers locations are provided in Table 10. To derive the primer nucleotide sequence (in a 5'-3' direction) from Table 10, the following steps can be performed:

For the left primer:

1) convert the genomic sequence to bisulfite modified genomic sequence (changing every C to T). 2) Take x (x refers to the number given in the table under Left primer length) nucleotides from the 5' end (left end) of the bisulfite modified genomic sequence. 3) (for hMC) attach lOmer tag to 5' end. For the right primer

1) convert the genomic sequence into its reverse complement sequence (rcGS), including directional change so that the right (3') end of the genomic sequence becomes the left (5' end ) of the rcGS.

2) convert the rcGS into bisulfite treated sequence (changing every C to T)

3) Take x (x refers to the number given in the table under Right primer length) nucleotides from the 5' end of the bisulfite modified rcGS. 4) (for hMC) attach T& tag to the 5' end.

Methylation data distribution for cell lines The analyzed target regions were primarily designed within annotated CpG islands, which are frequently unmethylated in normal tissue. The data presented herein is in agreement with previous findings for normal tissue samples and confirms that CGIs with a CpG density >10% are generally unmethylated (<15% methylation)¹⁷. (Figure 6B). In the normal tissue samples, 76% of all amplicons showed mean methylation values below 15% methylation; only 5% of amplicons showed methylation levels above 85% methylation, and the remaining 19% of amplicons showed methylation levels between 15% and 85%.

In the cancer cell lines, a shift towards medium (between 20 and 80%) but not high (>80%) methylation levels was observed. Here, only 49% of all amplicons show methylation levels below 15% methylation, 2% show methylation levels above 85%, but 49% showed methylation between 15 and 85%. Differences in the distribution of CGI methylation between normal samples and cancer cell lines were further assessed by grouping the analyzed genomic regions into 10 bins based on methylation value deciles. All bins containing genes with methylation values between 20 and 80 % showed a 2 - 4 fold increase in the number of amplicons allocated to this bin in cancer samples. The groups below 20% and above 80% contained fewer amplicons in cancer samples compared to normal samples (Figure 6C).

These findings prompt the question whether it is more likely for a non-methylated gene to become hypermethylated or for a methylated gene to become hypomethylated in cancer. Methylation differences were analyzed for each amplicon between the group of normal and the group of cancer samples; this resulted in a distribution that is skewed towards higher methylation differences in cancer cell lines. Thus hypermethylation of CGIs is the most frequent event in the analyzed cancer samples (Figure 6D). However, analyzing the methylation differences for the group of amplicons that are methylated at low (< 20%) or high (>80%) levels in normal samples, it is just as likely to observe hypermethylation of low-methylated amplicons as it is to find hypomethylation of highly methylated amplicons (p value = 1, fishers exact test). Given that amplicon-specific methylation changes might differ significantly between tumor types, each type of tumor cell line was analyzed individually. The results differ slightly for each tumor type, but in general the analysis confirms previous findings on an individual level. (See Tables 6 and 7).

Recent studies have shown a decrease of epigenetic marking in a lkb window around the transcription start site. In active fly promoters, histone occupancy is decreased, while in Drosophila melganogaster and in normal human tissue samples, DNA methylation is reduced within this core region (Eckhardt, F. et al. Nat Genet 38, 1378-85 (2006)) and (Mito, Y., Henikoff, J.G. & Henikoff, S. Nat Genet 37, 1090-7 (2005)). To further investigate this relationship, the distance from the 5'-UTR for each measured CpG in the dataset (>700,000 datapoints) was mapped. CpG methylation in normal samples showed the expected core window of unmethylated CpG sites within lkb around the 5'-UTR (Figure 6E). In cancer cell lines, methylation averages are generally elevated, but the same symmetrical methylation decrease is observed. Thus these results confirm previous findings and expand their applicability to cancer cell lines.

Sequence motif confirmation and new motif detection

A set of CG-rich sequence motifs has been reported to be enriched in non methylated CGIs (Das, R. et al. Proc Natl Acad Sci USA 103, 10713-6 (2006)). The amplicons of the invention were divided into sequences with low (<20%, n = 300) and high (>80%, n = 16) average methylation. The non methylated amplicons comprised significantly more of said sequence motifs (p < 0.001, Wilcox Test, mean in the low methylation set = 22, mean in the high methylation set = 8).

Next the amplicons were split into two groups based on the observed differences in DNA methylation between normal and cancer cell line samples. The group with high average methylation differences (ΔM >20%; p-value< 0.001, two sided t-test) comprised 89 sequences and the group with low differences comprised 121 sequences (ΔM <5%; p-value>0.05, two sided t-test). During amplicon design, target sequences were preferably selected to be located in CG rich regions; therefore, there is not an even distribution of sequence motifs. As a result, a permutation method was used to compare the distribution of 6mer oligonucleotides and to identify sequence motifs which are overrepresented in one of the two sets. Five sequence motifs were overrepresented in the set of non-differentially methylated sequences (ATACCG, ATACTA, ATAGAT, TATACT, TCATGG). A smaller set of four motifs were enriched in the differentially methylated genes (GACCTG, GCCAAG, GTCCCA, TTGAAG). The two sequence motif sets were annotated. Based on the annotations, binding sites for Sp 1 , a zinc finger transcription factor, were found to be enriched in the set of non-differentially methylated sequences (p value = 0.001, fishers exact test). Confirmation of cell line results in colon cancer samples

To verify validity of the cell line-based results in clinical samples, the colon cancer cell line model was run in clinical samples. A set of 50 genes was selected that showed significant differential methylation (ΔM> 20%, p value <0.001, two sided t-test) in the colon cancer cell lines. To assess the specificity of our finding, 14 genes were selected that did not show any cell line methylation differences. The methylation status of these genes was investigated in 48 matched sample pairs of colon cancer tissue and adjacent normal colon tissue. The majority of patients were male (M=30, F=I 8); the median age at diagnosis was 65 years (range 46 to 83). Fourteen patients had experienced local or distant caner recurrence and all stages I to IV were evenly represented. The analysis of methylation differences between the normal and cancer tissue samples confirmed the previous cell line findings for the majority of genes. In the set of differentially methylated genes, 42 out of 50 (84%) genes were found to be significantly differentially methylated in the clinical tissue samples. Additionally, all 14 genes that did not show a methylation difference were still not differentially methylated in the clinical samples.

Next, the methylation patters were used to characterize relationships among the colon cancer samples, and to explore potential associations with their clinical features. None of the clinical features (e.g. , age, stage, recurrence, location) showed a strong correlation to the resulting colon cancer methylation groups. The degree of similarity between methylation patterns derived from cell line samples and their tissue counterparts was explored using hierarchical clustering. As expected, the normal tissues grouped with the normal colon tissue samples and the colon cancer cell lines grouped with the colon cancer tissue samples. However, the segregation of normal and colon cancer tissue samples was not complete. A subset (n = 10) of colon cancer samples was found in the group of normal tissue samples. In general, methylation differences between normal and cancer were larger in the cell line model compared to the clinical tissue samples. This effect might be caused by heterogeneity of the clinical specimens, which also include, for example, connective tissue or white blood cells. An additional explanation might be amplification of the methylation difference during cell culturing.

Finally, the findings were compared to results from a recent methylation study of colon cancer tumors that analyzed DNA methylation with a different technology (Weisenberger, DJ. et al. Nat Genet 38, 787-93 (2006)). A total of 38 genes were shared by both datasets. The results of both studies are in good agreement (92% concordance). Nine genes were found to be hypermethylated in colon cancer in both datasets, and 26 genes showed no colon cancer specific methylation in both datasets. Two genes were identified as hypermethylated only in the previous study, and one gene was found to be hypermethylated only in the present study. All together, the findings indicate that results obtained from the cell line model are to a high degree reproducible and applicable to clinical tissue samples. Assigning classes to significantly differentially methylated genes

Genes were identified that were differentially methylated between each type of cancer cell line and normal tissues. The individual groups were examined to determine which genes overlap in multiple cancer types and which are found in specific tumor types only. Because several genetic loci were tested in many separate runs (one for each cell line), the results will contain false positives that arise from multiple testing in high dimensional datasets. Although, not completely accounting for the issue in its entirety, a minimum difference of 20% as an additional selection criterion was implemented to filter out false positives.

A total of 71 genes were statistically significantly hypermethylated in at least one tumor type. A large fraction of these genes (n= 30, 42%) were found only in one tumor type, and nearly 10 percent were found in more than 5 tumor types (found in 5: TSPYL, PAX8, LEP, PHOX2B, TMPRSS2, found in 6: MYODl, found in 8: PAX5 ). Table 6 and Table 7 provide gene targets that hypermethylated or hypomethylated in cancer cell lines, respectively. In Tables 6 and 7, a "1" indicates an association with a p-value less than 0.05.

TABLE 6: HYPERMETHYLATED TARGET GENE REGIONS

Seven genes were hypomethylated (found in 1 tumor type: TCLlA, SLC22A2, TRPM5, IGF2, PEG3; found in 2 KCNQl, DLKl). As indicated in previous analysis, CNS neoplasm's (n=4) and Melanoma (n=3) had the highest number of hypomethylated genes. Interestingly almost all of the hypomethylated genes are known to be imprinted, which points to a loss of imprinting in these cases (See Table 7).

TABLE 7: HYPOMETHYLATED TARGET GENE REGIONS

PRC2 target identification for colon cancer and all others

A comprehensive analysis by Lee et al. recently mapped the SUZ 12 (suppressor of zeste 12) subunit of the Polycomb Repressive Complex 2 (PRC2) in embryonic stem cells and found its binding sites to be preferentially located at a distinct set of developmental genes (Lee, T.I. et al. Cell 125, 301-13 (2006)). A second study found that another polycomb group protein, EZH2 (Enhancer of Zeste

Homolog2), interacted with PRC2 to recruit DNA methyltransferases and control CpG methylation (Vire, E. et al. Nature 439, 871-4 (2006)). Taken together, it is believed that PRC2 target genes are silenced by DNA methylation in cells that maintain pluripotency, and this silencing might facilitate abnormal clonal expansion. A retrospective analysis of DNA methylation data provided further evidence that PRC2 target genes are silenced in human colon cancer (Widschwendter, M. et al. Nat Genet 39, 157-8 (2007)). In addition to SUZ 12, PRC2 contains two more core subunits: EED (embryonic ectoderm development) and EZH2. PRC2 has been shown to mediate histone H3K27 methylation. Chromatin immunoprecipitation studies have mapped EED and H3K27 binding in ES cells and found high concordance with SUZ 12 (Wang, H. et al. BMC Genomics 7, 166 (2006)). The SUZ 12, EED and H3K27 marks were used to determine PRC2 marks in the data set of 531 genomic regions. The target genes are provided in Table 9, the genomic amplicon sequences are provided in Table 10, and the CpG site locations are provided in Table 11. PRC2 binding information was found for 440 amplicons, including 79 amplicons with one or more PRC2 marks. The fraction of amplicons that contain one or more PRC2 binding sites was calculated for both the set of genes that did not show significant methylation differences and the set of genes that did show significant methylation differences in cancer cell lines versus normal tissue. The findings show a significant (p< 0.001, fisher's exact test) enrichment for PRC2 targets in the set of significantly hypermethylated genes in six out of the nine tumor types. Only the Melanoma specific gene set is not enriched for PRC2 targets, which may be explained by the high level of differentiation of melanomas. All other tumor types are 2 to 6 fold enriched for PRC2 targets (Table 8).

TABLE 8

Fraction Fraction

PRC target PRC target in nonin P-value, significant significant Fisher's odds

Tissue enes enes exact test ratio

A graphical representation of gene - tumor associations reveals that highly connected genes tend to be PRC2 targets. In contrast, the majority of genes without PRC2 -marks are found in only one or two tumor types. These findings strongly suggest a common involvement of PRC2 targets in neoplastic development.

TABLE 9 - TARGET GENE REGIONS

TABLE 10 - TARGET GENE SEQUENCES & PRIMERS

The entirety of each patent, patent application, publication and document referenced herein hereby is incorporated by reference, including all tables, drawings, and figures. All patents and publications are herein incorporated by reference to the same extent as if each was specifically and individually indicated to be incorporated by reference. Citation of the above patents, patent applications, publications and documents is not an admission that any of the foregoing is pertinent prior art, nor does it constitute any admission as to the contents or date of these publications or documents. All patents and publications mentioned herein are indicative of the skill levels of those of ordinary skill in the art to which the invention pertains.

Modifications may be made to the foregoing without departing from the scope, spirit and basic aspects of the invention. Although the invention has been described in substantial detail with reference to one or more specific embodiments, those of ordinary skill in the art will recognize that changes may be made to the embodiments specifically disclosed in this application, and yet these modifications and improvements are within the scope and spirit of the invention. One skilled in the art readily appreciates that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. The examples provided herein are representative of specific embodiments, are exemplary, and are not intended as limitations on the scope of the invention.

The invention illustratively described herein suitably may be practiced in the absence of any element(s) not specifically disclosed herein. Thus, for example, in each instance herein any of the terms "comprising", "consisting essentially of, and "consisting of may be replaced with either of the other two terms. Thus, the terms and expressions which have been employed are used as terms of description and not of limitation, equivalents of the features shown and described, or portions thereof, are not excluded, and it is recognized that various modifications are possible within the scope of the invention. Embodiments of the invention are set forth in the following claims.

Claims

What is claimed is:

1. A method for identifying a subject at risk of cancer, which comprises determining the methylation status of one or one or more nucleic acid target gene regions, wherein the one or more nucleic acid target gene regions comprises one or more PRC2 binding complexes and is disclosed in Table 9, and further wherein the methylation status is associated with the occurrence of cancer.