WO2014204861A1

WO2014204861A1 - Universal methylation profiling methods

Info

Publication number: WO2014204861A1
Application number: PCT/US2014/042567
Authority: WO
Inventors: Timothy H. Bestor; Jingyue Ju; Xiaoxu Li
Original assignee: The Trustees Of Columbia University In The City Of New York
Priority date: 2013-06-17
Filing date: 2014-06-16
Publication date: 2014-12-24
Also published as: CN105408342A; EP3010929A1; EP3010929A4

Abstract

The present invention provides a compound having the adenosylmethionine structure, wherein the sulfur atom of the methionine moiety is modified with alkene or succinimide groups, wherein R1, R2 and R3 are independently H, alkyl, aryl, C(0)NH2, C(0)R', CN, N02, C (O)R', S(0),NHR'; wherein X is 0 or NR'; wherein R' is H, alkyl or aryl; and wherein n is an integer from 1 to 8, with the proviso that, when substituent is a propene derivative, and n is 1, at least one of R1, R2, or R3 is other than H.

Description

UNIVERSAL METHYLATION PROFILING METHODS

This application claims priority of U.S. Provisional Application No. 61/836,060, filed June 17, 2013, the contents of which is hereby incorporated by reference in its entirety.

Throughout this application, various publications are referenced. The disclosures of these publications in their entireties are hereby incorporated by reference into this application to more fully describe the state of the art to which this invention pertains.

Background of the Invention

The human genome contains -28 million CpG sites, about 70% of which are methylated at the 5 position of the cytosine (Edwards et al . , 2010) . Aberrant DNA methylation has been linked to a growing list of developmental diseases, age-related neurodegenerative disorders, diabetes and cancer (Robertson et al . , 2005). Hence, epigenetic changes in DNA methylation status are increasingly being studied for their role in both normal and disease-associated phenotypic changes, including the Roadmap Epigenomics Project being launched by NIH to create reference epigenomes for a variety of cell types. To fulfill this goal, genome-wide methods and techniques that can comprehensively profile DNA methylation status with single base resolution and high throughput are essential (Suzuki et al . , 2008, Laird, 2010) .

Over 30 methylation analysis technologies have been developed, but all have shortcomings (Laird, 2010) . Bisulfite genomic sequencing (BGS) , reported by Susan Clark and Marianne Frommer in 1994 (Clark et al . , 1994), is regarded as the best available method. However, BGS has several serious shortcomings: (1) there is a severe loss of sequence information upon bisulfite conversion which produces sequences that cannot be aligned to the genome; (2) strong biases against GC-rich sequences (Edwards et al . , 2010); (3) bisulfate conversion artifacts, and (4) the need for large amounts of long starting DNA due to high rates of strand cleavage under the harsh reaction conditions (Warnecke et al . , 2002, 1997). Major improvements or new approaches overcoming the above mentioned limitations are needed to further advance genome-wide DNA methylation profiling. STimmarv of the Invention

The present invention provides a compound having the structure :

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R'₍ CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R is

and n is 1, at least one of R₁₇ R₂ or R₃ is other than H

The present invention also provides a composition of matter comprising a compound having the structure:

, wherein R is

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R is

and n is 1, at least one of R_1; R₂ or R₃ is other than H,

attached to a CpG methyltransferase.

The present invention also provides a process of producing a derivative of a double-stranded DNA comprising contacting the double-stranded DNA with a CpG methyltransferase and an S- adenosylmethionine analog having the structure:

wherein R is a chemical group capable of being transferred from the S-adenosylmethionine analog by the CpG methyltransferase to a 5- carbon of a non-methylated cytosine of the double-stranded DNA, under conditions such that the chemical group covalently binds to the 5-carbon of the non-methylated cytosine of the double-stranded DNA, and thereby produces the derivative of the double-stranded DNA, wherein the chemical group has the structure:

wherein R_1# R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R is

and n is 1, at least one of R₁₍ R₂ or R₃ is other than H The present invention also provides a method of determining whether a cytosine present within a double-stranded DNA sequence of known sequence is non-methylated comprising:

a) producing a derivative of the double-stranded DNA by contacting the double-stranded DNA with a CpG methyltransferase and an S-adenosylmethionine analog having the structure:

wherein R is a chemical group capable of being transferred from the S-adenosylmethionine analog by the CpG methyltransferase to a 5 carbon of a non-methylated cytosine of the double-stranded DNA so as to covalently bond the chemical group to the 5 carbon of the non-methylated cytosine of the double-stranded DNA, thereby making a derivatized double stranded DNA, wherein the chemical group has the structure :

wherein R_lf R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂/

C(0)R', CN, N0₂, C(0)R', S<0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8;

separately obtaining a single strand of the derivative of the double-stranded DNA;

sequencing the single strand so obtained in step b) ; and comparing the sequence of the single strand determined in step c) to the sequence of a corresponding strand of the double- stranded DNA of which a derivative has not been produced, wherein the presence of a uracil analog in the single strand of the derivative single strand instead of a cytosine at a predefined position in the corresponding strand of the double- stranded DNA of which a derivative has not been produced indicates that the cytosine at that position in the double- stranded DNA is non-methylated.

The present invention also provides a derivatized DNA molecule, wherein the derivatized DNA molecule differs from DNA by comprising a nucleotide residue which comprises a base having the following

is

wherein R_x, R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R''_; CN, N0₂, C(0)R'', S(0)₂NHR'';

wherein X is 0 or NR' ' ;

wherein R' ' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R' is

and n is 1, at least one of R₁₇ R₂ or R₃ is other than H, and wherein the sugar is a sugar of the nucleotide residue.

is

wherein R₁₇ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

and wherein the sugar is a sugar of the nucleotide residue.

The present invention also provides a kit for derivatizing a double- stranded DNA molecule or for determining whether a cytosine present within a double-stranded DNA sequence of known sequence is non- methylated comprising: a) a compound having the structure:

₍ wherein R is

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8; and

b) instructions for use.

The present invention also provides a method of determining whether a cytosine present within a double-stranded DNA sequence of known sequence is non-methylated comprising:

a) producing a derivative of the double-stranded DNA by contacting the double-stranded DNA with a CpG methyltransferase and an S- adenosylmethionine analog having the structure:

wherein R is a chemical group capable of being transferred from the S-adenosylmethionine analog by the CpG methyltransferase to a 5 carbon of a non-methylated cytosine of the double-stranded DNA so as to covalently bond the chemical group to the 5 carbon of the non- methylated cytosine of the double-stranded DNA, thereby making a derivatized double stranded DNA, wherein the chemical group has the structure :

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8; and

b) determining whether a cytosine at a predefined position in the double-stranded DNA has been modified with the chemical group R,

wherein modification with the chemical group R on the cytosine at a predefined position in the double-stranded DNA indicates that the cytosine at that position in the double-stranded DNA is non- methylated . Brief Description of the Figures

Figure 1. Conversion of C to U by DNA methyltransferase (MTase) using AdoMet analog. MTase transfers a chemical conversion group R from AdoMet analog to the 5 position of cytosine. After transfer, photochemically triggered intramolecular reaction between the R group and C facilitates deamination at the 4 position to form a U analog. DNA MTases are able to transfer a wide variety of functional groups to the 5 position of cytosines in double stranded DNA with high sequence specificity.

Figure 2. Synthetic scheme of 5-AOMC.

Figure 3. HPLC profile during time course of photo-irradiation of 5- AOMC. After 3h of irradiation, the main peak is the starting material 5-AOMC (MS-MW found 298) (Left) . After 12h photo- irradiation, the starting material is mostly consumed yielding a new product 5-AOMU with MS-MW 299 (Right) .

Figure 4. UV absorption spectra of the new photochemically generated product (5-AOMU) reveals maximum absorption at 265nm, typical of U, while the starting material (5-AOMC) shows the expected absorption of C with •max=274nm.

Figure 5. The mixture of the photochemically generated product from 5-AOMC and the synthesized 5-AOMU shows a single peak in HPLC analysis .

Figure 6. Synthesis of 5-AOMC phosphoramidite .

Figure 7. Single base extension MS analyses of primer elongation show conversion of 5-AOMC to 5-AOMU (C to U) in a DNA strand after photo-irradiation, resulting in a primer extension product (MW 3261) incorporating a ddA (A) , while in a counterpart DNA strand, unmodified C remains intact, resulting in a ddG incorporated extension product (MW 3277) (B) .

Figure 8. Example structures of photoreactive moiety containing AdoMet analogs. R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂, C(0)R', CN, N0₂, C(0)R', S(0)₂NHR'; X is 0 or NR' ; R' is H or alkyl ; and n=l-8.

Figure 9. Example scheme for the syntheses of AdoMet analogs containing photoreactive groups. Figure 10. Example scheme for the syntheses of bromides of the photoreactive groups .

Figure 11. DNA methylation profiling method based on DNA methyltransferase aided CpG site-specific conversion of C to U. An optimized AdoMet analog is used to deliver the conversion group to an unmethylated CpG so that only modified C can be further converted to U via photo-triggered intramolecular reaction. Subsequent sequencing permits DNA methylation status to be read out at single base resolution.

Figure 12. Two-path photocatalysis mechanism of Ru(bpy)₃ ^2*, a versatile photocatalyst which can engage [2+2] cycloaddition of both electron-deficient and electron-rich olefins.

Figure 13. A. Ru (bpy) 3²⁺-visible light catalyzed photo-conversion of a 5-position modified C in DNA to a U via a cycloaddition intermediate when C is modified with an electron-deficient double bond. B. Ru (bpy) 3²⁺-visible light catalyzed photo-conversion of a 5- position modified C in DNA to a U via a cycloaddition intermediate when C is modified with an electron-rich double bond.

Figure 14. Photo-Conversion Assay. Oligonucleotides are synthesized bearing a photo-convertible 5-modified cytosine (C) on each strand within the context of the same CpG site. Prior to irradiation, or in the absence of photo-conversion to the U analog (U' ) , the site can be cut by the restriction enzyme Hpall following PCR amplification which results in replacement of the C by a normal C. After photo- conversion, PCR will convert the resulting U' to a normal U, in which case the site can be cut with Bfal. Other restriction sites are included in the DNA to produce fragment sizes that allow easy discrimination of the bands on gels or the peaks obtained with mass spectroscopy. Detailed Description of the Invention Terms

As used herein, and unless stated otherwise, each of the following terms shall have the definition set forth below.

A - Adenine ;

C - Cytosine;

DNA - Deoxyribonucleic acid;

G - Guanine ;

RNA - Ribonucleic acid;

T - Thymine; and

U - Uracil.

"Nucleic acid" shall mean any nucleic acid molecule, including, without limitation, DNA, RNA and hybrids thereof. The nucleic acid bases that form nucleic acid molecules can be the bases A, C, G, T and U, as well as derivatives thereof. Derivatives of these bases are well known in the art, and are exemplified in PCR Systems, Reagents and Consumables (Perkin Elmer Catalogue 1996-1997, Roche Molecular Systems, Inc., Branchburg, New Jersey, USA).

"Type" of nucleotide refers to A, G, C, T or U. "Type" of base refers to adenine, guanine, cytosine, uracil or thymine.

"Mass tag" shall mean a molecular entity of a predetermined size which is capable of being attached by a cleavable bond to another entity.

"Solid substrate" shall mean any suitable medium present in the solid phase to which a nucleic acid or an agent may be affixed. Non- limiting examples include chips, beads and columns.

"Hybridize" shall mean the annealing of one single-stranded nucleic acid to another nucleic acid based on sequence complementarity. The propensity for hybridization between nucleic acids depends on the temperature and ionic strength of their milieu, the length of the nucleic acids and the degree of complementarity. The effect of these parameters on hybridization is well known in the art (see Sambrook J, Fritsch EF, Maniatis T. 1989. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, New York.)

Embodiments of the Invention

wherein R_lt R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R is

and n is 1, at least one of R₁₍ R₂ or R₃ is other than H.

In one or more embodiments, R is

wherein R and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R',

CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when

or when R is

and n is 1 , at 1east one of Ri or R₂ is other than H.

In one or more embodiments , R is

In one or more embodiments, R' is H or alkyl.

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R is

and n is i, at least one of R₁₍ R₂ or R₃ is other than H

attached to a CpG methyltransferase .

In one or more embodiments, R is

wherein R_x and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R',

CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when

or when R is

and n is 1, at least one of Ri or R₂ is other than H.

In one or more embodiments, R' is H or alkyl .

In one or more embodiments, the compound is attached to the active site of the CpG methyltransferase .

In one or more embodiments, the CpG methyltransferase is Sssl methyltransferase .

In one or more embodiments, the CpG methyltransferase is Hhal methyltransferase .

In one or more embodiments, the CpG methyltransferase is CviJI methyltransferase . The present invention also provides a process of producing a derivative of a double-stranded DNA comprising contacting the double-stranded DNA with a CpG methyltransferase and an S- adenosylmethionine analog having the structure:

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R is

and n is 1, at least one of R₁₍ R₂ or R₃ is other than H. one or more embodiments, the chemical group has the structure

wherein R_x and R₂ are independently H, alkyl, aryl, C(0) H₂, C(0)R',

CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when

or when R is

and n is 1, at 1east one of R_x or R₂ is other than H. one or more embodiments, the chemical group has the structure

In one or more embodiments, R' is H or alkyl .

In one or more embodiments, the CpG methyltransferase is Sssl methyltransferase . In one or more embodiments, the CpG methyltransferase is Hhal methyltransferase .

In one or more embodiments, the CpG methyltransferase is CViJI methyltransferase .

In one or more embodiments, the chemical group capable of being transferred from the S-adenosylmethionine analog by the CpG methyltransferase to the 5-carbon of the non-methylated cytosine of the double-stranded DNA permits photochemical deamination of a position of the non-methylated cytosine when it is covalently bound to the 5-carbon of the non-methylated cytosine of the double- stranded DNA.

In one or more embodiments, the non-methylated cytosine is immediately adjacent in sequence to a guanine in a single strand of the double-stranded DNA.

a) producing a derivative of the double-stranded DNA by contacting the double-stranded DNA with a CpG methyltransferase and an S-adenosylmethionine analog having th structure :

wherein R_lt R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂, C(0)R', CN, N0₂₍ C(0)R', S(0)₂NHR'; wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8;

b) separately obtaining a single strand of the derivative of the double-stranded DNA;

c) sequencing the single strand so obtained in step b) ; and d) comparing the sequence of the single strand determined in step c) to the sequence of a corresponding strand of the double- stranded DNA of which a derivative has not been produced, wherein the presence of a uracil analog in the single strand of the derivative single strand instead of a cytosine at a predefined position in the corresponding strand of the double- stranded DNA of which a derivative has not been produced indicates that the cytosine at that position in the double- stranded DNA is non-methylated.

In one or more embodiments, the chemical group has the structure

wherein and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ; wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

In one or more embodiments, the chemical group has the structure

In one or more embodiments, R' is H or alkyl.

In one or more embodiments, the chemical group capable of being transferred from the S-adenosylmethionine analog by the CpG methyltransferase to the 5 carbon of the non-methylated cytosine of the double-stranded DNA permits photochemical deamination of a 4 position of the non-methylated cytosine when it is covalently bound to the 5 carbon of the non-methylated cytosine of the double- stranded DNA.

In one or more embodiments, in step c) the sequencing is sequencing by synthesis.

In one or more embodiments, the sequencing by synthesis comprises contacting the derivatized single strand with a DNA polymerase, a primer oligonucleotide, dATP, dCTP, dGTP, dTTP, and a dideoxynucleotide triphosphate having a detectable label attached thereto .

In one or more embodiments, the detectable label is radioactive or fluorescent .

In one or more embodiments, the detectable label is a mass tag.

In one or more embodiments, the method further comprises attaching the single strand to a solid support prior to step c) .

The present invention also provides a derivatized DNA molecule, wherein the derivatized DNA molecule differs from DNA by comprising a nucleotide residue which comprises a base having the following structure :

R'

wherein R_lf R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R'', CN, N0₂, C(0)R''₍ S(0)₂NHR";

wherein X is 0 or NR' ' ;

wherein R' ' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R' is

and n is 1, at least one of R₁₍ R₂ or R₃ is other than H, and wherein the sugar is a sugar of the nucleotide residue.

wherein R_x and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R'',

CN, N0₂, C(0)R", S(0)₂NHR' ' ;

wherein X is 0 or NR' ' ;

wherein R' ' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8 ,

In one or more embodiments, R' ' is H or alkyl.

structure :

, wherein R' ' is

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

and wherein the sugar is a sugar of the nucleotide residue.

wherein R₁ and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R',

CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

In one or more embodiments, R' is H or alkyl The present invention also provides a kit for derivatizing a double- stranded DNA molecule or for determining whether a cytosine present within a double-stranded DNA sequence of known sequence is non- methylated comprising: a) a compound having the structure:

wherein R_lf R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂₍

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8; and

b) instructions for use.

In one or more embodiments , R is

wherein Rj_. and R₂ are independently H, alkyl, aryl, C(0)NH₂₍ C(0)R',

CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

In one or more embodiments, R' is H or alkyl .

In one or more embodiments, the kit further comprises a CpG methyltransferase .

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0) H_2/

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8; and

wherein modification with the chemical group R on the cytosine at a predefined position in the double-stranded DNA indicates that the cytosine at that position in the double-stranded DNA is non- methylated. n one or more embodiments the chemical group has the structure

wherein R_x and R₂ are independently H, alkyl, aryl, C(0)NH_2/ C(0)R',

CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

In one or more embodiments the chemical group has the structure

In one or more embodiments the CpG methyltransferase is Sssl methyltransferase .

In one or more embodiments the CpG methyltransferase is ff al methyltransferase .

In one or more embodiments the CpG methyltransferase is CviJI methyltransferase .

In one or more embodiments the CpG methyltransferase is M. Sssl methyltransferase .

In one or more embodiments the CpG methyltransferase is M. ffhal methyltransferase .

In one or more embodiments the CpG methyltransferase is . CviJI methyltransferase . In one or more embodiments determining whether a cytosine at a predefined position in the double-stranded DNA has been modified with the chemical group R comprises converting the modified cytosine to a uracil analog.

In one or more embodiments, the method further comprises conversion of a modified cysteine residue in the DNA derivative to a uracil analog by a photo-catalyzed reaction.

In one or more embodiments the photo-catalyzed reaction is carried out using a Tris (bipyridine) ruthenium (II) chloride (Ru(bpy)₃ ^2*) catalyst .

In one or more embodiments the (Ru(bpy)₃ ^2*) catalyst is Ru(bpy)₃Cl₂.

In one or more embodiments the (Ru(bpy)₃ ^2*) catalyst is Ru (bpy) ₃ (PF₆) ₂.

In one or more embodiments the light source for the photo-catalyzed reaction is a household bulb.

In one or more embodiments the light source for the photo-catalyzed reaction is a laser.

In one or more embodiments the laser has a wavelength of 400nm- 600nm.

This invention provides methods for methylation profiling. Methods for methylation profiling are disclosed in U.S. Patent Application Publication No. US 2011-0177508 Al, which is hereby incorporated by reference .

This invention provides the use of DNA methyltransferases . Examples of DNA methyltransferases include but are not limited to Sssl, Hhal and CviJI as well as modified Sssl, Hhal and CviJI (K.Sssl, M.Hhal and M. viJI , respectively). These enzymes are modified mainly to have reduced specificity such that R groups on AdoMet analogs can be more efficiently transferred to unmethylated C residues, including in the context of a CpG site in DNA. Examples of such modified M.SssI and M.Hhal genes have been described in the literature (Lukinavicius et al (2012) Engineering the DNA cytosine-5 methyltransferase reaction for sequence-specific labeling of DNA. Nucleic Acids Res 40:11594-11602; Kriukene et al (2013) DNA unmethylome profiling by covalent capture of CpG sites. Nature Commun 4 :doi : 10.1038/ncomms3190) . This invention provides the instant methods and processes, wherein the detectable label bound to the base via a cleavable linker is a dye, a fluorophore, a chromophore, a combinatorial fluorescence energy transfer tag, a mass tag, or an electrophore . Combinatorial fluorescence energy tags and methods for production thereof are disclosed in U.S. Patent No. 6,627,748, which is hereby incorporated by reference.

Detectable tags and methods of affixing nucleic acids to surfaces which can be used in embodiments of the methods described herein are disclosed in U.S. Patent Nos . 6,664,079 and 7,074,597 which are hereby incorporated by reference.

This invention also provides the instant methods and processes, wherein the DNA is bound to a solid substrate. This invention also provides the instant method, wherein the DNA is bound to the solid substrate via 1,3 -dipolar azide-alkyne cycloaddition chemistry. This invention also provides the instant methods and processes, wherein the DNA is bound to the solid substrate via a polyethylene glycol molecule. This invention also provides the instant methods and processes, wherein the DNA is alkyne-labeled. This invention also provides the instant method and processes, wherein the DNA is bound to the solid substrate via a polyethylene glycol molecule and the solid substrate is azide-functionalized. This invention also provides the instant methods and processes, wherein the DNA is immobilized on the solid substrate via an azido linkage, an alkynyl linkage, or biotin-streptavidin interaction. Immobilization of nucleic acids is described in Immobilization of DNA on Chips II, edited by Christine Wittmann (2005) , Springer Verlag, Berlin, which is hereby incorporated by reference. This invention also provides the instant methods and processes, wherein the DNA is bound to the solid substrate via a polyethylene glycol molecule and the solid substrate is azide-functionalized or the DNA is immobilized on the solid substrate via an azido linkage, an alkynyl linkage, or biotin- streptavidin interaction. In an embodiment, the DNA or nucleic acid is attached/bound to the solid surface by covalent site-specific coupling chemistry compatible with DNA.

This invention also provides the instant methods and processes, wherein the solid substrate is in the form of a chip, a bead, a well, a capillary tube, a slide, a wafer, a filter, a fiber, a porous medium, or a column. This invention also provides the instant methods and processes, wherein the solid substrate is gold, quartz, silica, plastic, glass, nylon, diamond, silver, metal, or polypropylene. This invention also provides the instant method, wherein the solid substrate is porous . Chips or beads may be made from materials common for DNA microarrays, for example glass or nylon. Beads/micro-beads may be in turn immobilized to chips.

This invention also provides the instant methods and processes, wherein about 1000 or fewer copies of the DNA are bound to the solid substrate. This invention also provides the instant methods and processes wherein 2xl0⁷, lxl0⁷, lxlO⁶ or lxlO⁴ or fewer copies of the DNA are bound to the solid substrate.

This invention also provides the instant methods and processes, wherein the nucleotide analogues comprise one of the fluorophores Cy5, Bodipy-FL-510, ROX and R6G.

This invention also provides the instant methods and processes, wherein the DNA polymerase is a 9°N polymerase or a variant thereof. DNA polymerases which can be used in the instant invention include, for example E.coli DNA polymerase I, Bacteriophage T4 DNA polymerase, Sequenase™, Taq DNA polymerase and 9°N polymerase (exo-) A485L/Y409V. RNA polymerases which can be used in the instant invention include, for example, Bacteriophage SP6 , T7 and T3 RNA polymerases.

Methods for production of cleavably capped and/or cleavably linked nucleotide analogues are disclosed in U.S. Patent No. 6,664,079, which is hereby incorporated by reference.

DNA Methylation is described in U.S. Patent Application Publication No. 2003-0232371 Al which is hereby incorporated by reference in its entirety.

All combinations and subcombinations of the various elements described herein are within the scope of the invention.

This invention will be better understood by reference to the Experimental Details which follow, but those skilled in the art will readily appreciate that the specific experiments detailed are only illustrative of the invention as described more fully in the claims which follow thereafter. Experimental Details

DNA methylation at specific sequences was first analyzed by Southern blotting after cleavage with methylation-sensitive restriction endonucleases (MSREs) such as Hpall , which fails to cleave the sequence 5'-CCGG-3' when the central CpG dinucleotide is methylated (Waalwijk and Flavell, 1978) . This method is robust and provides an internal control for complete digestion when the blot is reprobed for mitochondrial DNA, which is not methylated and is present in many copies. However, the MSRE method is tedious, expensive, requires relatively large amounts of radioactive nucleotides, and can test only a small number of CpG sites per fragment because only -20% of all CpG sites fall within the recognition sequence of a known MSRE. If a given fragment contains many CpG sites and only one or a few are unmethylated, the sequence is often scored as unmethylated. MSRE provides the best-controlled method of methylation analysis, but low throughput and other shortcomings means that it cannot form the basis for a whole-genome methylation profiling platform.

Numerous other PCR-based methods for rapid methylation profiling of single or small numbers of CpG sites have been developed; examples are methylation-sensitive PCR (MSP; Steigerwald et al., 1990), COBRA (Eads and Laird, 2002) and methyl-light (Trinh et al . , 2001). These methods are fast and inexpensive but can test only small numbers of CpG sites; they are unsuitable for unbiased whole-genome methylation profiling. After specific methylation abnormalities have been found to be associated with a given disorder, these focused methods might be found to be appropriate for diagnostic and prognostic tests in clinical samples.

Microarray analysis has been applied, with considerable success (i.e., Gitan et al . , 2002). However, microarray methods cannot address the methylation status of repeated sequences (which contain the majority of 5-methylcytosine in the genome; Rollins et al . , 2006), and CpG islands give rise to high noise levels as a result of their high G + C contents . Microarrays cannot examine the methylation status of each CpG dinucleotide. Again, while this method has its advantages, it is not suited to whole-genome methylation profiling.

An important advance in methylation profiling came with the introduction of bisulfite genomic sequencing (BGS) by Susan Clark and Marianne Frommer in 1994 (Clark et al . , 1994). BGS depends on the ability of sodium bisulfite to oxidatively deaminate the 4 position of cytosine, thereby converting the base to uracil. A methyl group at the 5 position prevents bisulfite from adding across the 5-6 double bond, which renders 5-methyl cytosine resistant to bisulfite conversion. PCR amplification followed by DNA sequencing produces a C lane in which each band corresponds to what was a 5- methylcytosine in the starting DNA; all unmethylated cytosines are sequenced as thymines. BGS was an important advance over earlier methods of genomic sequencing (Church and Gilbert, 1984) .

However, BGS has severe drawbacks when applied to whole genome methylation profiling. First, it cannot be known if the thymines in the final sequence were thymines or cytosines in the starting material unless one sequences both in the presence and in the absence of bisulfite treatment and compares the results. This severely reduces the information content of DNA. As a result, the new ultrahigh throughput DNA sequencing methods cannot be used, as sequence reads are short and a large percentage of the sequences cannot be mapped to a single position in the genome. Very few repetitive sequences can be mapped at all . BGS is largely restricted to pre-selected regions of the genome where primers can be designed to selectively amplify the region of interest. Whole- genome methylation profiles cannot be obtained by this method, as many regions of the genome do not allow design of unique primer sets CpG islands are especially problematic, as primer sites free of CpG dinucleotides cannot be found in most CpG islands. Second, bisulfite conversion requires that the DNA be single stranded; any double stranded DNA will be resistant to conversion and will be scored as methylated. As a result, bisulfite treatment must be performed under very harsh conditions (0.2 N sodium hydroxide at elevated temperature for several hours) . Under these conditions bisulfite conversion and chain breakage are competing reactions, and bisulfite conversion only approaches completion when >95% of the DNA has been cleaved to less than 350 bp (Warnecke et al . , 2002). This means that large amounts of starting DNA must be used and the DNA must be long. This prevents the use of DNA from paraffin sections, where the DNA is almost all <300 bp, and also prevents the use of small amounts of DNA, as in the case of early embryos, small tissue biopsies, and other cases in which large amounts of DNA are not available. Third, CpG dinucleotides in certain sequence contexts are inherently resistant to bisulfite conversion (Warnecke et al., 2002), and are scored as spurious sites of methylation. Fourth, the loss of all C-G base pairs introduces a large bias in the PCR amplification step in favor of PCR product derived from unconverted or methylated starting material. (Warnecke et al . , 1997). Each of these artifacts can be severe.

Together the loss of sequence information upon bisulfite conversion, the strong PCR biases, the artifacts of bisulfite conversion, and the need for large amounts of long starting DNA renders conventional BGS inappropriate for whole-genome methylation profiling by ultrahigh throughput DNA sequencing.

Over the past few years this laboratory has developed new methods to fractionate the normal human genome into methylated and unmethylated compartments and have determined the methylation status of CpG dinucleotides in excess of 30 million base pairs from the fractionated genomes in order to characterize the methylation landscape of the normal human genome (Rollins et al . , 2006). In that work, new computational methods were developed that mapped annotated features of the genome onto very large assemblages of sequence data. Although this method, which depends on the enzymatic fractionation of DNA into methylated and unmethylated compartments, has provided information on the methylation status of more CpG sites than the sum total of all other methods, it remains incapable of whole-genome methylation profiling because of shortcomings that cannot be overcome with existing technology.

Examples of methylation abnormalities are identified by the method of Rollins et al . (2006). It should be noted that the method disclosed herein can be applied to any sequenced genome; mammary carcinoma is shown because highly abnormal methylation patterns are known to be present in the genomes of these cells and these genomes provide an excellent test system.

Previous studies from the Klimasauskas and Weinhold groups (Dalhoff et al . , 2006a, 2006b) have shown that a wide variety of functional groups can be efficiently transferred by DNA methyltransferases to the 5 position of cytosines in DNA by means of synthetic AdoMet analogs in which the methyl group has been replaced by any of a wide variety of functional groups. Bulky groups such as biotin can be added to every recognition site for a given methyltransferase . Here DNA methyltransferase SssI can be used to transfer specific reactive groups to the 5 position of cytosines in every unmethylated CpG dinucleotide ; non-CpG cytosines are not modified. If the cytosine is methylated, this reaction is blocked - only unmethylated CpG dinucleotides are derivatized. The most important aspect of the transferred group is that it alters base pairing during sequencing or during amplification by PCR so as to allow discrimination of CpG dinucleotides that were methylated or unmethylated in the starting DNA. The method is conceptually related to bisulfite genomic sequencing, but does not suffer from the deficiencies that render BGS unusable in whole-genome methylation profiling.

Example 1; Methods for genome-wide DNA methylation profiling based on DNA methyltransferase aided site-specific conversion of Cvtosine in CpQ islands

A superior method of methylation profiling based on a unique and innovative approach is developed. The Klimasauskas and Weinhold groups (Dalhoff et al . , 2006a, 2006b) synthesized S-Adenosyl-L- methionine (AdoMet) analogs in which the methyl group has been replaced by a variety of functional groups, and show that DNA methyltransferases are able to specifically transfer these functional groups to the 5 position of cytosines in DNA CpG sites. Efficiency is essentially 100% (Dalhoff et al . , 2006a, 2006b). Thus DNA methyltransferases have been used for sequence-specific, covalent attachment of larger chemical groups to DNA, providing new molecular tools for precise, targeted functionalization and labeling of large natural DNAs .

We take advantage of this capacity to use CpG site-specific DNA methyltransferases to modify the 5 position of C with suitable reactive functionalities that can be subsequently capable of converting C to U. Such a conversion strategy meets the above mentioned criteria and overcomes the drawbacks in existing methods, therefore advancing the field of genome-wide DNA methylation profiling .

A novel genome-wide methylation profiling approach based on DNA methyltransferase aided site-specific conversion of C in unmethylated CpG dinucleotides is developed. In this approach, AdoMet analogs derivatized with C-reactive functionalities are used as substrates of DNA methyltransferases to transfer the reactive functionalities to the 5 position of C in unmethylated CpG dinucleotides (Fig. 1) . These enzymatically attached reactive functionalities initiate highly efficient intramolecular reactions leading to conversion of C to U analogs in neutral aqueous solution. After conversion, high throughput DNA sequencing of the converted DNA provides a single base-resolution methylation profile; unmethylated cytosines are sequenced as thymines, while methylated cytosines are sequenced as cytosines.

There are a number of innovative aspects to this method. First, in this new approach, the CpG site-specific bacterial DNA methyltransferase K-Sssl, which methylates all CpG dinucleotides , transfers conversion chemical groups to the 5 position of cytosines in every unmethylated CpG dinucleotide; non-CpG cytosines and 5- methyl Cs are not modified due to the enzymes' strict CpG site recognition and high regioselectivity for the 5 position of C. This has distinctive advantages over traditional bisulfite chemistry including the specific conversion of only unmethylated CpG sites, instead of all unmethylated cytosines when bisulfite is used. This increases the information content in the sequences and facilitates alignment of reads to the genome. In addition, by not directly affecting all cytosines the conversion chemistry is less toxic and causes far less DNA damage than does BGS, which enables the sequencing of longer DNA fragments (current bisulfite protocols are often limited to the analysis of DNA fragments <500 bp) .

Second, upon being modified with reactive groups at the 5 position of cytosines, further conversion is limited to the modified C's and occurs in an intramolecular fashion with high efficiency. Moreover, the versatility of conversion chemistries allows us to finely optimize the conversion to secure the least extent of DNA damage and retain the maximum DNA sequence information.

Bisulfite sequencing and the limitations thereof are discussed above. In contrast, due to its specificity, high efficiency and mild conversion conditions, DNA methyltransferase-aided conversion of C retains the important advantage of BGS (the ability to test every CpG dinucleotide for methylation) while avoiding its deficiencies such as loss of information content, reactivity of non- CpG cytosines, and the requirement for large amounts of long DNA as starting material . The new method is thus simpler and more amenable to automation and DNA sample preparation for high throughput next- generation sequencing than existing methods, providing a novel and robust approach to comprehensively profile genome-wide DNA methylation patterns .

Advances in chemistry and biology research into DNA methyltransferases have generated a novel tool kit to chemically modify the 5 position of C in unmethylated CpG's in order to initiate a conversion reaction on this specific C. Versatile chemical mechanisms are explored for suitable C to U conversion in DNA. For example, it has been reported that photo-irradiation can generate a 5 , 6-cyclobutane intermediate between an alkene and the 5,6 position double bond of a C via a 2+2 cycloaddition, leading to the interruption of the conjugate system, and deamination at the 4 position (Haga et al.,1993). Notably, this chemistry has been applied to on-DNA conversion of C to U (Matsumura et al., 2008, Fujimoto et al . , 2010).

Example 2: Photochemical conversion of 5 position-modified deoxycytidine to deoxyuridine analog.

For the enzyme-aided site-specific C to U conversion, it is a prerequisite that the 5 position attached chemical conversion moiety should be able to trigger an intramolecular reaction resulting in C to U conversion. Since such an example has never been reported, we started our research by designing and synthesizing deoxycytidines with their 5 position derivatized with photo-reactive moieties. We designed 5 position-modified C by attaching a double bond-containing chain with 1 to 5 carbon units inserted between the double bond and the 5 position of C. We have synthesized one of the model compounds 5-allyloxymethyl-dC (5-AO C), in which the double bond is 3 carbon units from the 5 position following the synthetic route shown in Fig. 2. The product was fully characterized by HR MS and ¾ NMR.

Photo-irradiation of 5-AOMC at 300nm was then conducted in aqueous solution for up to 12h, and the reaction products separated and monitored by HPLC and UV absorption. MS and HPLC profiles indicate that 5-AOMC (MW 298) is converted photochemically to a new product 5-AOMU (MW 299) with high efficiency (Fig. 3) .

Both C and 5-methyl-C remain intact after photo-irradiation under the same conditions. UV absorption spectra of the new photochemically generated product (5-AOMU) reveals maximum absorption at 265nm, typical of U, while the starting material (5- AOMC) shows the expected absorption of C with •max=274nm (Fig. 4) . The results show that 5-AOMC is converted to a U analog by means of photo-irradiation. To further verify this conversion, 5-AOMU was synthesized and characterized. The mixture of the photochemically generated product from 5-AOMC and the synthesized 5-AOMU shows a single peak in HPLC (Fig. 5) , indicating that the new product is identical to 5-AOMU. The photochemically generated 5-AOMU was also characterized by ¾NMR. These data clearly demonstrate that modification at the 5 position of C with an alkene moiety can lead to formation of a U analog by photo-irradiation under mild conditions with high efficiency.

Example 3; Site specific photochemical conversion of C to U analog in DMA

To test the feasibility of the photochemistry approach for converting C to U on a real DNA chain, a fully protected 5-AOMC phosphoramidite was synthesized (Fig. 6) and 5-AOMC (C*) was incorporated into a 16mer oligonucleotide 5 ' -TACGA (C* ) GAGTGCGGCA-3 ' via standard oligonucleotide synthesis. After HPLC purification, the modified 16mer was characterized by MALDI-TOF MS yielding a peak with MW of 5001, equal to the calculated mass. Then photo- irradiation of oligo 5 ' -TACGA (C*) GAGTGCGGCA-3' was conducted to convert C* to a U analog. The outcome for conversion of C* in DNA was detected by using single base extension-MS analysis (Fig. 7) . In this analysis, the photo-irradiated oligonucleotide was annealed with a primer 5 ' -TGCCGCACTC-3 ' (MW 2964), then incubated with ddNTPs and DNA polymerase. With the aid of DNA polymerase, one of the 4 ddNTPs can be selectively incorporated onto the 3 ' end of the primer. Photo-chemistry induced conversion of C to U should be able to direct the incorporation of ddA, resulting in an elongated primer with a MW of 3261, otherwise ddG should be incorporated giving an elongated primer of MW 3277.

Actual MS measurement of the one-base elongated primer shows the MW as 3261, which matches the MW of the primer incorporated with an single ddATP, thus the existence of a complementary U on the template was confirmed, and the photochemical conversion of the C* to a U analog on DNA demonstrated (Fig. 7A) . In the control experiment, photo-irradiation of the unmodified counterpart 5'- TACGACGAGTGCGGCA-3 ' was conducted under the same conditions. The single base extension MS analysis of the resulting primer elongation product gave a MW of 3277, which matches the MW of the primer extended with a single ddGTP (Fig. 7B) , unambiguously showing that only if the 5-position of C is modified with a photo-reactive moiety, can it be converted to the U analog in a DNA fragment with high efficiency. We also measured possible photoirradiation-induced DNA damage. Under the conditions we used (300nm, over 10 h) no DNA damage was observed.

The above studies show that 5 position modification of C with a photo-reactive moiety provides us with a new route for highly efficient C to U conversion, without the drawbacks of bisulfite conversion. Therefore the feasibility of CpG site-specific conversion of C to U is demonstrated and the results offer the basis and rationale for the DNA methyltransferase-aided DNA methylation profiling method described.

Example 4; Exploration of C-U conversion chemistry triggered by a 5 position reactive moiety

A library of 5 position-derivatized deoxycytidxne model compounds is used to systematically screen and optimize the C-reactive functionalities that most efficiently convert C into U. The model compounds are partially listed in Fig. 8.

Species of AdoMet analogs used as DNA methyltransferase substrates are sulfonium derivatives of AdoHcy, in which photoactive groups (R) replace the methyl group in AdoMet (Fig. 8) . Such AdoMet analogs containing photoactive groups (R) serve as substrates for CpG specific DNA methyltransferase, such as Sssl, and as a result of enzymatic reaction, these photoactive groups (R) can be transferred to the 5 position of C of unmethylated CpG dinucleotides in DNA.

(1) The photoactive groups (R) include alkenes (Fig. 8, a, b, c, d, e) which forms a cyclobutane intermediate with 5,6 double bond of C upon photo-irradiation, and further result in C to U conversion. The photoactive groups (R) also include alkenes modified with R₁₍ R₂ and R₃, which facilitates the photoreaction. R_lr R₂ and R₃ are hydrogen, alkyl, aryl, amide, carboxylic acid, ester, nitro group, cyano group, aldehyde, ketone and sulfonamide. Such an alkene or R₁₍ R₂ and R₃ modified alkenes can be separated from the sulfur atom in AdoMet analogs by a carbon chain of various length (n=l-8) . Either alkyl chains or chemically cleavable structures (such as ester linkers) can be inserted between the above mentioned alkenes and the sulfur atom in AdoMet analogs.

(2) The photoactive groups (R) include the above mentioned alkene conjugated butadiene moiety (Fig. 8, c) . Such butadienes are linked to the sulfur atom in AdoMet analogs by a carbon chain of various length .

(3) The photoactive groups (R) include maleic anhydride and maleimide analogs (Fig. 8, f, g, h, i) which are linked to the sulfur atom in AdoMet analogs by either a saturated or double bond and containing a carbon chain of various length. (4) The photoactive groups (R) also include N, N double bond containing functionalities that can be used for photoreaction with the 5,6 double bond of C upon photo-irradiation. For example R can be tetrazole-containing moieties (Fig. 8, j, k, 1) which are linked to the sulfur atom in AdoMet analogs by either a saturated or double bond and containing a carbon chain of various length.

Example 5; Syntheses of AdoMet analogs containing photoreactive groups

Synthesis of AdoMet analogs with the desired extended side chains is carried out by regioselective S-alkylation of AdoHcy with corresponding triflates or bromides of the photoreactive moieties under mild acidic conditions. A diastereomeric mixture of sulfonium is expected after alkylation of AdoHcy, and further RP—HPLC (reverse phase high performance liquid chromatography) purification is conducted to isolate the enzymatically active S-epimer for the subsequent transfer reaction. Examples of the synthesis route for AdoMet analog are shown in Fig. 9. Triflates or bromides of the photoreactive moieties needed for AdoMet analogs synthesis can be synthesized using commercially available starting materials as shown in Fig. 10.

Example 6: DNA methyIt ansferase guided transfer of C-reactive functionalities on CPG-containing DMA and subsequent site-specific C to U conversion.

AdoMet derivatives are designed based on the results of the experiments above and a possible library of AdoMet derivatives to be synthesized is identified. Synthesis of AdoMet analogs with the desired extended side chains is carried out following reported methods (Dalhoff et al., 2006a, 2006b) by regioselective S- alkylation of AdoHcy with corresponding triflates or bromides of the photoreactive moieties under mildly acidic conditions (Fig. 9) . A diastereomeric mixture of sulfonium is expected after alkylation of AdoHcy, and further RP-HPLC purification is used to isolate the enzymatically active S-epimer for subsequent transfer reactions. With AdoMet analogs as substrates, a CpG site-specific DNA methyltransferase (M-Sssl) is used to transfer a photo-reactive group to the 5 position of unmethylated cytosines on both synthetic DNA and genomic DNA samples (Fig. 11) . MALDI-TOF MS is used to evaluate the efficiency of reactive moiety transfer onto the DNA. Further site-specific photochemical conversion of C to U is carried out using the optimized reaction conditions obtained above. Single- base extension experiments (Fig. 7) are used to study on-DNA conversion efficiency.

The ideal AdoMet analogs are identified, and the conditions for both enzymatic transfer of modifying functionality and on-DNA conversion are optimized. The photo-irradiation conditions are further optimized by screening for optimal wavelength, intensity and other conditions (temperature, time, buffer, pH, and auxiliary ingredients) to maximize the conversion yield and minimize possible side reactions on DNA.

Example 7 : Combined DNA methyl-transferase-aided conversion chemistry and next-generation DNA sequencing to achieve real-world DNA methylation profiling

After validation of the DNA methyltransferase-aided CpG site specific conversion of C to U analog, methylation patterns in real- world genomic DNA preparations are determined from the mammary carcinoma cell line MCF-7, for which we have very large amounts of methylation data. DNA is purified by proteinase K digestion, phenol extraction, and dialysis against 10 mM Tris HC1, pH 7.2. DNA then is reacted with the optimal AdoMet derivative identified above and with M-Sssl (New England Biolabs, Inc.). The derivatized DNA then is subjected to high throughput DNA sequencing (Fig. 11) , and CpG dinucleotides in the NCBI reference sequence that appear as TpG are scored as unmethylated CpG dinucleotides in the starting DNA. The required software has been developed and validated (Edwards et al., 2010) .

After conversion of unmethylated cytosines to uracil analogs, an inert "tail" from the added reactive groups may remain at the 5 position of the cytosine. This tail extends into the major groove of the DNA helix, but it is well known that modification of this position does not interfere with incorporation of nucleotides during polymerase extension, and this position has been modified in a large number of applications (Ju, et al . , 2006) including polymerase- catalyzed labeling of DNA and RNA with bulky adducts such as biotin, digoxigenin, and large fluorescent moieties. Such modifications do not markedly interfere with the efficiency or specificity of dNTP incorporation. This allows coupling the well established sample preparation protocols (emulsion PCR or bridging PCR) and high- throughput DNA sequencing technologies with the enzyme-aided CpG site specific C to U conversion of genomic DNA. Therefore the workflow (Fig. 11) of this new genome-wide DNA methylation profiling method consists of: 1) enzymatic transfer of C-reactive moiety to 5 position of C; 2) photoreactions leading to C to U analog conversion; 3) sequencing sample preparation; and 4) integrated high-throughput DNA sequencing and data interpretation using any sequencing platform adapted to our enzyme-aided and photochemistry based C to U conversion. Validation of this work flow establishes the basis for further automation of the protocols.

Example 8; Detection of non-CpG methylation

Some cell types contain non-CpG methylation (Lister et al . , 2009), the function of which is currently unknown. We can map -22% of all non-CpG methylation by a simple modification of the K.SssI protocol by substitution of M.CViJI, which methylates the cytosine in GpC dinucleotides (Xu et al. , 1998), for U.SssI, which is specific for CpG dinucleotides .

Example 9 : Methylation profiling and single-molecule sequencing

Our technology also can be used in single-molecule sequencing technologies that identify bases by electronic properties as the DNA passes through nanopore chambers. AdoHcy derivatives are developed that allow accurate identification of derivatized cytosines so as to distinguish cytosine from 5 methyl cytosine. This represents a simple extension of the technology that allows the extremely fast and economical mapping of genomic methylation patterns . The Methyl- seq method elaborated here provides converted DNA that can be sequenced not only by all current methods but is also perfectly suited to the new single molecule methods under development.

Example 10; Photo-conversion of C to U mediated by Visible Light Irradiation Using Ru^*2 Complex as Photocatalvst

Among versatile chemical mechanisms which can be explored for suitable C to U conversion in DNA, photo-irradiation mediated [2+2] cycloaddition leads to a 5 , 6-cyclobutane intermediate between an alkene and the 5,6 position double bond. Due to the interruption of the conjugate system, further deamination at the 4 position (Haga, et al.,1993) can readily occur resulting in formation of U. Notably, this chemistry has been applied to on-DNA conversion of C to U ( atsumura, et al . , 2008, Fujimoto, et al . , 2010).

We have demonstrated that a C with its 5 position modified with an alkene-bearing chemical group can be photoconverted to a U analog by 300nm photo-irradiation using the model compound 5-allyloxymethyl-dC (5-AOMC) .

Here, in addition to a variety of 5 position modifying groups which facilitate the C to U photo-conversion, mild and efficient photo reaction conditions are applied in C to U conversion in real-world DNA by using visible light combined with the use of the photocatalyst Tris (bipyridine) ruthenium (II) chloride (Ru (bpy) ^*) .

Irradiation of Ru(bpy)₃ ^2* with visible light (Xmax=452 nm) produces a photoexcited state Ru*(bpy)₃ ^2* which can abstract an electron from a relatively electron-rich amine base. The resulting Ru(bpy)^3* complex then reduces an aryl enone to the key radical anion intermediate involved in [2+2] cycloaddition. On the other hand Ru*(bpy)₃ ³⁺ can also pass an electron to an electron acceptor, the resulting Ru(bpy)₃ ^2* then oxidize electron-rich styrene, affording a radical cation that would undergo subsequent [2+2] cycloaddition. Thus through either a reductive quenching cycle or an oxidative quenching cycle, Ru*(bpy)₃ ^2* turns out to be a powerful photoredox catalyst for [2+2] photocycloaddition of olefins (Ischay et al . , 2008; 2010; Du et al . , 2009) . Its two-path photocatalysis mechanism makes Ru(bpy)₃ ^2* a versatile photocatalyst which can engage [2+2] cycloaddition of both electron-deficient and electron-rich olefins ( Fig . 12) . This advantage provides us with a wide range of both electron-deficient and electron-rich olefins from which we screen and optimize the suitable double bond containing species that can be used to design and synthesize AdoMet analogs. Upon attachment of photo-reactive moieties at the 5-position of CpG sites by using DNA methyltransferase and such AdoMet analogs as substrate, subsequent C to U conversion is achieved by Ru^2*-visible light photocatalysis.

A variety of complementary double bond species modified with either electron withdrawal groups or electron donating groups can be used to derivatize AdoMet analogs. The distance between these modified double bonds and the 5,6 double bond of C is also taken into consideration for a space and energy favorable intramolecular cycloaddition .

Fig . 13 shows two examples of Ru (bpy) /-visible light catalyzed photo-conversion of 5-position modified C to U in DNA. Examples of Ru(bpy)₃ ^2* complex can be Ru(bpy)₃Cl₂ and Ru (bpy) ₃ (PF₆) ₂. The visible light photocatalysis reaction mixture contains the Ru(bpy)₃ ^2* complex, tertiary amines (for example N, N-diisopropylethylamine (i- Pr₂Net) ) , quaternary ammonium cations (such as MV(PF methyl viologen, or MV^2*) , Mg^2* or Li^*. The light source can be ordinary household bulbs or lasers (with wavelength from 400nm-600nm) .

Example 11: Further Testing of Photo-Conversion Reaction

An alternative method to test the conversion of a C with its 5 position modified with a photoactive group within a synthetic single stranded DNA molecule to a U analog utilizes a synthetic double- stranded DNA molecule, which takes advantage of a simple gel-based restriction endonuclease assay.

Both strands of a DNA molecule of at least 50 base pairs are synthesized as oligonucleotides containing one modified C in a CpG context. The C within the same CpG moiety in both strands is replaced with one of the 5 position modified photo-convertible analogs described earlier using standard phosphoramidite-based synthetic chemistry. The oligonucleotides are designed in such a way that, after PCR, the resulting double-stranded DNA molecule will be cleaved with one restriction enzyme in the absence of photo- conversion and a different restriction enzyme in the presence of photo-conversion. Confirmation of the photo-conversion event is via detection of the resulting restriction fragments on agarose gels, or following denaturation, the single strand fragments via MALDI-TOF mass spectrometry. Additional restriction sites allow for ease of discrimination of fragments on gels, while additional modifiable CpG sites elsewhere in the length of the synthetic DNA molecule provide further options for testing photo-conversion of multiple C analogs at various short distances from each other. An example of such an assay is shown (Fig. 14) .

References

Church GM, Gilbert W. (1984) Genomic sequencing. Proc Natl Acad Sci U S A. 81, 1991-1995.

Clark SJ, Harrison J, Paul CL, Frommer M. (1994) High sensitivity mapping of methylated cytosines. Nucleic Acids Res. 22, 2990-2999.

Dalhoff C, G Lukinavicius , S Klimasauskas and E Weinhold (2006a) Direct transfer of extended groups from synthetic cofactors by DNA methyltransferases Wat Chem Biol 2:31-2.

Dalhoff C, G Lukinavicius, S Klimasauskas and E Weinhold (2006b) Synthesis of S-adenosyl-L-methionine analogs and their use for sequence-specific transalkylation of DNA by methyltransferases Nat Protoc 1, 1879-86.

Du J, Yoon TP, (2009) Crossed Intermolecular [2+2] Cycloadditions of Acyclic Enones via Visible Light Photocatalysis J^". Am. Chem. Soc. 131 (41), pp 14604-14605.

Eads CA, Laird PW. (2002) Combined bisulfite restriction analysis (COBRA). Methods Mol Biol 200, 71-85.

Edwards JR, O'Donnell AH, Rollins RA, Peckham HE, Lee C, Milekic MH, Chanrion B, Fu Y, Su T, Hibshoosh H, Gingrich JA, Haghighi F, Nutter R, Bestor TH (2010) Chromatin and sequence features that define the fine and gross structure of genomic methylation patterns. Genome Res. 20, 972-980.

Fujimoto K, Konishi-Hiratsuka K, Sakamoto T, Yoshimura Y (2010) Site-specific cytosine to uracil transition by using reversible DNA photo-crosslinking. ChemBioChem 11(12), 1661-1664.

Gitan R, Shi H, Chen C, Yan P, Huang T (2002) ethylation-Specific Oligonucleotide Microarray: A New Potential for High-Throughput Methylation Analysis Genome Res. 12, 158-164.

Haga N, Ogura H (1993) Photocycloaddition of cytosine and 2'- deoxycytidines to 2 , 3-dimethyl-2-butene Heterocycles 36(8), 1721- 1724. Ischay MA, Anzovino ME, Du J, Yoon TP (2008) Efficient Visible Light Photocatalysis of [2+2] Enone Cycloadditions J. Am. Chem. Soc, 130 (39) , pp 12886-12887.

Ischay MA, Lu Z, Yoon TP (2010) [2+2] Cycloadditions by Oxidative Visible Light Photocatalysis J. Am. Chem. Soc. 132 (25), pp 8572- 8574.

Kriukien* E, Labrie V, Khare T, Urbanavi · i · t· G, Lapinait* A, Koncevi»ius K, Li D, Wang T, Pai S, Ptak C, Gordevi'ius J, Wang SC, Petronis A, Klimasauskas S (2013) DNA unmethylome profiling by covalent capture of CpG sites. Nature Communications, 4, 2190.

Laird PW (2010) Principles and challenges of genome-wide DNA methylation analysis. Nature Reviews Genetics 11, 191-203.

Lukinavicius G, Lapinaite A, Urbanaviciute G, Gerasimaite R, Klimasauskas S (2012) Engineering the DNA cytosine-5 methyltransferase reaction for sequence-specific labeling of DNA Nucleic Acids Res, 40(22), 11594-602.

Matsumura T, Ogino M, Nagayoshi K, Fujimoto K (2008) Photochemical site-specific mutation of 5-methylcytosine to thymine Chemistry- Letters 37, 94-95.

Robertson KD (2005) DNA methylation and human disease Nature Reviews Genetics 6, 597-610.

Rollins R, Haghighi F, Edwards J, Das R, Zhang M, Ju J, and Bestor TH (2006) Large-scale structure of genomic methylation patterns Genome Res. 16, 157-163.

Steigerwald SD, Pfeifer GP, Riggs AD. (1990) Ligation-mediated PCR improves the sensitivity of methylation analysis by restriction enzymes and detection of specific DNA strand breaks. Nucleic Acids Res. 18, 1435-1439.

Suzuki MM, Bird A (2008) DNA methylation landscapes: provocative insights from epigenomics. Nature Reviews Genetics 9, 465-476.

Trinh BN, Long TI, Laird PW. (2001) DNA methylation analysis by MethyLight technology. Methods 25, 456-462. Waalwijk C, Flavell RA. (1978) DNA methylation at a CCGG sequence in the large intron of the rabbit beta-globin gene: tissue-specific variations. Nucleic Acids Res 5, 4631-4634.

Warnecke PM, Stirzaker C, Song J, Grunau C, Melki JR, Clark SJ. (2002) Identification and resolution of artifacts in bisulfite sequencing .Methods . 27, 101-107.

Warnecke PM, Stirzaker C, Melki JR, Millar DS, Paul CL, Clark SJ. (1997) Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. Nucleic Acids Res. 25, 4422-426.

Claims

Vlhat is claimed is:

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH_2i

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to

with the proviso that, when R is

at least one of R₁₍ R₂ or R₃ is other than H.

The compound of claim 1, wherein R is

wherein Rj and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is O or M' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when

or when R is

and n is 1, at least one of R_x or R₂ is other than H.

3. The compound of claim 1, wherein R is

4. The compound of any one of claims 1-3, wherein R' is H or alkyl .

5 . A composition of matter comprising a compound having the structure :

, wherein R is

wherein R_x/ R₂ and R₃ are independently H, alkyl, aryl, C(0) H₂, C(0)R', CN, N0₂, C(0)R', S(0)₂NHR'; wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8 ,

with the proviso that, when R is

and n is 1, at least one of R_lf R₂ or R₃ is other than H,

attached to a CpG methyltransferase .

6. The composition of matter of claim 5, wherein R is

wherein Ri and R₂ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R' , CN, N0₂, C(0)R' , S(0)₂NHR' ;

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

8. The composition of matter of any one of claims 5-7, wherein R' is H or alkyl .

9. The composition of matter of any one of claims 5-8, wherein the compound is attached to the active site of the CpG methyltransferase .

10. The composition of matter of any one of claims 5-9, wherein the CpG methyltransferase is SssI methyltransferase .

11. The composition of matter of any one of claims 5-9, wherein the CpG methyltransferase is Hhal methyltransferase .

12. The composition of matter of any one of claims 5-9, wherein the CpG methyltransferase is CviJI methyltransferase .

13. A process of producing a derivative of a double-stranded DNA comprising contacting the double-stranded DNA with a CpG methyltransferase and an S-adenosylmethionine analog having the structure:

wherein R is a chemical group capable of being transferred from the S-adenosylmethionine analog by the CpG methyltransferase to a 5-carbon of a non-methylated cytosine of the double-stranded DNA, under conditions such that the chemical group covalently binds to the 5-carbon of the non- methylated cytosine of the double-stranded DNA, and thereby produces the derivative of the double-stranded DNA,

wherein the chemical group has the

wherein R_lt R₂ and R₃ are independently H, alkyl, aryl, C(0) H₂,

C(0)R', CN, N0_2i C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8 ,

with the proviso that, when R is

and n is 1, at least one of R₁₍ R₂ or R₃ is other than H.

14. The process of claim 13, wherein the chemical group has

wherein and R₂ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8, with the proviso that, when R is

or when

than H.

15. The process of claim 13, wherein the chemical group has the structure

16. The process of any one of claims 13-15, wherein R' is H or alkyl .

17. The process of any one of claims 13-16, wherein the CpG methyltransferase is Sssl methyltransferase .

18. The process of any one of claims 13-16, wherein the CpG methyltransferase is Hhal methyltransferase .

19. The process of any one of claims 13-16, wherein the CpG methyltransferase is CViJI methyltransferase .

20. The process of any one of claims 13-19, wherein the chemical group capable of being transferred from the S- adenosylmethionine analog by the CpG methyltransferase to the 5-carbon of the non-methylated cytosine of the double-stranded DNA permits photochemical deamination of a 4-position of the non-methylated cytosine when it is covalently bound to the 5- carbon of the non-methylated cytosine of the double-stranded DNA.

21. The process of any one of claims 13-20, wherein the non- methylated cytosine is immediately adjacent in sequence to a guanine in a single strand of the double-stranded DNA. A method of determining whether a cytosine present within a double-stranded DNA sequence of known sequence is non- methylated comprising:

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂₎

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8;

c) sequencing the single strand so obtained in step b) ; and d) comparing the sequence of the single strand determined in step c) to the sequence of a corresponding strand of the double-stranded DNA of which a derivative has not been produced,

wherein the presence of a uracil analog in the single strand of the derivative single strand instead of a cytosine at a predefined position in the corresponding strand of the double- stranded DNA of which a derivative has not been produced indicates that the cytosine at that position in the double- stranded DNA is non-methylated.

23. The method of s

the structure

wherein R_x and R₂ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

24. The metho e

structure

25. The method of any one of claims 22-24, wherein R' is H or alkyl .

26. The method of any one of claims 22-25, wherein the CpG methyltransferase is SssI methyltransferase .

27. The method of any one of claims 22-25, wherein the CpG methyltransferase is Hhal methyltrans ferase .

28. The method of any one of claims 22-25, wherein the CpG methyltransferase is CviJI methyltransferase .

29. The method of any one of claims 22-28, wherein the non- methylated cytosine is immediately adjacent in sequence to a guanine in a single strand of the double-stranded DNA.

30. The method of any one of claims 22-29, wherein the chemical group capable of being transferred from the S- adenosylmethionine analog by the CpG methyltransferase to the 5 carbon of the non-methylated cytosine of the double-stranded DNA permits photochemical deamination of a 4 position of the non-methylated cytosine when it is covalently bound to the 5 carbon of the non-methylated cytosine of the double-stranded DNA.

31. The method of any one of claims 22-30, wherein in step c) the sequencing is sequencing by synthesis.

32. The method of claim 31, wherein the sequencing by synthesis comprises contacting the derivatized single strand with a DNA polymerase, a primer oligonucleotide, dATP, dCTP, dGTP, dTTP, and a dideoxynucleotide triphosphate having a detectable label attached thereto.

33. The method of claim 32, wherein the detectable label is radioactive or fluorescent.

34. The method of claim 32, wherein the detectable label is a mass tag.

35. The method of any one of claims 22-34, further comprising attaching the single strand to a solid support prior to step c) .

36. A derivatized DNA molecule, wherein the derivatized DNA molecule differs from DNA by comprising a nucleotide residue which comprises a base having the following structure: 89

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0) H₂,

C(0)R'', CN, N0₂, C(0)R", S(0)₂NHR";

wherein X is 0 or NR' ' ;

wherein R' ' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8

with the proviso that, when R' is

37. The derivatized DNA molecule of claim 36, wherein R' is

wherein ¾ and R₂ are independently H, alkyl, aryl, C(0)NH₂, C(0)R'', CN, N0₂, C(0)R", S(0)₂NHR'';

wherein X is O or NR' ' ;

wherein R' ' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

with the proviso that, when R' is

or when R' is

and n is 1, at least one of R_x or R₂ is other than H.

38. The derivatized DNA molecule of claim s

molecule differs from DNA by comprising a nucleotide residue which comprises a base having the following

structure :

wherein R' ' is

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8,

and wherein the sugar is a sugar of the nucleotide residue.

The derivatized DNA molecule of claim 40, wherein R' '

wherein Rj. and R₂ are independently H, alkyl, aryl, C(0)NH₂

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is O or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

42. The derivatized DNA molecule of claim 40, wherein R' ' i

43. The derivatized DNA molecule of any one of claims 40-42, wherein R' is H or alkyl.

44. A kit for derivatizing a double-stranded DNA molecule or for determining whether a cytosine present within a double- stranded DNA sequence of known sequence is non-methylated comprising :

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂₍ C(0)R', S(0)₂NHR';

wherein X is 0 or BR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8; and

b) instructions for use.

The kit of claim 44, wherein R is structure

wherein R_t and R₂ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R'₍ CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

47. The kit of any one of claims 44-46, wherein R' is H or alkyl .

48. The kit of any one of claims 38-40, further comprising a CpG methyltransferase .

49. The kit of claim 41, wherein the CpG methyltransferase is Sssl methyltransferase .

50. The kit of claim 41, wherein the CpG methyltransferase is Hhal methyltransferase .

51. The kit of claim 41, wherein the CpG methyltransferase is CviJI methyltransferase .

52. A method of determining whether a cytosine present within a double-stranded DNA sequence of known sequence is non- methylated comprising:

wherein R₁₍ R₂ and R₃ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8; and

wherein modification with the chemical group R on the cytosine at a predefined position in the double-stranded DNA indicates that the cytosine at that position in the double-stranded DNA is non-methylated.

53. The method of claim 52, wherein the chemical group has

the structure

wherein ¾ and R₂ are independently H, alkyl, aryl, C(0)NH₂,

C(0)R', CN, N0₂, C(0)R', S(0)₂NHR';

wherein X is 0 or NR' ;

wherein R' is H, alkyl or aryl; and

wherein n is an integer from 1 to 8.

54. The metho e

structure

55. The method of any one of claims 52-54, wherein the CpG methyltransferase is Sssl methyltransferase .

56. The method of any one of claims 52-54, wherein the CpG methyltransferase is al methyltransferase .

57. The method of any one of claims 52-54, wherein the CpG methyltransferase is CviJI methyltransferase .

58. The method of any one of claims 52-54, wherein the CpG methyltransferase is M.SssI methyltransferase .

59. The method of any one of claims 52-54, wherein the CpG methyltransferase is M.Hhal methyltransferase .

60. The method of any one of claims 52-54, wherein the CpG methyltransferase is M.CViJI methyltransferase .

61. The method of any one of claims 52-60, wherein determining whether a cytosine at a predefined position in the double- stranded DNA has been modified with the chemical group R comprises converting the modified cytosine to a uracil analog.

62. The method of claim 61 comprising conversion of a modified cysteine residue in the DNA derivative to a uracil analog by a photo-catalyzed reaction.

63. The method of claim 62, wherein the photo-catalyzed reaction is carried out using a Tris (bipyridine) ruthenium (II) chloride (Ru(bpy)₃ ^2*) catalyst.

64. The method of claim 63, wherein the (Ru(bpy)₃ ^2t) catalyst is Ru(bpy)₃Cl₂.

65. The method of claim 63, wherein the (Ru(bpy)₃ ^2*) catalyst is Ru(bpy)₃(PF₆)₂.

66. The method of any one of claims 62-65, wherein the light source for the photo-catalyzed reaction is a household bulb.

67. The method of any one of claims 62-65, wherein the light source for the photo-catalyzed reaction is a laser.

68. The method of claim 67, wherein the laser has a wavelength of 400nm-600nm.