NZ793130A

NZ793130A - Bisulfite-free, base-resolution identification of cytosine modifications

Info

Publication number: NZ793130A
Application number: NZ793130A
Authority: NZ
Inventors: Yibin Liu; Chunxiao Song
Original assignee: Ludwig Institute For Cancer Research Ltd
Priority date: 2018-01-08
Filing date: 2019-01-08
Publication date: 2022-10-28

Abstract

This disclosure provides methods for bisulfite-free identification in a nucleic acid sequence of the locations of 5-methylcytosine, 5- hydroxymethylcytosine, 5-carboxylcytosine and 5- formylcytosine.

Description

This disclosure provides methods for ite-free identification in a nucleic acid sequence of the locations of 5-methylcytosine, 5- hydroxymethylcytosine, 5-carboxylcytosine and 5- formylcytosine.

NZ 793130 BISULFITE-FREE, BASE-RESOLUTION IDENTIFICATION OF CYTOSINE MODIFICATIONS CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the beneﬁt of U.S. Provisional Application Nos. 62/614,798 ﬁled January 8, 2018, 62/660,523 ﬁled April 20, 2018, and 62/771,409 ﬁled November 26, 2018, each ofwhich is incorporated herein by nce in its entirety.

FIELD OF THE INVENTION This sure provides methods for identifying in a nucleic acid ce the locations of 5-methylcytosine, 5-hydroxymethylcytosine, 5-carboxylcytosine and/or 5- forrnylcytosine.

BACKGROUND 5-Methylcytosine (SmC) and 5-hydroxymethylcyto sine (Sth) are the two major epigenetic marks found in the ian genome. 5th is generated ﬁom SmC by the ten- eleven translocation (TET) family enases. Tet can further oxidize 5th to 5- forrnylcytosine (5 1C) and 5-carboxylcyto sine (ScaC), which exists in much lower abundance in the mammalian genome ed to SmC and 5th (10-fold to lOO-fold lower than that of 5th). Together, SmC and 5th play crucial roles in a broad range of biological ses ﬁom gene regulation to normal pment. Aberrant DNA methylation and hydroxymethylation have been associated with various diseases and are well-accepted hallmarks of cancer. Therefore, the determination of SmC and 5th in DNA sequence is not only ant for basic research, but also is valuable for clinical applications, including diagnosis and therapy.

SfC and ScaC are the two ﬁnal oxidized derivatives of SmC and can be ted to unmodiﬁed cytosine by Thyrnine DNA glycosylase (TDG) in base excision repair pathway. Therefore, SfC and ScaC are two important key intermediates in the active demethylation process, which plays important role in embryonic development. SfC and ScaC are found in these contexts and may serve as indicator of nearly complete SmC demethylation. SfC and ScaC may also play additional functions such as bind speciﬁc proteins and affect the rate and speciﬁcity ofRNA polymerase II.

WO 36413 SmC is also a post-transcriptional RNA modiﬁcation that has been identiﬁed in both stable and highly abundant tRNAs and rRNAs, and in mRNAs. In addition, SmC has been detected in snRNA (small nuclear RNA), miRNA (microRNA), lncRNA (long noncoding RNA) and eRNA (enhancer RNA). However, there s to be differences in the occurrence of SmC in speciﬁc RNA types in different organisms. For example, SmC appears not to be t in tRNA and mRNA ﬁom bacteria, while it has been found in tRNA and mRNA in eukaryotes and archaea.

Sth has also been detected in RNA. For example, mRNA ﬁom Drosophila and mouse has been found to contain 5th. The same family of enzymes that oxidize SmC in DNA was ed to catalyze the formation ofSth in mammalian total RNA. In ﬂies, a riptome wide study using methylation RNA irnmunoprecipitation sequencing - seq) with 5th dies, detected the presence of Sth in many mRNA coding sequences, with particularly high levels in the brain. It was also reported that active translation is associated with high 5th levels in RNA, and ﬂies lacking the TET enzyme responsible for Sth deposition in RNA have impaired brain development.

The current gold standard and mo st widely used method for DNA methylation and hydroxymethylation analysis is bisulﬁte sequencing (BS), and its derived methods such as Tet-assisted bisulﬁte sequencing (TAB-Seq) and oxidative bisulﬁte sequencing (oxBS). All of these methods employ bisulﬁte treatment to convert unmethylated cytosine to uracil while leaving SmC and/or 5th intact. Through PCR ampliﬁcation of the te-treated DNA, which reads uracil as thymine, the modiﬁcation information of each ne can be inferred at a single base resolution (where the transition ofC to T provides the location of the unmethylated cytosine). There are, however, at least two main cks to bisulﬁte sequencing. First, bisulﬁte treatment is a harsh chemical reaction, which degrades more than 90% ofthe DNA due to depurination under the required acidic and thermal conditions. This degradation severely limits its application to low-input samples, such as al s including circulating cell-ﬁee DNA and single-cell sequencing. Second, bisulﬁte sequencing relies on the complete conversion of unmodiﬁed cytosine to thymine. Unmodiﬁed cytosine accounts for approximately 95% of the total cytosine in the human genome. Converting all these positions to e ly reduces sequence complexity, leading to poor sequencing quality, low mapping rates, uneven genome coverage and increased sequencing cost.

Bisulﬁte cing methods are also susceptible to false detection of SmC and 5th due to incomplete conversion ofunmodiﬁed cytosine to thymine.

Bisulﬁte sequencing has also been used to detect cytosine methylation in RNA.

Unlike other methods for detecting SmC in RNA such as methylated-RNA- oprecipitation, RNA-bisulﬁte-sequencing (RNA-BS-seq) has the advantage ofbeing able to determine of the extent of methylation of a speciﬁc C position in RNA. RNA-BS-seq, however, suffers from the same drawbacks described above for bisulﬁte sequencing ofDNA.

In ular, the reaction conditions can cause substantial degradation of RNA.

There is a need for a method for DNA methylation and hydroxyrnethylation analysis that is a mild reaction that can detect the d cytosine (SmC and Sth) at base-resolution quantitatively without affecting the unmodiﬁed cytosine. Likewise, there is a need for a method for RNA methylation and hydroxyrnethylation analysis that employs mild reaction conditions and can detect the modiﬁed cytosine quantitatively at base resolution without affecting the unmodiﬁed ne.

SUMMARY OF THE INVENTION The t invention provides methods for identifying the location of one or more of 5-methylcyto sine, 5- hydroxyrnethylcytosine, 5-carboxylcytosine and/or S-formylcytosine in a c acid. The methods described herein provide for DNA or RNA methylation and hydroxymethylation analysis involving mild reactions that detect the modiﬁed cytosine quantitatively with base-resolution t affecting the unmodiﬁed cytosine. Provided herein is a new method for identifying SmC and Sth by combining TET oxidation and reduction by borane derivatives (e.g., pyridine borane and line borane H3)), referred to herein as TAPS (TET Assisted Pyridine borane Sequencing) (Table l). TAPS detects modiﬁcations directly with high sensitivity and speciﬁcity, without affecting unmodiﬁed cytosines, and can be adopted to detect other cytosine modiﬁcations. It is non- destructive, preserving RNA and DNA up to 10 kbs long. Compared with bisulﬁte sequencing, TAPS results in higher mapping rates, more even coverage and lower sequencing costs, enabling higher quality, more comprehensive and r methylome analyses. ions of this method that do not employ the oxidation step are used to identify SfC and/or ScaC as described herein.

In one aspect, the present invention provides a method for identifying 5- cytosine (SmC) in a target c acid comprising the steps of: a. providing a nucleic acid sample comprising the target nucleic acid; b. modifying the nucleic acid comprising the steps of: i. adding a blocking group to the 5-hydroxymethylcytosine (Sth) in the nucleic acid sample; ii. converting the SmC in the nucleic acid sample to 5-carboxylcytosine (ScaC) and/or 5-formylcyto sine (SfC); and iii. converting the ScaC and/or SfC to dihydrouracil (DHU) to provide a modiﬁed nucleic acid sample comprising a modiﬁed target nucleic acid; and c. detecting the sequence of the d target nucleic acid; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target nucleic acid provides the location of a SmC in the target nucleic acid.

In embodiments of the method for identifying SmC in a target nucleic acid, the percentages of the T at each transition location provide a quantitative level of SmC at each location in the target nucleic acid. In embodiments, the nucleic acid is DNA. In other embodiments, nucleic acid is RNA.

In another , the t invention provides a method for identifying SmC or Sth in a target nucleic acid comprising the steps of: a. providing a nucleic acid sample comprising the target nucleic acid; b. modifying the nucleic acid comprising the steps of: i. converting the SmC and Sth in the nucleic acid sample to 5- carboxylcytosine (ScaC) and/or 51C; and ii. converting the ScaC and/or SfC to DHU to provide a modiﬁed nucleic acid sample sing a modiﬁed target c acid; and c. ing the sequence of the modiﬁed target nucleic acid; wherein a cytosine (C) to e (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target nucleic acid provides the location of either a SmC or Sth in the target nucleic acid.

In ments of the method for identifying SmC or Sth, the percentages of the T at each transition on provide a tative level of SmC or Sth at each location in the target nucleic acid. In embodiments, the nucleic acid is DNA. In other embodiments, nucleic acid is RNA.

In another aspect, the invention provides a method for identifying SmC and identifying Sth in a target nucleic acid comprising: a. fying SmC in the target nucleic acid comprising the steps of: i. providing a ﬁrst nucleic acid sample comprising the target nucleic acid; ii. modifying the nucleic acid in the ﬁrst sample comprising the steps of: 1. adding a ng group to the 5-hydroxymethylcyto sine (Sth) in the ﬁrst c acid sample; 2. converting the SmC in the ﬁrst nucleic acid sample to ScaC and/or 51C; 3. converting the ScaC and/or SfC to DHU to provide a modiﬁed ﬁrst DNA sample comprising a d target nucleic acid; iii. optionally amplifying the copy number of the modiﬁed target nucleic acid; iv. detecting the sequence of the modiﬁed target nucleic acid; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target nucleic acid provides the location of a SmC in the target nucleic acid. b. identifying SmC or Sth in the target nucleic acid comprising the steps of: i. providing a second nucleic acid sample comprising the target nucleic acid; ii. modifying the nucleic acid in the second sample comprising the steps of: l. converting the SmC and Sth in the second nucleic acid sample to ScaC and/or SfC; and 2. converting the ScaC and/or SfC to DHU to provide a modiﬁed second nucleic acid sample sing a modiﬁed target nucleic acid; iii. ally amplifying the copy number of the modiﬁed target nucleic acid; iv. detecting the sequence of the modiﬁed target nucleic acid ﬁom the second ; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target nucleic acid es the location of either a SmC or Sth in the target nucleic acid; and c. comparing the s of steps (a) and (b), wherein a C to T transitions present in step (b) but not in step (a) provides the location of Sth in the target nucleic acid.

In embodiments for identifying SmC and identifying Sth in a target nucleic acid, in step (a) the percentages of the T at each transition location provide a quantitative level of SmC in the target nucleic acid; in step (b), the percentages of the T at each transition location provide a quantitative level of SmC or Sth in the target nucleic acid; and in step (c) the differences in percentages for a C to T transition identiﬁed in step (b), but not in step (a) es the tative level of a Sth at each location in the target nucleic acid. In ments, the nucleic acid is DNA. In other embodiments, nucleic acid is RNA.

In embodiments of the invention, the blocking group added to Sth in the nucleic acid sample is a sugar. In embodiments, the sugar is a naturally-occurring sugar or a modiﬁed sugar, for example glucose or a modiﬁed glucose. In embodiments of the invention, the blocking group is added to Sth by contacting the nucleic acid sample with UDP linked to a sugar, for example UDP-glucose or UDP linked to a modiﬁed glucose in the presence of a glucosyltransferase enzyme, for e, T4 bacteriophage B- glucosyltransferase (BGT) and T4 bacteriophage a-glucosyltransferase (aGT) and derivatives and analogs thereof.

In embodiments of the invention, the step of converting the SmC in the nucleic acid sample to ScaC and/or SfC and the step of converting the SmC and Sth in the nucleic acid sample to ScaC and/or SfC each comprises contacting the nucleic acid sample with a ten eleven translocation (TET) enzyme. In further embodiments, the TET enzyme is one or more of human TETl and TET3; murine Tetl, Tet2, and Tet3; Naegleria TET (NgTET); , TET2, Coprinopsis cinerea (CcTET) and derivatives or analogues thereof. In embodiments, the TET enzyme is NgTET.

In another aspect, the ion provides a method for identifying ScaC or SfC in a target nucleic acid comprising the steps of: a. providing a nucleic acid sample comprising the target nucleic acid; b. ting the ScaC and SfC to DHU to provide a modiﬁed nucleic acid sample comprising a modiﬁed target nucleic acid; c. optionally ying the copy number of the modiﬁed target nucleic acid; and d. detecting the sequence of the modiﬁed target nucleic acid; n a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target nucleic acid provides the location of either a ScaC or SfC in the target nucleic acid.

In embodiments of the method for identifying ScaC or SfC in a target nucleic acid, the percentages of the T at each transition location e a quantitative level for ScaC or SfC at each location in the target nucleic acid.

In another aspect, the invention provides a method for identifying ScaC in a target nucleic acid comprising the steps of: a. providing a nucleic acid sample sing the target nucleic acid; b. adding a blocking group to the SfC in the nucleic acid ; c. converting the ScaC to DHU to provide a modiﬁed nucleic acid sample sing a d target nucleic acid; WO 36413 d. optionally amplifying the copy number of the modiﬁed target nucleic acid; and e. determining the sequence of the modiﬁed target nucleic acid; wherein a cytosine (C) to e (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target c acid provides the location of a ScaC in the target nucleic acid.

In embodiments of the method for identifying ScaC in a target nucleic acid, the percentages of the T at each transition location provide a quantitative level for ScaC at each location in the target nucleic acid. In embodiments, the nucleic acid is DNA. In other embodiments, nucleic acid is RNA.

In embodiments of the invention, adding a blocking group to the SfC in the nucleic acid sample comprises contacting the nucleic acid with an aldehyde reactive compound including, for example, hydroxylamine derivatives (such as O-ethylhydroxylamine), hydrazine derivatives, and hydrazide tives.

In r aspect, the invention provides a method for identifying SfC in a target nucleic acid comprising the steps of: a. providing a nucleic acid sample comprising the target nucleic acid; b. adding a blocking group to the ScaC in the c acid sample c. converting the SfC to DHU to provide a modiﬁed nucleic acid sample comprising a modiﬁed target nucleic acid; d. optionally amplifying the copy number of the modiﬁed target nucleic acid; e. detecting the sequence of the d target nucleic acid; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target nucleic acid compared to the target nucleic acid provides the location of a SfC in the target nucleic acid.

In embodiments of the method for identifying SfC in a target nucleic acid, the percentages of the T at each transition location provide a quantitative level for SfC at each location in the target nucleic acid. In embodiments, the nucleic acid (sample and target) is DNA. In other ments, nucleic acid e and target) is RNA.

In embodiments, the step of adding a ng group to the ScaC in the nucleic acid sample comprises contacting the nucleic acid sample with a ylic acid derivatization reagent, ing, for example, l-ethyl-3 -(3- dirnethylaminopropyl)carbodiimide (EDC) and (ii) an amine (such as ethylamine), hydrazine, or hydroxylamine compound.

In embodiments of the invention, the methods above r comprise the step of amplifying the copy number of the d target nucleic acid. In embodiments, this cation step is performed prior to the step of detecting the sequence of the modiﬁed target nucleic acid. The step of amplifying the copy number when the modiﬁed target nucleic acid is DNA may be accomplished by performing the polymerase chain reaction (PCR), primer extension, and/or cloning. When the modiﬁed target nucleic acid is RNA, the step of amplifying the copy number may be accomplished by RT-PCR using oligo(dT) primer (for mRNA), random primers, and/or gene speciﬁc primers.

In embodiments of the invention, the DNA sample ses picogram quantities of DNA. In embodiments of the invention, the DNA sample ses about 1 pg to about 900 pg DNA, about 1 pg to about 500 pg DNA, about 1 pg to about 100 pg DNA, about 1 pg to about 50 pg DNA, about 1 to about 10 pg, DNA, less than about 200 pg, less than about 100 pg DNA, less than about 50 pg DNA, less than about 20 pg DNA, and less than about 5 pg DNA. In other ments of the invention, the DNA sample comprises nanogram quantities ofDNA. In embodiments of the invention, the DNA sample contains about 1 to about 500 ng of DNA, about 1 to about 200 ng ofDNA, about 1 to about 100 ng of DNA, about 1 to about 50 ng of DNA, about 1 ng to about 10 ng ofDNA, about 1 ng to about 5 ng ofDNA, less than about 100 ng of DNA, less than about 50 ng ofDNA less than about 5 ng ofDNA, or less that about 2 ng ofDNA. In embodiments of the invention, the DNA sample comprises ating cell-free DNA (cﬂ)NA). In embodiments of the invention the DNA sample comprises microgram quantities ofDNA.

In embodiments of the invention, the step of converting the ScaC and/or SfC to DHU comprises contacting the nucleic acid sample with a reducing agent including, for e, pyridine , 2-picoline borane (pic-BH3), borane, sodium borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride. In a preferred embodiment, the reducing agent is pic-BH3 and/or pyridine borane.

In embodiments of the invention, the step of determining the sequence of the d target nucleic acid comprises chain termination sequencing, microarray, high- throughput sequencing, and restriction enzyme analysis.

BRIEF DESCRIPTION OF THE FIGURES Fig. 1. Borane—containing compounds screening. Borane-containing compounds were screened for conversion of ScaC to DHU in an llmer oligonucleotide (“oligo”), with conversion rate estimated by MALDI. 2-picoline borane (pic-borane), borane pyridine, and tert-butylamine borane could tely convert 5caC to DHU while ethylenediamine borane and dimethylamine borane gave around 30% conversion rate. No detectable products measured (n.d.) with dicyclohexylamine borane, morpholine borane, 4- methylmorpholine borane, and trirnethylamine borane. Other reducing agents such as sodium borohydride and sodium tri(acetoxy)borohydride decomposed rapidly in acidic media and lead to incomplete conversion. Sodium cyanoborohydride was not used due to potential for en e formation under acidic condition. Pic-borane and pyridine borane were chosen because of complete conversion, low toxicity and high stability.

Fig. 2A-B. Pic-borane reaction on DNA oligos. (A) MALDI terization of 5caC-containing llmer model DNA treated with pic-borane. Calculated mass (m/z) shown above each graph, observed mass shown to the left of the peak. (B) The conversion rates of dC and various cytosine derivatives were quantiﬁed by HPLC-MS/MS. Data shown as mean :: SD of three replicates.

Fig. 3A-B. Single side pic-borane reaction. 1H and 13’C NMR results were in ance with previous report on 2'-deoxy-5,6-dihydrouridine (I. Aparici-Espert et al., J. Org. Chem. 81, 4031-4038 (2016)). (A) 1H NMR (MeOH-d4, 400 MHz) chart of the single nucleoside rane reaction product. 5 ppm: 6.28 (t, 1H, J = 7 Hz), 4.30 (m, 1H), 3.81 (m, 1H), 3.63 (m, 2H), 3.46 (m, 2H), 2.65 (t, 2H, J = 6 Hz), 2.20 (m, 1H), 2.03 (m, 1H).

(B) 13C NMR d4, 400 MHz) chart of the single nucleoside pic-borane reaction product. 5 ppm: 171.56 (CO), 153.54 (CO), 85.97 (CH), 83.86 (CH), 70.99 (CH), 61.92 (CH2), 36.04 (CH2), 35.46 (CH2), 30.49 (CH2).

Fig. 4A-B. A diagram showing (A) borane conversion of 5caC to DHU and a proposed mechanism for borane reaction of 5caC to DHU; and (B) borane sion of 5fC to DHU and a proposed mechanism for borane reaction of 5fC to DHU.

Fig. 5A-B. (A) m showing that the TAPS method converts both 5mC and 5th to DHU, which upon replication acts as thymine. (B) Overview of the TAPS, TAPSB, and CAPS methods.

Fig. 6. MALDI characterization of 5fC and 5caC ning model DNA oligos treated by pic-borane with or without blocking 5fC and 5caC. 5fC and 5caC are converted to dihydrouracil (DHU) with pic-BH3. 5fC was blocked by hydroxylamine derivatives such as O-ethylhydroxylamine (EtONHz) which would become oxime and resist pic-borane conversion. 5caC was blocked by ethylamine via EDC conjugation and converted to amide which blocks conversion by pic-borane. Calculated MS (m/z) shown above each graph, observed MS shown to the left of the peak.

Fig. 7. MALDI characterization of 5mC and 5th containing model DNA oligos treated by KRuO4 and pic-borane with or without blocking of 5th. 5th could be blocked by BGT with glucose and converted to ngC. SmC, Sth and 5ng could not be converted by pic-borane. 5th could be oxidized by KRuO4 to 51C, and then converted to DHU by pic-borane. Calculated MS (m/z) shown above each graph, observed MS shown to the left of the peak.

Fig. 8A-B. Restriction enzyme digestion showed TAPS could effectively convert 5mC to T. (A) Illustration of restriction enzyme digestion assay to conﬁrm sequence change caused by TAPS. (B) Taan-digestion tests to conﬁrm the C-to-T transition caused by TAPS. TAPS was performed on a 222 bp model DNA having a Taan restriction site and containing 5 fully methylated CpG sites (SmC) and its unmethylated control (C).

PCR—ampliﬁed 222 bp model DNA can be cleaved with Taan to ~160 bp and ~60 bp nts as shown in the SmC, C and C TAPS. After TAPS on the methylated DNA, the T(mC)GA sequence is converted to TTGA and is no longer cleaved by Taan ion as shown in the SmC-TAPS lanes.

Fig. 9A-B. TAPS on a 222 bp model DNA and mESC gDNA. (A) Sanger cing results for the 222 bp model DNA containing 5 fully methylated CpG sites and its unmethylated control before (SmC, C) and after TAPS (SmC TAPS, C TAPS). Only SmC is converted to T by the TAPS method. (B) HPLC-MS/MS quantiﬁcation of relative modiﬁcation levels in the mESCs gDNA control, after NgTETl oxidation and after pic- borane reduction. Data shown as mean :: SD of three replicates.

Fig. 10A-D. TAPS caused no cant DNA degradation ed to bisulﬁte. Agarose gel images of 222 bp unmethylated DNA, 222 bp methylated DNA, and mESC gDNA (A) before and (B) after chilling in an ice bath. No detectable DNA degradation was observed after TAPS and DNA remained double-stranded and could be visualized without chilling. Bisulﬁte conversion created degradation and DNA became -stranded and could be ized only after d on ice. (C) Agarose gel images of mESC gDNA ofvarious ﬁagment lengths treated with TAPS and bisulﬁte before (left panel) and after (right panel) cooling down on ice. DNA after TAPS remained double-stranded and could be directly visualized on the gel. Bisulﬁte treatment caused more damage and ntation to the samples and DNA became single-stranded and could be visualized only after d on ice. TAPS conversion was complete for all gDNA regardless of fragment length as shown in Fig. 15. (D) Agarose gel imaging of a 222 bp model DNA before and after TAPS (three ndent repeats) showed no detectable degradation after the reaction.

Fig. 11. Comparison of ampliﬁcation curves and melting curves between model DNAs before and after TAPS. qPCR assay showed minor difference on model DNAs before and after TAPS in ampliﬁcation curves. Melting curve of methylated DNA (SmC) shifted to lower temperature after TAPS indicated le Tm-decreasing C-to-T transition while there was no shift for unmethylated DNA (C).

Fig. 12. Complete C-to-T transition induced after TAPS, TAPSﬂ and CAPS as demonstrated by Sanger sequencing. Model DNA containing single methylated and single hydroxymethylated CpG sites was prepared as described herein. TAPS conversion was done following NgTETl oxidation and pyridine borane reduction protocol as described herein. TAPSB conversion was done following 5th blocking, NgTETl Oxidation and Pyridine borane ion protocol. CAPS conversion was done following 5th oxidation and Pyridine borane reduction protocol. Alter conversion, 1 ng of ted DNA sample was PCR ampliﬁed by Taq DNA Polymerase and sed for Sanger sequencing. TAPS converted both SmC and Sth to T. TAPSB selectively converted SmC s CAPS selectively converted 5th. None of the three methods caused conversion on unmodiﬁed ne and other bases.

Fig. 13A-B. (A) TAPS is compatible with various DNA and RNA polymerase and induces complete C-to-T transition as shown by Sanger sequencing. Model DNA containing methylated CpG sites for the polymerase test and primer sequences are described .

After TAPS treatment, SmC was converted to DHU. KAPA HiFi Uracil plus polymerase, Taq polymerase, and Vent exo-polymerase would read DHU as T and therefore induce te C-to-T transition after PCR. Alternatively, primer extension was done with biotin-labelled primer and isothermal polymerases including Klenow ﬁagment, Bst DNA polymerase, and phi29 DNA polymerase. The newly synthesized DNA strand was separated by Dynabeads MyOne Streptavidin C1 and then ampliﬁed by PCR with Taq polymerase and processed for Sanger sequencing. T7 RNA polymerase could efﬁciently bypass DHU and insert adenine opposite to DHU site, which is proved by RT-PCR and Sanger sequencing. (B) Certain other commercialized polymerases did not amplify DHU containing DNA efﬁciently. After TAPS treatment, SmC was converted to DHU. KAPA HiFi Uracil plus rase and Taq polymerase would read DHU as T and therefore induce te C-to-T transition. Low or no C-to-T transition was ed with certain other commercialized polymerases including KAPA HiFi polymerase, Pfu rase, Phusion rase and NEB Q5 polymerase (not shown).

Fig. 14. DHU does not show PCR bias compared to T and C. Model DNA containing one DHU/U/T/C modiﬁcation was synthesized with the corresponding DNA oligos as described in . Standard curves for each model DNA with DHU/U/T/C modiﬁcation were plotted based on qPCR reactions with 1:10 serial ons of the model DNA input (from 0.1 pg to 1 ng, every qPCR experiment was run in triplicates). The slope of the regression n the log concentration (ng) values and the average Ct values was calculated by SLOPE on in Excel. PCR efﬁciency was calculated using the following equation: Efﬁciency% = (10"(-1/Slope)-1)*100%. Ampliﬁcation factor was calculated using the following equation: Ampliﬁcation factor=10A(-1/Slope). PCR efﬁciency for model DNAs with DHU or T or C modiﬁcation were almost the same, which demonstrated that DHU could be read through as a regular base and would not cause PCR bias.

Fig. ISA-B. TAPS completely converted 5mC to T regardless of DNA fragment length. (A) Agaro se gel images of Taan-digestion assay conﬁrmed complete 5mC to T conversion in all s regardless ofDNA fragment lengths. 194 bp model sequence from lambda genome was PCR ampliﬁed after TAPS and digested with Taan enzyme. PCR product ampliﬁed ﬁom unconverted sample could be cleaved, whereas ts ampliﬁed on TAPS treated samples stayed intact, suggesting loss of restriction site and hence complete 5mC-to-T transition. (B) The C-to-T conversion percentage was estimated by gel band ﬁcation and shown 100% for all DNA ﬁagment lengths tested.

Fig. 16. The conversion and false positive for different TAPS conditions. The combination ofmTet1 and pyridine borane achieved the highest conversion rate of methylated C , calculated with fully CpG methylated Lambda DNA) and the lowest conversion rate diﬁed C (0.23%, calculated with 2 kb unmodiﬁed spike-in), compared to other conditions with NgTETl or pic-borane. Showing above bars the conversion rate +/- SE of all tested cytosine sites.

Fig. 17A-B. Conversion rate on short spike-ins. (A) 120mer—1 and (B) - 2 ning 5mC and 5th. Near te conversion was archived on 5mC and 5th sites ﬁom both strands. Actual sequences with modiﬁcation status shown on top and bottom.

Fig. 18A-E. Improved sequencing quality of TAPS over Whole Genome Bisulﬁte Sequencing (WGBS). (A) Conversion rate of5mC and Sth in TAPS-treated DNA. Left: Synthetic spike-ins (CpN) methylated or hydroxymethylated at known positions.

(B) False positive rate of TAPS ﬁom unmodiﬁed 2 kb in. (C) Total run time of TAPS and WGBS when processing 1 million simulated reads on one core of one Intel Xeon CPU.

(D) Fraction of all sequenced read pairs (after trimming) mapped to the genome. (E) Sequencing quality scores per base for the ﬁrst and second reads in all sequenced read pairs, as ed by Illumina ace. Top: TAPS. Bottom: WGBS.

Fig. 19A-B. TAPS resulted in more even coverage and fewer uncovered positions than WGBS. Comparison ofdepth of coverage across (A) all bases and (B) CpG sites between WGBS and TAPS, computed on both strands. For "TAPS (down-sampled)”, random reads out of all mapped TAPS reads were selected so that the median coverage matched the median coverage ofWGBS. Positions with coverage above 50X are shown in the last bin.

Fig. 20. Distribution of modiﬁcation levels across all chromosomes. Average modiﬁcation levels in 100 kb windows along mouse chromosomes, weighted by the coverage of CpG, and smoothed using a Gaussian weighted moving average ﬁlter with window size Fig. 21A-E. Comparison of genome-wide methylome measurements by TAPS and WGBS. (A) Average cing coverage depth in all mouse CpG islands (binned into windows) and 4 kbp ﬂanking regions (binned into 50 equally sized s). To account for differences in cing depth, all mapped TAPS reads were down-sampled to match the median number ofmapped WGBS reads across the genome. (B) CpG sites covered by at least three reads by TAPS alone, both TAPS and WGBS, or WGBS alone. (C) Number of CpG sites d by at least three reads and ation level > 0.1 detected by TAPS alone, TAPS and WGBS, or WGBS alone. (D) Example of the chromosomal distribution of modiﬁcation levels (in %) for TAPS and WGBS. Average ﬁaction ofmodiﬁed Cst per 100 kb windows along mouse chromosome 4, smoothed using a Gaussian-weighted moving average ﬁlter with window size 10. (E) Heatmap representing the number of CpG sites covered by at least three reads in both TAPS and WGBS, broken down by modiﬁcation levels as measured by each method. To improve st, the ﬁrst bin, containing Cst unmodiﬁed in both methods, was excluded ﬁom the color scale and is denoted by a star.

Fig. 22. ation levels around CpG Islands. e ation levels in CpG islands (binned into 20 windows) and 4 kbp ﬂanking regions (binned into 50 equally sized windows). Bins with coverage below 3 reads were ignored.

Fig. 23A-B. TAPS exhibits smaller coverage-modiﬁcation bias than WGBS. All CpG sites were binned according to their coverage and the mean (circles) and the median (triangles) modiﬁcation value is shown in each bin for WGBS (A) and TAPS (B). The CpG sites covered by more than 100 reads are shown in the last bin. The lines represent a linear ﬁt through the data points.

Fig. 24A-C. Low-input gDNA and cell-free DNA TAPS prepared with dsDNA and ssDNA library preparation kits. (A) Sequencing libraries were successfully constructed with down to 1 ng of murine embryonic stem cell (mESC) genomic DNA (gDNA) with dsDNA library kits NEBNext Ultra II or KAPA Hyperplus kits. ssDNA library kit Accel-NGS Methyl-Seq kit was used to further lower the input DNA amount down to (B) 0.01 ng ofmESC gDNA or (C) 1 ng of cell-free DNA.

Fig. 25A-B. Low-input gDNA and cell-free DNA TAPS libraries prepared with dsDNA KAPA lus library preparation kit. cing libraries were successfully constructed with as little as 1 ng of (A) mESC gDNA and (B) ree DNA with KAPA Hyperplus kit. Cell-ﬂee DNA has a sharp length bution around 160 bp (nucleosome size) due to plasma nuclease digestion. After library construction, it becomes ~300 bp, which is the sharp band in (B).

Fig. 26A-D. High-quality cell-free DNA TAPS. (A) Conversion rate of 5mC in TAPS-treated chNA. (B) False positive rate in reated chNA. (C) Fraction of all sequenced read pairs that were uniquely mapped to the genome. (D) Fraction of all sequenced read pairs that were uniquely mapped to the genome and after removal ofPCR duplication reads. CHG and CHH are non-CpG ts.

Fig. 27. TAPS can detect genetic variants. Methylation (MOD, top row) and C- to-T SNPs (bottom row) showed distinct base distribution patterns in original top strand (OT)/original bottom strand (OB), left column, and in strands complementary to OT (CTOT) and OB , right .

DETAILED DESCRIPTION OF THE INVENTION The present invention provides a bisulﬁte-free, base-resolution method for detecting 5mC and Sth in a sequence, herein named TAPS. TAPS consists of mild tic and chemical reactions to detect 5mC and 5th directly and quantitatively at base-resolution without affecting unmodiﬁed cytosine. The present invention also es s to detect 51C and 5caC at base resolution without affecting unmodiﬁed cytosine.

Thus, the methods provided herein provide mapping of 5mC, Sth, SfC and 5caC and overcome the disadvantages ofprevious methods such as bisulﬁte sequencing.

Table 1. Comparison ofBS and related methods versus TAPS, TAPSB and CAPS for 5mC and Sth sequencing.

Sth C C T T C T Methods for Identifying SmC In one aspect, the t invention provides a method for identifying 5- methylcytosine (SmC) in a target DNA comprising the steps of: a. providing a DNA sample comprising the target DNA; b. modifying the DNA comprising the steps of: i. adding a blocking group to the 5-hydroxymethylcyto sine (Sth) in the DNA sample; ii. converting the SmC in the DNA sample to 5-carboxylcytosine (ScaC) and/or 5-formylcyto sine (SfC); and iii. converting the ScaC and/or SfC to DHU to provide a modiﬁed DNA sample comprising a modiﬁed target DNA; c. detecting the sequence of the modiﬁed target DNA; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target DNA compared.

In embodiments of the method for identifying SmC in the target DNA, the method provides a quantitative measure for the frequency the of SmC modiﬁcation at each location where the modiﬁcation was identiﬁed in the target DNA. In embodiments, the percentages of the T at each transition location provide a tative level of SmC at each location in the target DNA.

In order to identify SmC in a target DNA without including 5th, the Sth in the sample is blocked so that it is not subject to sion to ScaC and/or SfC. In the methods of the present invention, 5th in the sample DNA are ed non-reactive to the subsequent steps by adding a blocking group to the Sth. In one embodiment, the blocking group is a sugar, including a d sugar, for example glucose or 6-azide-glucose (6- azidodeoxy-D-glucose). The sugar blocking group is added to the ymethyl group ofSth by ting the DNA sample with e diphosphate (UDP)-sugar in the presence of one or more glucosyltransferase enzymes.

In embodiments, the glucosyltransferase is T4 bacteriophage B-glucosyltransferase (BGT), T4 bacteriophage a-glucosyltransferase (aGT), and tives and analogs thereof.

BGT is an enzyme that zes a chemical reaction in which a beta-D-glucosyl (glucose) residue is transferred ﬁom UDP-glucose to a 5-hydroxymethylcytosine residue in a nucleic acid.

By stating that the blocking group is, for example, e, this refers to a glucose moiety (e. g., a beta-D-glucosyl residue) being added to 5th to yield glucosyl 5- hydroxymethyl ne. The sugar blocking group can be any sugar or modiﬁed sugar that is a substrate of the glucosyltransferase enzyme and blocks the subsequent conversion of the 5th to ScaC and/or SfC. The step of converting the SmC in the DNA sample to ScaC and/or SfC is then accomplished by the methods provided herein, such as by oxidation using a TET enzyme. And converting the ScaC and/or SfC to DHU is lished by the methods provided herein, such by borane oxidation.

The method for identifying 5-methylcytosine (SmC) can be performed on an RNA sample to identify the location of, and provide a quantitative measure of, SmC in a target Methods for Identifying 5111C or 5th (together) In another aspect, the present invention provides a method for identifying SmC or 5th in a target DNA comprising the steps of: a. providing a DNA sample comprising the target DNA; b. modifying the DNA comprising the steps of: i. ting the SmC and 5th in the DNA sample to 5- carboxylcytosine (ScaC) and/or 51C; and ii. converting the ScaC and/or SfC to DHU to provide a modiﬁed DNA sample comprising a modiﬁed target DNA; c. detecting the sequence of the modiﬁed target DNA; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target DNA compared to the target DNA provides the location of either a SmC or 5th in the target In ments of the method for identifying SmC or 5th in the target DNA, the method provides a quantitative measure for the frequency the of SmC or 5th ations at each location where the modiﬁcations were identiﬁed in the target DNA. In ments, the tages of the T at each transition on provide a quantitative level of SmC or 5th at each location in the target DNA.

This method for identifying SmC or 5th provides the location ofSmC and 5th, but does not guish between the two cytosine modiﬁcations. Rather, both SmC and 5th are converted to DHU. The presence ofDHU can be detected directly, or the modiﬁed DNA can be replicated by known methods where the DHU is converted to T.

The method for identifying SmC or Sth can be performed on an RNA sample to identify the location of, and provide a quantitative measure of, SmC or Sth in a target Methods for fying 5111C and Identifying Sth The present invention es a method for identifying SmC and identifying Sth in a target DNA by (i) performing the method for fying SmC on a ﬁrst DNA sample described herein, and (ii) performing the method for identifying SmC or Sth on a second DNA sample described herein. The location of SmC is provided by (i). By comparing the results of (i) and (ii), wherein a C to T transitions detected in (ii) but not in (i) es the location ofSth in the target DNA. In ments, the ﬁrst and second DNA samples are derived ﬁom the same DNA sample. For example, the ﬁrst and second s may be separate aliquots taken ﬁom a sample comprising DNA to be analyzed.

Because the SmC and Sth (that is not blocked) are converted to SfC and ScaC before conversion to DHU, any existing SfC and ScaC in the DNA sample will be detected as SmC and/or 5th. However, given the extremely low levels of SfC and ScaC in genomic DNA under normal conditions, this will often be acceptable when analyzing methylation and hydroxymethylation in a DNA sample. The SfC and ScaC signals can be eliminated by protecting the SfC and ScaC ﬁom conversion to DHU by, for example, hydroxylamine conjugation and EDC coupling, respectively.

The above method identiﬁes the locations and percentages ofSth in the target DNA through the ison ofSmC locations and tages with the locations and percentages of SmC or 5th (together). Alternatively, the location and frequency of Sth modiﬁcations in a target DNA can be ed directly. Thus, in one aspect the invention provides a method for identifying Sth in a target DNA comprising the steps of: a. providing a DNA sample comprising the target DNA; b. modifying the DNA comprising the steps of: i. converting the Sth in the DNA sample to SfC; and ii. converting the SfC to DHU to provide a modiﬁed DNA sample comprising a modiﬁed target DNA; c. detecting the sequence of the modiﬁed target DNA; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target DNA compared to the target DNA provides the location of a Sth in the target DNA.

In ments, the step of converting the Sth to SfC comprises oxidizing the Sth to SfC by contacting the DNA with, for e, potassium perruthenate (KRuO4) (as WO 36413 described in Science. 2012, 33, 7 and W02013017853, incorporated herein by reference); or Cu(II)/TEMPO (copper(II) perchlorate and 6-tetramethylpiperidine oxyl (TEMPO)) (as described in Chem. Commun., 2017,53, 5756-5759 and 039002, incorporated herein by nce). The SfC in the DNA sample is then converted to DHU by the methods disclosed herein, e.g., by the borane reaction.

The method for identifying SmC and identifying Sth can be performed on an RNA sample to identify the location of, and provide a quantitative measure of, SmC and Sth in a target RNA.

Methods for Identifying SeaC or 51C In one aspect, the invention es a method for identifying ScaC or SfC in a target DNA comprising the steps of: a. providing a DNA sample comprising the target DNA; b. converting the ScaC and/or SfC to DHU to provide a modiﬁed DNA sample comprising a modiﬁed target DNA; c. optionally amplifying the copy number of the modiﬁed target DNA; d. detecting the sequence of the modiﬁed target DNA; wherein a cytosine (C) to thymine (T) transition in the ce of the d target DNA compared to the target DNA provides the location of either a ScaC or SfC in the target DNA.

This method for identifying SfC or ScaC provides the location of SfC or ScaC, but does not distinguish between these two cytosine modiﬁcations. Rather, both SfC and ScaC are converted to DHU, which is detected by the methods described herein.

Methods for fying SeaC In r aspect, the invention provides a method for identifying ScaC in a target DNA comprising the steps of: a. providing a DNA sample comprising the target DNA; b. adding a blocking group to the SfC in the DNA sample; c. converting the ScaC to DHU to provide a modiﬁed DNA sample comprising a modiﬁed target DNA; d. optionally amplifying the copy number of the modiﬁed target DNA; and e. determining the sequence of the modiﬁed target DNA; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target DNA compared to the target DNA provides the on of a ScaC in the target DNA.

In embodiments of the method for identifying ScaC in the target DNA, the method provides a quantitative measure for the frequency the of ScaC modiﬁcation at each location where the modiﬁcation was identiﬁed in the target DNA. In embodiments, the percentages of the T at each tion location provide a quantitative level of ScaC at each location in the target DNA.

In this method, SfC is d (and SmC and Sth are not converted to DHU) allowing identiﬁcation of ScaC in the target DNA. In embodiments of the invention, adding a blocking group to the SfC in the DNA sample comprises contacting the DNA with an aldehyde ve compound including, for example, hydroxylamine derivatives, hydrazine derivatives, and hyrazide derivatives. ylamine derivatives include ashydroxylamine; hydroxylamine hydrochloride; hydroxylammonium acid sulfate; hydroxylamine phosphate; O-methylhydroxylamine; O-hexylhydroxylamine; O-pentylhydroxylamine; O- benzylhydroxylamine; and particularly, O-ethylhydroxylamine (EtONH2), O-alkylated or O- arylated hydroxylamine, acid or salts thereof. Hydrazine tives include N- alkylhydrazine, N-arylhydrazine, N- benzylhydrazine, alkylhydrazine, N,N- diarylhydrazine, N,N-dibenzylhydrazine, N,N-alkylbenzylhydrazine, N,N- arylbenzylhydrazine, and N,N-alkylarylhydrazine. Hydrazide derivatives e - esulfonylhydrazide, N-acylhydrazide, N,N-alkylacylhydrazide, N,N- benzylacylhydrazide, N,N-arylacylhydrazide, N-sulfonylhydrazide, N,N- alkylsulfonylhydrazide, N,N-benzylsulfonylhydrazide, and N,N-arylsulfonylhydrazide.

The method for identifying ScaC can be performed on an RNA sample to identify the location of, and provide a quantitative measure of, ScaC in a target RNA.

Methods for Identifying SfC In another aspect, the invention provides a method for identifying SfC in a target DNA comprising the steps of: a. providing a DNA sample comprising the target DNA; b. adding a blocking group to the ScaC in the DNA sample; c. converting the SfC to DHU to provide a modiﬁed DNA sample comprising a modiﬁed target DNA; d. optionally ying the copy number of the modiﬁed target DNA; e. detecting the sequence of the modiﬁed target DNA; wherein a cytosine (C) to thymine (T) transition in the sequence of the modiﬁed target DNA compared to the target DNA provides the location of a SfC in the target DNA.

In embodiments of the method for identifying SfC in the target DNA, the method provides a quantitative measure for the ncy the of SfC modiﬁcation at each on where the modiﬁcation was identiﬁed in the target DNA. In embodiments, the percentages of the T at each transition location provide a quantitative level of SfC at each location in the target DNA.

Adding a ng group to the ScaC in the DNA sample can be accomplished by (i) contacting the DNA sample with a coupling agent, for example a ylic acid derivatization reagent like carbodiimide derivatives such as l-ethyl-3 -(3- dimethylaminopropyl)carbodiimide (EDC) or N,N'-dicyclohexylcarbodiimide (DCC) and (ii) contacting the DNA sample with an amine, hydrazine or hydroxylamine compound. Thus, for example, ScaC can be blocked by treating the DNA sample with EDC and then benzylamine, ethylamine or other amine to form an amide that blocks ScaC from conversion to DHU by, e.g., pic-BH3. Methods for EDC-catalyzed ScaC coupling are bed in W02014l 65770, and are incorporated herein by reference.

The method for identifying SfC can be performed on an RNA sample to identify the location of, and e a quantitative e of, SfC in a target RNA. c Acid Sample / Target c Acid The present invention provides methods for identifying the location of one or more of 5-methylcyto sine, 5-hydroxymethylcyto sine, oxylcytosine and/or 5-formylcytosine in a target nucleic acid quantitatively with base-resolution without affecting the unmodiﬁed cytosine. In embodiments, the target nucleic acid is DNA. In other embodiments, the target nucleic acid is RNA. Likewise the nucleic acid sample that comprises the target nucleic acid may be a DNA sample or an RNA sample.

The target nucleic acid may be any c acid having cytosine modiﬁcations (i.e., SmC, 5th, SfC, and/or ScaC). The target nucleic acid can be a single nucleic acid molecule in the sample, or may be the entire population of nucleic acid molecules in a sample (or a subset thereof). The target nucleic acid can be the native nucleic acid from the source (e.g., cells, tissue samples, etc.) or can pre-converted into a hroughput sequencing-ready form, for example by ﬁagmentation, repair and ligation with adaptors for cing. Thus, target c acids can comprise a plurality of nucleic acid sequences such that the methods described herein may be used to generate a library of target nucleic acid sequences that can be analyzed individually (e. g., by determining the sequence of individual targets) or in a group (e.g., by high-throughput or next generation sequencing methods).

A nucleic acid sample can be obtained from an organism from the Monera (bacteria), Protista, Fungi, Plantae, and Animalia Kingdoms. Nucleic acid s may be obtained from a from a patient or subject, from an environmental sample, or from an organism of interest. In embodiments, the nucleic acid sample is extracted or derived from a cell or collection of cells, a body ﬂuid, a tissue sample, an organ, and an organelle.

RNA Sample / Target RNA The present invention provides methods for identifying the location of one or more of 5-methylcyto sine, 5- hydroxymethylcytosine, 5-carboxylcytosine and/or 5-formylcytosine in a target RNA quantitatively with base-resolution without affecting the unmodiﬁed cytosine. In embodiments, the RNA is one or more ofmRNA (messenger RNA), tRNA (transfer RNA), rRNA (ribosomal RNA), snRNA (small nuclear RNA), miRNA (microRNA), lncRNA (long noncoding RNA) and eRNA (enhancer RNA). The target RNA can be a single RNA molecule in the sample, or may be the entire population ofRNA les in a sample (or a subset thereof). Thus, target RNA can comprise a plurality of RNA sequences such that the methods described herein may be used to generate a y of target RNA sequences that can be analyzed dually (e.g., by determining the sequence of individual targets) or in a group (e.g., by hroughput or next tion sequencing methods).

DNA Sample / Target DNA The methods of the invention utilize mild enzymatic and chemical reactions that avoid the substantial degradation associated with methods like bisulﬁte sequencing. Thus, the methods of the present invention are useful in analysis of low-input samples, such as ating cell-ﬂee DNA and in single-cell analysis.

In embodiments of the invention, the DNA sample comprises picogram quantities ofDNA. In embodiments of the invention, the DNA sample comprises about 1 pg to about 900 pg DNA, about 1 pg to about 500 pg DNA, about 1 pg to about 100 pg DNA, about 1 pg to about 50 pg DNA, about 1 to about 10 pg, DNA, less than about 200 pg, less than about 100 pg DNA, less than about 50 pg DNA, less than about 20 pg DNA, and less than about 5 pg DNA. In other embodiments of the invention, the DNA sample comprises nanogram ties ofDNA. The sample DNA for use in the methods of the ion can be any quantity including, DNA ﬁom a single cell or bulk DNA samples. In embodiments, the methods of the present invention can be performed on a DNA sample comprising about 1 to about 500 ng of DNA, about 1 to about 200 ng ofDNA, about 1 to about 100 ng of DNA, about 1 to about 50 ng of DNA, about 1 to about 10 ng of DNA, about 2 to about 5 ng of DNA, less than about 100 ng ofDNA, less than about 50 ng ofDNA less than 5 ng, and less than 2 ng of DNA. In ments of the invention the DNA sample comprises microgram quantities ofDNA.

A DNA sample used in the s described herein may be from any source including, for example a body ﬂuid, tissue sample, organ, organelle, or single cells. In embodiments, the DNA sample is circulating ree DNA (cell-ﬂee DNA or chNA), which is DNA found in the blood and is not present within a cell. cﬂ)NA can be ed from blood or plasma using methods known in the art. Commercial kits are available for isolation of chNA including, for example, the Circulating Nucleic Acid Kit (Qiagen). The DNA sample may result from an enrichment step, including, but is not limited to antibody immunoprecipitation, chromatin precipitation, restriction enzyme digestion-based enrichment, ization-based enrichment, or chemical labeling-based enrichment.

The target DNA may be any DNA having cytosine modiﬁcations (i.e., 5mC, 5th, 51C, and/or 5caC) including, but not limited to, DNA fragments or genomic DNA puriﬁed ﬁom tissues, organs, cells and lles. The target DNA can be a single DNA molecule in the sample, or may be the entire population ofDNA molecules in a sample (or a subset thereof). The target DNA can be the native DNA from the source or pre-converted into a high-throughput sequencing-ready form, for example by fragmentation, repair and ligation with adaptors for sequencing. Thus, target DNA can comprise a plurality ofDNA sequences such that the methods described herein may be used to generate a library of target DNA sequences that can be analyzed individually (e.g., by determining the sequence of individual targets) or in a group (e.g., by high-throughput or next generation sequencing methods).

Converting 5mC and Sth to ScaC and/or SfC Embodiments of the present invention, such as the TAPS method described herein, e the step of ting the 5mC and 5th (or just the 5mC if the 5th is blocked) to 5caC and/or 51C In embodiments of the invention, this step comprises contacting the DNA or RNA sample with a ten eleven translocation (TET) enzyme. The TET enzymes are a family of enzymes that catalyze the transfer of an oxygen molecule to the N5 methyl group on 5mC ing in the formation of 5-hydroxymethylcytosine (Sth). TET r catalyzes the oxidation of 5th to SfC and the oxidation of SfC to form 5caC (see Fig. 5A).

TET enzymes useful in the methods of the invention include one or more ofhuman TETl TET2, and TET3; murine Tetl, Tet2, and Tet3; Naegleria TET (NgTET); Coprinopsis cinerea (CcTET) and derivatives or analogues thereof. In embodiments, the TET enzyme is NgTET. In other embodiments the TET enzyme is human TETl (hTETl ).

Converting ScaC and/or SfC to DHU WO 36413 Methods of the t invention include the step of ting the 5caC and/or 5fC in a nucleic acid sample to DHU. In embodiments of the invention, this step comprises contacting the DNA or RNA sample with a reducing agent ing, for example, a borane reducing agent such as pyridine borane, 2—picoline borane (pic-BH3), borane, sodium borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride. In a preferred embodiment, the reducing agent is pyridine borane and/or pic-BH3.

Amplifying the copy number of modiﬁed target nucleic acid The methods of the invention may optionally include the step of ying (increasing) the copy number of the modiﬁed target Nucleic acid by s known in the art. When the modiﬁed target nucleic acid is DNA, the copy number can be increased by, for example, PCR, cloning, and primer extension. The copy number of individual target DNAs can be ampliﬁed by PCR using primers speciﬁc for a ular target DNA sequence.

Alternatively, a plurality of different modiﬁed target DNA sequences can be ampliﬁed by cloning into a DNA vector by standard techniques. In embodiments of the invention, the copy number of a plurality of different modiﬁed target DNA sequences is increased by PCR to generate a library for next generation sequencing where, e.g., double-stranded adapter DNA has been previously ligated to the sample DNA (or to the modiﬁed sample DNA) and PCR is performed using primers mentary to the adapter DNA.

Detecting the ce of the Modiﬁed Target c acid In embodiments of the invention, the method comprises the step of detecting the sequence of the modiﬁed target nucleic acid. The modiﬁed target DNA or RNA contains DHU at positions where one or more of 5mC, Sth, 5fC, and 5caC were present in the unmodiﬁed target DNA or RNA. DHU acts as a T in DNA replication and sequencing s. Thus, the cytosine modiﬁcations can be detected by any direct or indirect method that identiﬁes a C to T transition known in the art. Such methods include sequencing methods such as Sanger sequencing, microarray, and next generation sequencing s.

The C to T transition can also be detected by restriction enzyme analysis where the C to T transition abolishes or introduces a restriction endonuclease ition sequence.

The invention additionally provides kits for identiﬁcation of 5mC and Sth in a target DNA. Such kits comprise reagents for ﬁcation of 5mC and 5th by the methods described herein. The kits may also contain the reagents for identiﬁcation of 5caC and for the identiﬁcation of SfC by the methods described herein. In embodiments, the kit comprises a TET enzyme, a borane reducing agent and instructions for performing the . In further embodiments, the TET enzyme is TETl and the borane reducing agent is selected ﬁom one or more of the group consisting ofpyridine borane, 2-picoline borane (pic- BH3 ), borane, sodium borohydride, sodium cyanoborohydride, and sodium toxyborohydride. In a further ment, the TETl enzyme is NgTetl or murine Tetl and the borane reducing agent is pyridine borane and/or pic-BH3.

In ments, the kit further comprises a 5th blocking group and a glucosyltransferase enzyme. In further embodiments, the Sth blocking group is uridine diphosphate sugar where the sugar is glucose or a glucose derivative, and the glucosyltransferase enzyme is T4 bacteriophage B-glucosyltransferase (BGT), T4 bacteriophage a-glucosyltransferase (aGT), and derivatives and analogs thereof.

In embodiments the kit further comprises an oxidizing agent selected from potassium perruthenate (KRuO4) and/or Cu(II)/TEMPO (copper(II) perchlorate and 2,2,6,6- tetramethylpiperidine-l -oxyl (TEMPO)).

In embodiments, the kit comprises reagents for blocking SfC in the nucleic acid sample. In embodiments, the kit comprises an aldehyde reactive compound including, for example, hydroxylamine derivatives, ine derivatives, and hyrazide derivatives as described herein. In embodiments, the kit comprises ts for blocking 5caC as described herein.

In embodiments, the kit comprises reagents for isolating DNA or RNA. In embodiments the kit comprises reagents for ing low-input DNA ﬁom a sample, for example cﬂ)NA from blood, plasma, or serum.

EXAMPLES Methods Preparation ofmodel DNA.

DNA oligos for MALDI and HPLC-MS/MS test. DNA oligonucleotides (“oligos”) with C, 5mC and 5th were purchased from Integrated DNA Technologies (IDT). All the ces and ations could be found in Figs. 6 and 7. DNA oligo with SfC was synthesized by the C-tailing method: DNA oligos 5'-GTCGACCGGATC-3' and 5'- TTGGATCCGGTCGACTT-3' were annealed and then incubated with 5-formyl-2'-dCTP (Trilink h) and Klenow Fragment 3 '—>5' exo- (New England Biolabs) in NEBuffer 2 for 2 hr at 37°C. The product was puriﬁed with in P-6 Gel Columns (Bio-Rad).

DNA oligo with 5caC was synthesized using Expedite 8900 Nucleic Acid Synthesis System with standard phosphoramidites ) 5-Carboxy-dC-CE Phosphoramidite (Glen Research). Subsequent ection and puriﬁcation were carried out with Glen-Pak Cartridges (Glen Research) according to the manufacturer’s instructions.

Puriﬁed ucleotides were characterized by Voyager-DE MALDI-TOF (matrix-assisted laser desorption ionization time-of-ﬂight) Biospectrometry ation. 222 bp Model DNA for conversion test. To generate 222 bp model DNA containing ﬁve CpG sites, bacteriophage lambda DNA (Thermo Fisher) was PCR ampliﬁed using Taq DNA Polymerase (New England s) and puriﬁed by AMPure XP beads an Coulter). Primers ces are as follows: FW-5'- CCTGATGAAACAAGCATGTC-3', RV-5'-CAUTACTCACUTCCCCACUT-3'. The uracil base in the reverse strand of PCR product was removed by USER enzyme (New England Biolabs). 100 ng ofpuriﬁed PCR product was then methylated in 20 pl solution containing 1X er 2, 0.64 mM S-adenosylmethionine and 20 U M.SssI CpG Methyltransferase (New England Biolab s) for 2 hr at 37°C, followed by 20 min heat inactivation at 65°C. The methylated 222 bp model DNA was puriﬁed by AMPure XP beads.

Model DNA for TAPS, TAPSﬂ and CAPS validation with Sanger sequencing. 34 bp DNA oligo containing single 5mC and single 5th site was annealed with other DNA oligos in annealing buffer containing 5 mM Tris-Cl (pH 7.5), 5 mM MgC12, and 50 mM NaCl, and then ligated in a reaction containing 400 U T4 ligase (NEB) at 25°C for 1 hr and puriﬁed by 1.8X AMPure XP beads.

DNA Sequence (5' to 3') 34 b mC and th CCCGA“‘CGCATGATCTGTACTTGATCGACthGTGCAAC TruSeq Universal AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTA Ada ter CACGACGCTCTTCCGATCT TruSeq Adapter /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCACG Index 6 TCTCGTATGCCGTCTTCTGCTTG Uracil linker GAUCGTTGCACGGUCGATCAAGUACAGATCAT GCGUCGGGAGAUCGGAAG The Uracil linker was removed by USER enzyme after ligation reaction resulting in a ﬁnal product sequence (5’ to 3 ’): AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCTCCCGAmCGCATGATCTGTACTTGATCGACthGTGCAACGATCGGAAGAGCA CACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG. PCR primers for ampliﬁcation of the model DNA were: P5: 5'- AATGATACGGCGACCACCGAG-3' and P7: 5'-CAAGCAGAAGACGGCATACGAG-3 '.

Model DNA for rase test and Sanger sequencing. Model DNA for polymerase test and Sanger sequencing was prepared with the same ligation method above except different DNA oligos were used: Seuence 5'to 3' AGCAGTCTmCGATCAGCTGmCTACTGTAmCGTAGCAT AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC Adanter ACGACGCTCTTCCGATCT TruSeq Adapter /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCACGC (Index 6) CAATATCTCGTATGCCGTCTTCTGCTTG /5Phos/AGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTG GCCTT Insert_2_60_bp /5Phos/CTATAGCCGGCTTGCTCTCTCTGCCTCTAGCAGCTG CTCCCTATAGTGAGTCGTATTAAC 40_bp-Linker-1 ATCTAGAACTTAGCGCACCTAGATCGGAAGAGCGTCGTG _CTGATCAAGACTGCTAAGGCCACAACCAGTTGGCG80_bp-Linker: AGAGAGCAAGCCGGCTATAGATGCTACGTACAGTAGCAG AGACGTGTGCTCTTCCGATCGTTAATACGACTCACTATAG The ﬁnal t sequence (5' to 3') was: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCTAGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTGGCCTTAGCAGTCTmCGA TCAGCTG“'CTACTGTAmCGTAGCATCTATAGCCGGCTTGCTCTCTCTGCCTCTAGC AGCTGCTCCCTATAGTGAGTCGTATTAACGATCGGAAGAGCACACGTCTGAACTC CAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG. PCR primers to amplify the model DNA are the P5 and P7 primers provided above. -labelled primer ce for primer extension is biotin linked to the 5' end of the P7 primer. PCR primers for RT-PCR after T7 RNA polymerase transcription were the P5 primer and RT: 5'- TGCTAGAGGCAGAGAGAGCAAG-3 '.

Model DNA for PCR bias test. Model DNA for PCR bias test was prepared with the same on method above except different DNA oligos were used: DNA Se u uence 5' to 3 17 b o X AGCAGTCTXGATCAGCT X= DHU or U or T or C 17 bp No GCTACTGTACGTAGCAT Modiﬁcation TruSeq Universal AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC Ada . ter ACGACGCTCTTCCGATCT TruSeq Adapter /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCACGC Index 6 CAATATCTCGTATGCCGTCTTCTGCTTG Insert_ 1_40_bp /AGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTG GCCTT Insert_2_60_bp /5Phos/CTATAGCCGGCTTGCTCTCTCTGCCTCTAGCAGCTG CTCCCTATAGTGAGTCGTATTAAC 40_bp-Linker- l ATCTAGAACTTAGCGCACCTAGATCGGAAGAGCGTCGTG Linker AGAGAGCAAGCCGGCTATAGATGCTACGTACAGTAGCAG CTGATCAAGACTGCTAAGGCCACAACCAGTTGGCG 42_bp-Linker-2 AGACGTGTGCTCTTCCGATCGTTAATACGACTCACTATAG Final product sequence (5' to 3'): AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCTAGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTGGCCTTAGCAGTCTXGAT CAGCTGCTACTGTACGTAGCATCTATAGCCGGCTTGCTCTCTCTGCCTCTAGCAGC TGCTCCCTATAGTGAGTCGTATTAACGATCGGAAGAGCACACGTCTGAACTCCAG TCACGCCAATATCTCGTATGCCGTCTTCTGCTTG, where X= DHU or U or T or C.

PCR primer to amplify the model DNA are the P5 and P7 primers provided above.

Preparation ofmethylated bacteriophage lambda genomic DNA 1 pg of unmethylated bacteriophage lambda DNA (Promega) was methylated in 50 “L reaction containing 0.64 mM SAM and 0.8 U/ul M.SssI enzyme in Mg2+-ﬁee buffer (10 mM Tris-Cl pH 8.0, 50 mM NaCl, and 10 mM EDTA) for 2 hours at 37°C. Then, 0.5 “L of M.SssI enzyme and 1 “L of SAM were added and the reaction was incubated for additional 2 hours at 37°C. Methylated DNA was subsequently d on 1X Ampure XP beads. To assure complete methylation, the whole procedure was repeated in NEB buffer 2. DNA methylation was then validated with HpaII digestion assay. 50 ng of methylated and unmethylated DNA were digested in 10 “L reaction with 2 U of HpaII enzyme (NEB) in CutSmart buffer (NEB) for l h at 37°C. ion ts were run on 1% agarose gel together with undigested lambda DNA control. Unmethylated lambda DNA was digested after the assay whereas methylated lambda DNA ed intact conﬁrming complete and successful CpG methylation. Sequence of lambda DNA can be found in GenBank - EMBL ion : I02459.

Preparation of2 kb unmodified spike-in controls 2 kb spike-in controls (2kb-l were PCR ampliﬁed from pNIC28-Bsa4 , 2, 3) plasmid (Addgene, cat. no. 26103) in the reaction ning 1 ng DNA template, 0.5 pM primers, 1 U Phusion High-Fidelity DNA Polymerase (Thermo Fisher). PCR primer sequences are listed in Table 2.

Table 2. Sequences of PCR primers for spike-ins.

Primer name Seguence 15' to 3'! 2kb-3_Forward CACAGATGTCTGCCTGTTCA 2kb-3 Reverse AGGGTGGTGAATGTGAAACC PCR product was d on Zymo-Spin column. 2 kb unmodiﬁed l sequence (5' to 3'): TGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAGAAGCGTT AATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGT CACTGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATG AAACGAGAGAGGATGCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTA CTGGAACGTTGTGAGGGTAAACAACTGGCGGTATGGATGCGGCGGGACCAGAGA AAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATACAGATGTAGGTGTTCCAC AGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCAGGGCG CTGACTTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGT TGTTGCTCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGT ATCGGTGATTCATTCTGCTAACCAGTAAGGCAACCCCGCCAGCCTAGCCGGGTCC TCAACGACAGGAGCACGATCATGCGCACCCGTGGGGCCGCCATGCCGGCGATAA TGGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGTGACGAAGGCTTGAGC GAGGGCGTGCAAGATTCCGAATACCGCAAGCGACAGGCCGATCATCGTCGCGCT CCAGCGAAAGCGGTCCTCGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCC TACGAGTTGCATGATAAAGAAGACAGTCATAAGTGCGGCGACGATAGTCATGCC CCGCGCCCACCGGAAGGAGCTGACTGGGTTGAAGGCTCTCAAGGGCATCGGTCG AGATCCCGGTGCCTAATGAGTGAGCTAACTTACATTAATTGCGTTGCGCTCACTG CCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAAC GCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTTCACCA GTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTTGCA GCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGT TAACGGCGGGATATAACATGAGCTGTCTTCGGTATCGTCGTATCCCACTACCGAG ATATCCGCACCAACGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGC GCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCA TTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCT ATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGAC GCGCCGAGACAGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCA ATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCTTCATGGGAGAAAATAAT ACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAACATTAGT GCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAATGATC AGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGA CGCCGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCG AGATTTAATCGCCGCGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGT GGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTG GGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGA AACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGATAAGAGACACCGGC ATACTCTGCGACATCGTATAACGTTACTGGTTTCACATTCACCACCCT Preparation of120mer spike-in controls 120mer spike-in controls were produced by primer extension. Oligo sequences and primers are listed in the Table 3.

Table 3. Sequences of DNA oligos and primers used for preparation of 120mer control spike-ins.

Template ce Primer for extension Sp1ke-1n control (5, to 3,) (5, to 3,) ATACTCATCATTAAACTTCGCCCTTACCTAC CACTTCGTGTATGTAGATAGGTAGTATACA 120mer_1 ’éﬁgggéggﬂgéﬁc ATCGAAATGAGTACGTAGATAGTA CACTTCG GAAAGTAAGATGGAGGTGAGAGTGAGAGT TGATACTGGTCCCGAGSthCTGA GATGGAGATTCATTC AGTTAGGCCSthGGGATGACTGASthAG TCTCGCCA 120mer-2 TCTTCCGAGACCGACGACACAGGTCTCCCT TAATACGACTCACTA AGTCGTATTATGGCGAGAGAATGA ATCTCCATC Brieﬂy, for 120mer-l spike-in, 3 HM oligo was annealed with 10 HM primer in the annealing buffer containing 5 mM Tris-Cl (pH 7.5), 5 mM MgC12, and 50 mM NaCl. For 120mer-2 spike-in, 5 uM oligo was annealed with 7.5 uM . Primer extension was performed in the NEB buffer 2 with 0.4 uM dNTPs (120mer-l: dATP/dGTP/dTTP/dthTP, -2: dATP/dGTP/dTTP/dCTP) and 5 U ofKlenow Polymerase (New England Biolabs) for 1 hour at 37°C. After reaction spike-in controls were d on Zymo-Spin columns (Zymo Research). The 120mer spike-in controls were then methylated in 50 “L reaction containing 0.64 mM SAM and 0.8 U/ul M.SssI enzyme in NEB buffer 2 for 2 hours at 37°C and puriﬁed with Zymo-Spin columns. All spike-in sequences used can be aded from https://f1gshare.com/s/80c3ab7 l 3 c26 l262494b. tion of Synthetic spike-in with NSmCNN and NSthNN Synthetic oligo with N5mCNN and N5thNN ces was produced by annealing and extension method. Oligo sequences are listed in Table 4, below.

Table 4.

Template sequence (5' to 3') GCAGAAGACAGGAAGGATGAAACACTCAGGCG N5mCNN CACGCTGGCATN“‘CNNGACAAACCACAAGAACAGGCTAG TGAGAATGAAGGGA CCAACTCTGAAACCCACCAACGCCAACATCCACCACACA N5thNN ACCCAAGATthCNNGACCATCTTACAAACATATCCCTTC ATTCTCACTAGCC Brieﬂy, 10 HM N5mCNN and N5thNN oligos (IDT) were annealed together in the annealing buffer containing 5 mM Tris-Cl (pH 7.5), 5 mM MgC12, and 50 mM NaCl.

Extension was performed in the NEB buffer 2 with 0.4 mM dNTPs (dATP/dGTP/dTTP/dCTP) and 5 U of Klenow Polymerase (NEB) for 1 hour at 37°C. After on, spike-in control was puriﬁed on Zymo-Spin column (Zymo Research). Synthetic in with N5mCNN and N5thNN (5 ’ to 3 ’): GAAGATGCAGAAGACAGGAAGGATGAAACACTCAGGCGCACGCTGGCATNmCN NGACAAACCACAAGAACAGGCTAGTGAGAATGAAGGGATATGTTTGTAAGATGG TCNNGNATCTTGGGTTGTGTGGTGGATGTTGGCGTTGGTGGGTTTCAGAGTTGG.

Complementary strand (5’ to 3’): CCAACTCTGAAACCCACCAACGCCAACATCCACCACACAACCCAAGATthCNN CTTACAAACATATCCCTTCATTCTCACTAGCCTGTTCTTGTGGTTTGTCN NGNATGCCAGCGTGCGCCTGAGTGTTTCATCCTTCCTGTCTTCTGCATCTTC.

DNA digestion and HPLC-MS/MS analysis DNA samples were digested with 2 U ofNuclease Pl (Sigma-Aldrich) and 10 nM deaminase tor erythroAmino-B-hexyl—a—methyl-9H—purineethanol hydrochloride (Sigma-Aldrich). After ght incubation at 37°C, the samples were further treated with 6 U of alkaline phosphatase (Sigma-Aldrich) and 0.5 U phodiesterase I (Sigma-Aldrich) for 3 hours at 37°C. The digested DNA solution was ﬁltered with Amicon Ultra-0.5 mL 10 K centrifugal ﬁlters (Merck Millipore) to remove the proteins, and subjected to HPLC- MS/MS analysis.

The HPLC-MS/MS analysis was carried out with 1290 Inﬁnity LC Systems (Agilent) coupled with a 6495B Triple Quadrupole Mass ometer (Agilent). A ZORBAX Eclipse Plus C18 column (2.1 X 150 mm, 1.8-Micron, Agilent) was used. The column temperature was maintained at 40°C, and the solvent system was water containing 10 mM ammonium acetate (pH 6.0, solvent A) and acetonitrile (60/40, v/v, t B) with 0.4 mL/min ﬂow rate. The gradient was: 0-5 min; 0 solvent B; 5-8 min; 0-5 .63 % solvent B; 8-9 min; 5.63 % solvent B; 9-16 min; 5.63-13.66% solvent B; 16-17 min; 13.66- 100% solvent B; 17-21 min; 100% solvent B; 21-24.3 min; 100-0% solvent B; 24.3-25 min; 0% solvent B. The dynamic multiple reaction monitoring mode (dMRM) of the MS was used for quantiﬁcation. The source-dependent parameters were as follows: gas temperature 230°C, gas ﬂow 14 L/min, nebulizer 40 psi, sheath gas temperature 400°C, sheath gas ﬂow 11 L/min, capillary voltage 1500 V in the positive ion mode, nozzle voltage 0 V, high pressure RF 110 V and low pressure RF 80 V, both in the positive ion mode. The ﬁagmentor voltage was 380 V for all compounds, while other compound-dependent parameters were as summarized in Table 5.

Table 5. Compound-dependent HPLC-MS/MS parameters used for nucleosides quantiﬁcation. RT: retention time, CE: ion energy; CAE: cell accelerator e. All the nucleo sides were analyzed in the positive mode.

Precursor Ion t Ion Delta CE CAE Compound RT (mm) (m/z) (m/z) RT(min) (V) (V) dA+H 252 136 13.78 2 10 4 dT+H 243 127 11.07 2 10 4 dT+Na 265 149 11.07 2 10 4 dG+H 268 1 52 9.64 2 1 0 4 dC+H 228 112 3.71 1.5 10 4 dC+Na 250 134 3.71 1.5 10 4 de+H 242 126 9.05 1.5 10 4 de+Na 264 148 9.05 1.5 10 4 hde+H 258 142 4.34 2 12 4 hde+Na 280 164 4.34 2 12 4 de+H 256 140 10.69 2 8 4 de+Na 278 162 10.69 2 8 4 cadC+H 272 156 1.75 3 12 4 cadC+Na 294 178 1.75 3 12 4 DHU+H 231 1 15 3.45 3 10 4 DHU+Na 253 137 3.45 3 10 4 Expression andpuriﬁcation ofNgTET1 pRSET-A plasmid encoding His-tagged NgTETl n (GG739552.1) was designed and purchased from Invitrogen. n was expressed in E. coli BL21 (DE3) bacteria and puriﬁed as previously described with some modiﬁcations (J. E. Pais et al., Biochemical characterization of a Naegleria ke oxygenase and its application in single molecule sequencing of 5-methylcytosine. Proc. Natl. Acad. Sci. USA. 112, 4316-4321 (2015), incorporated herein by nce). Brieﬂy, for protein expression bacteria ﬁom overnight small-scale culture were grown in LB medium at 37°C and 200 rpm until OD600 was between 0.7-0.8. Then cultures were cooled down to room temperature and target protein expression was induced with 0.2 mM isopropyl-B-d-l-thiogalactopyranoside (IPTG).

Cells were maintained for additional 18 hours at 18°C and 180 rpm. Subsequently, cells were harvested and re-suspended in the buffer containing 20 mM HEPES (pH 7.5), 500 mM NaCl, 1 mM DTT, 20 mM imidazole, 1 ug/mL tin, 1 ug/mL pepstatin A and 1 mM PMSF.

Cells were broken with EmulsiFlex-C5 high-pressure homogenizer, and lysate was clariﬁed by centrifugation for 1 hour at 30,000 x g and 4°C. ted supernatant was loaded on Ni- NTA resins and NgTETl protein was eluted with buffer containing 20 mM HEPES (pH 7.5), 500 mM imidazole, 2 M NaCl, 1 mM DTT. ted fractions were then puriﬁed on HiLoad 16/60 de 75 (20 mM HEPES pH 7.5, 2 M NaCl, 1 mM DTT). Fractions containing NgTETl were then collected, buffer exchanged to the buffer containing 20 mM HEPES (pH 7.0), 10 mM NaCl, 1 mM DTT, and loaded on HiTrap HP SP column. Pure protein was eluted with the salt gradient, collected and buffer-exchanged to the ﬁnal buffer containing 20 mM Tris-Cl (pH 8.0), 150 mM NaCl and 1 mM DTT. Protein was then concentrated up to 130 uM, mixed with glycerol (30% v/v) and ts were stored at -80°C.

Expression iﬁcation ofmTET1CD mTETlCD catalytic domain (NM_001253857.2, 4371-6392) with N-terminal Flag-tag was cloned into pcDNA3 -Flag between KpnI and BamHl restriction sites. For protein expression, 1 mg plasmid was transfected into 1 L of Expi293F (Gibco) cell culture at density 1 X 106 cells/mL and cells were grown for 48 h at 37°C, 170 rpm and 5% C02.

Subsequently, cells were harvested by centrifugation, re-suspended in the lysis buffer containing 50 mM Tris-Cl pH = 7.5, 500 mM NaCl, 1X cOmplete Protease Inhibitor Cocktail (Sigma), 1 mM PMSF, 1% Triton X-100 and ted on ice for 20 min. Cell lysate was then clariﬁed by centrifugation for 30 min at 30000 X g and 4°C. Collected supernatant was puriﬁed on ANTI-FLAG M2 Affmity Gel ) and pure protein was eluted with buffer containing 20 mM HEPES pH = 8.0, 150 mM NaCl, 0.1 mg/mL 3X Flag peptide (Sigma), 1X cOmplete Protease Inhibitor Cocktail (Sigma), 1 mM PMSF. Collected ns were concentrated and buffer-exchanged to the ﬁnal buffer containing 20 mM HEPES pH = 8.0, 150 mM NaCl and 1 mM DTT. Concentrated protein was mixed with glycerol (30% v/v), frozen in liquid nitrogen and aliquots were stored at -80°C. Activity and quality of recombinant mTETl CD was checked by MALDI Mass ometry analysis. Based on this assay, recombinant mTETl CD is fully active and able to catalyze oxidation of 5mC to 5caC.

Any signiﬁcant digestion of tested model oligo was detected by MALDI ing that protein is ﬁee ﬁom nucleases.

TET Oxidation NgTETl Oxidation. For Tet oxidation of the 222 bp model DNA oligos, 100 ng of 222 bp DNA was incubated in 20 pl solution containing 50 mM MOPS buffer (pH 6.9), 100 mM um iron (11) sulfate, 1 mM a-ketoglutarate, 2 mM ascorbic acid, 1 mM dithiothreitol (DTT), 50 mM NaCl, and 5 pM NgTET for 1 hr at 37 °C. After that, 0.4 U of Proteinase K (New England Biolab s) was added to the reaction mixture and incubated for 30 min at 37°C. The product was puriﬁed by pin column (Zymo Research) following manufacturer’ s instruction.

For NgTETl oxidation of genomic DNA, 500 ng of genomic DNA were incubated in 50 pl solution ning 50 mM MOPS buffer (pH 6.9), 100 mM ammonium iron (11) e, 1 mM a-ketoglutarate, 2 mM ic acid, 1 mM dithiothreitol, 50 mM NaCl, and 5 pM NgTETl for 1 hour at 37°C. After that, 4 U ofProteinase K (New England Biolab s) were added to the reaction mixture and incubated for 30 min at 37°C. The product was cleaned-up on 1.8X Ampure beads following the manufacturer’s instruction. mTETl Oxidation. 100 ng of genomic DNA was incubated in 50 pl reaction containing 50 mM HEPES buffer (pH 8.0), 100 pM ammonium iron (11) sulfate, 1 mM a- ketoglutarate, 2 mM ascorbic acid, 1 mM dithiothreitol, 100 mM NaCl, 1.2 mM ATP and 4 pM mTETlCD for 80 min at 37°C. After that, 0.8 U of nase K (New England Biolabs) were added to the reaction mixture and incubated for 1 hour at 50°C. The product was cleaned-up on Bio-Spin P-30 Gel Column (Bio-Rad) and 1.8X Ampure XP beads following the manufacturer’s instruction.

Borane Reduction Pic-BH3 reduction 25 pL of 5 M a—picoline-borane (pic-BH3, Sigma-Aldrich) in MeOH and 5 pL of 3 M sodium acetate solution (pH 5.2, Thermo Fisher) was added into 20 pL DNA sample and incubated at 60°C for l h. The product was puriﬁed by Zymo-Spin column (Zymo ch) following manufacturer’s instructions for the 222 bp or by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s instruction for the oligos.

Alternatively, 100 mg of2-picoline-borane (pic-borane, Aldrich) was dissolved in 187 pL ofDMSO to give around 3.26 M solution. For each on, 25 pL of pic-borane solution and 5 pL of 3 M sodium acetate solution (pH 5.2, Thermo ) were added into 20 pL ofDNA sample and incubated for 3 hours at 70°C. The product was d by Zymo-Spin column for genomic DNA or by Micro Bio-Spin 6 Columns (Bio- Rad) for DNA oligos following the manufacturer’s instructions.

Pyridine borane reduction. 50-100 ng of oxidised DNA in 35 [1L of water were reduced in 50 [1L reaction containing 600 mM sodium acetate solution (pH = 4.3) and l M pyridine borane for 16 hours at 37°C and 850 rpm in orf ThermoMixer. The product was puriﬁed by Zymo-Spin column.

Single nucleoside pic-borane reaction. 500 [1L of 3.26 M 2-picoline-borane (pic- borane, Sigma-Aldrich) in MeOH and 500 [LL of 3 M sodium acetate solution (pH 5.2, Thermo ) were added into 10 mg of2’-deoxycytidinecarboxylic acid sodium salt (Berry&Associates). The mixture was stirred for 1 hour at 60°C. The product was puriﬁed by HPLC to give pure compound as white foam. High resolution MS (Q-TOF) m/z [M + Na]+ ated for 205Na: 253 .0800; found: 253.0789. 5thblocking 5th blocking was performed in 20 pl solution containing 50 mM HEPES buffer (pH 8), 25 mM MgC12, 200 “M uridine diphosphogluco se (UDP-Glc, New England Biolabs), and 10 U BGT (Thermo Fisher), and 10 [1M 5th DNA oligo for 1 hr at 37 °C. The product was puriﬁed by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s ction. 5fCblocking 5fC ng was performed in 100 mM MES buffer (pH 5.0), 10 mM 0- ethylhydroxylamine (Sigma- Aldrich), and 10 [1M 5fC DNA oligo for 2 hours at 37 °C. The t was puriﬁed by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s instruction. 5caC blocking 5caC blocking was performed in 75 mM MES buffer (pH 5.0), 20 mM N- ysuccinimide (NHS, Sigma-Aldrich), 20 mM 1-(3-dimethylaminopropyl) ethylcarbodiimide hydrochloride (EDC, Fluorochem), and 10 [1M 5caC DNA oligo at 37 °C for 0.5 h. The buffer was then exchanged to 100 mM sodium phosphate (pH 7.5), 150 mM NaCl using Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s instructions. 10 mM ethylamine -Aldrich) was added to the oligo and incubated for 1 hour at 37°C.

The product was puriﬁed by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s instructions. 5th oxidation 46 [IL of5th DNA oligo was denatured with 2.5 [LL of l M NaOH for 30 min at 37°C in a g incubator, then oxidized with 1.5 [LL of solution containing 50 mM NaOH and 15 mM potassium perruthenate , Sigma-Aldrich) for 1 hour on ice. The product was puriﬁed by Micro Bio-Spin 6 Columns following manufacturer’s instructions.

Validation ofTAPS conversion with Taan assay 5mC conversion after TAPS was tested by PCR ampliﬁcation of a target region which contains Taan restriction site (TCGA) and subsequent Taan digestion. For example, 5mC conversion in our TAPS libraries can be tested based on 194 bp amplicon ning single Taan ction site that is ed ﬁom CpG methylated lambda DNA spike-in control. PCR product ampliﬁed from the 194 bp amplicon is digested with Taan restriction enzyme and digestion product is checked on 2% agaro se gel. PCR product ampliﬁed on unconverted control DNA is digested by Taan and shows two bands on the gel. In TAPS- converted sample restriction site is lost due to C-to-T transition, so the 194 bp amplicon would remain intact. Overall conversion level can be assessed based on digested and undigested gel bands quantiﬁcation and for successful TAPS samples should be higher than Brieﬂy, the converted DNA sample was PCR ampliﬁed by Taq DNA Polymerase (New England Biolab s) with corresponding primers. The PCR product was incubated with 4 units of Taan restriction enzyme (New England Biolabs) in 1X CutSmart buffer (New d Biolabs) for 30 min at 65°C and checked by 2% agarose gel electrophoresis.

Quantitativepolymerase chain reaction (qPCR) For comparison of ampliﬁcation curves and melting curves between model DNAs before and after TAPS (Fig. 11), 1 ng ofDNA sample was added into 19 [LL ofPCR master mix containing 1X LightCycler 480 High Resolution Melting Master Mix (Roche Diagnostics Corporation), 250 nM ofprimers FW-CCTGATGAAACAAGCATGTC and RV- CATTACTCACTTCCCCACTT and 3 mM ofMgSO4. For PCR cation, an initial denaturation step was med for 10 min at 95 °C, followed by 40 cycles of 5 sec ration at 95°C, 5 sec annealing at customized annealing temperature and 5 sec elongation at 72°C. The ﬁnal step included 1 min at 95°C, 1 min at 70°C and a melting curve C step increments, 5 sec hold before each acquisition) from 65°C to 95°C.

For other assays, qPCR was med by adding the required amount ofDNA sample into 19 [IL ofPCR master mix containing 1X Fast SYBR Green Master Mix (Thermo Fisher), 200 nM of forward and reverse primers. For PCR ampliﬁcation, an initial denaturation step was performed for 20 sec at 95 °C, followed by 40 cycles of 3 s denaturation at 95°C, 20 s annealing and elongation at 60°C.

Validation ofCmCGG methylation level in NA with HpaII-qPCR assay. 1 ug mESC gDNA was incubated with 50 units ofHpaII (NEB, 50 units/uL) and 1X CutSmart buffer in 50 [LL reaction for 16 hours at 37°C. No HpaII was added for control reaction. 1 [LL Proteinase K was added to the reaction and incubated at 40°C for 30 minutes ed by inactivation of Proteinase K for 10 minutes at 95°C. Ct value of HpaII digested sample or control sample was measured by qPCR assay as above with corresponding primer sets for speciﬁc CCGG ons (listed in Table 9).

Sanger sequencing The PCR product was puriﬁed by Exonuclease I and Shrimp Alkaline Phosphatase (New England Biolab s) or Zymo-Spin column and processed for Sanger sequencing.

DNA damage test onfragments with diferent length. mESC genomic DNA was spiked-in with 0.5% ofCpG methylated lambda DNA and left mented or sonicated with Covaris M220 instrument and size-selected to 500-1 kb or 1 kb—3 kb on Ampure XP beads. 200 ng ofDNA were single-oxidised with mTETlCD and reduced with Pyridine borane complex as described above or converted with EpiTect Bisulﬁte Kit (Qiagen) according to manufacturer’s protocol. 10 ng ofDNA before and after TAPS and Bisulﬁte conversion were run on 1% e gel. To visualize bisulﬁte converted gel was cooled down for 10 min samples in ice bath. 5mC conversion in TAPS samples was tested by Taan ion assay as bed above. mESCs culture and isolation ofgenomic DNA Mouse ESCs (mESCs) E14 were cultured on gelatin-coated plates in Dulbecco’s Modiﬁed Eagle Medium (DMEM) (Invitrogen) supplemented with 15% FBS (Gibco), 2 mM L-glutamine (Gibco), 1% non-essential amino acids (Gibco), 1% llin/streptavidin (Gibco), 0.1 mM aptoethanol (Sigma), 1000 units/mL LIF (Millipore), 1 uM PD0325901 (Stemgent), and 3 uM CHIR99021 (Stemgent). Cultures were maintained at 37°C and 5% C02 and passaged every 2 days.

For isolation of genomic DNA, cells were harvested by centrifugation for 5 min at 1000 X g and room temperature. DNA was extracted with Quick-DNA Plus kit (Zyrno Research) ing to manufacturer’s protocol.

Preparation ofmESCgDNAfor TAPS and WGBS.

For whole-genome bisulﬁte sequencing (WGBS), mESC gDNA was spiked-in with 0.5% ofunmethylated lambda DNA. For whole-genome TAPS, mESC gDNA was -in with 0.5% of methylated lambda DNA and 0.025% ofunmodiﬁed 2 kb spike-in control. DNA samples were fragmented by Covaris M220 instrument and size-selected to 200-400 bp on Ampure XP beads. DNA for TAPS was additionally spiked-in with 0.25% of N5mCNN and N5thNN control oligo after election with Ampure XP beads.

Whole Genome Bisulﬁte Sequencing For Whole Genome Bisulﬁte Sequencing , 200 ng of fragmented mESC gDNA spiked-in with 0.5% ofunmethylated bacteriophage lambda DNA was used. End- ed and A-tailing reaction and ligation of methylated adapter (NextFlex) were prepared with KAPA HyperPlus kit (Kapa Biosystems) ing to cturer’s protocol.

Subsequently, DNA ent bisulﬁte conversion with EpiTect Bisulﬁte Kit (Qiagen) according to Illumina’s protocol. Final library was ampliﬁed with KAPA Hiﬁ Uracil Plus Polymerase (Kapa Biosystems) for 6 cycles and cleaned-up on 1X Ampure beads. WGBS sequencing library was -end 80 bp sequenced on a NextSeq 500 sequencer (Illumina) using a NextSeq High Output kit with 15% PhiX control library spike-in.

Whole-genome TAPS For whole genome TAPS, 100 ng of fragmented mESC gDNA -in with 0.5% of methylated lambda DNA and 0.025% diﬁed 2 kb spike-in control were used.

End-repair and A-tailing reaction and ligation of Illumina Multiplexing adapters were prepared with KAPA lus kit according to manufacturer’s protocol. Ligated DNA was oxidized with mTETl CD twice and then reduced with pyridine borane according to the protocols described above. Final sequencing library was ampliﬁed with KAPA Hiﬁ Uracil Plus Polymerase for 5 cycles and cleaned-up on 1X Ampure beads. Whole-genome TAPS sequencing library was paired-end 80 bp sequenced on a NextSeq 500 sequencer (Illumina) using one NextSeq High Output kit with 1% PhiX control library in.

Low-input whole-genome TAPS with dsDNA librarypreparation kits mESC gDNA prepared as described above for genome TAPS was used for low-input whole-genome TAPS. Brieﬂy, samples containing 100 ng, 10 ng, and 1 ng of mESC gDNA were oxidized with NgTETl once according to the protocol described above.

End-repaired and A-tailing reaction and ligation were performed with NEBNext Ultra II (New England Biolab s) or KAPA HyperPlus kit according to manufacturer’s ol.

Subsequently DNA underwent pic-borane reaction as described above. Converted libraries were ampliﬁed with KAPA Hiﬁ Uracil Plus Polymerase and cleaned-up on 1X Ampure beads.

Low-input whole-genome TAPS with ssDNA librarypreparation kit mESC gDNA prepared as described above for whole-genome TAPS was used for low-input whole-genome TAPS. Brieﬂy, samples containing 100 ng, 10 ng, 1 ng, 100 pg, and 10 pg ofmESC gDNA were oxidized with NgTETl once and reduced with pic-borane as described above. Sequencing libraries were prepared with Accel-NGS Methyl-Seq DNA Library Kit (Swift Biosciences) according to manufacturer’s protocol. Final libraries were ampliﬁed with KAPA Hiﬁ Uracil Plus Polymerase for 6 cycles (100 ng), 9 cycles (10 ng), 13 cycles (1 ng), 16 cycles (100 pg), and 21 cycles (10 pg) and cleaned-up on 0.85X Ampure beads.

In other experiments, mESC gDNA prepared as described above for whole- genome TAPS were used for low-input whole-genome TAPS. Brieﬂy, samples containing 100 ng, 10 ng, and 1 ng ofmESC gDNA were used for End-repaired and A-tailing reaction and ligated to na Multiplexing adaptors with KAPA HyperPlus kit according to manufacturer’s ol. Ligated s were then oxidized with mTETl CD once and then reduced with ne borane according to the protocols described above. ted libraries were ampliﬁed with KAPA Hiﬁ Uracil Plus Polymerase for 5 cycles (100 ng), 8 cycles (10 ng), and 13 cycles (1 ng) and cleaned-up on 1X Ampure XP beads.

Cell-free DNA TAPS Cell-free DNA TAPS samples were prepared ﬁom 10 ng and 1 ng of cell-free DNA sample. Brieﬂy, s were oxidized with NgTETl once and reduced with pic- borane as described above. Sequencing libraries were prepared with Accel-NGS Methyl-Seq DNA Library Kit (Swift Biosciences) according to manufacturer’s protocol. Final libraries were ampliﬁed with KAPA Hiﬁ Uracil Plus Polymerase for 9 cycles (10 ng) and 13 cycles (1 ng) and cleaned-up on 0.85X Ampure beads.

In other experiments, cell-ﬂee DNA TAPS samples were prepared from 10 ng and 1 ng of cell-free DNA sample as described above for whole-genome TAPS. Brieﬂy, cell-free DNA samples were used for End-repaired and ing reaction and ligated to na Multiplexing rs with KAPA HyperPlus kit according to manufacturer’s protocol.

Ligated samples were then ed with mTETl CD once and then reduced with pyridine borane according to the protocols described above. Converted libraries were ampliﬁed with KAPA Hiﬁ Uracil Plus rase for 7 cycles (10 ng), and 13 cycles (1 ng) and cleaned-up on 1X Ampure XP beads.

WGBS data processing Paired-end reads were ad as FASTQ from Illumina BaseSpace and subsequently quality-trimmed with Trim Galore! v0.4.4 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Read pairs where at least one read was shorter than 35 bp after trimming were removed. Trimmed reads were mapped to a genome combining the mm9 version of the mouse genome, lambda phage and PhiX (sequence ﬁom Illumina iGENOMES) using Bismark v0.19 using --no_overlap option (F. Krueger, S. R. Andrews, Bismark: a ﬂexible aligner and methylation caller for Bisulf1te- Seq applications. Bioinformatics 27, 1571-1572 (201 1 ), incorporated herein by reference).

The ‘three-C’ ﬁlter was used to remove reads with excessive non-conversion rates. PCR duplicates were called using Picard v1.119 (http://broadinstitute.github.io/picard/) MarkDuplicates. Regions known to be prone to mapping artefacts were aded (https://sites.google.com/site/anshulkundaje/projects/blacklists) and excluded from further is (E. P. Consortium, An integrated encyclopedia ofDNA elements in the human genome. Nature 489, 57-74 (2012), incorporated herein by reference).

TAPS datapre-processing Paired-end reads were downloaded ﬁom Illumina BaseSpace and subsequently quality-trimmed with Trim Galore! v0.4.4. Read pairs where at least one read was shorter than 35 bp after trimming were d. Trimmed reads were mapped to a genome combining spike-in sequences, lambda phage and the mm9 version of the mouse genome using BWA mem 15 (H. Li, R. , Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009), incorporated herein by reference) with default parameters. Regions known to be prone to g artefacts were downloaded (https://sites.google.com/site/anshulkundaje/projects/blacklists) and excluded from further analysis (E. P. Consortium, Nature 489, 57-74 (2012)).

Detection of converted bases in TAPS Aligned reads were split into al top (OT) and original bottom (OB) strands using a custom python3 script (MF-ﬁlter.py). PCR ates were then removed with Picard MarkDuplicates on OT and OB separately. Overlapping segments in read pairs were removed using BamUtil clivaerlap (https://github.com/statgen/bamUtil) on the icated, mapped OT and OB reads separately. Modiﬁed bases were then detected using samtools mpileup and a custom python3 script (MF-caller_MOD.py).

Sequencing quality analysis ofTAPS and WGBS Quality score statistics per nucleotide type were extracted from original FASTQ ﬁles as downloaded from Illumina BaseSpace with a python3 script (MF-phredder.py).

Coverage is ofTAPS and WGBS Per-base genome coverage ﬁles were generated with Bedtools v2.25 genomecov (A. R. n, I. M. Hall, ls: a ﬂexible suite of utilities for comparing genomic es. Bioinformatics 26, 841-842 (2010), orated herein by reference). To compare the relative coverage distributions between TAPS and WGBS, TAPS reads were subsampled to the corresponding coverage median in WGBS using the —s option of samtools view. In the analyses comparing coverage in WGBS and pled TAPS, clivaerlap was used on both TAPS and WGBS bam ﬁles.

Analysis sine modifications measured by TAPS and WGBS The fraction of modiﬁed reads per base was calculated ﬁom Bismark output, and the output ofMF-caller_MOD.py, respectively. Intersections were performed using Bedtools intersect, and statistical analyses and ﬁgures were generated in R and Matlab. Genomic regions were visualized using IGV v2.4.6 (J. T. Robinson et al., Integrative genomics viewer.

Nat. Biotechnol. 29, 24-26 (2011), incorporated herein by reference). To plot the coverage and modiﬁcation levels around CGIs, all CGI coordinates for mm9 were downloaded from the UCSC genome browser, binned into 20 s, and extended by up to 50 windows of size 80 bp on both sides (as long as they did not reach half the ce to the next CGI).

Average modiﬁcation levels (in Cst) and coverage (in all bases, both strands) in each bin were computed using Bedtools map. The values for each bin were again averaged and subsequently plotted in Matlab.

Dataprocessing time simulation Synthetic pair-end sequencing reads were simulated using ART42 based on the lambda phage genome (with parameters -p -ss NSSO --errfree --minQ 15 -k 0 -nf 0 -l 75 -c 0 -m 240 -s 0 -ir 0 -ir2 0 -dr 0 -dr2 0 -sam -rs 10). 50% of all CpG positions were subsequently marked as modiﬁed and two libraries were ed, either as TAPS (convert modiﬁed bases) or as WGBS (convert unmodiﬁed bases), using a custom python3 script. The reads were then processed ing the pipeline used for each of the s in the paper.

Processing time was measured with Linux d time. All steps of the analysis were performed in single-threaded mode on one Intel Xeon CPU with 250GB of memory.

Results and Discussion It was discovered that 3 can readily t SfC and ScaC to DHU by a previously unknown reductive decarboxylation/deamination reaction (Fig. 4). The reaction was shown to be quantitative both in single nucleoside and in oligonucleotides using MALDI (Figs. 2-3, and 6-7).

An llmer ScaC-containing DNA oligo was used as a model to screen chemicals that could react with ScaC, as monitored by matrix-assisted laser desorption/ionization mass spectroscopy (MALDI). Certain borane-containing compounds were found to efﬁciently react with the ScaC oligo, ing in a molecular weight reduction of41 Da (Figs. 1 and 2).

Pyridine borane and its derivative 2-picoline borane (pic-borane) were selected for ﬁlrther study as they are commercially available and nmentally benign reducing agents.

The reaction on a single 5caC nucleoside was repeated and conﬁrmed that pyridine borane and pic-borane convert 5caC to dihydrouracil (DHU) (Figs. 3, 4B). Interestingly, pyridine borane and pic-borane was found to also convert 5fC to DHU through an apparent ive oxylation/ ation mechanism (Figs. 4C and 6). The detailed mechanism ofboth reactions remains to be deﬁned. Quantitative analysis of the borane reaction on the DNA oligo by HPLC-MS/MS conﬁrms that rane converts 5caC and 5fC to DHU with around 98% efﬁciency and has no activity against unmethylated cytosine, 5mC or 5th (Fig. 2B).

As a uracil derivative, DHU can be recognized by both DNA and RNA polymerases as thymine. Therefore, borane ion can be used to induce both 5caC-to-T and 5fC-to-T transitions, and can be used for base-resolution cing of 5fC and 5caC, which we termed Pyridine borane Sequencing (“PS”) (Table 6). The borane reduction of 5fC and 5caC to T can be blocked through hydroxylamine conjugation (C. X. Song et al., Genome-wide proﬁling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153, 678-691 (2013), incorporated herein by reference) and EDC coupling (X. Lu et al., Chemical modiﬁcation-assisted bisulﬁte sequencing (CAB-Seq) for 5-carboxylcyto sine ion in DNA. J. Am. Chem. Soc. 135, 9315-9317 (2013), orated herein by reference), respectively (Fig. 6). This blocking allows PS to be used to sequence 5fC or 5caC speciﬁcally (Table 6).

Table 6. Comparison ofBS and related methods versus PS for 5fC and 5caC cing.

Base BS fCAB-Seq Seq fC-CET PS PS with 5fC PS with 5caC /redBS-Seq blockin blockin C C C C 5mC C C C C C Sth C C C C C 5fC T C T 5caC T T T C Furthermore, TET enzymes can be used to oxidize 5mC and Sth to 5caC, and then subject 5caC to borane reduction in a process herein called TET-Assisted Pyridine borane Sequencing (“TAPS”) (Fig. 5A—B, Table l). TAPS can induce a C-to-T transition of 5mC and Sth, and therefore can be used for base-resolution detection of5mC and 5th.

In addition, B-glucosyltransferase (BGT) can label 5th with glucose and thereby protect it from TET oxidation (M. Yu et al., Base-resolution is of 5- hydroxymethylcytosine in the mammalian genome. Cell 149, 1368-1380 ) and borane reduction (Fig. 7), ng the selective sequencing of only SmC, in a process referred to herein as TAPSB (Fig. 5B, Table l). 5th sites can then be deduced by subtraction of TAPSB from TAPS measurements. Alternatively, ium perruthenate (KRuO4), a reagent previously used in oxidative bisulﬁte sequencing (oxBS) (M. J. Booth et al., Quantitative Sequencing of 5-Methylcytosine and S-Hydroxymethylcytosine at Single-Base Resolution. Science 336, 934-937 (2012)), can be used to replace TET as a chemical oxidant to speciﬁcally oxidize Sth to SfC (Fig. 7). This approach, referred to herein as Chemical- Assisted Pyridine borane Sequencing (“CAPS”), can be used to sequence 5th speciﬁcally (Fig. 5B, Table 1). Therefore, TAPS and related methods can in principle offer a comprehensive suite to ce all four cytosine epigenetic modiﬁcations (Fig. 5B, Table 1, Table 6).

TAPS alone will detect the existing SfC and ScaC in the genome as well.

However, given the extremely low levels of SfC and ScaC in c DNA under normal ions, this will be acceptable. If under certain conditions, one would like to eliminate the SfC and ScaC signals completely, it can also be readily accomplished by protecting the SfC and ScaC by hydroxylamine conjugation and EDC coupling, respectively, thereby ting conversion to DHU.

The performance of TAPS was evaluated in comparison with bisulﬁte sequencing, the current standard and most widely used method for base-resolution mapping ofSmC and Sth. Naegleria TET-like oxygenase (NgTETl) and mouse Tetl (mTetl) were used because both can efﬁciently oxidize SmC to ScaC in vitro. To confnm the SmC-to-T transition, TAPS was applied to model DNA containing fully methylated CpG sites and showed that it can ively convert SmC to T, as demonstrated by restriction enzyme digestion (Fig. 8A-B) and Sanger cing (Fig. 9A). TAPSB and CAPS were also validated by Sanger sequencing (Fig. 12).

TAPS was also applied to genomic DNA (gDNA) from mouse embryonic stem cells (mESCs). S/MS quantiﬁcation showed that, as expected, SmC accounts for 98.5% of cytosine modiﬁcations in the mESCs gDNA; the remainder is ed ofSth (1.5%) and trace amounts SfC and ScaC, and no DHU (Fig. 9B). After TET oxidation, about 96% of cytosine modiﬁcations were oxidized to ScaC and 3% were oxidized to SfC (Fig. 9B).

After borane reduction, over 99% of the cytosine modiﬁcations were converted into DHU (Fig. 9B). These results demonstrate both TET oxidation and borane reduction work efﬁciently on genomic DNA.

Both TET oxidation and borane reduction are mild reactions, with no notable DNA degradation ed to bisulﬁte (Fig. 10A-D) and thereby provide high DNA recovery.

Another notable advantage over bisulﬁte sequencing is that TAPS is non-destructive and can preserve DNA up to 10 kbs long (Fig. 10C). Moreover, DNA remains double stranded after TAPS (Fig. 10A-C), and the conversion is independent of the DNA length (Fig. 15A-B).

In addition, because DHU is close to a l base, it is compatible with various DNA polymerases and isothermal DNA or RNA rases (Figs. 13A-B) and does not show a bias compared to T/C during PCR (Fig. 14).

Whole genome sequencing was performed on two samples ofmESC gDNA, one converted using TAPS and the other using stande whole-genome bisulﬁte cing (WGBS) for comparison.

To assess the accuracy of TAPS, spike-ins of different lengths were added that were either fully unmodiﬁed, in vitro ated using CpG Methyltransferase (M.SssI) or GpC Methyltransferase (M.CviPI) (using the above methods). For short spike-ins (120mer-1 and 120mer-2) containing 5mC and 5th, near complete conversion was ob served for both modiﬁcations on both strands in both CpG and non-CpG ts (Fig. 17A-B). 100 ng gDNA was used for TAPS, compared to 200 ng gDNA for WGBS. To assess the accuracy of TAPS, we added three different types of spike-in ls. Lambda DNA where all Cst were fully methylated was used to estimate the false negative rate (non- conversion rate of 5mC); a 2 kb unmodiﬁed amplicon was used to estimate the false positive rate (conversion rate ofunmodiﬁed C); synthetic oligo ins containing both a methylated and hydroxymethylated C surrounded by any other base (N5mCNN and N5thNN, respectively) were used to compare the conversion rate on 5mC and Sth in different ce contexts. The ation of mTet1 and pyridine borane achieved the highest 5mC conversion rate (96.5% and 97.3% in lambda and synthetic spike-ins, respectively) and the lowest conversion rate diﬁed C (0.23%) (Fig. 18A—B and Fig. 16). A false negative rate between 2.7% and 3.5%, with a false-positive rate of only 0.23%, is comparable to bisulﬁte sequencing: a recent study showed 9 commercial bisulﬁte kits had average false negative and false positive rates of 1.7% and 0.6%, respectively (Holmes, E.E. et al.

Performance evaluation of kits for bisulﬁte-conversion ofDNA from tissues, cell lines, FFPE tissues, aspirates, lavages, effusions, , serum, and urine. PLoS One 9, e93933 (2014)).

The tic spike-ins suggest that TAPS works well on both 5mC and Sth, and that TAPS performs only slightly worse in non-CpG contexts. The conversion for 5th is 8.2% lower than 5mC, and the conversion for non-CpG contexts is 1 1.4% lower than for CpG contexts (Fig. 18A).

WGBS data es special software both for the ent and modiﬁcation- calling steps. In contrast, our processing pipeline uses a standard genomic aligner (bwa), followed by a custom modiﬁcation-calling tool that we call “asTair”. When sing simulated WGBS and TAPS reads (derived from the same semi-methylated source sequence), TAPS/asTair was more than 3x faster than WGBS/Bismark (Fig. 18C).

Due to the conversion of nearly all cytosine to thymine, WGBS libraries feature an extremely skewed nucleotide composition which can negatively affect Illumina sequencing.

Consequently, WGBS reads showed substantially lower sequencing quality scores at cytosine/guanine base pairs compared to TAPS (Fig. 18E). To sate for the nucleotide composition bias, at least 10 to 20% PhiX DNA (a base-balanced control library) is commonly added to WGBS libraries (see, e.g., Illumina’s Whole-Genome Bisulﬁte Sequencing on the HiSeq 3000/HiSeq 4000 Systems). Accordingly, we supplemented the WGBS library with 15% PhiX. This, in ation with the reduced information content of BS-converted reads, and DNA ation as a result ofbisulf1te treatment, resulted in signiﬁcantly lower mapping rates for WGBS compared to TAPS (Fig. 18D and Table 7).

Table 7. Mapping and cing quality statistics for WGBS and TAPS.

Measure WGBS TAPS Total raw reads 376062375 455548210 Trimmed reads 367860813 453028186 Mapped reads (mm9+spike-ins+PhiX) 251940139 451077132 PCR deduplicated reads 232303596 851 Mapping rate d reads/trimmed reads) 68.49% 99.57% Unique mapping rate (unique reads 0 for TAPS]/trimmed reads) 68.49% 88.08% Unique PCR deduplicated mapping rate (unique PCR dedu licated reads [MAPQ>0 for TAPS] /trimmed reads 63.15% 81.31% Therefore, for the same sequencing cost (one NextSeq High Output run), the e depth of TAPS exceeded that ofWGBS (21 X and 13.1 X, respectively; Table 8).

Furthermore, TAPS ed in fewer uncovered regions, and overall showed a more even coverage distribution, even after down-sampling to the same sequencing depth as WGBS (inter-quartile range: 9 and 11, respectively, Fig. 19A and Table 8).

Table 8. Coverage statistics for TAPS, WGBS and TAPS down-sampled to have approximately the same mean coverage as WGBS. Here, ge was computed for both strands at all positions in the genome.

TAPS with down- TAPS without down- Measure WGBS sampling sampling Mean 13.078 12.411 21.001 Variance 1988.242 482.242 1371.912 median 13 13 22 qtl25 7 8 15 th75 18 17 28 iqr 1 1 9 13 maximum 1 16084 37329 63 526 For example, CpG Islands (CGIs) in particular were generally better covered by TAPS, even when controlling for differences in sequencing depth between WGBS and TAPS (Fig. 21A), while both showed equivalent ylation inside CGIs (Fig. 22). Moreover, WGBS showed a slight bias of sed modiﬁcation levels in highly covered CpG sites (Fig. 23 A), while our results suggest that TAPS exhibits very little of the ation-coverage bias (Fig. 23B). These results demonstrate that TAPS dramatically improved sequencing quality compared to WGBS, while effectively halving the sequencing cost.

The higher and more even genome coverage of TAPS resulted in a larger number of CpG sites covered by at least three reads. With TAPS, 88.3% of all 43,205,316 CpG sites in the mouse genome were covered at this level, compared to only 77.5% with WGBS (Fig. 21B and 19B). TAPS and WGBS resulted in highly correlated methylation measurements across chromosomal regions (Fig. 21 D and Fig. 20). On a cleotide basis, 32,755,271 CpG positions were d by at least three reads in both methods (Fig. 21B). Within these sites, we deﬁned “modiﬁed Cst” as all CpG positions with a modiﬁcation level of at least % (L. Wen et al., Whole-genome analysis of 5-hydroxymethylcyto sine and 5- methylcytosine at base tion in the human brain. Genome Biology 15, R49 ).

Using this threshold, 95.8% of Cst showed matching ation states between TAPS and WGBS. 98.5% of all Cst that were covered by at least three reads and found modiﬁed in WGBS were recalled as modiﬁed by TAPS, indicating good agreement between WGBS and TAPS (Fig. 21C). When comparing ation levels per each CpG covered by at least three reads in both WGBS and TAPS, good correlation between TAPS and WGBS was observed (Pearson r = 0.63, p < 2e-16, Fig. 21E). Notably, TAPS identiﬁed a subset of highly modiﬁed CpG positions which were missed by WGBS (Fig. 21E, bottom right comer).

We further validated 7 of these Cst, using an orthogonal restriction digestion and real-time PCR assay, and conﬁrmed all ofthem are fully methylated and/or hydroxymethylated (Table Table 9. Comparison ofCmCGG methylation level in mESC gDNA quantiﬁed by TAPS, WGBS and HpaII-qPCR assay. ge and methylation level (mC%) by TAPS and WGBS were computed for per strand. Ct value for HpaII digested sample (CthaII) or control sample (thm) in the HpaII-qPCR assay was e of triplicates. mC% is calculated using following equation: mC% = 2"( thm 'CthalI)*100%.

WGBS Hpall—qPCR assay Position of CmCGG mC% Forsvgrd and reverse pr1mer GCTGCAGATTGGAGCC chr6: AAAG 29.628 29.642 101.0% 135868201 TTGATGGTGATGGTGG TCAGTGCTCATGGACTC chr3: ATACT 22.162 22.111 96.5% 31339449 TGGGAGCAAA GTTGTTG CCCACTAGACATGCTCT chr4: GCC 31.304 31.279 98.3% 128271030 CAAAATGTTGCTTGCCT TCCCTGAGCCCTGATCT chr1: AGT 22.008 22.026 101.3% 58635199 AATACTGGCTGACCGG ACACCACAGCAGAAGA chr14: GAGC 21.228 21.053 36331351 TGTTGCACAG GCTGAGCTGTATCCTTG chr19: AGGT 22.515 22.558 103.0% 42893499 GGGTATTCCA GTGGATCTTCAGTGGTG chr3: GCA 22.439 22.545 107.6% 113611193 ATGCTCCCTCATCCTTT Negative CCGG site AGCCTCTGAACTTGACT chr19: GCC 27.11 21.409 1.9% 9043049 GCCTGGAACTCCTGAC Positive CCGG site GGTCCTTGATCCACCCA chr15: GAC 106.1% 39335961 ACATGGTGCTGGTCTA Together, these results indicate that TAPS can directly replace WGBS, and in fact provides a more comprehensive view of the methylome than WGBS. y, TAPS was tested with low input DNA and TAPS was shown to work with as little as 1 ng gDNA and in some instances down to 10 pg of gDNA, close to single-cell level. TAPS also works effectively with down to 1 ng of circulating cell-ﬁee DNA. These results demonstrate the potential of TAPS for low input DNA and clinical ations (Fig. 24A-C, Fig. 25A-B).

TAPS was tested on three circulating cell-free DNA samples (chNA) from one healthy sample, one Barrett’s oesophagus (Barrett’s) , and one pancreatic cancer sample that were obtained from 1-2 ml ofplasma. Standard TAPS protocol was followed and each sample sequenced to ~10X coverage. Analysis of the chNA TAPS results showed that TAPS provided the same high-quality methylome sequencing ﬁom low-input cﬂ)NA as from bulk genomic DNA, including high SmC conversion rate (Fig. 26A), low false positive rate (conversion ofunmodiﬁed cytosine, Fig. 26B), high mapping rate (Fig. 26C), and low PCR duplication rate (Fig. 26D). These results demonstrate the power of TAPS for e diagnostics from chNA.

TAPS can also differentiate methylation from C-to-T genetic variants or single tide rphisms (SNPs), therefore could detect genetic ts. Methylations and C-to-T SNPs result in different patterns in TAPS: methylations result in T/G reads in original top strand (OT)/original bottom strand (OB) and A/C reads in strands complementary to OT (CTOT) and OB (CTOB), whereas C-to-T SNPs result in T/A reads in OT/OB and (CTOB/CTOT) (Fig. 27). This further increases the y of TAPS in providing both methylation information and genetic variants, and ore ons, in one experiment and sequencing run. This ability of the TAPS method disclosed herein provides integration of genomic analysis with epigenetic analysis, and a substantial reduction of sequencing cost by eliminating the need to perform standard whole genome sequencing (WGS).

In summary, we have developed a series of PS-derived bisulﬁte-ﬁee, base- resolution sequencing methods for ne epigenetic modiﬁcations and demonstrated the utility of TAPS for whole-methylome sequencing. By using mild enzymatic and chemical reactions to detect SmC and 5th directly at base-resolution with high sensitivity and speciﬁcity without affecting unmodiﬁed cytosines, TAPS outperforms bisulﬁte sequencing in providing a high quality and more complete methylome at half the sequencing co st. As such TAPS could replace bisulﬁte sequencing as the new stande in DNA methylcytosine and hydroxymethylcytosine analysis. Rather than introducing a bulky ation on cytosines in the bisulﬁte-free SfC sequencing method reported ly (B. Xia et al., Bisulﬁte-free, base-resolution analysis of S-formylcytosine at the genome scale. Nat. Methods 12, 1047- 1050 (2015); C. Zhu et al., -Cell S-Formylcytosine Landscapes ofMammalian Early Embryos and ESCs at Single-Base Resolution. Cell Stem Cell 20, 720-731 (2017)), TAPS converts modiﬁed cytosine into DHU, a near natural base, which can be “read” as T by common polymerases and is potentially compatible with PCR—free DNA sequencing. TAPS is compatible with a variety of downstream analyses, ing but not limit to, pyro sequencing, methylation-sensitive PCR, ction digestion, MALDI mass spectrometry, microarray and whole-genome sequencing. Since TAPS can preserve long DNA, it can be extremely valuable when combined with long read sequencing technologies, such as SMRT sequencing and nanopore sequencing, to igate certain difﬁcult to map regions. It is also possible to combine pull-down methods with TAPS to further reduce the sequencing co st and add base-resolution information to the low-resolution y-based maps. Herein, it was demonstrated that TAPS could directly replace WGBS in routine use while ng co st, complexity and time required for analysis. This could lead to wider adoption of epigenetic analyses in academic research and al diagnostics.

Claims

Claims:

1. A method for converting oxylcytosine (5caC) and/or 5-formylcytosine (5fC) to dihydrouracil (DHU) comprising contacting a nucleic acid sample comprising 5caC and/or 5fC with a borane reducing agent.

2. The method of claim 1, n the borane reducing agent comprises an agent selected from the group consisting of 2-picoline borane (pic-BH3), borane, sodium borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride.

3. The method of claim 1, wherein the borane reducing agent comprises .

4. The method of claim 1, wherein the borane reducing agent comprises sodium borohydride.

5. The method of claim 1, wherein the borane reducing agent comprises sodium cyanoborohydride.

6. The method of claim 1, wherein the borane reducing agent comprises sodium triacetoxyborohydride.

7. The method of claim 1, wherein the borane reducing agent comprises line borane.

8. The method of claim 1 comprising contacting the nucleic acid sample with an oxidizing agent prior to contacting with a borane reducing agent.

9. The method of claim 8, wherein the oxidizing agent is a ten-eleven translocation (TET) enzyme

10. The method of claim 9, wherein the TET enzyme comprises human TET1, human TET2, human TET3, murine TET1, murine TET2, murine TET3, Naegleria TET (NgTET), Coprinopsis cinerea (CcTET), or derivatives or analogues f.

11. The method of claim 9, wherein the oxidizing agent comprises a chemical oxidizing agent.

12. The method of claim 11, wherein the chemical oxidizing agent comprises potassium perruthenate ) or Cu(II)/TEMPO.

13. The method of claim 8, further comprising adding a blocking group to one or more modified cytosines in the nucleic acid sample.

14. The method of claim 13, wherein the blocking group is added prior to ting with the oxidizing agent.

15. The method of claim 14, wherein the one or more modified cytosines comprises 5hmC.

16. The method of claim 12, wherein the blocking group comprises a sugar or a uridine diphosphate (UDP)-linked sugar.

17. The method of claim 13, wherein the blocking group is added after contacting with the oxidizing agent and prior to contacting with the borane ng agent.

18. The method of claim 17, wherein the one or more ed cytosines comprises 5caC or

19. The method of claim 18, wherein the blocking group comprises an de reactive compound.

20. The method of claim 19, n the aldehyde reactive compound ses a hydroxylamine derivative, a hydrazine derivative, or a hydrazide derivative.

21. The method of claim 20, wherein adding the blocking group comprises contacting the nucleic acid sample with (i) a ng agent and (ii) an amine, hydrazine, or hydroxylamine compound.

22. The method of claim 1, further comprising sequencing the nucleic acid sample after contacting with the borane reducing agent to identify converted cytosine bases.

23. A method for converting a modified cytosine to dihydrouracil (DHU) comprising: i) contacting a c acid sample comprising a modified cytosine with an oxidizing agent to produce 5-carboxylcytosine (5caC) and/or 5-formylcytosine (5fC); and ii) contacting the nucleic acid sample comprising 5caC and/or 5fC with a borane reducing agent.

24. The method of claim 23, wherein the oxidizing agent is a ten-eleven translocation (TET) enzyme

25. The method of claim 24, wherein the TET enzyme comprises human TET1, human TET2, human TET3, murine TET1, murine TET2, murine TET3, Naegleria TET (NgTET), Coprinopsis cinerea (CcTET), or derivatives or analogues f.

26. The method of claim 23, wherein the oxidizing agent comprises a chemical oxidizing agent.

27. The method of claim 26, wherein the oxidizing agent comprises potassium perruthenate (KRuO4) or /TEMPO.

28. The method of claim 23, wherein the borane reducing agent comprises an agent selected from the group consisting of line borane (pic-BH3), borane, sodium dride, sodium cyanoborohydride, and sodium triacetoxyborohydride.

29. The method of claim 28, wherein the borane ng agent comprises borane.

30. The method of claim 28, wherein the borane reducing agent comprises sodium borohydride.

31. The method of claim 28, wherein the borane reducing agent comprises sodium cyanoborohydride.

32. The method of claim 28, wherein the borane reducing agent comprises sodium triacetoxyborohydride.

33. The method of claim 28, wherein the borane reducing agent comprises 2-picoline borane.

34. The method of claim 23, further comprising adding a blocking group to one or more of the modified cytosines in the nucleic acid .

35. The method of claim 34, wherein the blocking group is added prior to ting with the oxidizing agent.

36. The method of claim 35, wherein the one or more modified cytosines comprises 5hmC.

37. The method of claim 36, n the blocking group comprises a sugar or a uridine phate linked sugar.

38. The method of claim 34, wherein the blocking group is added after contacting with the oxidizing agent and prior to contacting with the borane reducing agent.

39. The method of claim 38, wherein the one or more modified cytosines comprises 5caC or

40. The method of claim 39, wherein the blocking group comprises an aldehyde reactive compound.

41. The method of claim 40, wherein the aldehyde reactive compound comprises a hydroxylamine tive, a hydrazine derivative, or a hydrazide derivative.

42. The method of claim 41, wherein adding the blocking group comprises contacting the nucleic acid sample with (i) a coupling agent and (ii) an amine, hydrazine, or hydroxylamine compound.

43. The method of claim 23, further comprising sequencing the nucleic acid sample after contacting with the borane reducing agent to identify converted cytosine bases.

44. The method of claim 23, wherein the nucleic acid sample comprises DNA or RNA. “gang , _ . Egg» gag a.” N... marmmﬁﬁmﬁﬁwumg $ng ﬁ...ﬁ ﬂ H 3&3”.wﬁﬁﬂm§§ﬁ§§ Em Egg. $33, $3 mXLNxmleN: mxﬁ Egg.Nﬁﬁmmﬁsaéﬂ 43% ﬁg.mITUII wﬁﬁﬁﬁwﬁﬁ gm: mﬁﬁmﬁmﬁmﬁ V. are”: $33 N: mégwg ﬁg: xmﬁagz §mg Q XE. [ﬂank NI ﬂing,E‘Ewwmwﬁmﬁm ﬁ \mmI‘ xme 3&3.aummcﬁuﬁmwﬁwﬁyw .mcﬁmm.,§mmw§m magma §$ Scfmmx mumwwxﬁg «Na? 3me E33», aamaxmgmﬁ SUBSTITUTE SHEET (RULE 26) WO 36413 .EEEE L....\\\.