WO2014165770A1 - Séquençage avec une résolution de l'ordre de la base de la 5-formylcytosine (5fc) et de la 5-carboxylcytosine (5cac) - Google Patents

Séquençage avec une résolution de l'ordre de la base de la 5-formylcytosine (5fc) et de la 5-carboxylcytosine (5cac) Download PDF

Info

Publication number
WO2014165770A1
WO2014165770A1 PCT/US2014/032997 US2014032997W WO2014165770A1 WO 2014165770 A1 WO2014165770 A1 WO 2014165770A1 US 2014032997 W US2014032997 W US 2014032997W WO 2014165770 A1 WO2014165770 A1 WO 2014165770A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
5cac
acid molecule
bisulfite
compound
Prior art date
Application number
PCT/US2014/032997
Other languages
English (en)
Inventor
Chuan He
Chunxiao Song
Xingyu LU
Original Assignee
The University Of Chicago
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The University Of Chicago filed Critical The University Of Chicago
Publication of WO2014165770A1 publication Critical patent/WO2014165770A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification

Definitions

  • the present invention relates generally to the field of molecular biology.
  • compositions for detecting, evaluating, identifying, quantifying, sequencing, and/or mapping 5-formyl-modified cytosine and 5- carboxyl-modified cytosine bases within a nucleic acid molecule.
  • 5-methylcytosine has a profound influence on mammalian development and various human diseases (Klose and Bird, 2006). However, one of the most fundamental areas of interest, the active demethylation of 5mC in mammalian cells, has only recently been unveiled (Bhutani et al., 2011). Recently, 5mC was discovered to be further oxidized to 5-hydroxymethylcytosine (5hmC) by the TET family dioxygenases in mammalian cells (Kriaucionis and Heintz, 2009; Tahiliani et al, 2009).
  • TET family dioxygenases can further oxidize 5hmC to 5-formylcytosine (5fC), and 5- carboxylcytosine (5caC) in a stepwise manner (He et al., 2011; Ito et al., 2011; Pfaffeneder et al., 2011).
  • the later oxidation products 5fC and 5caC are recognized and excised by mammalian DNA glycosylase, TDG, and subsequently converted to cytosine through base excision repair (BER) (Cortazar et al., 2011; Cortellino et al., 2011; He et al., 2011; Maiti and Drohat, 2011; Zhang et al., 2012), resulting in an active DNA demethylation pathway in mammals.
  • BER base excision repair
  • Genomic profiling of 5hmC has revealed its association with genes and gene regulatory elements in particular, where 5hmC is most abundant and 5mC is depleted (Ficz et al, 2011; Khare et al, 2012; Pastor et al, 2011; Song et al, 2011; Stadler et al, 2011; Stroud et al, 2011; Szulwach et al, 2011; Williams et al, 2011; Wu et al, 2011a; Xu et al, 201 1; Yu et al., 2012).
  • 5fC and 5caC behave similarly to cytosine in bisulfite sequencing-based methods (Booth et al, 2012; Yu et al., 2012), and their low abundance in mammalian genomic DNA (only ppm of total cytosines in mESC (Ito et al., 2011)) have made it challenging to effectively apply antibody- based immunoprecipitation, which typically works well with dense modifications (Pastor et al., 2011).
  • Methods may include one or more of the following steps: a) modifying, particularly chemically modifying, 5caC or 5fC in the nucleic acid molecule to protect 5caC or 5fC from a bisulfite-mediated deamination, and b) subjecting the nucleic acid molecule comprising the modified 5caC or modified 5fC to bisulfite treatment.
  • both 5caC and 5fC may be chemically modified in the nucleic acid molecule, sequentially or separately.
  • bisulfite treatment refers to any treatment with a bisulfite ion or bisulfite salt, such as sodium bisulfite, for example under conditions suitable to convert cytosine to uracil.
  • bisulfite sequencing refers to determining the methylation pattern of a nucleic acid through the use of bisulfite treatment.
  • Bisulfite-mediated deamination refers to a reaction in which cytosine in a nucleic acid molecule is deaminated to uracil with a bisulfite reagent under conditions suitable to convert cytosine to uracil.
  • 5fC is modified in a nucleic acid molecule to protect 5fC from bisulfite-mediated deamination.
  • the 5fC is chemically modified to an oxime or hydrazine.
  • an "oxime” may refer to a chemical compound belonging to the imines, with the general formula where Ri may be an organic side chain and R 2 may be hydrogen, forming an aldoxime, or another organic group, forming a ketoxime.
  • hydrazine also called diazane, may refer to an inorganic compound with the formula N 2 H 4 .
  • the modification agent or compound may comprise a hydroxylamine group, such ashydroxylamine; hydroxylamine hydrochloride; hydroxylammonium acid sulfate; hydroxylamine phosphate; O-methylhydroxylamine; O-hexylhydroxylamine; O- pentylhydroxylamine; O-benzylhydroxylamine; and particularly, O-ethylhydroxylamine (EtONH 2 ), O-alkylated or O-arylated hydroxylamine, acid or salts thereof.
  • a hydroxylamine group such ashydroxylamine; hydroxylamine hydrochloride; hydroxylammonium acid sulfate; hydroxylamine phosphate; O-methylhydroxylamine; O-hexylhydroxylamine; O- pentylhydroxylamine; O-benzylhydroxylamine; and particularly, O-ethylhydroxylamine (EtONH 2 ), O-alkylated or O-arylated hydroxylamine, acid or salts thereof.
  • the modification compound may be hydrazine or hydrazide.
  • a “hydrazine group” may refer to the divalent group -NR 1 R 2 -NH 2 , wherein R 1 and R 2 may be hydrogen, alkyl, aryl, or benzyl.
  • “hydrazine groups” or “hydrazine groups” include, but are not limited to, hydrazines, semicarbazides, carbazides, thiosemicarbazides, thiocarbazides, hydrazine carboxylates and carbonic acid hydrazines.
  • hydrazines used herein include N-alkylhydrazine, N-arylhydrazine, N- benzylhydrazine, ⁇ , ⁇ -dialkylhydrazine, N,N-diarylhydrazine, ⁇ , ⁇ -dibenzylhydrazine, N,N- alkylbenzylhydrazine, ⁇ , ⁇ -arylbenzylhydrazine, and N,N-alkylarylhydrazine.
  • a “hydrazide group” may refer to a common functional group characterized by a nitrogen to nitrogen covalent bond with four substituents with at least one of them being an acyl or sulfonyl group.
  • sulfonylhydrazides such as p- toluenesulfonylhydrazide which are useful reagents in organic chemistry such as in the Shapiro reaction.
  • This reagent can be prepared by reaction of tosyl chloride with hydrazine.
  • hydrazides used herein include -toluenesulfonylhydrazide, N-acylhydrazide, ⁇ , ⁇ -alkylacylhydrazide, N,N-benzylacylhydrazide, ⁇ , ⁇ -arylacylhydrazide, N- sulfonylhydrazide, N,N-alkylsulfonylhydrazide, ⁇ , ⁇ -benzylsulfonylhydrazide, and N,N- arylsulfony lhy drazide .
  • the pH for the 5fC modification may be a pH about, or at most about 1 , 1.5, 2,
  • 5fC may be attached to a compound comprising an amine group for reverse labeling.
  • the pH for the attachment reaction may be a pH of about, at least about, or at most about 7, 8, 9, 10, 11, 12, 13, 14 or any value or range derivable therein.
  • the pH for the detachment reaction to remove the amine group from 5fC may be a pH of about, at least about, or at most about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or any value or range derivable therein.
  • 5caC is modified in a nucleic acid molecule to protect 5caC from bisulfite-mediated deamination.
  • the 5caC is chemically modified to an amide.
  • 5caC is modified with a compound comprising an amine group, such as benzylamine, alkylamine, xyleneamine, cycloalkylamine, or hydroxylamine.
  • the pH for the 5caC modification may be a pH of at most or about 4, 5, 6, 7, 8, 9, 10, 11, 12 or any value or range derivable therein.
  • 5caC may be attached to a compound comprising an amine group or thiol group by incubating the nucleic acid molecule with a coupling agent like carbodiimide derivatives such as l-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC).
  • a coupling agent like carbodiimide derivatives such as l-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC).
  • Method related to isolation or enrichment of target nucleic acids may also be provided.
  • the methods may involve isolating the target nucleic acid before or after modifying 5fC and/or 5caC.
  • the methods may also involve enriching the target nucleic acid before or after modifying 5fC and/or 5caC.
  • the target nucleic acid may be 5fC-containing nucleic acids for 5fC protection and detection, or 5 caC -containing nucleic acids for 5caC protection and detection, or both.
  • the degree of enrichment may be at least about, at most about, or about 5x, lOx, 20x, 30x, 40x, 50x, 60x, 70x, 80x, 90x, lOOx, HOx, 120x, 130x, 140x, 150x, 160x, 170x, 180x, 190x, 200x, 21 Ox, 220x, 230x, 240x, 250x, 260x, 270x, 280x, 290x, 300x, 31 Ox, 320x, 330x, 340x, 350x, 360x, 370x, 380x, 390x, 400x, 410x, 420x, 430x, 440x, 450x, 460x, 470x, 480x, 490x, 500x, 600x, 700x, 800x, 900x, lOOOx, HOOx, 1200x, 1300x, 1400x, 1500x, 1600x, 1700x, 1800x, 1900x, 2000
  • the enrichment may result in a population of nucleic acids comprising at least about, at most about, or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100% target nucleic acids (e.g., any nucleic acids having a 5fC or 5ca
  • Certain methods may further involve attaching a detectable label or compound to 5caC or 5fC.
  • the label may be a fluorescent, radioactive, enzymatic, electrochemical, or colorimetric label, and in particular aspects, a biotin label.
  • the detectable label or compound is used to enrich the nucleic acid molecule.
  • methods may also involve detaching the detectable label or compound from 5fC or 5caC.
  • one or more labels described herein may be excluded.
  • 5fC may be incubated with a compound comprising a detectable compound or label.
  • 5fC may form an imine with the compound.
  • the compound may comprise an amine group attached to a detectable label or compound.
  • 5fC may be attached to a detectable label or compound.
  • the detectable label or compound attached to 5fC comprises an amine group.
  • the amine group may be a diamine group or a cyclic diamine group (i.e., cycloalkyldiamine). In particular embodiments, the group may or may not be alkyldiamine. In some embodiments, one or more detectable compounds or labels described herein may be excluded.
  • the amine-containing label or compound may be detached from 5fC by decreasing pH, removing diamine, or being replaced by hydroxylamine, hydrazine, or hydrazide.
  • the labeling of 5fC may be reversible.
  • the labeling is not reversible.
  • the pH may be decreased to a pH of about or below about 3, 4, 5, 6, 7 or any value or range derivable therein for the detachment of the detectable label or compound from 5fC.
  • 5caC may be attached to a detectable label or compound.
  • the detectable label or compound attached to 5caC comprises a thiol group, such as a xylene-based thiol group (i.e., phenylmethane thiol group) or a linear thiol group (i.e., an alkyl thiol group).
  • a thiol group such as a xylene-based thiol group (i.e., phenylmethane thiol group) or a linear thiol group (i.e., an alkyl thiol group).
  • the thiol-containing label or compound may be detached from 5caC by increasing pH to a pH of about, at least about, or at most about 7, 8, 9, 10, 11, 12, 13, 14 or any value or range derivable therein, or adding a base, such as sodium hydroxide or ammonium hydroxide or any base known in the art.
  • a base such as sodium hydroxide or ammonium hydroxide or any base known in the art.
  • nucleic acid enrichment methods may involve the use of antibodies, including, but not limited to, nucleic acid-specific antibodies that specifically bind to certain nucleic acid residues (e.g., 5fC or 5caC).
  • nucleic acid residues e.g., 5fC or 5caC
  • anti-5fC antibodies or anti-5caC antibodies may be used for enriching 5 fC -containing or 5 caC -containing nucleic acid molecules. They may also be used more generally to identify such molecules, as well as to segregate or isolate such molecules.
  • antibodies can be used to quantify or qualify the nucleic acid that is specifically identified by any of the antibodies.
  • enriched nucleic acids may be subject to further treatment, manipulation, or sequencing, such as CAB-seq (chemical assisted bisulfite sequencing method).
  • nucleic acids such as genomic nucleic acids may be denatured.
  • genomic nucleic acids may be subject to antibody- based enrichment methods, such as the use of anti-5fC antibodies.
  • the nucleic acids enriched for 5fC may be subject to bisulfite-based sequenceing.
  • a subpopulation of 5fC-enriched nucleic acids may be subject to 5fC-specific modification or labeling as described herein followed by bisulfite-based sequencing, or particularly, CAB- seq.
  • another subpopulation of 5fC-enriched nucleic acids may be subject to regular bisulfite treatment, for example, no specific modification or labeling of 5fC.
  • nucleic acids may be subject to antibody-based enrichment methods for other bases, such as the use of anti-5caC antibodies.
  • the nucleic acids enriched for 5caC may be subject to bisulfite-based sequencing.
  • a subpopulation of 5caC-enriched nucleic acids may be subject to 5caC-specific modification or labeling as described herein followed by bisulfite-based, or particularly, CAB-seq.
  • another subpopulation of 5caC-enriched nucleic acids may be subject to regular bisulfite treatment, for example, no specific modification or labeling of 5caC.
  • methods may be provided for sequencing a nucleic acid molecule comprising 5fCs and/or 5caCs, comprising isolating a nucleic acid molecule comprising 5fCs and/or 5caCs with anti-5fC and/or anti-5caC antibodies
  • the methods may further comprise modifying 5fCs and/or 5caCs in the nucleic acid molecule to protect 5caC from bisulfite-mediated deamination; and/or then subjecting the modified nucleic acid molecule to bisulfite sequencing.
  • Methods and compositions involve detecting, characterizing, and/or distinguishing between 5fC and 5caC after protecting the 5fC and/or 5caC from bisulfite treatment.
  • Method and compositions also involve detecting, characterizing, and/or distinguishing 5fC and/or 5caC from other types of modified cytosine, such as 5hmC, after protecting the 5fC and/or 5caC from bisulfite treatment.
  • Methods may involve identifying 5fC in the nucleic acids by comparing modified nucleic acids with unmodified nucleic acids or to nucleic acids whose modification state is already known. Detection of the modification can involve a wide variety of recombinant nucleic acid techniques.
  • a modified nucleic acid molecule is incubated with polymerase, at least one primer, and one or more nucleotides under conditions to allow polymerization of the modified nucleic acid.
  • methods may involve sequencing a modified nucleic acid molecule.
  • a modified nucleic acid is used in a primer extension assay. Such modifications and labels are synthetic with respect to genomic DNA.
  • methods may also involve sequencing the bisulfite- treated nucleic acid molecules modified as disclosed herein or the bisulfite -treated control nucleic acids, which has not been modified to protect 5fC and/or 5caC from bisulfite treatment.
  • the methods may further comprise converting 5-methylcytosine (5mC) and/or hydroxymethylcytosine in the nucleic acid molecule to 5-carboxylcytosine (5caC) after the modification step.
  • conversion to 5caC may comprise incubating the nucleic acid molecule with methylcytosine dioxygenase (TET). Due to the conversion to 5caC, particular sites of 5fC and/or 5caC in the modified nucleic acid may be identified because of its distinctive readout in the sequencing reaction after the bisulfite treatments.
  • TET methylcytosine dioxygenase
  • 5hmC in the nucleic acid may not be converted to 5-carboxylcytosine (5caC) before bisulfite treatment.
  • the detection of modified cytosine may depend on sequencing of a control nucleic acid.
  • methods comprising sequencing the bisulfite-treated nucleic acid molecule and a bisulfite-treated control nucleic acid to obtain both sequences. The methods may further comprise comparing the sequence of the bisulfite-treated nucleic acid molecule comprising the modified 5fC and/or 5caC with the sequence of a bisulfite-treated control nucleic acid to detect the modified 5fC and/or 5caC.
  • control nucleic acid sequence may refer to the sequence from a nucleic acid lacking the modification step as disclosed herein.
  • the control nucleic acid comprises 5fC and/or 5caC that has been attached to a detectable label or compound and has been later detached therefrom.
  • the modification of 5hmC and 5mC in the target nucleic acid described herein may be performed prior to the bisulfite treatment. Methods may further involve one or more of the following steps that are subsequent to the conversion of 5mC and 5hmC to 5caC: treating the nucleic acid with bisulfite; amplifying the bisulfite-treated nucleic acid; and sequencing the bisulfite-treated nucleic acid.
  • Methods and compositions may involve a control nucleic acid.
  • the control may be used to evaluate whether modification or other enzymatic or chemical reactions are occurring.
  • the control may be used to compare modification states.
  • the control may be a negative control or it may be a positive control. It may be a control that was not incubated with one or more reagents in the modification reaction.
  • a control nucleic acid may have a modification state (based on qualitative and/or quantitative information related to modification at 5-hmCs, or the absence thereof), which is used for comparing to a nucleic acid being evaluated.
  • the control nucleic acid used in sequencing may be a nucleic acid identical to the test nucleic acid in the sequence and epigenetic state but differs in at least one aspect, such as being unmodified on one of its functional groups.
  • control nucleic acid provides the basis for a control nucleic acid.
  • the control nucleic acid is from a normal sample with respect to a particular attribute, such as a disease or condition, or other phenotype.
  • the control sample is from a different patient population, a different cell type or organ type, a different disease state, a different phase or severity of a disease state, a different prognosis, a different developmental stage, before and after a treatment or exposure to an agent or substance, etc.
  • Particular embodiments involve a method of sequencing a nucleic acid molecule comprising 5-formylcytosine (5fC), comprising one or more of the following steps: a) chemically modifying 5fC in the nucleic acid molecule to an oxime or hydrazone; and b) subjecting the modified nucleic acid molecule to bisulfite sequencing.
  • the method comprises converting 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) in the nucleic acid molecule to 5caC between steps a) modification and b) bisulfite sequencing for direct detection of modified 5fC.
  • methods may involve comparing the sequence of the modified nucleic acid to a sequence of a control nucleic acid.
  • nucleic acid molecule comprising 5-formylcytosine (5fC)
  • methods of sequencing a nucleic acid molecule comprising 5-formylcytosine (5fC) comprising one or more of the following steps: a) chemically modifying 5-formylcytosine (5fC) in the nucleic acid molecule to an oxime or hydrazone; b) converting 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) in the nucleic acid molecule to 5caC; and c) subjecting the converted nucleic acid molecule to bisulfite sequencing.
  • methods may involve sequencing a nucleic acid molecule comprising 5-formylcytosine (5fC), comprising one or more of the following steps: a) chemically modifying 5fC in the nucleic acid molecule to an oxime or hydrazone; b) subjecting the modified nucleic acid molecule and a control nucleic acid to bisulfite sequencing; and c) comparing the sequence of the modified nucleic acid to the sequence of a control nucleic acid.
  • 5fC 5-formylcytosine
  • methods of sequencing a nucleic acid molecule comprising 5- formylcytosine (5fC) may comprise one or more of the steps: a) obtaining a nucleic acid molecule comprising 5fC that has been attached to a detectable label or compound and has been detached therefrom; b) chemically modifying the 5fC in the nucleic acid molecule to protect the 5 fC from bisulfite-mediated deamination; and c) subjecting the nucleic acid molecule to bisulfite sequencing.
  • obtaining a nucleic acid molecule comprising 5fC that has been attached to a detectable label or compound and has been detached therefrom is a reversible labeling and enrichment step, which may comprise one or more of the following sub steps: a) attaching a detectable label or compound to the 5fC in the nucleic acid molecule; b) isolating the nucleic acid molecule based on the detectable label or compound; and, c) detaching the 5fC from the detectable label or compound.
  • Particular embodiments involve a method of sequencing a nucleic acid molecule comprising 5-carboxylcytosine (5caC), comprising one or more of the following steps: a) chemically modifying 5caC in the nucleic acid molecule to protect 5caC from bisulfite-mediated deamination; and b) subjecting the modified nucleic acid molecule to bisulfite sequencing.
  • 5caC 5-carboxylcytosine
  • the method comprises converting 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) in the nucleic acid molecule to 5caC between steps a) modification and b) bisulfite sequencing for direct detection of modified 5caC, which will exhibit different behavior or give different readout from unmodified 5caC in sequencing, particularly bisulfite sequencing.
  • methods may involve comparing the sequence of the modified nucleic acid to a sequence of a control nucleic acid.
  • methods of sequencing a nucleic acid molecule comprising 5caC comprising one or more of the following steps: a) chemically modifying 5-carboxylcytosine (5caC) in the nucleic acid molecule; b) converting 5- methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) in the nucleic acid molecule to 5caC; and c) subjecting the converted nucleic acid molecule to bisulfite sequencing.
  • methods may involve sequencing a nucleic acid molecule comprising 5-carboxylcytosine (5caC), comprising one or more of the following steps: a) chemically modifying 5caC in the nucleic acid molecule; b) subjecting the modified nucleic acid molecule and a control nucleic acid to bisulfite sequencing; and c) comparing the sequence of the modified nucleic acid to the sequence of a control nucleic acid.
  • 5caC 5-carboxylcytosine
  • methods of sequencing a nucleic acid molecule comprising 5- carboxylcytosine (5 caC) may comprise one or more of the steps: a) obtaining a nucleic acid molecule comprising 5caC that has been attached to a detectable label or compound and has been detached therefrom; b) chemically modifying the 5caC in the nucleic acid molecule to protect the 5caC from bisulfite-mediated deamination; and c) subjecting the nucleic acid molecule to bisulfite sequencing.
  • obtaining a nucleic acid molecule comprising 5caC that has been attached to a detectable label or compound and has been detached therefrom is a reversible labeling and enrichment step, which may comprise one or more of the following sub steps: a) attaching a detectable label or compound to the 5caC in the nucleic acid molecule; b) isolating the nucleic acid molecule based on the detectable label or compound; and, c) detaching the 5caC from the detectable label or compound.
  • a method of detecting 5- carboxylcytosine (5caC) in a nucleic acid molecule comprising one or more of the steps: a) attaching a first detectable label or compound comprising an amine group to 5caC in the nucleic acid molecule and a second detectable label or compound comprising a thiol group to 5caC in a control nucleic acid molecule; b) enriching the nucleic acid molecule having 5caC based on the first detectable label or compound; c) enriching the control nucleic acid molecule having 5caC based on the second detectable label or compound; d) detaching the second detectable label or compound from the control nucleic acid; e) subjecting both nucleic acid molecules to bisulfite sequencing to obtain two sequences; and f) detecting 5caC by comparing the two sequences. It is contemplated that 1, 2, 3, 4, or all five steps may be employed.
  • methods comprising incubating the nucleic acid molecule or the control nucleic acid molecule in carbodiimide derivatives such as EDC as coupling agents to facilitate the attachment of detectable labels or compounds.
  • nucleic acid molecules may be DNA, RNA, or a combination of both. In some embodiments it is contemplated that RNA or DNA may be excluded ot that RNA or DNA may be isolated such that there are only contaminating amounts of other nucleic acids. Nucleic acids may be recombinant, genomic, or synthesized. In further aspects, there may be provided samples that comprise isolated nucleic acid molecules. For example, the sample may be in solution. In other aspects, the nucleic acid molecule may be attached to a solid support, such as an array. In certain aspects, the bisulfite treatment is done in situ or in vitro.
  • methods involve nucleic acid molecules that are isolated and/or purified.
  • the nucleic acid may be isolated from a cell or biological sample in some embodiments.
  • Certain embodiments involve isolating nucleic acids from a eukaryotic, mammalian, or human cell. In some cases, they are isolated from non-nucleic acids.
  • Methods and compositions may involve a purified nucleic acid, modification reagent, labels, and/or coupling agents. Such protocols are known to those of skill in the art.
  • purification may result in a molecule that is about or at least about 70, 75, 80, 85, 90, 95, 96, 97, 98, 99, 99.1, 99.2, 99.3, 99.4, 99.5, 99.6, 99.7 99.8, 99.9% or more pure, or any range derivable therein, relative to any contaminating components (w/w or w/v).
  • steps including, but not limited to, obtaining information (qualitative and/or quantitative) about one or more 5mCs and/or 5hmCs and/or 5fCs and/or 5caCs in a nucleic acid sample; ordering an assay to determine, identify, and/or map 5mCs and/or 5hmCs and/or 5fCs and/or 5caCs in a nucleic acid sample; reporting information (qualitative and/or quantitative) about one or more 5mCs and/or 5hmCs and/or 5fCs and/or 5caCs in a nucleic acid sample; comparing that information to information about 5mCs and/or 5hmCs and/or 5fCs and/or 5caCs in a control or comparative sample.
  • the terms “determine,” “analyze,” “assay,” and “evaluate” in the context of a sample refer to transformation of that sample to gather qualitative and/or quantitative data about the sample.
  • the term “map” means to identify the location within a nucleic acid sequence of the particular nucleotide.
  • the nucleic acid molecule is eukaryotic; in some cases, the nucleic acid is mammalian, which may be human. This means the nucleic acid molecule is isolated from a human cell and/or has a sequence that identifies it as human. In particular embodiments, it is contemplated that the nucleic acid molecule is not a prokaryotic nucleic acid, such as a bacterial nucleic acid molecule. In additional embodiments, isolated nucleic acid molecules are on an array. In particular cases, the array is a microarray.
  • a nucleic acid is isolated by any technique known to those of skill in the art, including, but not limited to, using a gel, column, matrix or filter to isolate the nucleic acids.
  • the gel is a polyacrylamide or agarose gel.
  • kits which may be in a suitable container, that can be used to achieve the described methods.
  • there are kits comprising one or more modification agents (e.g., chemicals) and one or more coupling agents.
  • the molecules may have or involve different types of modifications.
  • a kit may include one or more buffers, such as buffers for nucleic acids or for reactions involving nucleic acids.
  • an enzyme is a polymerase.
  • Kits may also include nucleotides for use with the polymerase.
  • a restriction enzyme is included in addition to or instead of a polymerase.
  • the kit may comprise nucleic acid molecules containing one or more 5fCs and/or 5caCs.
  • kits comprising a coupling or modifying agent such as a compound comprising a hydroxy lamine group, a hydrazine group or a hydrazide group as well as as a carbodiimide derivative, a modifying agent such as a compound having an amine group or a thiol group, and a detectable label such as a biotin group.
  • the kit comprises a nucleic acid having one or more 5caCs; a carbodiimide derivative; a compound having an amine group or a thiol group; and a detectable label.
  • nucleic acid molecules that have been modified at the nucleotides that are 5fC and/or 5caC.
  • FIGS. 2A-2H Base-resolution detection of 5fC.
  • A MALDI-TOF of 5fC- containing 9mer DNA with EtONH 2 treatment. Reactions were performed in duplex DNA with the complementary strand; however, MS monitored the single-stranded 9mer DNA containing the modification. Calculated MS shown in black, observed MS shown in Red.
  • B Sanger sequencing data comparing 5fC-containing 76mer DNA in bisulfite sequencing (BS) before and after EtONH 2 or NaBH 4 treatment. The oligos were treated with the bisulfite thermal cycle program either once or twice. The 5fC alone sample can achieve a relatively high level of deamination after two rounds of bisulfite treatments.
  • Sequencing depth 865 ⁇ 254.
  • H Sanger sequencing data of the 76mer DNA showing that hydroxylamine does not alter the behavior of cytosine in bisulfite sequencing..
  • FIG. 3 Scheme for the Chemical modification- Assisted Bisulfite Sequencing
  • FIGS. 4A-4B Chemical labeling of 5fC (A) and 5caC (B) as confirmed by MALDI.
  • FIG. 6 EDC-based chemical selective labeling of 5caC
  • FIG. 7 EDC-catalyzed 5caC coupling with different amines.
  • FIG. 8A-8b MALDI-TOF of 5caC labeling with benzylamine-S-S-biotin on the left and benzyl thiol biotin on the right showing both amine and thiol labeling.
  • FIG. 9 Sanger sequencing of bisulfite -treated, amine-labeled 76 mer 5caC- containing oligo.
  • FIG. 10 Selective pull-down of 5caC from genomic DNA and sequencing
  • FIG. 11 Pull-down of the 76 mer 5caC-containing oligo after protection with amine and thiol, respectively.
  • the amine-based protection is resistant to bisulfite while the thiol-based one can be reversed with 0.5 M NaOH, 60°C for 3 hours. Colony-picking data are shown for both cases.
  • FIG. 12 Strategy for reversible labeling of 5fC.cis-l,2-Diaminocyclohexane can react with 5fC under basic pH.
  • the resulting product is stable at neutral pH but can be reversed under acidic pH.
  • FIG. 13 MALDI of model reactions showing that 5fC in a 9mer DNA can react efficiently with cis-l,2-Diaminocyclohexane and the product is stable at neutral pH.
  • FIG. 14 Strategy for reversible 5fC chemical labeling for enrichment and base-resolution sequencing.
  • 5fC in the genomic DNA can be labeled with diamine-biotin under basic pH, enriched by streptavidin beads at neutral pH and reversed at acidic pH.
  • the enriched 5fC DNA is then subjected to fCAB-Seq for genome -wide or loci-specific, base- resolution detection of 5fC.
  • FIG. 15A-15B Strategies for 5fC chemical labeling for enrichment and base-resolution sequencing.
  • FIG. 15A 5caC-containing DNA can be labeled with amine - biotin and thiol-biotin in parallelusing EDC-based chemistry. After pull-down, thiol-labeled DNA can be reversed back to 5caC. Then the two samples can be subjected to the bisulfite- seq, the different behaviors of amine protected 5caC and 5caC will give the base resolution information for 5caC site.
  • FIG. 15B 5caC DNA can be pull-downed by thiol-based labeling, and reversed back to 5caC. Half of the pull-downed 5caC DNA is protected by amine-based labeling. During bisulfite-seq, the comparison of protected 5caC and 5caC from one pulldown will give the single base resolution information of 5caC.
  • FIG. 16 HPLC yield of reversible EDC based thiol labeling and EDC based amine protection.
  • FIG. 17 MALDI of model reactions showing that 5caC in a 9mer DNA can react efficiently with thiol-PEG 2 -biotin. The thiol labeled DNA can be reversed back to 5caC and protected by amine.
  • FIGS. 18A-18B - A strategy for single-base resolution sequencing of 5fC and
  • Genomic DNA can be denatured and enriched for 5fC (A) with anti-5fC antibody or 5caC (B) with anti-5caC antibody.
  • the enriched fragments are split into two halves. One half is subjected to CAB-seq (chemical labeling followed by bisulfite treatment) and the other half is subjected to regular bisulfite treatment. Subtraction gives 5fC (A) or 5caC (B) with single-base resolution information.
  • FIGS 19A-19D Validation and characterization for anti-5fC and anti-5caC antibodies and DNA Immunoprecipitation CAB-Seq strategy.
  • A 5fC- and 5caC-containing 76-mer DNA oligonucleotides can be specifically recognized by the anti-5fC and anti-5caC antibodies, respectively.
  • B DIP-qPCR assays demonstrated specific enrichment of 5fC- and 5 caC -containing DNA fragments using these antibodies.
  • C Sanger sequencing of bisulfite- treated DNA showing high efficiency of protected 5fC and protected 5caC from deamination following immunoprecipitation and enrichment.
  • D Detection limits for 5fC and 5caC of CAB-Seq can be significantly improved after immunoprecipitation with corresponding antibodies.
  • Certain embodiments are directed to methods and compositions for modifying, detecting, and/or evaluating 5fC and/or 5caC in nucleic acids.
  • 5fC and/or 5caC is protected from a bisulfite-mediated deamination.
  • 5fC and/or 5caC is attached to a labeled moiety which may be detached later.
  • detectable groups biotin, fluorescent tag, radioactive groups, etc.
  • Certain embodiments may also involve methods and compositions for sequencing or profiling nucleic acids comprising the modified or labeled 5fC and/or 5caC.
  • the term “enriching a target sequence” refers to increasing the amount of a target sequence and increasing the ratio of target sequence relative to the corresponding reference sequence in a sample. For example, where the ratio of a target sequence to a reference sequence is initially 5% to 95% in a sample, the target sequence may be amplified in an amplification reaction or in a label-based selection so as to produce a ratio of 70% target sequence to 30%> reference sequence. Thus, there is a 14-fold enrichment of the target sequence relative to the reference sequence.
  • the degree of enrichment may be at least, at most, or about 2X, 3X, 4X, 5X, 6X, 7X, 8X, 9X, 10X, 15X, 20X, 25X, 30X, 35X, 40X, 45X, 50X, 60X, 70X, 80X, 90X 100X, 150X, 200X, 1000X, 2000X, ⁇ , ⁇ or any number or range derivable therein.
  • target sequence refers to a nucleic acid that is less prevalent in a nucleic acid sample than a corresponding reference sequence. For example, the target sequence makes up less than 50% of the total amount of reference sequence and target sequence in a sample.
  • the target sequence may be a mutant allele or a methylated allele.
  • the target sequence may be a sequence comprising one or more of the cytosine variants, such as 5fC, 5caC, 5hmC, or 5mC, while the sequence that does not comprise the corresponding variant is the reference sequence.
  • a "target strand” refers to a single nucleic acid strand of a target sequence.
  • the target sequence may be at least 50%> homologous to the corresponding reference sequence, but must differ by at least one nucleotide from the reference sequence.
  • Target sequences may be amplifiable via PCR with the same pair of primers as those used for the reference sequence.
  • the term "reference sequence” or “reference nucleic acid” refers to a nucleic acid that may be more prevalent in a nucleic acid sample than a corresponding target sequence.
  • the reference sequence may make up over 50%> of the total reference sequence and target sequence in a sample.
  • the reference sequence may be expressed at the RNA and/or DNA level 10X, 15X, 20X, 25X, 30X, 35X, 40X, 45X, 50X, 60X, 70X, 80X, 90X 100X, 150X, 200X or more than the target sequence.
  • a “reference strand” refers to a single nucleic acid strand of a reference sequence.
  • control nucleic acid refers to a nucleic acid that may be used as a control in a nucleic acid assay.
  • the control nucleic acid may be a nucleic acid in its unmodified state as compared to a modified nucleic acid under the same conditions.
  • the control nucleic acid may be a control nucleic acid that may be enriched along with a test nucleic acid using any of the methods described herein.
  • 5-Formylcytosine is one of the DNA variants that is produced when Tet enzymes act on 5-hydroxymethylcytosine. Further oxidation of 5-formylcytosine by the Tet enzyme will results in conversion to 5-carboxylcytosine. It is believed that the oxidation of 5- methylcytosine through the various DNA methylation variants represents a mechanism of DNA demethylation, and that this demethylation pathway has a function during development and germ cell programming. 5-Formylcytosine is present in mouse embryonic stem (ES) cells and major mouse organs. This DNA modification also appears in the paternal pronucleus post-fertilization, concomitant with the disappearance of 5 -methylcytosme, suggesting its involvement in the DNA demethylation process.
  • ES mouse embryonic stem
  • 5-Carboxylcytosine has been identified as one of the DNA methylation variants that is produced when Tet enzymes oxidize 5-hydroxymethylcytosine and, subsequently 5-formylcytosine. It is believed that the oxidation of 5 -methylcytosme through to 5-carboxylcytosine represents a mechanism of DNA demethylation, and that this demethylation pathway has a function during development and germ cell programming. It has been suggested that 5caC is excised from genomic DNA by thymine DNA glycosylase (TDG), which returns the cytosine residue back to its unmodified state.
  • TDG thymine DNA glycosylase
  • 5-Carboxylcytosine has been identified in mouse embryonic stem (ES) cells. This DNA modification appears in the paternal pronucleus post- fertilization, concomitant with the disappearance of 5- methylcytosine, further lending support that this variant is part of a DNA demethylation pathway.
  • 5 -Methylcytosme is the DNA modification that results from the transfer of a methyl group from S-adenosyl methionine (also known as AdoMet or SAM) to the carbon 5 position of a cytosine residue. This transfer is catalyzed by DNA methyltransferase enzymes (DNMTs). 5 -Methylcytosme is the most common and widely studied form of DNA methylation. It usually occurs within CpG dinucleotide motifs, although non-CpG methylation has been identified in embryonic stem cells.
  • SAM S-adenosyl methionine
  • DNMTs DNA methyltransferase enzymes
  • 5-Hydroxymethylcytosine is a DNA methylation modification that occurs as a result of enzymatic oxidation of 5-methylcytosine (5mC) by the Tet family of iron-dependent deoxygenases3.
  • 5-Hydroxymethylcytosine can be found in elevated amounts in certain mammalian tissues, such as mouse Purkinje cells and granule neurons.
  • 5hmC may be produced by the addition of formaldehyde to DNA cytosines by DNMT proteins.
  • DNA epigenetic modifications on cytosine play key roles in biological functions and various diseases.
  • most common technique for studying cytosine methylation is the bisulfite treatment-based sequencing. This technique has major drawbacks in not being able to differentiate 5fC and 5caC, and harsh conditions are required. Readily available and robust technologies for clinical diagnostic of 5fC and 5caC are very limited.
  • Some embodiments of the inventors present a method for identifying 5fC and/or 5caC or distinguishing 5fC from 5caC or other states of cytosine in a nucleic acid and specific site detection of 5fC and/or 5caC for clinical or other applications in an economic and highly efficient way.
  • the methods may involve modification of nucleic acids for protection of 5fC and/or 5caC from bisulfite-mediated deamination or for enrichment of nucleic acids containing 5fC and/or 5caC.
  • the step of enriching a sample or nucleic acids for sequences comprising CpG islands or specific cytosine variants, such as 5fC or 5caC can be done in different ways.
  • the nucleic acids containing specific cytosine variants may be selected, isolated, or enriched by labels attached thereto.
  • the nucleic acids may be attached to one or more detectable labels or compounds.
  • the label may be fluorescent, radioactive, enzymatic, electrochemical, or colormeric.
  • the label may be an amine-containing compound coupled to biotin.
  • the labels or compounds may be removed from the nucleic acids after selection, isolation or enrichment.
  • the attachment to a detectable label or compound may be reversed by changing the condition, such as by increasing pH or decreasing pH, adding a base or acid, or replacing with another functional group.
  • Optional enrichment methods may involve the use of antibodies, such as antibodies that specifically bind 5fC-containing or 5 caC -containing nucleic acids (e.g., as described in Inoue et al, 2011, incorporated herein by reference). Any form of antibodies known in the art can be used in embodiments described herein.
  • DNA immunoprecipitation coupled chemical- modification assisted bisulfite sequencing can be used to generate genome- wide, single-base resolution maps for 5fC and 5caC, wherein antibody-based immunoprecipitation is used prior to the downstream subsequent CAB-Seq.
  • the enrichment methods may be combined with one or more enrichment methods known in the art.
  • An additional technique for enrichment is immunoprecipitation of methylated hydroxymethylated, 5fC-specific, and 5caC-specific DNA using specific antibodies, such as a methyl-Cytosine specific antibody (Weber et al., 2005).
  • an enrichment step can comprise digesting the sample with one or more restriction enzymes which more frequently cut regions of DNA comprising no CpG islands and less frequently cut regions comprising CpG islands, and isolating DNA fragments with a specific size range.
  • CpG island refers to regions of DNA with a high G/C content and a high frequency of CpG dinucleotides relative to the whole genome of an organism of interest. Also used interchangeably in the art is the term “CG island.”
  • the ⁇ ⁇ in “CpG island” refers to the phosphodiester bond between the cytosine and guanine nucleotides.
  • 5fC may be labeled with or attached to a label or a compound, wherein the label or compound may comprise an amine group.
  • the amine group on 5fC is formed when a compound having an amine group reacts with 5fC.
  • R alkyl, aminoalkyl, aminocyclicalkyl, benzyl, alkylamine, cycloalkyl, or cycloalkylamine.
  • the amine group may be alkylamine, arylamine, benzylamine or cycloalkylamine (including cycloalkyldiamine).
  • the amine group may be alkyldiamine (such as Entry 3, 4, 5, 6 in Table 2) or more particularly cycloalkyldiamine (such as Entry 5 and 6 in Table 2).
  • 5caC may be labeled with or attached to a label or a compound, wherein the label or compound may comprise a thioester group.
  • the thioester group on 5caC may be formed upon reaction of a carboxylic acid of 5caC with a compound having a thiol group.
  • the compound having a thiol group may contain a carbon-bonded sulfhydryl (-C-SH or R- SH) group (where R represents an alkane, alkene, or other carbon-containing group of atoms).
  • the -SH functional group may be referred to as either a thiol group or a sulfhydryl group.
  • Non-limiting examples of thiol-containing compounds may include methanethiol - CH 3 SH [m-mercaptan], ethanethiol - C 2 H 5 SH [e- mercaptan], 1-Propanethiol
  • the thiol-containing compound may have a generic formula of SH-R.
  • Linking/coupling agents and/or mechanisms known to those of skill in the art can be used to combine to components or agents in certain embodiments, such as, for example, antibody-antigen interaction, avidin biotin linkages, amide linkages, ester linkages, thioester linkages, ether linkages, thioether linkages, phosphoester linkages, phosphoramide linkages, anhydride linkages, disulfide linkages, ionic and hydrophobic interactions, bispecific antibodies and antibody fragments, or combinations thereof.
  • the thiol group may be an alkyl thiol, a xylene- based thiol (or a phenylmethane thiol group, e.g., 2a in Table 3) or a linear thiol (or an alkyl thiol group, e.g., 2b in Table 3).
  • methods involve protection of particular cytosine variants from bisulfite treatment, especially bisulfite-mediated deamination.
  • the protection may include chemical modification of these variants so the readout in bisulfite sequencing of the modified sequence may be different from the unmodified control nucleic acid.
  • Certain embodiments are directed to methods and compositions for modifying nucleic acids containing 5fC or modifying, detecting, and/or evaluating 5fC in nucleic acids.
  • a target nucleic acid is modified to protect 5fC from a bisulfite-mediated deamination.
  • a functional group e.g., a hydroxyamine group
  • a functional group may be incorporated into or attached to a nuclei acid using methods described herein. This incorporation or attachment of a functional group allows further labeling or tagging cytosine residues with biotin or tags.
  • the labeling or tagging of 5fC can use, for example, click chemistry or other functional/coupling groups know to those skilled in the art.
  • the labeled or tagged nucleic acid fragments containing 5fC can be enriched, isolated, detected and/or evaluated.
  • Hydroxylamine groups that may be used in certain aspects include those having the general formula or having a functional group having the general formula of:
  • R l5 R 2 are hydrogen
  • R 3 is selected from the group consisting of hydrogen, lower alkyl, and aryl; and water soluble salts of these hydroxylamines.
  • the lower alkyl group may generally have from 1 to 8 carbon atoms and the aryl group may be, for example, phenyl, benzyl, and tolyl.
  • Non-limiting examples of suitable hydroxylamine-containing compounds include hydroxylamine; hydroxylamine hydrochloride; hydroxylammonium acid sulfate; hydroxylamine phosphate; O-methylhydroxylamine; O-hexylhydroxylamine; O- pentylhydroxylamine; O-benzylhydroxylamine; and particularly, O-ethylhydroxylamine (EtONH 2 ), or any O-alkylated or O-arylated hydroxylamine may be used.
  • Also suitable for use in certain aspects are compounds, which upon being added to the aqueous system, yield hydroxylamines.
  • the compound containing a hydroxylamine group may also include substituted derivatives of hydroxylamine. If the hydroxyl hydrogen is substituted, this is called an O-hydroxylamine. Similarly to ordinary amines, one can distinguish primary, secondary and tertiary hydroxylamines, the latter two referring to compounds where two or three hydrogens are substituted, respectively.
  • a "hydrazine group” may refer to the divalent group -NR 1 R 2 -NH 2 , wherein R 1 and R 2 may be alkyl, aryl, or benzyl.
  • hydrazine groups include, but are not limited to, hydrazines, hydrazides, semicarbazides, carbazides, thiosemicarbazides, thiocarbazides, hydrazine carboxylates and carbonic acid hydrazines.
  • hydrazines used herein include N-alkylhydrazine, N-arylhydrazine, N- benzylhydrazine, ⁇ , ⁇ -dialkylhydrazine, N,N-diarylhydrazine, ⁇ , ⁇ -dibenzylhydrazine, N,N- alkylbenzylhydrazine, ⁇ , ⁇ -arylbenzylhydrazine, and N,N-alkylarylhydrazine.
  • a “hydrazide group” may refer to a common functional group characterized by a nitrogen to nitrogen covalent bond with four substituents with at least one of them being an acyl group.
  • Important members of this class are sulfonylhydrazides such as p- toluenesulfonylhydrazide which are useful reagents in organic chemistry such as in the Shapiro reaction.
  • This reagent can be prepared by reaction of tosyl chloride with hydrazine.
  • hydrazides used herein include -toluenesulfonylhydrazide, N-acylhydrazide, ⁇ , ⁇ -alkylacylhydrazide, N,N-benzylacylhydrazide, ⁇ , ⁇ -arylacylhydrazide, N- sulfonylhydrazide, N,N-alkylsulfonylhydrazide, ⁇ , ⁇ -benzylsulfonylhydrazide, and N,N- arylsulfony lhy drazide .
  • Certain embodiments are directed to methods and compositions for modifying nucleic acids containing 5caC.
  • a target nucleic acid may be modified to protect 5caC from a bisulfite-mediated deamination.
  • the nucleic acid may be transformed into an amide by reaction with an amine-containing compound or a compound comprising an amine group.
  • the amine group may be alkylamine, cycloalkylamine, benzylamine, xyleneamine, hydroxylamine or any amine groups listd in FIG. 7.
  • the amine-containing compound may be alkylamine (e.g., la, lc, and Id in FIG. 7), cycloalkylamine, or benzylamine.
  • the amine group may be attached to a detected label or compound, such as a biotin.
  • Amine groups are functional groups that contain a basic nitrogen atom with a lone pair. Amines are derivatives of ammonia, wherein one or more hydrogen atoms have been replaced by a substituent such as an alkyl or aryl group.
  • the compound comprising an amine group may be an aliphatic amine or an aromatic amine, a primary amine, a secondary amine, a tertiary amine, or a cyclic amine.
  • An aliphatic amine has no aromatic ring attached directly to the nitrogen atom.
  • Aromatic amines have the nitrogen atom connected to an aromatic ring as in the various anilines.
  • the aromatic ring decreases the alkalinity of the amine, depending on its substituents.
  • the presence of an amine group strongly increases the reactivity of the aromatic ring, due to an electron-donating effect.
  • Amines may also be organized into four subcategories:
  • Primary amines - Primary amines arise when one of three hydrogen atoms in ammonia is replaced by an alkyl or aromatic.
  • Important primary alkyl amines include methylamine, ethanolamine (2-aminoethanol), and the buffering agent tris, while primary aromatic amines include aniline.
  • Secondary amines - Secondary amines have two substituents (alkyl, aryl or both) bound to N together with one hydrogen. Important representatives include dimethylamine and methylethanolamine, while an example of an aromatic amine would be diphenylamine.
  • Cyclic amines - Cyclic amines are either secondary or tertiary amines.
  • cyclic amines examples include the 3 -member ring aziridine and the six-membered ring piperidine.
  • N-methylpiperidine and N-phenylpiperidine are examples of cyclic tertiary amines.
  • Ri and R 2 are the same or different and can be alkyl or aryl.
  • the carbodiimide derivative may be l-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) or ⁇ , ⁇ '- dicyclohexylcarbodiimide (DCC).
  • 5mC and/or 5hmC in a nucleic acid may be subjected to modification, such as oxidation to 5caC.
  • modification such as oxidation to 5caC.
  • oxidation of 5mC and/or 5hmC to 5caC can be accomplished by contacting a nucleic acid with a methylcytosine dioxygenase (e.g., TET1, TET2 and TET3) or an enzyme having similar activity or the catalytic domain of a methylcytosine dioxygenase; or chemical modification.
  • a methylcytosine dioxygenase e.g., TET1, TET2 and TET3
  • an enzyme having similar activity or the catalytic domain of a methylcytosine dioxygenase or chemical modification.
  • the nucleic acid may be an isolated nucleic acid, a nucleic acid in a sample, a nucleic acid that has been modified by methods described above (e.g., modification of 5fC and/or 5caC), or a nucleic acid that has not been modified.
  • methods may involve treating the nucleic acid with bisulfite under conditions that will allow sequencing of the nucleic acid and/or amplifying the bisulfite-treated nucleic acid.
  • the amplified nucleic acid or modified nucleic acid may be sequenced or analyzed. For example, in bisulfite sequencing, oxidized 5mC and 5hmC may be read as T, thus the positions of protected 5fC or 5caC may be determined because protected 5fC or 5caC may be read as C.
  • TET1, TET2, or TET3 are human or mouse proteins.
  • Human TET1 has accession number NM 030625.2; human TET2 has accession number NM_001127208.2, alternatively, NM_017628.4; and human TET3 has accession number NM_ 144993.1.
  • Mouse TET1 has accession number NM_027384.1; mouse TET2 has accession number NM 001040400.2; and mouse TET3 has accession number NMJ83138.2.
  • 5 -methylcytosine (5mC) in DNA has an important function in gene expression, genomic imprinting, and suppression of transposable elements.
  • 5mC can be converted to 5- hydroxymethylcytosine (5hmC) by the Tet (ten eleven translocation) proteins.
  • the Tet proteins can also convert 5mC to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) in an enzymatic activity-dependent manner (Ito et al., 2011, incorporated by reference).
  • the enrichment methods or protection methods may involve the use of labels.
  • the labels may be attached to the marker for selection or enrichment or the modification moiety that may modify cytosine variants.
  • the label can be any label that is detected, or is capable of being detected. Examples of suitable labels include, e.g., chromogenic label, a radiolabel, a fluorescent label, and a biotinylated label.
  • the label can be, e.g., fluorescent label, biotin label, radiolabel and the like.
  • the label is a chromogenic label.
  • the term "chromogenic label" includes all agents that have a distinct color or otherwise detectable marker.
  • markers used include fluorescent groups, biotin tags, enzymes (that may be used in a reaction that results in the formation of a colored product), magnetic and isotopic markers, and so on.
  • detectable markers is for illustrative purposes only, and is in no way intended to be limiting or exhaustive.
  • the label may be attached to the nucleic acid or specific cytosine residues
  • the labels may be attached to modifying agents such as compounds comprising a thiol group, an amine group, a hydroxylamine group, or any functional groups using methods known in the art.
  • Labels include any detectable group attached to a modification moiety, such as an amine or hydroxylamine group, or detection agent that does not interfere with its function.
  • Further labels that may be used include fluorescent labels, such as Fluorescein, Texas Red, Lucifer Yellow, Rhodamine, Nile-red, tetramethyl-rhodamine-5-isothiocyanate, 1 ,6-diphenyl- 1,3,5-hexatriene, cis-Parinaric acid, Phycoerythrin, Allophycocyanin, 4',6-diamidino-2- phenylindole (DAPI), Hoechst 33258, 2-aminobenzamide, and the like.
  • Further labels include electron dense metals, such as gold, ligands, haptens, such as biotin, radioactive labels.
  • a fiuorophore contains or is a functional group that will absorb energy of a specific wavelength and re-emit energy at a different (but equally specific) wavelength. The amount and wavelength of the emitted energy depend on both the fiuorophore and the chemical environment of the fiuorophore.
  • Fluorophores can be attached to protein using functional groups and or linkers, such as amino groups (Active ester, Carboxylate, Isothiocyanate, hydrazine); carboxyl groups (carbodiimide); thiol (maleimide, acetyl bromide); azide (via click chemistry or non-specifically (glutaraldehyde).
  • Fluorophores can be proteins, quantum dots (fluorescent semiconductor nanoparticles), or small molecules. Common dye families include, but are not limited to Xanthene derivatives: fluorescein, rhodamine, Oregon green, eosin, Texas red etc; Cyanine derivatives: cyanine, indocarbocyanine, oxacarbocyanine, thiacarbocyanine and merocyanine; Naphthalene derivatives (dansyl and prodan derivatives); Coumarin derivatives; oxadiazole derivatives: pyridyloxazole, nitrobenzoxadiazole and benzoxadiazole; Pyrene derivatives: cascade blue etc.; BODIPY (Invitrogen); Oxazine derivatives: Nile red, Nile blue, cresyl violet, oxazine 170 etc.; Acridine derivatives: pro flavin, acridine orange, acridine yellow etc.; Arylmethine
  • fluorophores include: Hydroxycoumarin; Aminocoumarin;
  • PE Phycoerythrin
  • Alexa Fluor dyes may include: Alexa Fluor 350, Alexa
  • Alexa Fluor 405 Alexa Fluor 430, Alexa Fluor 488, Alexa Fluor 500, Alexa Fluor 514, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 610, Alexa Fluor 633, Alexa Fluor 647, Alexa Fluor 660, Alexa Fluor 680, Alexa Fluor 700, Alexa Fluor 750, and Alexa Fluor 790.
  • Cy Dyes may include Cy2, Cy3, Cy3B, Cy3.5, Cy5, Cy5.5 and Cy7.
  • Nucleic acid probes may include Hoechst 33342, DAPI, Hoechst 33258, SYTOX Blue, Chromomycin A3, Mithramycin, YOYO-1, Ethidium Bromide, Acridine Orange, SYTOX Green, TOTO-1, TO-PRO-1, TO-PRO: Cyanine Monomer, Thiazole Orange, Propidium Iodide (PI), LDS 751, 7-AAD, SYTOX Orange, TOTO-3, TO-PRO-3, and DRAQ5.
  • Additional dyes may include Indo-1, Fluo-3, DCFH, DHR, or SNARF.
  • Fluorescent proteins may include Y66H, Y66F, EBFP, EBFP2, Azurite, GFPuv, T-Sapphire, Cerulean, mCFP, ECFP, CyPet, Y66W, mKeima-Red, TagCFP, AmCyanl, mTFPl, S65A, Midoriishi Cyan, Wild Type GFP, S65C, TurboGFP, TagGFP, S65L, Emerald, S65T (Invitrogen), EGFP (Clontech), Azami Green (MBL), ZsGreenl (Clontech), TagYFP (Evrogen), EYFP (Clontech), Topaz, Venus, mCitrine, YPet, Turbo YFP, ZsYellowl (Clontech), Kusabira Orange (MBL), mOrange , m O, TurboRFP (Evrogen), tdTomato, TagRFP (Evrogen),
  • Nucleic acid analysis and evaluation includes various methods of amplifying, fragmenting, and/or hybridizing nucleic acids that have or have not been modified.
  • Methodologies are available for large scale sequence analysis.
  • the methods described exploit these genomic analysis methodologies and adapt them for uses incorporating the methodologies described herein.
  • the methods can be used to perform high resolution hydroxymethylation analysis on several thousand CpGs in genomic DNA. Therefore, methods may be directed to analysis of the hydroxymethylation status of a genomic DNA sample, comprising one or more of the steps:
  • the present methods allow for analyzing the hydroxymethylation status of all regions of a complete genome, where changes in hydroxymethylation status are expected to have an influence on gene expression. Due to the combination of bisulfite treatment, amplification and high throughput sequencing, it is possible to analyze the hydroxymethylation status of at least 1000 and preferably 5000 CpG islands in parallel.
  • DNA may be isolated from an organism of interest, including, but not limited to eukaryotic organisms and prokaryotic organisms, preferably mammalian organisms, such as humans.
  • Restrictions enzymes may be selected by a person skilled in the art using conventional bioinformatics approaches.
  • the selection of appropriate enzymes may have a substantial influence on the average size of fragments that ultimately will be generated and sequenced.
  • the selection of appropriate enzymes may be designed in such a way that it promotes enrichment of a certain fragment length. Thus, the selection may be adjusted to the kind of sequencing method which is finally applied. For most sequencing methods, a fragment length between 100 and 1000 by has been proven to be efficient. Therefore, in one embodiment, said fragment size range is from 100, 200 or 300 base pairs to 400, 500, 600, 700, 800, 900, or 1000 base pairs (bp), including all ranges and values there between.
  • the human genome reference sequence (NCBI Build 36.1 from March 2006; assembled parts of chromosomes only) has a length of 3,142,044,949 bp and contains 26,567 annotated CpG islands (CpGs) for a total length of 21,073,737 bp (0.67%).
  • a DNA sequence read hits a CpG if the read overlaps with the CpG by at least 50 bp.
  • the following enzymes or their isoschizomers can be used in certain aspects: Msel (TTAA), Tsp509 (AATT), Alul (AGCT), Nlalll (CATG), Bfal (CTAG), HpyCH4 (TGCA), Dpul (GATC), MboII (GAAGA), Mlyl (GAGTC), BCCI (CCATC).
  • Isoschizomers are pairs of restriction enzymes specific to the same recognition sequence and cut in the same location.
  • Embodiments include a CpG island enriched library produced from genomic DNA by digestion with several restriction enzymes that preferably cut within non-CpG island regions.
  • the restriction enzymes are selected in such a way that digestion can result in fragments with a size range between 300, 400, 500, 600 to 500, 600, 800, 900 bp or greater, including all ranges and values there between.
  • the library fragments may be ligated to adaptors. Subsequently, a conventional bisulfite treatment is performed according to methods that are well known in the art.
  • the 454 Genome Sequencer System supports the sequencing of samples from a wide variety of starting materials including, but not limited to, eukaryotic or bacterial genomic DNA. Genomic DNAs are fractionated into small, 100- to 1000-bp fragments with an appropriate specific combination of restriction enzymes which enriches for CpG island containing fragments.
  • the restriction enzymes used for a method according to certain aspects are selected from a group consisting of Msel, Tsp509, Alul, Nlalll, Bfal, HpyCH4, Dpul, Mboll, Mlyl, and BCCI, or any isoschizomer of any of the enzymes mentioned. Preferably, 4-5 different enzymes are selected.
  • the fragments Prior to ligation of the adaptors, the fragments can be completely double stranded without any single stranded overhang.
  • a fragment polishing reaction is performed using e.g. E. coli T4 DNA polymerase. In one embodiment, the polishing reaction is performed in the presence of hydroxymethyl-dCTP instead of dCTP. In another embodiment, the fragment polishing reaction is performed in the presence of a DNA polymerase which lacks proofreading activity, such as a DNA polymerase (Roche Applied Science Cat. No: 11 480 014 001).
  • the two different double stranded adaptors A and B are ligated to the ends of the fragments. Some or all of the C-residues of adaptors A and B can be hydroxymethyl-C residues.
  • the fragments containing at least one B adaptor are immobilized on a streptavidin coated solid support and a nick repair-fill-in synthesis is performed using a strand displacement enzyme such as Bst Polymerase (New England Biolabs).
  • Bst Polymerase New England Biolabs
  • said reaction is performed in the presence of hydroxymethyl -dCTP instead of dCTP.
  • the sample is amplified by means of performing a conventional PCR using amplification primers with sequences corresponding to the A and B adaptor sequences.
  • the bisulfite treated and optionally purified and/or amplified single-stranded DNA library is immobilized onto specifically designed DNA Capture Beads. Each bead carries a unique single-stranded DNA library fragment.
  • a library fragment can be amplified within its own microreactor comprised of a water-in-oil emulsion, excluding competing or contaminating sequences. Amplification of the entire fragment collection can be done in parallel; for each fragment, this results in a copy number of several million clonally amplified copies of the unique fragment per bead. After PCR amplification within the emulsion, the emulsion is broken while the amplified fragments remain bound to their specific beads.
  • Chromatin Immunoprecipitation is a method used to determine the location of DNA binding sites on the genome for a particular protein of interest, the target sequence. This technique gives a picture of the protein-DNA interactions that occur inside the nucleus of living cells or tissues. Histone methylation-specific antibiody (antibodies that recognize H3K4mel regions) may be used to isolate or enrich nucleic acids attached to such histones before or after the modification of cytosine variants.
  • DNA-binding proteins including transcription factors and histones
  • DNA-binding proteins including transcription factors and histones
  • DNA binding proteins including transcription factors and histones
  • the crosslinking is often accomplished by applying formaldehyde to the cells (or tissue), although it is sometimes advantageous to use a more defined and consistent crosslinker such as DTBP.
  • DTBP crosslinker
  • the cells are lysed and the DNA is broken into pieces 0.2-1 kb in length by sonication. At this point the immunoprecipitation is performed resulting in the purification of protein-DNA complexes.
  • the purified protein-DNA complexes are then heated to reverse the formaldehyde cross-linking of the protein and DNA complexes, allowing the DNA to be separated from the proteins.
  • the identity and quantity of the DNA fragments isolated can then be determined by PCR.
  • the limitation of performing PCR on the isolated fragments is that one must have an idea which genomic region is being targeted in order to generate the correct PCR target primers. This limitation is very easily circumvented simply by cloning the isolated genomic DNA into a plasmid vector and then using primers that are specific to the cloning region of that vector.
  • a DNA microarray can be used (ChIP-on- chip or ChlP-chip) allowing for the characterization of the cistrome.
  • ChlP- Sequencing has recently emerged as a new technology that can localize protein binding sites in a high-throughput, cost-effective fashion.
  • Microarray methods can be used in conjunction with the methods described herein for simultaneous testing of numerous genetic alterations of the human genome.
  • the subject matter described herein can also be used in various fields to greatly improve the accuracy and reliability of nucleic acid analyses, chromosome mapping, and genetic testing.
  • Selected chromosomal target elements can be included on the array and evaluated for 5fC and/or 5caC content in conjunction with hybridization to a nucleic acid array.
  • array such as a microarray used for comparative genomic hybridization (CGH)
  • CGH comparative genomic hybridization
  • 5fC and/or 5caC in genomic DNA fragments are specifically labeled using, for example, radio-labels, fluorescent labels or amplifiable signals. These labeled target DNA fragments may be then screened by hybridization using microarrays.
  • Methods may involve attaching a fluorescent tag to the 5fC and/or 5caC; and/or hybridizing to a probe containing a nucleotide labeled with a fluorescent tag that functions as a FRET partner to the first. If the labeled based in the probe is juxtaposed with the labeled 5fC or 5caC, a FRET signal will be observed.
  • this method involves using AC impedance as a measurement for the presence of 5fC and/or 5caC. Briefly, a nucleic acid probe specific for the sequence to be analyzed is immobilized on a gold electrode. The DNA fragment to be analyzed is added and allowed to hybridize to the probe. Excess non-hybridized, single- strand DNA is digested using nucleases. A label such as biotin may be covalently linked to the 5fC and/or 5caC using the methods of certain aspects either before or after hybridization.
  • Biotin-HRP is bound to the biotinylated DNA sequence then 4- chloronaphthol may be added. If the HRP molecule is bound to the hybridized target DNA near the gold electrode, the HRP oxidizes the 4-chloronaphthol to a hydrophobic product that absorbs to the electrode surface. This results in a higher AC impedance if 5fC and/or 5caC is present in the target DNA compared to a control sequence lacking 5fC and/or 5caC.
  • chromosomal DNA is prepared using standard karotyping techniques known in the art.
  • the 5hmC and/or 5caC in the chromosomal DNA may be labeled with a detectable moiety (fluorophore, radio-label, amplifiable signal) and imaged in the context of the intact chromosomes.
  • genomic DNA may be contacted with anti-5fC and/or anti-5caC antibody to immunoprecipitate 5fC- and/or 5caC-containing DNA fragments.
  • the immunoprecipitated DNA may be separated and subjected to either traditional bisulfite treatment or chemical-modification assisted bisulfite treatment (CAB).
  • CAB chemical-modification assisted bisulfite treatment
  • bisulfite -treated DNA may be tagged with appropriate adapters and amplified for sequencing.
  • an antibody or a fragment thereof that specifically binds to at least a portion of nucleic acids or proteins are contemplated. These antibodies can be used for enrichment of certain nucleic acids to prepare for any sequencing methods, particularly epigenetic sequencing such as the bisulfite-based CAB-seq.
  • the antibody is a monoclonal antibody or a polyclonal antibody. In some embodiments, the antibody is selected from the group consisting of a chimeric antibody, an affinity matured antibody, a humanized antibody, and a human antibody. In some embodiments, the antibody is an antibody fragment. In some embodiments, the antibody is a Fab, Fab', Fab'-SH, F(ab') 2 , or scFv. In certain embodiments, the antibody is recombinant or synthetic.
  • binding fragments suitable include, without limitation: (i) the Fab fragment, consisting of VL, VH, CL and CHI domains; (ii) the "Fd” fragment consisting of the VH and CHI domains; (iii) the "Fv” fragment consisting of the VL and VH domains of a single antibody; (iv) the "dAb” fragment, which consists of a VH domain; (v) isolated CDR regions; (vi) F(ab')2 fragments, a bivalent fragment comprising two linked Fab fragments; (vii) single chain Fv molecules (“scFv”), wherein a VH domain and a VL domain are linked by a peptide linker which allows the two domains to associate to form a binding domain; (viii) bi-specific single chain Fv dimers (see U.S.
  • Fv is a minimum antibody fragment which contains a complete antigen- binding site.
  • a two-chain Fv species consists of a dimer of one heavy- and one light-chain variable domain in tight, non-covalent association.
  • scFv single-chain Fv
  • one heavy- and one light-chain variable domain can be covalently linked by a flexible peptide linker such that the light and heavy chains can associate in a "dimeric" structure analogous to that in a two-chain Fv species.
  • variable domain interacts to define an antigen-binding site on the surface of the VH-VL dimer.
  • the six HVRs confer antigen-binding specificity to the antibody.
  • a single variable domain or half of an Fv comprising only three HVRs specific for an antigen has the ability to recognize and bind antigen, although at a lower affinity than the entire binding site.
  • the Fab fragment contains the heavy- and light-chain variable domains and also contains the constant domain of the light chain and the first constant domain (CHI) of the heavy chain.
  • Fab' fragments differ from Fab fragments by the addition of a few residues at the carboxy terminus of the heavy chain CHI domain including one or more cysteines from the antibody hinge region.
  • Fab'-SH is the designation herein for Fab' in which the cysteine residue(s) of the constant domains bear a free thiol group.
  • F(ab') 2 antibody fragments originally were produced as pairs of Fab' fragments which have hinge cysteines between them. Other chemical couplings of antibody fragments are also known.
  • Single-chain Fv or “scFv” antibody fragments comprise the VH and VL domains of antibody, wherein these domains are present in a single polypeptide chain.
  • the scFv polypeptide further comprises a polypeptide linker between the VH and VL domains which enables the scFv to form the desired structure for antigen binding.
  • scFv see, e.g., Pluckthun, in The Pharmacology of Monoclonal Antibodies, vol. 113, Rosenburg and Moore eds., (Springer- Verlag, New York, 1994), pp. 269-315.
  • the term "monoclonal antibody” as used herein refers to an antibody obtained from a population of substantially homogeneous antibodies, i.e., the individual antibodies comprising the population are identical except for possible mutations, e.g., naturally occurring mutations, that may be present in minor amounts. Thus, the modifier “monoclonal” indicates the character of the antibody as not being a mixture of discrete antibodies.
  • such a monoclonal antibody typically includes an antibody comprising a polypeptide sequence that binds a target, wherein the target-binding polypeptide sequence was obtained by a process that includes the selection of a single target binding polypeptide sequence from a plurality of polypeptide sequences.
  • the selection process can be the selection of a unique clone from a plurality of clones, such as a pool of hybridoma clones, phage clones, or recombinant DNA clones.
  • a selected target binding sequence can be further altered, for example, to improve affinity for the target, to improve its production in cell culture, to reduce its immunogenicity in vivo, to create a multispecific antibody, etc., and that an antibody comprising the altered target binding sequence is also a monoclonal antibody.
  • each monoclonal antibody of a monoclonal antibody preparation is directed against a single determinant on an antigen.
  • monoclonal antibody preparations can be advantageous in that they are typically uncontaminated by other immunoglobulins.
  • the modifier "monoclonal" indicates the character of the antibody as being obtained from a substantially homogeneous population of antibodies, and is not to be construed as requiring production of the antibody by any particular method.
  • the monoclonal antibodies to be used in accordance with certain aspects may be made by a variety of techniques, including, for example, the hybridoma method (e.g., Kohler and Milstein, Nature, 256:495-97 (1975); Hongo et al, Hybridoma, 14 (3): 253-260 (1995), Harlow et al., Antibodies: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, 2nd ed. 1988); Hammerling et al., in: Monoclonal Antibodies and T-Cell Hybridomas 563-681 (Elsevier, N.Y., 1981)), recombinant DNA methods (see, e.g., U.S. Pat. No.
  • phage-display technologies see, e.g., Clackson et al, Nature, 352: 624-628 (1991); Marks et al, J. Mol. Biol. 222: 581-597 (1992); Sidhu et al, J. Mol. Biol. 338(2): 299-310 (2004); Lee et al, J. Mol. Biol. 340(5): 1073-1093 (2004); Fellouse, PNAS USA 101(34): 12467-12472 (2004); and Lee et al, J. Immunol. Methods 284(1-2): 119-132 (2004).
  • polyclonal indicates the character of the antibody as being obtained from a source of a nonhomogeneous population of antibodies.
  • a polyclonal antibody comprises more than one antibody, such as 1,2,3,4,5,6,7,8,9 or 10 antibodies.
  • Animals may be inoculated with a target antigen, such as 5fC or 5caC, in order to produce antibodies specific for 5fC or 5caC. Frequently an antigen is bound or conjugated to another molecule to enhance the immune response.
  • a conjugate is any peptide, polypeptide, protein or non-proteinaceous substance bound to an antigen that is used to elicit an immune response in an animal.
  • Antibodies produced in an animal in response to antigen inoculation comprise a variety of non-identical molecules (polyclonal antibodies) made from a variety of individual antibody producing B lymphocytes.
  • polyclonal or monoclonal antibodies, binding fragments and binding domains and CDRs may be created that are specific for an antigen, one or more of its respective epitopes, or conjugates of any of the foregoing, whether such antigens or epitopes are isolated from natural sources or are synthetic derivatives or variants of the natural compounds.
  • Antibodies may be produced from any animal source, including birds and mammals.
  • the antibodies are ovine, murine (e.g., mouse and rat), rabbit, goat, guinea pig, camel, horse, or chicken.
  • Methods for producing polyclonal antibodies in various animal species, as well as for producing monoclonal antibodies of various types, including humanized, chimeric, and fully human, are well known in the art and highly predictable.
  • Prokaryotic or eukaryotic cells can be used as expression hosts. Expression in eukaryotic host cells is preferred because such cells are more likely than prokaryotic cells to assemble and secrete a properly folded and immunologically active antibody. However, any antibody produced that is inactive due to improper folding may be renaturable according to well-known methods (Kim and Baldwin, 1982). It is possible that the host cells will produce portions of intact antibodies, such as light chain dimers or heavy chain dimers, which also are antibody homo logs.
  • a host cell is transformed with DNA encoding either the light chain or the heavy chain (but not both) of an antibody homolog.
  • Recombinant DNA technology may also be used to remove some or all of the DNA encoding either or both of the light and heavy chains that is not necessary for binding to the antigen.
  • the molecules expressed from such truncated DNA molecules are antibody homo logs.
  • kits for modifying cytosine bases of nucleic acids and/or subjecting such modified nucleic acids to further analysis can include one or more of a modification agent(s), a labeling reagent for detecting or modifying a 5fC and/or a 5caC, and, if desired, a substrate that contains or is capable of attaching to one or more modified 5fC and/or 5caC.
  • the substrate can be, e.g., a microsphere, antibody, or other binding agent.
  • Each kit may include a 5fC and/or a 5caC modifying agent or agents, e.g., a compound comprising a hydroxy lamine group or an amine group for 5fC or a compound comprising an amine group or a thiol group for 5caC or any other modification moiety, etc.
  • One or more reagent is preferably supplied in a solid form or liquid buffer that is suitable for inventory storage, and later for addition into the reaction medium when the method of using the reagent is performed.
  • Each kit may comprise a test nucleic acid and/or a control nucleic acid.
  • kits may optionally provide additional components that are useful in the procedure. These optional components include buffers, capture reagents, developing reagents, labels, reacting surfaces, means for detection, control samples, instructions, and interpretive information.
  • the optional components may also include nucleic-acid specific antibodies for enrichment of certain nucleic acids, such as anti- 5fC or anti-5caC antibodies or fragments thereof. These optional components may be separately packaged.
  • Each kit may also include additional components that are useful for amplifying the nucleic acid, or sequencing the nucleic acid, or other applications as described herein.
  • the kit may optionally provide additional components that are useful in the procedure. These optional components include buffers, capture reagents, developing reagents, labels, reacting surfaces, means for detection, control samples, instructions, and interpretive information.
  • the kit may optionally include a detectable label or compound, coupling agents, and, if desired, reagents for detecting the nucleic acid comprising the label or compound.
  • Tdg knockout mESC The TDG ES cell lines are derived from the inner cell mass of Td ⁇ , Td ⁇ v' , and Tdg 1' embryos, which were generated from transferring a nucleus of or Tdg '1' iPS cells into an enucleated oocyte.
  • TDG-null iPS cell lines To establish TDG-null iPS cell lines, a genomic region encompassing exons 3-7 coding for amino acids 67-275 was flanked by LoxP sites and mice with an J3 ⁇ 4g ⁇ /+ genotype were obtained. Insertions of LoxP sites have no effect on the expression of the Tdg gene.
  • mice were intercrossed, and embryonic fibroblasts with double floxed Tdg were isolated from El 3.5 embryos on a mixed C57BL/6-ICR genetic background and were induced to reprogramming with standard factors Oct4, Sox2, and Klf4. Tdg was then inactivated by Cre- mediated deletion to get Td ⁇ 1' and Tdg 1' iPS cells.
  • TDG KO ES cells nuclear transfer was performed with the TDG-null iPS cells and enucleated Mil oocytes. Reconstructed embryos were cultured in KOSM medium and allowed to develop to blastocyst stage for derivation of ES cell lines, according to the standard protocol of ES cell establishment.
  • mESCs were cultured as previously described (Yu et al, 2012). [00199] mES cell culture and differentiation. mESCs were cultured in feeder-free gelatin-coated plates in Dulbecco's Modified Eagle Medium (DMEM) (Invitrogen Cat. No. 11995) supplemented with 15% FBS (GIBCO), 2 mM L-glutamine (GIBCO), 0.1 mM 2- mercaptoethanol (Sigma), 1 xnonessential amino acids (GIBCO), 1,000 units/ml LIF (Millipore Cat. No.
  • DMEM Dulbecco's Modified Eagle Medium
  • ESG110-7 ESG1107), l xpen/strep (GIBCO), 3 mM CHIR99021 (Stemgent), and 1 mM PD0325901 (Stemgent).
  • GEBCO l xpen/strep
  • 3 mM CHIR99021 Stemgent
  • 1 mM PD0325901 Stemgent
  • mESC genomic DNA was prepared as previously described (Song et al, 2011).
  • Oligonucleotides containing 5hmC, 5fC, and 5caC were prepared by using Applied Biosystems 392 DNA synthesizer with phosphoramidites from Glen Research.
  • 5hmC, 5fC, and 5caC deoxynucleoside standards were prepared as previously described (Dai and He, 2011; Dai et al, 2011; Globisch et al, 2010).
  • ⁇ -glucosyltransferase (PGT) was prepared as previously described (Song et al, 2011).
  • Lambda dCTP, dmCTP, dhmCTP, dfCTP, and dcaCTP amplicon enrichment test from genomic DNA was PCR amplified by HotStarTaq DNA Polymerase (Qiagen) and purified by gel electrophoresis in non-overlapping 2 kb amplicons, with a cocktail of dATP/dGTP/dTTP and one of the following: dCTP at genomic positions 40-42 kb, dmCTP at genomic positions 0-10 kb, 10% dhmCTP (Bioline) and 90% dCTP at genomic positions 42-44 kb, 10% dfCTP (Trilink) and 90% dCTP at genomic positions 44-46 kb, and 10% dcaCTP (Trilink) and 90% dCTP at genomic positions 46-48 kb.
  • HotStarTaq DNA Polymerase Qiagen
  • 0.2 ng of each spike -in DNA was added into 100 ⁇ g mESC genomic DNA and sonicated to 200-600 bp. 45 ⁇ g was taken for fC-Seal and the corresponding methanol control, respectively, as described above. 10 ⁇ g was taken for hMe- Seal as described above.
  • qPCR validation was run in triplicate 20 ⁇ ⁇ reactions each with 1 x PowerSYBR Green PCR Master Mix (ABI), 0.5 ⁇ forward and reverse primers, 1 ng template and water. Reactions were run on an ABI 7500 Fast Instrument using the standard cycling conditions with primers sequences are as follows: 40-42 kb (dCTP) FW- CGGGAATGGCTTTGTGGTAA (SEQ ID NO.
  • RV-AATTCGCCTACACGCATCCT (SEQ ID NO. 2), 0-10 kb (dmCTP) FW-AGTGGAGCAAGCGTGACAAGT (SEQ ID NO. 3), RV-CAGCGCGTAGGCTTCGA (SEQ ID NO. 4), 42-44 kb (dhmCTP) FW- TGAATGCCGGGAATGGTTT (SEQ ID NO. 5), RV-TGGAGAGCACCACCACTGATT (SEQ ID NO. 6), 44-46 kb (dfCTP) FW-CCGATTCCGCCTAGTTGGT (SEQ ID NO. 7), RV-TGCCTGCGATGGTTGGA (SEQ ID NO. 8), 46-48 kb (dcaCTP) FW- CTGCGCCGCCACAAA (SEQ ID NO. 9), RV-CTGGAATTGGGCAGAAGAAAAC (SEQ ID NO. 10).
  • the nucleosides were separated by reverse phase ultra-performance liquid chromatography on a C18 column, with online mass spectrometry detection using Agilent 6410 QQQ triple-quadrupole LC mass spectrometer set to multiple reaction monitoring (MRM) in positive electrospray ionization mode.
  • the nucleosides were quantified using the nucleoside to base ion mass transitions of 258 to 142 (5hmC), 256 to 140 (5fC), and 228 to 112 (C). Quantification and detection limits were determined by comparison with the standard curves obtained from nucleoside standards running at the same volume and time.
  • Libraries were sequenced using the Illumina HiScan platform. Cluster generation was performed with Illumina TruSeq cluster kit v2-cBot- HS. Single-read 51 -bp sequencing was completed with Illumina TruSeq SBS kit v3-HS. A dedicated PhiX control lane, as well as 1% PhiX spike in all other lanes, were used for automated matrix and phasing calculations. Image analysis and base calling were performed with the standard Illumina pipeline. Libraries were prepared and sequenced from two biological replicates per genotype from 5fC- and 5hmC-enriched DNA. In parallel, genotype matched non-enriched input genomic DNA libraries were generated and sequenced.
  • mTETl sites were defined as previously described (Yu et al, 2012) by merging mTETl enrichment profiles derived from using 3 distinct mTETl antibodies (Williams et al, 2011; Wu et al, 2011).
  • Raw mTETl ChlP-Seq sequence reads from both studies (SRA accessions: SRR070927, SRR070925, SRR096330, SRR096331) were aligned to NCBIvl/mm9 and monoclonal reads from each were combined into a single set.
  • Peaks were identified against the combined set of IgG control monoclonal reads (SRA accessions: SRR070931, SRR096334, SRR096335), as well as monoclonal reads from mESC input genomic DNA using a standard MACS analysis (Zhang et al, 2008).
  • CTCF and enhancer regions H3K4mel regions without H3K4me3
  • DNasel Hypersensitivity sites were downloaded from UCSC goldenPath ENCODE mouse datasets (Kent et al, 2002; Myers et al, 2011).
  • ENCODE data sources and accessions are as follows: CTCF (LICR, Ren Lab, wgEncodeEM001703), H3K4mel (LICR, Ren Lab, wgEncodeEM001681, NCBI GEO Accession GSM769009), H3K4me3 (LICR, Ren Lab, wgEncodeEM001682, NCBI GEO Accession GSM769009), H3K27ac (LICR, Ren Lab, wgEncodeEM240097), DNasel Hypersensitivity/DNase-Seq (UW, Stamatoyannopoulos Lab, wgEncodeEM001728).
  • TSSs, exons, intragenic, and intergenic regions were derived from the UCSC RefSeq transcript tables associated with NCBIvl/mm9.
  • mESC UMRs, LMRs, and FMRs were obtained from (Stadler et al., 2011).
  • mESC enhancers and enhancers predicted as linked to promoters were downloaded from http://chromosome.sdsc.edu/mouse/download.html as part of (Shen et al, 2012). Enhancers from the list of enhancer promoter pairs were treated as 1 kb intervals for analyses.
  • RNA-Seq expression libraries were generated from duplicate samples per genotype using the Illumina TruSeq RNA Sample Preparation Kit v2. Libraries were sequenced using the Illumina HiScan platform. Cluster generation was performed with Illumina TruSeq cluster kit v2-cBot- HS. Single-read 51-bp sequencing was completed with Illumina TruSeq SBS kit v3-HS. Single-read 51-bp sequencing was completed with Illumina TruSeq SBS kit v3-HS. A dedicated PhiX control lane, as well as 1% PhiX spike in all other lanes, were used for automated matrix and phasing calculations. Image analysis and base calling were performed with the standard Illumina pipeline.
  • RNA-Seq reads were aligned using tophat- 1.4.1 (Trapnell et al., 2009) and RPKM expression values were extracted using cufflinks- 1.3.0 (Trapnell et al, 2010) using RefSeq gene models.
  • Chromatin Immunoprecipitation ChlP-seq experiments were performed following the protocol from the laboratory of Richard M. Myers (http://myers.hudsonalpha.org/documents/Myers%20Lab%20ChIP- seq%>20Protocol%>20v041610.pdf). Briefly, cells were cross-linked with 1%> formaldehyde at 25 °C for 10 min and sonicated to generate chromatin fragments of 100-500 bp. Chromatin fragments from 2x 10 cells were immunoprecipitated using 5 ⁇ g of the p300 antibody (Santa Cruz, C-20, sc-585) or 5 ug H3K4mel antibody (Abeam ab8895). ChlP-seq library construction and Illumina sequencing were performed as described above.
  • Hydroxylamine protection of 5fC for bisulfite sequencing was performed in 100 mM MES buffer (pH 5.0), 10 mM O-ethylhydroxylamine (Aldrich, 274992), and 100 ng/ ⁇ 76mer double-stranded synthetic DNA or sonicated genomic DNA (average 400 bp), or ChlP'd DNA for 2 h at 37 °C.
  • the DNA substrates were purified by Qiagen nucleotide removal kit and subjected to the sodium bisulfite treatment by using EpiTect Bisulfite Kits (Qiagen) following the manufacturers' instructions except the bisulfite thermal cycle program was run either once or twice or high- throughput bisulfite amplicon sequencing.
  • hydroxylamine-based 5fC labeling could give rise to side products through reacting with abasic sites and possibly even normal cytosines (Munzel et al., 2010) which may complicate the pull-down approach (Raiber et al., 2012), it is shown that hydroxylamine-protected 5fC inhibits bisulfite-catalyzed deamination.
  • Products from the bisulfite treatment were PCR amplified by HotStarTaq DNA Polymerase (Qiagen) using the following primers: Forward: 5'- CCCTTTTATTATTTTAATTAATATTATATT-3' (SEQ ID NO. 12) Reverse: 5'-
  • PCR conditions consisted of an initial denaturation step of 95 °C for 16 min, followed by 48 cycles of 94 °C for 30 s, 45 °C for 30 s and 72 °C for 1 min, and a final extension at 72 °C for 7 min.
  • PCR products were purified using Qiagen PCR purification kit and subjected to Sanger sequencing with the following primer: 5'-CTCCGACATTATCACT-3' (SEQ ID NO. 14).
  • RNA-Seq 5fC-enriched DNA, 5hmC-enriched DNA, non-NaBH 4 control DNA, and non-enriched sonicated input genomic DNA was end- repaired, adenylated, and ligated to Illumina Genomic DNA Adapters (Genomic DNA adapter oligo mix) according to standard Illumina protocols for ChlP-Seq library construction. Libraries were sequenced using the Illumina HiScanSQ platform. Cluster generation was performed with Illumina TruSeq cluster kit v2-cBot-HS. Single-read 51 -bp sequencing was completed with Illumina TruSeq SBS kit v3-HS. [00211] RNA-Seq.
  • RNA-Seq libraries were generated from duplicate samples per genotype using the Illumina TruSeq RNA Sample Preparation Kit v2. Libraries were sequenced using the Illumina HiScanSQ platform according to standard Illumina protocols. RNA-Seq reads were aligned using tophat- 1.4.1 (Trapnell et al., 2009) and RPKM expression values were extracted using cufflinks- 1.3.0 (Trapnell et al, 2010) using RefSeq gene models. [00212] High-throughput bisulfite amplicon sequencing for fCAB-Seq. For 76mer model DNA, dsDNA was end-repaired, adenylated, and ligated to methylated adapters as described in detail below.
  • Bisulfite-treated DNA was PCR amplified for 7-10 cycles with PfuTurbo Cx Hotstart DNA polymerase as follows in a 50 reaction: IX PfuTurbo Cx Reaction buffer, 0.25 mM dNTP Mix, 5 uL Illumina TruSeq PCR Primer Cocktail, 2.5 U PfuTurbo Cx Hotstart polymerase, 95 °C 5 min, 98 °C 30 sec, 7-10 cycles of 98 °C 10 sec, 65 °C 30 sec, 72 °C 30, followed by 5 min at 72 °C.
  • PCR-amplified libraries were purified on a Qiagen MinElute column, quantified by qPCR (KAPABiosystems library quant kit for Illumina libraries), and diluted to 8 pM. Multiplex sequencing was performed on the Illumina MiSeq platform using 50 cycle Illumina MiSeq Reagent Kits. Libraries derived from bisulfite-treated 76mer dsDNA were mixed at 60:40 v/v with a generic genomic DNA library (8 pM) to ensure proper AJC and T/G channel balance during sequencing.
  • Genomic DNA was used directly in bisulfite treatment for non-hydroxylamine controls or protected by hydroxylamine and bisulfite-treated as described above.
  • Loci specific primers listed in Table 1 were used at 0.2 ⁇ in 50 reactions as described above for the 76mer adapter ligated, bisulfite treated DNA, except that the primer annealing temperature was 55 °C, and 40 amplification cycles were run.
  • PCR amplicons were quantified by Quant-it pico green assay, normalized by weight, and pooled for each genotype and treatment.
  • Adapter ligated amplicons were PCR amplified for 18 cycles, purified on a Qiagen MinElute Column, and quantified by qPCR (KAPABiosystems library quant kit for Illumina libraries). Amplicon libraries were diluted to 8 pM, pooled at equal volume and sequenced on an Illumina MiSeq platform (2X 101-cycles paired end reads, with index read) using 300 cycle MiSeq Reagent Kits. 8 pM amplicon pools were mixed 60:40v/v with a generic genomic DNA library (8 pM) to ensure proper AJC and T/G channel balance during sequencing.
  • Paired-end reads were first pre-processed to remove adapter sequences, as well as low quality sequence on both the 3' and 5' ends using Trimmomatic 0.20 (Lohse et al, 2012), with the following parameters: LEADINGS TRAILING: 3 SLIDINGWINDOW:4: 15 MINLEN:36. Pre-processed reads were then aligned to both C to T and G to A converted chromosomes that were computationally derived from NCBI mm9 genomic sequence using Bowtie 0.12.9 (-m 1 -1 30 -n 0 -e 90 -X 550).
  • the genomic 5fC content was estimated to be -6% of 5hmC in Td ⁇ 1/fl mESCs and -15% of 5hmC in Tdg mESCs. Therefore, the identification of extremely low abundance 5fC at single-base resolution requires sufficiently high sequencing depth to detect abundances well below 20%, or the average reported abundance of 5hmC (Yu et al., 2012).
  • hydroxylamine treatment serves to protect 5fC from deamination, a higher number of cytosine reads in the hydroxylamine treatment than the conventional bisulfite treatment is expected to be observed.
  • H3K4mel-ChIP-Methyl-Seq and H3K4mel-ChIP-fCAB-Seq H3K4mel DNA ChlP'd was obtained as described above for p300, using Abeam, ab8895 H3K4mel antibody. ChlP'd DNA was subjected to fCAB treatment and subsequently bisulfite converted in parallel with untreated ChlP'd DNA as described above for amplicon sequencing. Bisulfite treated DNA was end-repaired, adenylated, and ligated to methylated Illumina TruSeq Adapters as previously described (Yu et al, 2012).
  • fCAB-Seq was applied to Td ⁇ fl and Tdg' ⁇ mESCs genomic DNA, and subjected the bisulfite amplicons to high-throughput sequencing in order to achieve sequencing depths sufficient to distinguish low abundance hydroxylamine-protected 5fC from the conventional bisulfite signals ( ⁇ 1,000X or higher coverage).
  • the presence and accumulation of specific 5fC sites in genomic DNA within five 5fC-marked endogenous loci (from fC-Seal) displaying TDG-dependent accumulation of 5fC (FIGS. 1B- 1C, and FIGS. 2E-G, p ⁇ 0.005) was validated. Also confirmed was that hydroxylamine does not alter the behavior of cytosine in bisulfite sequencing (FIG. 2H).
  • a ChlP-fCAB-Seq approach was next employed by capturing H3K4mel- bound DNA via chromatin immunoprecipitation, in which 5fC-marked regions defined by fC-Seal are enriched, and then subjected the captured DNA to either conventional bisulfite (Brmkman et al, 2012; Statham et al, 2012) (H3K4mel-ChIP-Methyl-Seq) or fCAB (H3K4mel-ChIP-fCAB-Seq) treatments, followed by sequencing (FIG. ID).
  • the percentage of cytosine bases protected from deamination in each treatment within poised and active enhancers predicted to be linked to promoters was then quantified, as such enhancers display the most significant enrichment of 5fC-marked regions in normal mESCs.
  • ⁇ / ⁇ mESCs although more variability was observed in H3K4mel-ChIP-Methyl-Seq and H3K4mel-ChIP-fCAB-Seq signals, the increases in H3K4mel-ChIP-fCAB-Seq relative to H3K4me-ChIP-Methyl-Seq were reduced in comparison to those in Tdg' ⁇ mESCs.
  • CAB-Seq Chemical modification-Assisted Bisulfite Sequencing
  • Some embodiments include: 1.) Upon chemical reaction of 5fC with ethylhydroxylamine or any hydroxylamine the protected 5fC cannot be effectively deaminated under standard bisulfite conditions. 2.) Similarly, upon reaction of 5caC with an amine group through l-ethyl-3-[3- dimethylaminopropyl]carbodiimide hydrochloride (EDC)-mediated coupling to form an amide the protected 5caC cannot be effectively deaminated under standard bisulfite conditions. Since both 5fC and 5caC undergo effective deamination under standard bisulfite conversion, single-base resolution sequencing before and after their chemical protection allows base-resolution sequencing of these two bases, respectively. The strategy is shown in FIG. 3 and is described below.
  • Genomic DNA sample is divided into two identical portions.
  • the chemically modified genomic DNA with 5fC converted to 5-Et-fC can be subjected to TET -mediated oxidation.
  • 5mC and 5hmC are oxidized to 5caC which reads as T under bisulfite sequencing.
  • the only base that remains as C is 5-Et-fC. In this way, 5fC can be directly read in sequencing.
  • Genomic DNA sample is divided into two identical portions.
  • CAB-Seq of 5caC reads 5mC+5hmC+5caC as C.
  • Conventional bisulfite sequencing reads 5mC+5hmC as C.
  • CAB-Seq reads 5mC+5hmC+5caC as C.
  • a subtraction of the two sequencing results done on the same sample provides the base-resolution identification of the modified 5caC with relative abundance at each modification sites.
  • the chemically modified genomic DNA with 5caC converted to an amide -modified 5caC can be subjected to TET-mediated oxidation.
  • 5mC and 5hmC are oxidized to 5caC which reads as T under bisulfite sequencing.
  • Unprotected 5fC reads as T under bisulfite sequencing as well.
  • the only base that remains as C is the amide-modified 5caC. In this way, 5caC can be directly read in sequencing.
  • 5-Methylcytosine (5mC) the well-known DNA epigenetic modification, plays crucial roles in biological functions and various diseases. Numerous work and studies have been applied to this fifth base. Recently, people discovered that 5mC can be oxidized to 5- hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) catalyzed by mammalian methylcytosine dioxygenases (TET1, TET2 and TET3). Chemically, 5-carboxylcytosine (5caC) may be the last attractive key intermediate involved in the demethylation process. There has been no technology developed for sequencing of 5caC. Therefore, a new strategy is presented to selectively label and sequence the genome wide 5caC for clinical or other applications in an economic and highly efficient/robust way.
  • 5caC can be selectively labelled with an amine or thiol group using EDC chemistry (FIG. 6).
  • the amine-based amide product is stable while the thiol-based product thiol ester can be reversed under basic condition.
  • the method was optimized to gain high labeling yield.
  • the xylene structure provides high coupling efficiency and the functional group for further biotinylation (FIG. 7). For instance, the use of structure Id with an azide attached to the labeling group ((4-aminomethyl)benzylazide) allows attachment of an azide to the labeled 5caC.
  • a biotin group can be installed for selective detection and pull-down for profiling or single-base resolution sequencing of 5caC using CAB-Seq.
  • This demonstrates a model system on the synthetic 5 caC -containing DNA. Based on Maldi-TOF Mass spectrometry, the labeling process is demonstrated (FIG. 8).
  • the amine-labeled 5caC is found to be protected from bisulfite-mediated deamination using standard bisulfite treatment and still reads as C in Sanger sequencing (FIG. 9). During traditional bisulfite treatment, the 5caC goes through deamination and reads as T in the following sequencing. Therefore, the Chemical modification- Assisted Bisulfite Sequencing (CAB-Seq) of 5caC enables sequencing of the 5caC at single-base resolution.
  • CAB-Seq Chemical modification- Assisted Bisulfite Sequencing
  • biotin tagged 5caC containing oligo can be enriched from the genomic
  • the 5caC peaks in genomic DNA can be profiled (FIG. 10). Besides direct profiling, processing the amine and thiol pull-down in parallel can also be done.
  • the thiol pull-downed DNA can be reversed back to 5caC under basic conditions (0.5 M NaOH, 60°C, 3 hours). After bisulfite treatment, the 5caC in the amine pull-down will read as C while it will read as T in the thiol-based pull-down and deprotection. By comparison of the two pull-downed samples, the 5caC sites at base resolution can be read in an enriched manner which would be much more cost effective than the whole genome bisulfite.
  • the aldehyde group in 5fC can react with amine to form imine, which is known as a Schiff base; however, the equilibrium generally favors the dissociation to the aldehyde and amine direction (Table 2, entry 1-2).
  • Imine is generally an unstable compound which hydrolyzes rapidly back into aldehyde and amine in neutral or acidic pH (Table 2, entry 2).
  • Various amine-containing compounds and different reaction conditions with 5fC were tried and it was found that diamine reacted with 5fC much more efficiently than monoamine, especially diamine connected with a two-carbon linker (compare entry 3 to entry 4 in Table 2), likely due to the formation of the cyclic aminal (Table 2, entry 3-4).
  • Conversion rate immediately Conversion rate 3 h after Entry Amine . 3 . , . . . , .
  • the 5caC containing DNA can be labeled and pulled down in parallel by amine and thiol, respectively (FIG. 15A).
  • the thiol ester is cleaved but the amide protection remains.
  • Bisulfite sequencing is applied to both samples (caCAB-Seq), the comparison of results from these two enriched samples (thiol-based sample converts to U/T while caCAB-Seq with amine-based protection remains C) provides the single-base resolution detection of 5caC sites.
  • Table 3 Reaction efficiency of xylene -based and linear thiol with 5caC catalyzed by EDC.
  • 5 caC -containing DNA can be pulled-down using only thiol-based labeling (FIG. 15B).
  • the thiol ester bond can be cleaved with NaOH treatment (FIGS. 16 and 17).
  • Half of the enriched sample is protected with amine to form amide (FIGS. 16 and 17) with the other half remain as caC. Then the 5caC site after bisulfite- seqof both samples can be detected and the results compared.
  • DNA and 1 ⁇ ⁇ (anti-5fC) or 0.5 ⁇ ⁇ (anti-5caC) of antiserum was used, and 1 ⁇ ⁇ of rabbit IgG was used to immunoprecipitate 5fC- or 5caC-containing DNA fragments.
  • the immunoprecipitated DNA was separated and subjected to either traditional bisulfite treatment or chemical-modification assisted bisulfite treatment (CAB).
  • CAB chemical-modification assisted bisulfite treatment
  • 50 ng of bisulfite-treated DNA was tagged with Illumina compatible adapter and amplified to an appropriate library concentration using EpiGnome Methyl-Seq kit (Epicentre- Illumina). Libraries were checked for quality and quantified using an Agilent 2100 Bioanalyzer DNA 1000 Chip.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne, de façon générale, le domaine de la biologie moléculaire. L'invention concerne, plus précisément, des procédés et des compositions permettant de détecter, évaluer et/ou cartographier différentes formes de bases de type cytosine au sein d'une molécule d'acide nucléique.
PCT/US2014/032997 2013-04-05 2014-04-04 Séquençage avec une résolution de l'ordre de la base de la 5-formylcytosine (5fc) et de la 5-carboxylcytosine (5cac) WO2014165770A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361809103P 2013-04-05 2013-04-05
US61/809,103 2013-04-05

Publications (1)

Publication Number Publication Date
WO2014165770A1 true WO2014165770A1 (fr) 2014-10-09

Family

ID=51659233

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/032997 WO2014165770A1 (fr) 2013-04-05 2014-04-04 Séquençage avec une résolution de l'ordre de la base de la 5-formylcytosine (5fc) et de la 5-carboxylcytosine (5cac)

Country Status (1)

Country Link
WO (1) WO2014165770A1 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018036537A1 (fr) * 2016-08-24 2018-03-01 中国科学院上海生命科学研究院 Nouvelle modification de 5-méthylcytosine catalysée par enzyme cmd1 et une application associée
CN111971386A (zh) * 2018-01-08 2020-11-20 路德维格癌症研究院 胞嘧啶修饰的免亚硫酸氢盐的碱基分辨率鉴定
WO2021005537A1 (fr) 2019-07-08 2021-01-14 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de méthylation du génome entier sans bisulfite
CN112858419A (zh) * 2021-02-26 2021-05-28 山东农业大学 一种基于钙钛矿和黑色二氧化锆构建光电化学传感器检测5-羟甲基胞嘧啶的方法
WO2021161192A1 (fr) 2020-02-11 2021-08-19 The Chancellor, Masters And Scholars Of The University Of Oxford Séquençage d'acide nucléique à lecture longue cible pour la détermination de modifications de cytosine
US11130991B2 (en) 2017-03-08 2021-09-28 The University Of Chicago Method for highly sensitive DNA methylation analysis
WO2022053872A1 (fr) 2020-09-14 2022-03-17 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de modifications de cytosine
US11530441B2 (en) 2018-07-27 2022-12-20 The University Of Chicago Methods for the amplification of bisulfite-treated DNA
CN116287166A (zh) * 2023-04-19 2023-06-23 纳昂达(南京)生物科技有限公司 甲基化测序接头及其应用
WO2024015800A3 (fr) * 2022-07-11 2024-04-04 The University Of Chicago Procédés et compositions de modification et de détection de 5-méthylcytosine
US11999998B2 (en) 2016-08-24 2024-06-04 Center For Excellence In Molecular Cell Science, Chinese Academy Of Sciences Modification of 5-methylcytosine catalyzed by Cmd1 enzyme and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012138973A2 (fr) * 2011-04-06 2012-10-11 The University Of Chicago Composition et procédés se rapportant à la modification de la 5-méthylcytosine (5-mc)

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012138973A2 (fr) * 2011-04-06 2012-10-11 The University Of Chicago Composition et procédés se rapportant à la modification de la 5-méthylcytosine (5-mc)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HU ET AL.: "Selective chemical labelling of 5-formylcytosine in DNA by fluorescent dyes.", CHEMISTRY, vol. 19, no. 19, 19 March 2013 (2013-03-19), pages 5836 - 5840 *
ITO ET AL.: "Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5- carboxylcytosine.", SCIENCE, vol. 333, no. 6047, 2 September 2011 (2011-09-02), pages 1300 - 1303 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11999998B2 (en) 2016-08-24 2024-06-04 Center For Excellence In Molecular Cell Science, Chinese Academy Of Sciences Modification of 5-methylcytosine catalyzed by Cmd1 enzyme and application thereof
WO2018036537A1 (fr) * 2016-08-24 2018-03-01 中国科学院上海生命科学研究院 Nouvelle modification de 5-méthylcytosine catalysée par enzyme cmd1 et une application associée
US11130991B2 (en) 2017-03-08 2021-09-28 The University Of Chicago Method for highly sensitive DNA methylation analysis
US11306355B2 (en) 2018-01-08 2022-04-19 Ludwig Institute For Cancer Research Ltd Bisulfite-free, base-resolution identification of cytosine modifications
CN111971386A (zh) * 2018-01-08 2020-11-20 路德维格癌症研究院 胞嘧啶修饰的免亚硫酸氢盐的碱基分辨率鉴定
US11987843B2 (en) 2018-01-08 2024-05-21 Ludwig Institute For Cancer Research, Ltd Bisulfite-free, base-resolution identification of cytosine modifications
US11959136B2 (en) 2018-01-08 2024-04-16 Ludwig Institute For Cancer Research, Ltd Bisulfite-free, base-resolution identification of cytosine modifications
US11905555B2 (en) 2018-07-27 2024-02-20 The University Of Chicago Methods for the amplification of bisulfite-treated DNA
US11530441B2 (en) 2018-07-27 2022-12-20 The University Of Chicago Methods for the amplification of bisulfite-treated DNA
EP4306652A2 (fr) 2019-07-08 2024-01-17 Ludwig Institute for Cancer Research Ltd. Analyse de méthylation du génome entier sans bisulfite
WO2021005537A1 (fr) 2019-07-08 2021-01-14 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de méthylation du génome entier sans bisulfite
WO2021161192A1 (fr) 2020-02-11 2021-08-19 The Chancellor, Masters And Scholars Of The University Of Oxford Séquençage d'acide nucléique à lecture longue cible pour la détermination de modifications de cytosine
WO2022053872A1 (fr) 2020-09-14 2022-03-17 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de modifications de cytosine
CN112858419B (zh) * 2021-02-26 2021-11-23 山东农业大学 一种构建光电化学传感器检测5-羟甲基胞嘧啶的方法
CN112858419A (zh) * 2021-02-26 2021-05-28 山东农业大学 一种基于钙钛矿和黑色二氧化锆构建光电化学传感器检测5-羟甲基胞嘧啶的方法
WO2024015800A3 (fr) * 2022-07-11 2024-04-04 The University Of Chicago Procédés et compositions de modification et de détection de 5-méthylcytosine
CN116287166A (zh) * 2023-04-19 2023-06-23 纳昂达(南京)生物科技有限公司 甲基化测序接头及其应用

Similar Documents

Publication Publication Date Title
WO2014165770A1 (fr) Séquençage avec une résolution de l'ordre de la base de la 5-formylcytosine (5fc) et de la 5-carboxylcytosine (5cac)
US9611510B2 (en) Composition and methods related to modification of 5-methylcytosine (5-mC)
US20200102616A1 (en) COMPOSITION AND METHODS RELATED TO MODIFICATION OF 5 HYDROXYMETHYLCYTOSINE (5-hmC)
CN105793434B (zh) Dna测序和表观基因组分析
US11274335B2 (en) Methods for the epigenetic analysis of DNA, particularly cell-free DNA
CA3081441A1 (fr) Kits d'analyse utilisant un codage et/ou une etiquette d'acide nucleique
US20160060687A1 (en) Methods and compositions to identify, quantify, and characterize target analytes and binding moieties
US11608518B2 (en) Methods for analyzing nucleic acids
CA3096668A1 (fr) Compositions et methodes d'evaluation et de traitement d'un cancer ou d'une neoplasie
JP2023508795A (ja) Dnaおよびrna修飾の濃縮および検出のための方法およびキット、ならびに機能モチーフ
US10655162B1 (en) Identification of biomolecular interactions
US20220162676A1 (en) Methods and Kits for Detection of N-4-acetyldeoxycytidine in DNA
WO2015009844A2 (fr) Analyse miroir faisant appel au bisulfite
JP5303981B2 (ja) Dnaメチル化測定方法
US11905555B2 (en) Methods for the amplification of bisulfite-treated DNA
US20210380967A1 (en) Methods of Identifying Adenosine-to-Inosine Edited RNA
WO2023242075A1 (fr) Détection des modifications épigénétiques de la cytosine
Hardisty Methods to Probe the Function of Modified Bases in DNA

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14778394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14778394

Country of ref document: EP

Kind code of ref document: A1