NZ793130A - Bisulfite-free, base-resolution identification of cytosine modifications - Google Patents
Bisulfite-free, base-resolution identification of cytosine modificationsInfo
- Publication number
- NZ793130A NZ793130A NZ793130A NZ79313019A NZ793130A NZ 793130 A NZ793130 A NZ 793130A NZ 793130 A NZ793130 A NZ 793130A NZ 79313019 A NZ79313019 A NZ 79313019A NZ 793130 A NZ793130 A NZ 793130A
- Authority
- NZ
- New Zealand
- Prior art keywords
- dna
- borane
- nucleic acid
- taps
- reducing agent
- Prior art date
Links
- OPTASPLRGRRNAP-UHFFFAOYSA-N Cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 title claims 5
- 229940104302 Cytosine Drugs 0.000 title claims 3
- 230000004048 modification Effects 0.000 title 1
- 238000006011 modification reaction Methods 0.000 title 1
- 150000007523 nucleic acids Chemical group 0.000 claims abstract 11
- FHSISDGOVSHJRW-UHFFFAOYSA-N 5-formylcytosine Chemical compound NC1=NC(=O)NC=C1C=O FHSISDGOVSHJRW-UHFFFAOYSA-N 0.000 claims abstract 4
- BLQMCTXZEMGOJM-UHFFFAOYSA-N 5-carboxycytosine Chemical compound NC=1NC(=O)N=CC=1C(O)=O BLQMCTXZEMGOJM-UHFFFAOYSA-N 0.000 claims abstract 2
- 229910000090 borane Inorganic materials 0.000 claims 26
- 229910000085 borane Inorganic materials 0.000 claims 24
- DMJZZSLVPSMWCS-UHFFFAOYSA-N diborane Chemical compound B1[H]B[H]1 DMJZZSLVPSMWCS-UHFFFAOYSA-N 0.000 claims 24
- UORVGPXVDQYIDP-UHFFFAOYSA-N trihydridoboron Substances B UORVGPXVDQYIDP-UHFFFAOYSA-N 0.000 claims 24
- 239000003638 reducing agent Substances 0.000 claims 17
- 239000007800 oxidant agent Substances 0.000 claims 14
- 230000000903 blocking Effects 0.000 claims 12
- 108020004707 nucleic acids Proteins 0.000 claims 10
- AGGHKNBCHLWKHY-UHFFFAOYSA-N sodium;triacetyloxyboron(1-) Chemical compound [Na+].CC(=O)O[B-](OC(C)=O)OC(C)=O AGGHKNBCHLWKHY-UHFFFAOYSA-N 0.000 claims 8
- 239000003795 chemical substances by application Substances 0.000 claims 5
- OIVLITBTBDPEFK-UHFFFAOYSA-N Dihydrouracil Chemical compound O=C1CCNC(=O)N1 OIVLITBTBDPEFK-UHFFFAOYSA-N 0.000 claims 4
- 108090000790 Enzymes Proteins 0.000 claims 4
- 102000004190 Enzymes Human genes 0.000 claims 4
- 102100003997 TET1 Human genes 0.000 claims 4
- 101700043716 TET1 Proteins 0.000 claims 4
- 101700048164 TET2 Proteins 0.000 claims 4
- 102100003998 TET2 Human genes 0.000 claims 4
- 101700085225 TET3 Proteins 0.000 claims 4
- 102100004000 TET3 Human genes 0.000 claims 4
- 101700066455 TRN2 Proteins 0.000 claims 4
- 150000001875 compounds Chemical class 0.000 claims 4
- OAKJQQAXSVQMHS-UHFFFAOYSA-N hydrazine Chemical compound NN OAKJQQAXSVQMHS-UHFFFAOYSA-N 0.000 claims 4
- BEOOHQFXGBMRKU-UHFFFAOYSA-N Sodium cyanoborohydride Chemical compound [Na+].[B-]C#N BEOOHQFXGBMRKU-UHFFFAOYSA-N 0.000 claims 3
- 150000001299 aldehydes Chemical class 0.000 claims 3
- 229910000033 sodium borohydride Inorganic materials 0.000 claims 3
- YOQDYZUWIQVZSF-UHFFFAOYSA-N sodium borohydride Substances [BH4-].[Na+] YOQDYZUWIQVZSF-UHFFFAOYSA-N 0.000 claims 3
- ODGROJYWQXFQOZ-UHFFFAOYSA-N sodium;boron(1-) Chemical compound [B-].[Na+] ODGROJYWQXFQOZ-UHFFFAOYSA-N 0.000 claims 3
- 239000000126 substance Substances 0.000 claims 3
- 241000222512 Coprinopsis cinerea Species 0.000 claims 2
- 235000001673 Coprinus macrorhizus Nutrition 0.000 claims 2
- QRXWMOHMRWLFEY-UHFFFAOYSA-N Isoniazid Chemical class NNC(=O)C1=CC=NC=C1 QRXWMOHMRWLFEY-UHFFFAOYSA-N 0.000 claims 2
- 241000224436 Naegleria Species 0.000 claims 2
- 229940035295 Ting Drugs 0.000 claims 2
- XCCTYIAWTASOJW-XVFCMESISA-N Uridine-5'-Diphosphate Chemical compound O[C@@H]1[C@H](O)[C@@H](COP(O)(=O)OP(O)(O)=O)O[C@H]1N1C(=O)NC(=O)C=C1 XCCTYIAWTASOJW-XVFCMESISA-N 0.000 claims 2
- 150000001412 amines Chemical class 0.000 claims 2
- QHXLIQMGIGEHJP-UHFFFAOYSA-N boron;2-methylpyridine Chemical compound [B].CC1=CC=CC=N1 QHXLIQMGIGEHJP-UHFFFAOYSA-N 0.000 claims 2
- 235000013601 eggs Nutrition 0.000 claims 2
- 150000002429 hydrazines Chemical class 0.000 claims 2
- -1 hydroxylamine compound Chemical class 0.000 claims 2
- ZLMJMSJWJFRBEC-UHFFFAOYSA-N potassium Chemical compound [K] ZLMJMSJWJFRBEC-UHFFFAOYSA-N 0.000 claims 2
- 229910052700 potassium Inorganic materials 0.000 claims 2
- 239000011591 potassium Substances 0.000 claims 2
- 239000011734 sodium Substances 0.000 claims 2
- 229920000160 (ribonucleotides)n+m Polymers 0.000 claims 1
- OSHIQPFXKULOPB-UHFFFAOYSA-N 6-(hydroxyamino)-1H-pyrimidin-2-one Chemical compound ONC1=CC=NC(=O)N1 OSHIQPFXKULOPB-UHFFFAOYSA-N 0.000 claims 1
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 claims 1
- 229940045145 Uridine Drugs 0.000 claims 1
- DRTQHJPVMGBUCF-UCVXFZOQSA-N Uridine Natural products O[C@H]1[C@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UCVXFZOQSA-N 0.000 claims 1
- 239000002253 acid Substances 0.000 claims 1
- 239000007822 coupling agent Substances 0.000 claims 1
- JPVYNHNXODAKFH-UHFFFAOYSA-N cu2+ Chemical compound [Cu+2] JPVYNHNXODAKFH-UHFFFAOYSA-N 0.000 claims 1
- 229920003013 deoxyribonucleic acid Polymers 0.000 claims 1
- OPCPRUQQEJNFIV-UHFFFAOYSA-N disodium;cyanoboron(1-) Chemical compound [Na+].[Na+].[B-]C#N.[B-]C#N OPCPRUQQEJNFIV-UHFFFAOYSA-N 0.000 claims 1
- AVXURJPOCDRRFD-UHFFFAOYSA-N hydroxylamine Chemical compound ON AVXURJPOCDRRFD-UHFFFAOYSA-N 0.000 claims 1
- 150000002443 hydroxylamines Chemical class 0.000 claims 1
- KEAYESYHFKHZAL-UHFFFAOYSA-N sodium Chemical compound [Na] KEAYESYHFKHZAL-UHFFFAOYSA-N 0.000 claims 1
- 229910052708 sodium Inorganic materials 0.000 claims 1
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-Hydroxymethylcytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 abstract 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-Methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 abstract 1
- 229920001850 Nucleic acid sequence Polymers 0.000 abstract 1
Abstract
This disclosure provides methods for bisulfite-free identification in a nucleic acid sequence of the locations of 5-methylcytosine, 5- hydroxymethylcytosine, 5-carboxylcytosine and 5- formylcytosine.
Description
This disclosure provides methods for ite-free identification in a nucleic acid sequence
of the locations of 5-methylcytosine, 5- hydroxymethylcytosine, 5-carboxylcytosine and 5-
formylcytosine.
NZ 793130
BISULFITE-FREE, BASE-RESOLUTION IDENTIFICATION
OF CYTOSINE MODIFICATIONS
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Nos.
62/614,798 filed January 8, 2018, 62/660,523 filed April 20, 2018, and 62/771,409 filed
November 26, 2018, each ofwhich is incorporated herein by nce in its entirety.
FIELD OF THE INVENTION
This sure provides methods for identifying in a nucleic acid ce the
locations of 5-methylcytosine, 5-hydroxymethylcytosine, 5-carboxylcytosine and/or 5-
forrnylcytosine.
BACKGROUND
5-Methylcytosine (SmC) and 5-hydroxymethylcyto sine (Sth) are the two major
epigenetic marks found in the ian genome. 5th is generated fiom SmC by the ten-
eleven translocation (TET) family enases. Tet can further oxidize 5th to 5-
forrnylcytosine (5 1C) and 5-carboxylcyto sine (ScaC), which exists in much lower abundance
in the mammalian genome ed to SmC and 5th (10-fold to lOO-fold lower than that
of 5th). Together, SmC and 5th play crucial roles in a broad range of biological
ses fiom gene regulation to normal pment. Aberrant DNA methylation and
hydroxymethylation have been associated with various diseases and are well-accepted
hallmarks of cancer. Therefore, the determination of SmC and 5th in DNA sequence is not
only ant for basic research, but also is valuable for clinical applications, including
diagnosis and therapy.
SfC and ScaC are the two final oxidized derivatives of SmC and can be ted
to unmodified cytosine by Thyrnine DNA glycosylase (TDG) in base excision repair
pathway. Therefore, SfC and ScaC are two important key intermediates in the active
demethylation process, which plays important role in embryonic development. SfC and ScaC
are found in these contexts and may serve as indicator of nearly complete SmC
demethylation. SfC and ScaC may also play additional functions such as bind specific
proteins and affect the rate and specificity ofRNA polymerase II.
WO 36413
SmC is also a post-transcriptional RNA modification that has been identified in
both stable and highly abundant tRNAs and rRNAs, and in mRNAs. In addition, SmC has
been detected in snRNA (small nuclear RNA), miRNA (microRNA), lncRNA (long
noncoding RNA) and eRNA (enhancer RNA). However, there s to be differences in
the occurrence of SmC in specific RNA types in different organisms. For example, SmC
appears not to be t in tRNA and mRNA fiom bacteria, while it has been found in tRNA
and mRNA in eukaryotes and archaea.
Sth has also been detected in RNA. For example, mRNA fiom Drosophila and
mouse has been found to contain 5th. The same family of enzymes that oxidize SmC in
DNA was ed to catalyze the formation ofSth in mammalian total RNA. In flies, a
riptome wide study using methylation RNA irnmunoprecipitation sequencing -
seq) with 5th dies, detected the presence of Sth in many mRNA coding
sequences, with particularly high levels in the brain. It was also reported that active
translation is associated with high 5th levels in RNA, and flies lacking the TET enzyme
responsible for Sth deposition in RNA have impaired brain development.
The current gold standard and mo st widely used method for DNA methylation and
hydroxymethylation analysis is bisulfite sequencing (BS), and its derived methods such as
Tet-assisted bisulfite sequencing (TAB-Seq) and oxidative bisulfite sequencing (oxBS). All
of these methods employ bisulfite treatment to convert unmethylated cytosine to uracil while
leaving SmC and/or 5th intact. Through PCR amplification of the te-treated DNA,
which reads uracil as thymine, the modification information of each ne can be inferred
at a single base resolution (where the transition ofC to T provides the location of the
unmethylated cytosine). There are, however, at least two main cks to bisulfite
sequencing. First, bisulfite treatment is a harsh chemical reaction, which degrades more than
90% ofthe DNA due to depurination under the required acidic and thermal conditions. This
degradation severely limits its application to low-input samples, such as al s
including circulating cell-fiee DNA and single-cell sequencing. Second, bisulfite sequencing
relies on the complete conversion of unmodified cytosine to thymine. Unmodified cytosine
accounts for approximately 95% of the total cytosine in the human genome. Converting all
these positions to e ly reduces sequence complexity, leading to poor sequencing
quality, low mapping rates, uneven genome coverage and increased sequencing cost.
Bisulfite cing methods are also susceptible to false detection of SmC and 5th due to
incomplete conversion ofunmodified cytosine to thymine.
Bisulfite sequencing has also been used to detect cytosine methylation in RNA.
Unlike other methods for detecting SmC in RNA such as methylated-RNA-
oprecipitation, RNA-bisulfite-sequencing (RNA-BS-seq) has the advantage ofbeing
able to determine of the extent of methylation of a specific C position in RNA. RNA-BS-seq,
however, suffers from the same drawbacks described above for bisulfite sequencing ofDNA.
In ular, the reaction conditions can cause substantial degradation of RNA.
There is a need for a method for DNA methylation and hydroxyrnethylation
analysis that is a mild reaction that can detect the d cytosine (SmC and Sth) at
base-resolution quantitatively without affecting the unmodified cytosine. Likewise, there is a
need for a method for RNA methylation and hydroxyrnethylation analysis that employs mild
reaction conditions and can detect the modified cytosine quantitatively at base resolution
without affecting the unmodified ne.
SUMMARY OF THE INVENTION
The t invention provides methods for identifying the location of one or more
of 5-methylcyto sine, 5- hydroxyrnethylcytosine, 5-carboxylcytosine and/or S-formylcytosine
in a c acid. The methods described herein provide for DNA or RNA methylation and
hydroxymethylation analysis involving mild reactions that detect the modified cytosine
quantitatively with base-resolution t affecting the unmodified cytosine. Provided
herein is a new method for identifying SmC and Sth by combining TET oxidation and
reduction by borane derivatives (e.g., pyridine borane and line borane H3)),
referred to herein as TAPS (TET Assisted Pyridine borane Sequencing) (Table l). TAPS
detects modifications directly with high sensitivity and specificity, without affecting
unmodified cytosines, and can be adopted to detect other cytosine modifications. It is non-
destructive, preserving RNA and DNA up to 10 kbs long. Compared with bisulfite
sequencing, TAPS results in higher mapping rates, more even coverage and lower sequencing
costs, enabling higher quality, more comprehensive and r methylome analyses.
ions of this method that do not employ the oxidation step are used to identify SfC
and/or ScaC as described herein.
In one aspect, the present invention provides a method for identifying 5-
cytosine (SmC) in a target c acid comprising the steps of:
a. providing a nucleic acid sample comprising the target nucleic acid;
b. modifying the nucleic acid comprising the steps of:
i. adding a blocking group to the 5-hydroxymethylcytosine (Sth) in the
nucleic acid sample;
ii. converting the SmC in the nucleic acid sample to 5-carboxylcytosine (ScaC)
and/or 5-formylcyto sine (SfC); and
iii. converting the ScaC and/or SfC to dihydrouracil (DHU) to provide a
modified nucleic acid sample comprising a modified target nucleic acid; and
c. detecting the sequence of the d target nucleic acid; wherein a cytosine (C)
to thymine (T) transition in the sequence of the modified target nucleic acid
compared to the target nucleic acid provides the location of a SmC in the target
nucleic acid.
In embodiments of the method for identifying SmC in a target nucleic acid, the
percentages of the T at each transition location provide a quantitative level of SmC at each
location in the target nucleic acid. In embodiments, the nucleic acid is DNA. In other
embodiments, nucleic acid is RNA.
In another , the t invention provides a method for identifying SmC or
Sth in a target nucleic acid comprising the steps of:
a. providing a nucleic acid sample comprising the target nucleic acid;
b. modifying the nucleic acid comprising the steps of:
i. converting the SmC and Sth in the nucleic acid sample to 5-
carboxylcytosine (ScaC) and/or 51C; and
ii. converting the ScaC and/or SfC to DHU to provide a modified nucleic acid
sample sing a modified target c acid; and
c. ing the sequence of the modified target nucleic acid; wherein a cytosine (C)
to e (T) transition in the sequence of the modified target nucleic acid
compared to the target nucleic acid provides the location of either a SmC or Sth
in the target nucleic acid.
In ments of the method for identifying SmC or Sth, the percentages of
the T at each transition on provide a tative level of SmC or Sth at each location
in the target nucleic acid. In embodiments, the nucleic acid is DNA. In other embodiments,
nucleic acid is RNA.
In another aspect, the invention provides a method for identifying SmC and
identifying Sth in a target nucleic acid comprising:
a. fying SmC in the target nucleic acid comprising the steps of:
i. providing a first nucleic acid sample comprising the target nucleic acid;
ii. modifying the nucleic acid in the first sample comprising the steps of:
1. adding a ng group to the 5-hydroxymethylcyto sine (Sth) in
the first c acid sample;
2. converting the SmC in the first nucleic acid sample to ScaC and/or 51C;
3. converting the ScaC and/or SfC to DHU to provide a modified first
DNA sample comprising a d target nucleic acid;
iii. optionally amplifying the copy number of the modified target nucleic acid;
iv. detecting the sequence of the modified target nucleic acid; wherein a
cytosine (C) to thymine (T) transition in the sequence of the modified target
nucleic acid compared to the target nucleic acid provides the location of a
SmC in the target nucleic acid.
b. identifying SmC or Sth in the target nucleic acid comprising the steps of:
i. providing a second nucleic acid sample comprising the target nucleic acid;
ii. modifying the nucleic acid in the second sample comprising the steps of:
l. converting the SmC and Sth in the second nucleic acid sample to
ScaC and/or SfC; and
2. converting the ScaC and/or SfC to DHU to provide a modified second
nucleic acid sample sing a modified target nucleic acid;
iii. ally amplifying the copy number of the modified target nucleic acid;
iv. detecting the sequence of the modified target nucleic acid fiom the second
; wherein a cytosine (C) to thymine (T) transition in the sequence of
the modified target nucleic acid compared to the target nucleic acid es
the location of either a SmC or Sth in the target nucleic acid; and
c. comparing the s of steps (a) and (b), wherein a C to T transitions present in
step (b) but not in step (a) provides the location of Sth in the target nucleic acid.
In embodiments for identifying SmC and identifying Sth in a target nucleic acid,
in step (a) the percentages of the T at each transition location provide a quantitative level of
SmC in the target nucleic acid; in step (b), the percentages of the T at each transition location
provide a quantitative level of SmC or Sth in the target nucleic acid; and in step (c) the
differences in percentages for a C to T transition identified in step (b), but not in step (a)
es the tative level of a Sth at each location in the target nucleic acid. In
ments, the nucleic acid is DNA. In other embodiments, nucleic acid is RNA.
In embodiments of the invention, the blocking group added to Sth in the nucleic
acid sample is a sugar. In embodiments, the sugar is a naturally-occurring sugar or a
modified sugar, for example glucose or a modified glucose. In embodiments of the
invention, the blocking group is added to Sth by contacting the nucleic acid sample with
UDP linked to a sugar, for example UDP-glucose or UDP linked to a modified glucose in the
presence of a glucosyltransferase enzyme, for e, T4 bacteriophage B-
glucosyltransferase (BGT) and T4 bacteriophage a-glucosyltransferase (aGT) and derivatives
and analogs thereof.
In embodiments of the invention, the step of converting the SmC in the nucleic
acid sample to ScaC and/or SfC and the step of converting the SmC and Sth in the nucleic
acid sample to ScaC and/or SfC each comprises contacting the nucleic acid sample with a ten
eleven translocation (TET) enzyme. In further embodiments, the TET enzyme is one or more
of human TETl and TET3; murine Tetl, Tet2, and Tet3; Naegleria TET (NgTET);
, TET2,
Coprinopsis cinerea (CcTET) and derivatives or analogues thereof. In embodiments, the
TET enzyme is NgTET.
In another aspect, the ion provides a method for identifying ScaC or SfC in a
target nucleic acid comprising the steps of:
a. providing a nucleic acid sample comprising the target nucleic acid;
b. ting the ScaC and SfC to DHU to provide a modified nucleic acid sample
comprising a modified target nucleic acid;
c. optionally ying the copy number of the modified target nucleic acid; and
d. detecting the sequence of the modified target nucleic acid; n a cytosine (C)
to thymine (T) transition in the sequence of the modified target nucleic acid
compared to the target nucleic acid provides the location of either a ScaC or SfC
in the target nucleic acid.
In embodiments of the method for identifying ScaC or SfC in a target nucleic acid,
the percentages of the T at each transition location e a quantitative level for ScaC or
SfC at each location in the target nucleic acid.
In another aspect, the invention provides a method for identifying ScaC in a target
nucleic acid comprising the steps of:
a. providing a nucleic acid sample sing the target nucleic acid;
b. adding a blocking group to the SfC in the nucleic acid ;
c. converting the ScaC to DHU to provide a modified nucleic acid sample
sing a d target nucleic acid;
WO 36413
d. optionally amplifying the copy number of the modified target nucleic acid; and
e. determining the sequence of the modified target nucleic acid; wherein a cytosine
(C) to e (T) transition in the sequence of the modified target nucleic acid
compared to the target c acid provides the location of a ScaC in the target
nucleic acid.
In embodiments of the method for identifying ScaC in a target nucleic acid, the
percentages of the T at each transition location provide a quantitative level for ScaC at each
location in the target nucleic acid. In embodiments, the nucleic acid is DNA. In other
embodiments, nucleic acid is RNA.
In embodiments of the invention, adding a blocking group to the SfC in the nucleic
acid sample comprises contacting the nucleic acid with an aldehyde reactive compound
including, for example, hydroxylamine derivatives (such as O-ethylhydroxylamine),
hydrazine derivatives, and hydrazide tives.
In r aspect, the invention provides a method for identifying SfC in a target
nucleic acid comprising the steps of:
a. providing a nucleic acid sample comprising the target nucleic acid;
b. adding a blocking group to the ScaC in the c acid sample
c. converting the SfC to DHU to provide a modified nucleic acid sample comprising
a modified target nucleic acid;
d. optionally amplifying the copy number of the modified target nucleic acid;
e. detecting the sequence of the d target nucleic acid; wherein a cytosine (C)
to thymine (T) transition in the sequence of the modified target nucleic acid
compared to the target nucleic acid provides the location of a SfC in the target
nucleic acid.
In embodiments of the method for identifying SfC in a target nucleic acid, the
percentages of the T at each transition location provide a quantitative level for SfC at each
location in the target nucleic acid. In embodiments, the nucleic acid (sample and target) is
DNA. In other ments, nucleic acid e and target) is RNA.
In embodiments, the step of adding a ng group to the ScaC in the nucleic
acid sample comprises contacting the nucleic acid sample with a ylic acid
derivatization reagent, ing, for example, l-ethyl-3 -(3-
dirnethylaminopropyl)carbodiimide (EDC) and (ii) an amine (such as ethylamine), hydrazine,
or hydroxylamine compound.
In embodiments of the invention, the methods above r comprise the step of
amplifying the copy number of the d target nucleic acid. In embodiments, this
cation step is performed prior to the step of detecting the sequence of the modified
target nucleic acid. The step of amplifying the copy number when the modified target nucleic
acid is DNA may be accomplished by performing the polymerase chain reaction (PCR),
primer extension, and/or cloning. When the modified target nucleic acid is RNA, the step of
amplifying the copy number may be accomplished by RT-PCR using oligo(dT) primer (for
mRNA), random primers, and/or gene specific primers.
In embodiments of the invention, the DNA sample ses picogram quantities
of DNA. In embodiments of the invention, the DNA sample ses about 1 pg to about
900 pg DNA, about 1 pg to about 500 pg DNA, about 1 pg to about 100 pg DNA, about 1 pg
to about 50 pg DNA, about 1 to about 10 pg, DNA, less than about 200 pg, less than about
100 pg DNA, less than about 50 pg DNA, less than about 20 pg DNA, and less than about 5
pg DNA. In other ments of the invention, the DNA sample comprises nanogram
quantities ofDNA. In embodiments of the invention, the DNA sample contains about 1 to
about 500 ng of DNA, about 1 to about 200 ng ofDNA, about 1 to about 100 ng of DNA,
about 1 to about 50 ng of DNA, about 1 ng to about 10 ng ofDNA, about 1 ng to about 5 ng
ofDNA, less than about 100 ng of DNA, less than about 50 ng ofDNA less than about 5 ng
ofDNA, or less that about 2 ng ofDNA. In embodiments of the invention, the DNA sample
comprises ating cell-free DNA (cfl)NA). In embodiments of the invention the DNA
sample comprises microgram quantities ofDNA.
In embodiments of the invention, the step of converting the ScaC and/or SfC to
DHU comprises contacting the nucleic acid sample with a reducing agent including, for
e, pyridine , 2-picoline borane (pic-BH3), borane, sodium borohydride, sodium
cyanoborohydride, and sodium triacetoxyborohydride. In a preferred embodiment, the
reducing agent is pic-BH3 and/or pyridine borane.
In embodiments of the invention, the step of determining the sequence of the
d target nucleic acid comprises chain termination sequencing, microarray, high-
throughput sequencing, and restriction enzyme analysis.
BRIEF DESCRIPTION OF THE FIGURES
Fig. 1. Borane—containing compounds screening. Borane-containing
compounds were screened for conversion of ScaC to DHU in an llmer oligonucleotide
(“oligo”), with conversion rate estimated by MALDI. 2-picoline borane (pic-borane), borane
pyridine, and tert-butylamine borane could tely convert 5caC to DHU while
ethylenediamine borane and dimethylamine borane gave around 30% conversion rate. No
detectable products measured (n.d.) with dicyclohexylamine borane, morpholine borane, 4-
methylmorpholine borane, and trirnethylamine borane. Other reducing agents such as sodium
borohydride and sodium tri(acetoxy)borohydride decomposed rapidly in acidic media and
lead to incomplete conversion. Sodium cyanoborohydride was not used due to potential for
en e formation under acidic condition. Pic-borane and pyridine borane were
chosen because of complete conversion, low toxicity and high stability.
Fig. 2A-B. Pic-borane reaction on DNA oligos. (A) MALDI terization of
5caC-containing llmer model DNA treated with pic-borane. Calculated mass (m/z) shown
above each graph, observed mass shown to the left of the peak. (B) The conversion rates of
dC and various cytosine derivatives were quantified by HPLC-MS/MS. Data shown as mean
:: SD of three replicates.
Fig. 3A-B. Single side pic-borane reaction. 1H and 13’C NMR results
were in ance with previous report on 2'-deoxy-5,6-dihydrouridine (I. Aparici-Espert et
al., J. Org. Chem. 81, 4031-4038 (2016)). (A) 1H NMR (MeOH-d4, 400 MHz) chart of the
single nucleoside rane reaction product. 5 ppm: 6.28 (t, 1H, J = 7 Hz), 4.30 (m, 1H),
3.81 (m, 1H), 3.63 (m, 2H), 3.46 (m, 2H), 2.65 (t, 2H, J = 6 Hz), 2.20 (m, 1H), 2.03 (m, 1H).
(B) 13C NMR d4, 400 MHz) chart of the single nucleoside pic-borane reaction
product. 5 ppm: 171.56 (CO), 153.54 (CO), 85.97 (CH), 83.86 (CH), 70.99 (CH), 61.92
(CH2), 36.04 (CH2), 35.46 (CH2), 30.49 (CH2).
Fig. 4A-B. A diagram showing (A) borane conversion of 5caC to DHU and a
proposed mechanism for borane reaction of 5caC to DHU; and (B) borane sion of 5fC
to DHU and a proposed mechanism for borane reaction of 5fC to DHU.
Fig. 5A-B. (A) m showing that the TAPS method converts both 5mC and
5th to DHU, which upon replication acts as thymine. (B) Overview of the TAPS, TAPSB,
and CAPS methods.
Fig. 6. MALDI characterization of 5fC and 5caC ning model DNA
oligos treated by pic-borane with or without blocking 5fC and 5caC. 5fC and 5caC are
converted to dihydrouracil (DHU) with pic-BH3. 5fC was blocked by hydroxylamine
derivatives such as O-ethylhydroxylamine (EtONHz) which would become oxime and resist
pic-borane conversion. 5caC was blocked by ethylamine via EDC conjugation and converted
to amide which blocks conversion by pic-borane. Calculated MS (m/z) shown above each
graph, observed MS shown to the left of the peak.
Fig. 7. MALDI characterization of 5mC and 5th containing model DNA
oligos treated by KRuO4 and pic-borane with or without blocking of 5th. 5th could
be blocked by BGT with glucose and converted to ngC. SmC, Sth and 5ng could not
be converted by pic-borane. 5th could be oxidized by KRuO4 to 51C, and then converted
to DHU by pic-borane. Calculated MS (m/z) shown above each graph, observed MS shown
to the left of the peak.
Fig. 8A-B. Restriction enzyme digestion showed TAPS could effectively
convert 5mC to T. (A) Illustration of restriction enzyme digestion assay to confirm
sequence change caused by TAPS. (B) Taan-digestion tests to confirm the C-to-T transition
caused by TAPS. TAPS was performed on a 222 bp model DNA having a Taan restriction
site and containing 5 fully methylated CpG sites (SmC) and its unmethylated control (C).
PCR—amplified 222 bp model DNA can be cleaved with Taan to ~160 bp and ~60 bp
nts as shown in the SmC, C and C TAPS. After TAPS on the methylated DNA, the
T(mC)GA sequence is converted to TTGA and is no longer cleaved by Taan ion as
shown in the SmC-TAPS lanes.
Fig. 9A-B. TAPS on a 222 bp model DNA and mESC gDNA. (A) Sanger
cing results for the 222 bp model DNA containing 5 fully methylated CpG sites and its
unmethylated control before (SmC, C) and after TAPS (SmC TAPS, C TAPS). Only SmC is
converted to T by the TAPS method. (B) HPLC-MS/MS quantification of relative
modification levels in the mESCs gDNA control, after NgTETl oxidation and after pic-
borane reduction. Data shown as mean :: SD of three replicates.
Fig. 10A-D. TAPS caused no cant DNA degradation ed to
bisulfite. Agarose gel images of 222 bp unmethylated DNA, 222 bp methylated DNA, and
mESC gDNA (A) before and (B) after chilling in an ice bath. No detectable DNA
degradation was observed after TAPS and DNA remained double-stranded and could be
visualized without chilling. Bisulfite conversion created degradation and DNA became
-stranded and could be ized only after d on ice. (C) Agarose gel images of
mESC gDNA ofvarious fiagment lengths treated with TAPS and bisulfite before (left panel)
and after (right panel) cooling down on ice. DNA after TAPS remained double-stranded and
could be directly visualized on the gel. Bisulfite treatment caused more damage and
ntation to the samples and DNA became single-stranded and could be visualized only
after d on ice. TAPS conversion was complete for all gDNA regardless of fragment
length as shown in Fig. 15. (D) Agarose gel imaging of a 222 bp model DNA before and
after TAPS (three ndent repeats) showed no detectable degradation after the reaction.
Fig. 11. Comparison of amplification curves and melting curves between
model DNAs before and after TAPS. qPCR assay showed minor difference on model
DNAs before and after TAPS in amplification curves. Melting curve of methylated DNA
(SmC) shifted to lower temperature after TAPS indicated le Tm-decreasing C-to-T
transition while there was no shift for unmethylated DNA (C).
Fig. 12. Complete C-to-T transition induced after TAPS, TAPSfl and CAPS
as demonstrated by Sanger sequencing. Model DNA containing single methylated and
single hydroxymethylated CpG sites was prepared as described herein. TAPS conversion
was done following NgTETl oxidation and pyridine borane reduction protocol as described
herein. TAPSB conversion was done following 5th blocking, NgTETl Oxidation and
Pyridine borane ion protocol. CAPS conversion was done following 5th oxidation
and Pyridine borane reduction protocol. Alter conversion, 1 ng of ted DNA sample
was PCR amplified by Taq DNA Polymerase and sed for Sanger sequencing. TAPS
converted both SmC and Sth to T. TAPSB selectively converted SmC s CAPS
selectively converted 5th. None of the three methods caused conversion on unmodified
ne and other bases.
Fig. 13A-B. (A) TAPS is compatible with various DNA and RNA polymerase and
induces complete C-to-T transition as shown by Sanger sequencing. Model DNA containing
methylated CpG sites for the polymerase test and primer sequences are described .
After TAPS treatment, SmC was converted to DHU. KAPA HiFi Uracil plus polymerase, Taq
polymerase, and Vent exo-polymerase would read DHU as T and therefore induce te
C-to-T transition after PCR. Alternatively, primer extension was done with biotin-labelled
primer and isothermal polymerases including Klenow fiagment, Bst DNA polymerase, and
phi29 DNA polymerase. The newly synthesized DNA strand was separated by Dynabeads
MyOne Streptavidin C1 and then amplified by PCR with Taq polymerase and processed for
Sanger sequencing. T7 RNA polymerase could efficiently bypass DHU and insert adenine
opposite to DHU site, which is proved by RT-PCR and Sanger sequencing. (B) Certain other
commercialized polymerases did not amplify DHU containing DNA efficiently. After TAPS
treatment, SmC was converted to DHU. KAPA HiFi Uracil plus rase and Taq
polymerase would read DHU as T and therefore induce te C-to-T transition. Low or
no C-to-T transition was ed with certain other commercialized polymerases including
KAPA HiFi polymerase, Pfu rase, Phusion rase and NEB Q5 polymerase (not
shown).
Fig. 14. DHU does not show PCR bias compared to T and C. Model DNA
containing one DHU/U/T/C modification was synthesized with the corresponding DNA
oligos as described in . Standard curves for each model DNA with DHU/U/T/C
modification were plotted based on qPCR reactions with 1:10 serial ons of the model
DNA input (from 0.1 pg to 1 ng, every qPCR experiment was run in triplicates). The slope of
the regression n the log concentration (ng) values and the average Ct values was
calculated by SLOPE on in Excel. PCR efficiency was calculated using the following
equation: Efficiency% = (10"(-1/Slope)-1)*100%. Amplification factor was calculated using
the following equation: Amplification factor=10A(-1/Slope). PCR efficiency for model
DNAs with DHU or T or C modification were almost the same, which demonstrated that
DHU could be read through as a regular base and would not cause PCR bias.
Fig. ISA-B. TAPS completely converted 5mC to T regardless of DNA
fragment length. (A) Agaro se gel images of Taan-digestion assay confirmed complete
5mC to T conversion in all s regardless ofDNA fragment lengths. 194 bp model
sequence from lambda genome was PCR amplified after TAPS and digested with Taan
enzyme. PCR product amplified fiom unconverted sample could be cleaved, whereas
ts amplified on TAPS treated samples stayed intact, suggesting loss of restriction site
and hence complete 5mC-to-T transition. (B) The C-to-T conversion percentage was
estimated by gel band fication and shown 100% for all DNA fiagment lengths tested.
Fig. 16. The conversion and false positive for different TAPS conditions. The
combination ofmTet1 and pyridine borane achieved the highest conversion rate of
methylated C , calculated with fully CpG methylated Lambda DNA) and the lowest
conversion rate dified C (0.23%, calculated with 2 kb unmodified spike-in),
compared to other conditions with NgTETl or pic-borane. Showing above bars the
conversion rate +/- SE of all tested cytosine sites.
Fig. 17A-B. Conversion rate on short spike-ins. (A) 120mer—1 and (B) -
2 ning 5mC and 5th. Near te conversion was archived on 5mC and 5th
sites fiom both strands. Actual sequences with modification status shown on top and bottom.
Fig. 18A-E. Improved sequencing quality of TAPS over Whole Genome
Bisulfite Sequencing (WGBS). (A) Conversion rate of5mC and Sth in TAPS-treated
DNA. Left: Synthetic spike-ins (CpN) methylated or hydroxymethylated at known positions.
(B) False positive rate of TAPS fiom unmodified 2 kb in. (C) Total run time of TAPS
and WGBS when processing 1 million simulated reads on one core of one Intel Xeon CPU.
(D) Fraction of all sequenced read pairs (after trimming) mapped to the genome. (E)
Sequencing quality scores per base for the first and second reads in all sequenced read pairs,
as ed by Illumina ace. Top: TAPS. Bottom: WGBS.
Fig. 19A-B. TAPS resulted in more even coverage and fewer uncovered
positions than WGBS. Comparison ofdepth of coverage across (A) all bases and (B) CpG
sites between WGBS and TAPS, computed on both strands. For "TAPS (down-sampled)”,
random reads out of all mapped TAPS reads were selected so that the median coverage
matched the median coverage ofWGBS. Positions with coverage above 50X are shown in
the last bin.
Fig. 20. Distribution of modification levels across all chromosomes. Average
modification levels in 100 kb windows along mouse chromosomes, weighted by the coverage
of CpG, and smoothed using a Gaussian weighted moving average filter with window size
Fig. 21A-E. Comparison of genome-wide methylome measurements by TAPS
and WGBS. (A) Average cing coverage depth in all mouse CpG islands (binned into
windows) and 4 kbp flanking regions (binned into 50 equally sized s). To account
for differences in cing depth, all mapped TAPS reads were down-sampled to match the
median number ofmapped WGBS reads across the genome. (B) CpG sites covered by at
least three reads by TAPS alone, both TAPS and WGBS, or WGBS alone. (C) Number of
CpG sites d by at least three reads and ation level > 0.1 detected by TAPS
alone, TAPS and WGBS, or WGBS alone. (D) Example of the chromosomal distribution of
modification levels (in %) for TAPS and WGBS. Average fiaction ofmodified Cst per
100 kb windows along mouse chromosome 4, smoothed using a Gaussian-weighted moving
average filter with window size 10. (E) Heatmap representing the number of CpG sites
covered by at least three reads in both TAPS and WGBS, broken down by modification levels
as measured by each method. To improve st, the first bin, containing Cst unmodified
in both methods, was excluded fiom the color scale and is denoted by a star.
Fig. 22. ation levels around CpG Islands. e ation levels
in CpG islands (binned into 20 windows) and 4 kbp flanking regions (binned into 50 equally
sized windows). Bins with coverage below 3 reads were ignored.
Fig. 23A-B. TAPS exhibits smaller coverage-modification bias than WGBS. All
CpG sites were binned according to their coverage and the mean (circles) and the median
(triangles) modification value is shown in each bin for WGBS (A) and TAPS (B). The CpG
sites covered by more than 100 reads are shown in the last bin. The lines represent a linear fit
through the data points.
Fig. 24A-C. Low-input gDNA and cell-free DNA TAPS prepared with dsDNA
and ssDNA library preparation kits. (A) Sequencing libraries were successfully
constructed with down to 1 ng of murine embryonic stem cell (mESC) genomic DNA
(gDNA) with dsDNA library kits NEBNext Ultra II or KAPA Hyperplus kits. ssDNA library
kit Accel-NGS Methyl-Seq kit was used to further lower the input DNA amount down to (B)
0.01 ng ofmESC gDNA or (C) 1 ng of cell-free DNA.
Fig. 25A-B. Low-input gDNA and cell-free DNA TAPS libraries prepared
with dsDNA KAPA lus library preparation kit. cing libraries were
successfully constructed with as little as 1 ng of (A) mESC gDNA and (B) ree DNA
with KAPA Hyperplus kit. Cell-flee DNA has a sharp length bution around 160 bp
(nucleosome size) due to plasma nuclease digestion. After library construction, it becomes
~300 bp, which is the sharp band in (B).
Fig. 26A-D. High-quality cell-free DNA TAPS. (A) Conversion rate of 5mC in
TAPS-treated chNA. (B) False positive rate in reated chNA. (C) Fraction of all
sequenced read pairs that were uniquely mapped to the genome. (D) Fraction of all
sequenced read pairs that were uniquely mapped to the genome and after removal ofPCR
duplication reads. CHG and CHH are non-CpG ts.
Fig. 27. TAPS can detect genetic variants. Methylation (MOD, top row) and C-
to-T SNPs (bottom row) showed distinct base distribution patterns in original top strand
(OT)/original bottom strand (OB), left column, and in strands complementary to OT (CTOT)
and OB , right .
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides a bisulfite-free, base-resolution method for
detecting 5mC and Sth in a sequence, herein named TAPS. TAPS consists of mild
tic and chemical reactions to detect 5mC and 5th directly and quantitatively at
base-resolution without affecting unmodified cytosine. The present invention also es
s to detect 51C and 5caC at base resolution without affecting unmodified cytosine.
Thus, the methods provided herein provide mapping of 5mC, Sth, SfC and 5caC and
overcome the disadvantages ofprevious methods such as bisulfite sequencing.
Table 1. Comparison ofBS and related methods versus TAPS, TAPSB and CAPS for 5mC
and Sth sequencing.
Sth C C T T C T
Methods for Identifying SmC
In one aspect, the t invention provides a method for identifying 5-
methylcytosine (SmC) in a target DNA comprising the steps of:
a. providing a DNA sample comprising the target DNA;
b. modifying the DNA comprising the steps of:
i. adding a blocking group to the 5-hydroxymethylcyto sine (Sth) in
the DNA sample;
ii. converting the SmC in the DNA sample to 5-carboxylcytosine (ScaC)
and/or 5-formylcyto sine (SfC); and
iii. converting the ScaC and/or SfC to DHU to provide a modified DNA
sample comprising a modified target DNA;
c. detecting the sequence of the modified target DNA; wherein a cytosine (C) to
thymine (T) transition in the sequence of the modified target DNA compared.
In embodiments of the method for identifying SmC in the target DNA, the method
provides a quantitative measure for the frequency the of SmC modification at each location
where the modification was identified in the target DNA. In embodiments, the percentages of
the T at each transition location provide a tative level of SmC at each location in the
target DNA.
In order to identify SmC in a target DNA without including 5th, the Sth in the
sample is blocked so that it is not subject to sion to ScaC and/or SfC. In the methods
of the present invention, 5th in the sample DNA are ed non-reactive to the
subsequent steps by adding a blocking group to the Sth. In one embodiment, the blocking
group is a sugar, including a d sugar, for example glucose or 6-azide-glucose (6-
azidodeoxy-D-glucose). The sugar blocking group is added to the ymethyl group
ofSth by ting the DNA sample with e diphosphate (UDP)-sugar in the
presence of one or more glucosyltransferase enzymes.
In embodiments, the glucosyltransferase is T4 bacteriophage B-glucosyltransferase
(BGT), T4 bacteriophage a-glucosyltransferase (aGT), and tives and analogs thereof.
BGT is an enzyme that zes a chemical reaction in which a beta-D-glucosyl (glucose)
residue is transferred fiom UDP-glucose to a 5-hydroxymethylcytosine residue in a nucleic
acid.
By stating that the blocking group is, for example, e, this refers to a glucose
moiety (e. g., a beta-D-glucosyl residue) being added to 5th to yield glucosyl 5-
hydroxymethyl ne. The sugar blocking group can be any sugar or modified sugar that
is a substrate of the glucosyltransferase enzyme and blocks the subsequent conversion of the
5th to ScaC and/or SfC. The step of converting the SmC in the DNA sample to ScaC
and/or SfC is then accomplished by the methods provided herein, such as by oxidation using
a TET enzyme. And converting the ScaC and/or SfC to DHU is lished by the
methods provided herein, such by borane oxidation.
The method for identifying 5-methylcytosine (SmC) can be performed on an RNA
sample to identify the location of, and provide a quantitative measure of, SmC in a target
Methods for Identifying 5111C or 5th (together)
In another aspect, the present invention provides a method for identifying SmC or
5th in a target DNA comprising the steps of:
a. providing a DNA sample comprising the target DNA;
b. modifying the DNA comprising the steps of:
i. ting the SmC and 5th in the DNA sample to 5-
carboxylcytosine (ScaC) and/or 51C; and
ii. converting the ScaC and/or SfC to DHU to provide a modified DNA
sample comprising a modified target DNA;
c. detecting the sequence of the modified target DNA; wherein a cytosine (C) to
thymine (T) transition in the sequence of the modified target DNA compared
to the target DNA provides the location of either a SmC or 5th in the target
In ments of the method for identifying SmC or 5th in the target DNA,
the method provides a quantitative measure for the frequency the of SmC or 5th
ations at each location where the modifications were identified in the target DNA. In
ments, the tages of the T at each transition on provide a quantitative level
of SmC or 5th at each location in the target DNA.
This method for identifying SmC or 5th provides the location ofSmC and
5th, but does not guish between the two cytosine modifications. Rather, both SmC
and 5th are converted to DHU. The presence ofDHU can be detected directly, or the
modified DNA can be replicated by known methods where the DHU is converted to T.
The method for identifying SmC or Sth can be performed on an RNA sample to
identify the location of, and provide a quantitative measure of, SmC or Sth in a target
Methods for fying 5111C and Identifying Sth
The present invention es a method for identifying SmC and identifying
Sth in a target DNA by (i) performing the method for fying SmC on a first DNA
sample described herein, and (ii) performing the method for identifying SmC or Sth on a
second DNA sample described herein. The location of SmC is provided by (i). By
comparing the results of (i) and (ii), wherein a C to T transitions detected in (ii) but not in (i)
es the location ofSth in the target DNA. In ments, the first and second DNA
samples are derived fiom the same DNA sample. For example, the first and second s
may be separate aliquots taken fiom a sample comprising DNA to be analyzed.
Because the SmC and Sth (that is not blocked) are converted to SfC and ScaC
before conversion to DHU, any existing SfC and ScaC in the DNA sample will be detected as
SmC and/or 5th. However, given the extremely low levels of SfC and ScaC in genomic
DNA under normal conditions, this will often be acceptable when analyzing methylation and
hydroxymethylation in a DNA sample. The SfC and ScaC signals can be eliminated by
protecting the SfC and ScaC fiom conversion to DHU by, for example, hydroxylamine
conjugation and EDC coupling, respectively.
The above method identifies the locations and percentages ofSth in the target
DNA through the ison ofSmC locations and tages with the locations and
percentages of SmC or 5th (together). Alternatively, the location and frequency of Sth
modifications in a target DNA can be ed directly. Thus, in one aspect the invention
provides a method for identifying Sth in a target DNA comprising the steps of:
a. providing a DNA sample comprising the target DNA;
b. modifying the DNA comprising the steps of:
i. converting the Sth in the DNA sample to SfC; and
ii. converting the SfC to DHU to provide a modified DNA sample
comprising a modified target DNA;
c. detecting the sequence of the modified target DNA; wherein a cytosine (C) to
thymine (T) transition in the sequence of the modified target DNA compared
to the target DNA provides the location of a Sth in the target DNA.
In ments, the step of converting the Sth to SfC comprises oxidizing the
Sth to SfC by contacting the DNA with, for e, potassium perruthenate (KRuO4) (as
WO 36413
described in Science. 2012, 33, 7 and W02013017853, incorporated herein by
reference); or Cu(II)/TEMPO (copper(II) perchlorate and 6-tetramethylpiperidine
oxyl (TEMPO)) (as described in Chem. Commun., 2017,53, 5756-5759 and 039002,
incorporated herein by nce). The SfC in the DNA sample is then converted to DHU by
the methods disclosed herein, e.g., by the borane reaction.
The method for identifying SmC and identifying Sth can be performed on an
RNA sample to identify the location of, and provide a quantitative measure of, SmC and
Sth in a target RNA.
Methods for Identifying SeaC or 51C
In one aspect, the invention es a method for identifying ScaC or SfC in a
target DNA comprising the steps of:
a. providing a DNA sample comprising the target DNA;
b. converting the ScaC and/or SfC to DHU to provide a modified DNA sample
comprising a modified target DNA;
c. optionally amplifying the copy number of the modified target DNA;
d. detecting the sequence of the modified target DNA; wherein a cytosine (C) to
thymine (T) transition in the ce of the d target DNA compared to
the target DNA provides the location of either a ScaC or SfC in the target DNA.
This method for identifying SfC or ScaC provides the location of SfC or ScaC, but
does not distinguish between these two cytosine modifications. Rather, both SfC and ScaC
are converted to DHU, which is detected by the methods described herein.
Methods for fying SeaC
In r aspect, the invention provides a method for identifying ScaC in a target
DNA comprising the steps of:
a. providing a DNA sample comprising the target DNA;
b. adding a blocking group to the SfC in the DNA sample;
c. converting the ScaC to DHU to provide a modified DNA sample comprising a
modified target DNA;
d. optionally amplifying the copy number of the modified target DNA; and
e. determining the sequence of the modified target DNA; wherein a cytosine (C) to
thymine (T) transition in the sequence of the modified target DNA compared to
the target DNA provides the on of a ScaC in the target DNA.
In embodiments of the method for identifying ScaC in the target DNA, the method
provides a quantitative measure for the frequency the of ScaC modification at each location
where the modification was identified in the target DNA. In embodiments, the percentages of
the T at each tion location provide a quantitative level of ScaC at each location in the
target DNA.
In this method, SfC is d (and SmC and Sth are not converted to DHU)
allowing identification of ScaC in the target DNA. In embodiments of the invention, adding
a blocking group to the SfC in the DNA sample comprises contacting the DNA with an
aldehyde ve compound including, for example, hydroxylamine derivatives, hydrazine
derivatives, and hyrazide derivatives. ylamine derivatives include ashydroxylamine;
hydroxylamine hydrochloride; hydroxylammonium acid sulfate; hydroxylamine phosphate;
O-methylhydroxylamine; O-hexylhydroxylamine; O-pentylhydroxylamine; O-
benzylhydroxylamine; and particularly, O-ethylhydroxylamine (EtONH2), O-alkylated or O-
arylated hydroxylamine, acid or salts thereof. Hydrazine tives include N-
alkylhydrazine, N-arylhydrazine, N- benzylhydrazine, alkylhydrazine, N,N-
diarylhydrazine, N,N-dibenzylhydrazine, N,N-alkylbenzylhydrazine, N,N-
arylbenzylhydrazine, and N,N-alkylarylhydrazine. Hydrazide derivatives e -
esulfonylhydrazide, N-acylhydrazide, N,N-alkylacylhydrazide, N,N-
benzylacylhydrazide, N,N-arylacylhydrazide, N-sulfonylhydrazide, N,N-
alkylsulfonylhydrazide, N,N-benzylsulfonylhydrazide, and N,N-arylsulfonylhydrazide.
The method for identifying ScaC can be performed on an RNA sample to identify
the location of, and provide a quantitative measure of, ScaC in a target RNA.
Methods for Identifying SfC
In another aspect, the invention provides a method for identifying SfC in a target
DNA comprising the steps of:
a. providing a DNA sample comprising the target DNA;
b. adding a blocking group to the ScaC in the DNA sample;
c. converting the SfC to DHU to provide a modified DNA sample comprising a
modified target DNA;
d. optionally ying the copy number of the modified target DNA;
e. detecting the sequence of the modified target DNA; wherein a cytosine (C) to
thymine (T) transition in the sequence of the modified target DNA compared to
the target DNA provides the location of a SfC in the target DNA.
In embodiments of the method for identifying SfC in the target DNA, the method
provides a quantitative measure for the ncy the of SfC modification at each on
where the modification was identified in the target DNA. In embodiments, the percentages of
the T at each transition location provide a quantitative level of SfC at each location in the
target DNA.
Adding a ng group to the ScaC in the DNA sample can be accomplished by
(i) contacting the DNA sample with a coupling agent, for example a ylic acid
derivatization reagent like carbodiimide derivatives such as l-ethyl-3 -(3-
dimethylaminopropyl)carbodiimide (EDC) or N,N'-dicyclohexylcarbodiimide (DCC) and (ii)
contacting the DNA sample with an amine, hydrazine or hydroxylamine compound. Thus,
for example, ScaC can be blocked by treating the DNA sample with EDC and then
benzylamine, ethylamine or other amine to form an amide that blocks ScaC from conversion
to DHU by, e.g., pic-BH3. Methods for EDC-catalyzed ScaC coupling are bed in
W02014l 65770, and are incorporated herein by reference.
The method for identifying SfC can be performed on an RNA sample to identify
the location of, and e a quantitative e of, SfC in a target RNA.
c Acid Sample / Target c Acid
The present invention provides methods for identifying the location of one or more
of 5-methylcyto sine, 5-hydroxymethylcyto sine, oxylcytosine and/or 5-formylcytosine
in a target nucleic acid quantitatively with base-resolution without affecting the unmodified
cytosine. In embodiments, the target nucleic acid is DNA. In other embodiments, the target
nucleic acid is RNA. Likewise the nucleic acid sample that comprises the target nucleic acid
may be a DNA sample or an RNA sample.
The target nucleic acid may be any c acid having cytosine modifications (i.e.,
SmC, 5th, SfC, and/or ScaC). The target nucleic acid can be a single nucleic acid molecule
in the sample, or may be the entire population of nucleic acid molecules in a sample (or a
subset thereof). The target nucleic acid can be the native nucleic acid from the source (e.g.,
cells, tissue samples, etc.) or can pre-converted into a hroughput sequencing-ready
form, for example by fiagmentation, repair and ligation with adaptors for cing. Thus,
target c acids can comprise a plurality of nucleic acid sequences such that the methods
described herein may be used to generate a library of target nucleic acid sequences that can
be analyzed individually (e. g., by determining the sequence of individual targets) or in a
group (e.g., by high-throughput or next generation sequencing methods).
A nucleic acid sample can be obtained from an organism from the Monera
(bacteria), Protista, Fungi, Plantae, and Animalia Kingdoms. Nucleic acid s may be
obtained from a from a patient or subject, from an environmental sample, or from an
organism of interest. In embodiments, the nucleic acid sample is extracted or derived from a
cell or collection of cells, a body fluid, a tissue sample, an organ, and an organelle.
RNA Sample / Target RNA
The present invention provides methods for identifying the location of one or more
of 5-methylcyto sine, 5- hydroxymethylcytosine, 5-carboxylcytosine and/or 5-formylcytosine
in a target RNA quantitatively with base-resolution without affecting the unmodified
cytosine. In embodiments, the RNA is one or more ofmRNA (messenger RNA), tRNA
(transfer RNA), rRNA (ribosomal RNA), snRNA (small nuclear RNA), miRNA
(microRNA), lncRNA (long noncoding RNA) and eRNA (enhancer RNA). The target RNA
can be a single RNA molecule in the sample, or may be the entire population ofRNA
les in a sample (or a subset thereof). Thus, target RNA can comprise a plurality of
RNA sequences such that the methods described herein may be used to generate a y of
target RNA sequences that can be analyzed dually (e.g., by determining the sequence of
individual targets) or in a group (e.g., by hroughput or next tion sequencing
methods).
DNA Sample / Target DNA
The methods of the invention utilize mild enzymatic and chemical reactions that
avoid the substantial degradation associated with methods like bisulfite sequencing. Thus,
the methods of the present invention are useful in analysis of low-input samples, such as
ating cell-flee DNA and in single-cell analysis.
In embodiments of the invention, the DNA sample comprises picogram quantities
ofDNA. In embodiments of the invention, the DNA sample comprises about 1 pg to about
900 pg DNA, about 1 pg to about 500 pg DNA, about 1 pg to about 100 pg DNA, about 1 pg
to about 50 pg DNA, about 1 to about 10 pg, DNA, less than about 200 pg, less than about
100 pg DNA, less than about 50 pg DNA, less than about 20 pg DNA, and less than about 5
pg DNA. In other embodiments of the invention, the DNA sample comprises nanogram
ties ofDNA. The sample DNA for use in the methods of the ion can be any
quantity including, DNA fiom a single cell or bulk DNA samples. In embodiments, the
methods of the present invention can be performed on a DNA sample comprising about 1 to
about 500 ng of DNA, about 1 to about 200 ng ofDNA, about 1 to about 100 ng of DNA,
about 1 to about 50 ng of DNA, about 1 to about 10 ng of DNA, about 2 to about 5 ng of
DNA, less than about 100 ng ofDNA, less than about 50 ng ofDNA less than 5 ng, and less
than 2 ng of DNA. In ments of the invention the DNA sample comprises microgram
quantities ofDNA.
A DNA sample used in the s described herein may be from any source
including, for example a body fluid, tissue sample, organ, organelle, or single cells. In
embodiments, the DNA sample is circulating ree DNA (cell-flee DNA or chNA),
which is DNA found in the blood and is not present within a cell. cfl)NA can be ed
from blood or plasma using methods known in the art. Commercial kits are available for
isolation of chNA including, for example, the Circulating Nucleic Acid Kit (Qiagen). The
DNA sample may result from an enrichment step, including, but is not limited to antibody
immunoprecipitation, chromatin precipitation, restriction enzyme digestion-based
enrichment, ization-based enrichment, or chemical labeling-based enrichment.
The target DNA may be any DNA having cytosine modifications (i.e., 5mC,
5th, 51C, and/or 5caC) including, but not limited to, DNA fragments or genomic DNA
purified fiom tissues, organs, cells and lles. The target DNA can be a single DNA
molecule in the sample, or may be the entire population ofDNA molecules in a sample (or a
subset thereof). The target DNA can be the native DNA from the source or pre-converted
into a high-throughput sequencing-ready form, for example by fragmentation, repair and
ligation with adaptors for sequencing. Thus, target DNA can comprise a plurality ofDNA
sequences such that the methods described herein may be used to generate a library of target
DNA sequences that can be analyzed individually (e.g., by determining the sequence of
individual targets) or in a group (e.g., by high-throughput or next generation sequencing
methods).
Converting 5mC and Sth to ScaC and/or SfC
Embodiments of the present invention, such as the TAPS method described herein,
e the step of ting the 5mC and 5th (or just the 5mC if the 5th is blocked) to
5caC and/or 51C In embodiments of the invention, this step comprises contacting the DNA
or RNA sample with a ten eleven translocation (TET) enzyme. The TET enzymes are a
family of enzymes that catalyze the transfer of an oxygen molecule to the N5 methyl group
on 5mC ing in the formation of 5-hydroxymethylcytosine (Sth). TET r
catalyzes the oxidation of 5th to SfC and the oxidation of SfC to form 5caC (see Fig. 5A).
TET enzymes useful in the methods of the invention include one or more ofhuman TETl
TET2, and TET3; murine Tetl, Tet2, and Tet3; Naegleria TET (NgTET); Coprinopsis
cinerea (CcTET) and derivatives or analogues thereof. In embodiments, the TET enzyme is
NgTET. In other embodiments the TET enzyme is human TETl (hTETl ).
Converting ScaC and/or SfC to DHU
WO 36413
Methods of the t invention include the step of ting the 5caC and/or
5fC in a nucleic acid sample to DHU. In embodiments of the invention, this step comprises
contacting the DNA or RNA sample with a reducing agent ing, for example, a borane
reducing agent such as pyridine borane, 2—picoline borane (pic-BH3), borane, sodium
borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride. In a preferred
embodiment, the reducing agent is pyridine borane and/or pic-BH3.
Amplifying the copy number of modified target nucleic acid
The methods of the invention may optionally include the step of ying
(increasing) the copy number of the modified target Nucleic acid by s known in the
art. When the modified target nucleic acid is DNA, the copy number can be increased by, for
example, PCR, cloning, and primer extension. The copy number of individual target DNAs
can be amplified by PCR using primers specific for a ular target DNA sequence.
Alternatively, a plurality of different modified target DNA sequences can be amplified by
cloning into a DNA vector by standard techniques. In embodiments of the invention, the
copy number of a plurality of different modified target DNA sequences is increased by PCR
to generate a library for next generation sequencing where, e.g., double-stranded adapter
DNA has been previously ligated to the sample DNA (or to the modified sample DNA) and
PCR is performed using primers mentary to the adapter DNA.
Detecting the ce of the Modified Target c acid
In embodiments of the invention, the method comprises the step of detecting the
sequence of the modified target nucleic acid. The modified target DNA or RNA contains
DHU at positions where one or more of 5mC, Sth, 5fC, and 5caC were present in the
unmodified target DNA or RNA. DHU acts as a T in DNA replication and sequencing
s. Thus, the cytosine modifications can be detected by any direct or indirect method
that identifies a C to T transition known in the art. Such methods include sequencing
methods such as Sanger sequencing, microarray, and next generation sequencing s.
The C to T transition can also be detected by restriction enzyme analysis where the C to T
transition abolishes or introduces a restriction endonuclease ition sequence.
The invention additionally provides kits for identification of 5mC and Sth in a
target DNA. Such kits comprise reagents for fication of 5mC and 5th by the
methods described herein. The kits may also contain the reagents for identification of 5caC
and for the identification of SfC by the methods described herein. In embodiments, the kit
comprises a TET enzyme, a borane reducing agent and instructions for performing the
. In further embodiments, the TET enzyme is TETl and the borane reducing agent is
selected fiom one or more of the group consisting ofpyridine borane, 2-picoline borane (pic-
BH3 ), borane, sodium borohydride, sodium cyanoborohydride, and sodium
toxyborohydride. In a further ment, the TETl enzyme is NgTetl or murine Tetl
and the borane reducing agent is pyridine borane and/or pic-BH3.
In ments, the kit further comprises a 5th blocking group and a
glucosyltransferase enzyme. In further embodiments, the Sth blocking group is uridine
diphosphate sugar where the sugar is glucose or a glucose derivative, and the
glucosyltransferase enzyme is T4 bacteriophage B-glucosyltransferase (BGT), T4
bacteriophage a-glucosyltransferase (aGT), and derivatives and analogs thereof.
In embodiments the kit further comprises an oxidizing agent selected from
potassium perruthenate (KRuO4) and/or Cu(II)/TEMPO (copper(II) perchlorate and 2,2,6,6-
tetramethylpiperidine-l -oxyl (TEMPO)).
In embodiments, the kit comprises reagents for blocking SfC in the nucleic acid
sample. In embodiments, the kit comprises an aldehyde reactive compound including, for
example, hydroxylamine derivatives, ine derivatives, and hyrazide derivatives as
described herein. In embodiments, the kit comprises ts for blocking 5caC as described
herein.
In embodiments, the kit comprises reagents for isolating DNA or RNA. In
embodiments the kit comprises reagents for ing low-input DNA fiom a sample, for
example cfl)NA from blood, plasma, or serum.
EXAMPLES
Methods
Preparation ofmodel DNA.
DNA oligos for MALDI and HPLC-MS/MS test. DNA oligonucleotides
(“oligos”) with C, 5mC and 5th were purchased from Integrated DNA Technologies
(IDT). All the ces and ations could be found in Figs. 6 and 7. DNA oligo with
SfC was synthesized by the C-tailing method: DNA oligos 5'-GTCGACCGGATC-3' and 5'-
TTGGATCCGGTCGACTT-3' were annealed and then incubated with 5-formyl-2'-dCTP
(Trilink h) and Klenow Fragment 3 '—>5' exo- (New England Biolabs) in NEBuffer 2
for 2 hr at 37°C. The product was purified with in P-6 Gel Columns (Bio-Rad).
DNA oligo with 5caC was synthesized using Expedite 8900 Nucleic Acid
Synthesis System with standard phosphoramidites ) 5-Carboxy-dC-CE
Phosphoramidite (Glen Research). Subsequent ection and purification were carried out
with Glen-Pak Cartridges (Glen Research) according to the manufacturer’s instructions.
Purified ucleotides were characterized by Voyager-DE MALDI-TOF (matrix-assisted
laser desorption ionization time-of-flight) Biospectrometry ation.
222 bp Model DNA for conversion test. To generate 222 bp model DNA
containing five CpG sites, bacteriophage lambda DNA (Thermo Fisher) was PCR amplified
using Taq DNA Polymerase (New England s) and purified by AMPure XP beads
an Coulter). Primers ces are as follows: FW-5'-
CCTGATGAAACAAGCATGTC-3', RV-5'-CAUTACTCACUTCCCCACUT-3'. The uracil
base in the reverse strand of PCR product was removed by USER enzyme (New England
Biolabs). 100 ng ofpurified PCR product was then methylated in 20 pl solution containing
1X er 2, 0.64 mM S-adenosylmethionine and 20 U M.SssI CpG Methyltransferase
(New England Biolab s) for 2 hr at 37°C, followed by 20 min heat inactivation at 65°C. The
methylated 222 bp model DNA was purified by AMPure XP beads.
Model DNA for TAPS, TAPSfl and CAPS validation with Sanger sequencing.
34 bp DNA oligo containing single 5mC and single 5th site was annealed with other DNA
oligos in annealing buffer containing 5 mM Tris-Cl (pH 7.5), 5 mM MgC12, and 50 mM
NaCl, and then ligated in a reaction containing 400 U T4 ligase (NEB) at 25°C for 1 hr and
purified by 1.8X AMPure XP beads.
DNA Sequence (5' to 3')
34 b mC and th CCCGA“‘CGCATGATCTGTACTTGATCGACthGTGCAAC
TruSeq Universal AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTA
Ada ter CACGACGCTCTTCCGATCT
TruSeq Adapter /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCACG
Index 6 TCTCGTATGCCGTCTTCTGCTTG
Uracil linker GAUCGTTGCACGGUCGATCAAGUACAGATCAT
GCGUCGGGAGAUCGGAAG
The Uracil linker was removed by USER enzyme after ligation reaction resulting
in a final product sequence (5’ to 3 ’):
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA
TCTCCCGAmCGCATGATCTGTACTTGATCGACthGTGCAACGATCGGAAGAGCA
CACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG. PCR
primers for amplification of the model DNA were: P5: 5'-
AATGATACGGCGACCACCGAG-3' and P7: 5'-CAAGCAGAAGACGGCATACGAG-3 '.
Model DNA for rase test and Sanger sequencing. Model DNA for
polymerase test and Sanger sequencing was prepared with the same ligation method above
except different DNA oligos were used:
Seuence 5'to 3'
AGCAGTCTmCGATCAGCTGmCTACTGTAmCGTAGCAT
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC
Adanter ACGACGCTCTTCCGATCT
TruSeq Adapter /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCACGC
(Index 6) CAATATCTCGTATGCCGTCTTCTGCTTG
/5Phos/AGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTG
GCCTT
Insert_2_60_bp /5Phos/CTATAGCCGGCTTGCTCTCTCTGCCTCTAGCAGCTG
CTCCCTATAGTGAGTCGTATTAAC
40_bp-Linker-1 ATCTAGAACTTAGCGCACCTAGATCGGAAGAGCGTCGTG
_CTGATCAAGACTGCTAAGGCCACAACCAGTTGGCG80_bp-Linker: AGAGAGCAAGCCGGCTATAGATGCTACGTACAGTAGCAG
AGACGTGTGCTCTTCCGATCGTTAATACGACTCACTATAG
The final t sequence (5' to 3') was:
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA
TCTAGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTGGCCTTAGCAGTCTmCGA
TCAGCTG“'CTACTGTAmCGTAGCATCTATAGCCGGCTTGCTCTCTCTGCCTCTAGC
AGCTGCTCCCTATAGTGAGTCGTATTAACGATCGGAAGAGCACACGTCTGAACTC
CAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG. PCR primers to amplify the
model DNA are the P5 and P7 primers provided above. -labelled primer ce for
primer extension is biotin linked to the 5' end of the P7 primer. PCR primers for RT-PCR
after T7 RNA polymerase transcription were the P5 primer and RT: 5'-
TGCTAGAGGCAGAGAGAGCAAG-3 '.
Model DNA for PCR bias test. Model DNA for PCR bias test was prepared with
the same on method above except different DNA oligos were used:
DNA Se u uence 5' to 3
17 b o X AGCAGTCTXGATCAGCT X= DHU or U or T or C
17 bp No GCTACTGTACGTAGCAT
Modification
TruSeq Universal AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC
Ada . ter ACGACGCTCTTCCGATCT
TruSeq Adapter /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCACGC
Index 6 CAATATCTCGTATGCCGTCTTCTGCTTG
Insert_ 1_40_bp /AGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTG
GCCTT
Insert_2_60_bp /5Phos/CTATAGCCGGCTTGCTCTCTCTGCCTCTAGCAGCTG
CTCCCTATAGTGAGTCGTATTAAC
40_bp-Linker- l ATCTAGAACTTAGCGCACCTAGATCGGAAGAGCGTCGTG
Linker AGAGAGCAAGCCGGCTATAGATGCTACGTACAGTAGCAG
CTGATCAAGACTGCTAAGGCCACAACCAGTTGGCG
42_bp-Linker-2 AGACGTGTGCTCTTCCGATCGTTAATACGACTCACTATAG
Final product sequence (5' to 3'):
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA
TCTAGGTGCGCTAAGTTCTAGATCGCCAACTGGTTGTGGCCTTAGCAGTCTXGAT
CAGCTGCTACTGTACGTAGCATCTATAGCCGGCTTGCTCTCTCTGCCTCTAGCAGC
TGCTCCCTATAGTGAGTCGTATTAACGATCGGAAGAGCACACGTCTGAACTCCAG
TCACGCCAATATCTCGTATGCCGTCTTCTGCTTG, where X= DHU or U or T or C.
PCR primer to amplify the model DNA are the P5 and P7 primers provided above.
Preparation ofmethylated bacteriophage lambda genomic DNA
1 pg of unmethylated bacteriophage lambda DNA (Promega) was methylated in 50
“L reaction containing 0.64 mM SAM and 0.8 U/ul M.SssI enzyme in Mg2+-fiee buffer (10
mM Tris-Cl pH 8.0, 50 mM NaCl, and 10 mM EDTA) for 2 hours at 37°C. Then, 0.5 “L of
M.SssI enzyme and 1 “L of SAM were added and the reaction was incubated for additional 2
hours at 37°C. Methylated DNA was subsequently d on 1X Ampure XP beads. To
assure complete methylation, the whole procedure was repeated in NEB buffer 2. DNA
methylation was then validated with HpaII digestion assay. 50 ng of methylated and
unmethylated DNA were digested in 10 “L reaction with 2 U of HpaII enzyme (NEB) in
CutSmart buffer (NEB) for l h at 37°C. ion ts were run on 1% agarose gel
together with undigested lambda DNA control. Unmethylated lambda DNA was digested
after the assay whereas methylated lambda DNA ed intact confirming complete and
successful CpG methylation. Sequence of lambda DNA can be found in GenBank - EMBL
ion : I02459.
Preparation of2 kb unmodified spike-in controls
2 kb spike-in controls (2kb-l were PCR amplified from pNIC28-Bsa4
, 2, 3)
plasmid (Addgene, cat. no. 26103) in the reaction ning 1 ng DNA template, 0.5 pM
primers, 1 U Phusion High-Fidelity DNA Polymerase (Thermo Fisher). PCR primer
sequences are listed in Table 2.
Table 2. Sequences of PCR primers for spike-ins.
Primer name Seguence 15' to 3'!
2kb-3_Forward CACAGATGTCTGCCTGTTCA
2kb-3 Reverse AGGGTGGTGAATGTGAAACC
PCR product was d on Zymo-Spin column. 2 kb unmodified l
sequence (5' to 3'):
TGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAGAAGCGTT
AATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGT
CACTGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATG
AAACGAGAGAGGATGCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTA
CTGGAACGTTGTGAGGGTAAACAACTGGCGGTATGGATGCGGCGGGACCAGAGA
AAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATACAGATGTAGGTGTTCCAC
AGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCAGGGCG
CTGACTTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGT
TGTTGCTCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGT
ATCGGTGATTCATTCTGCTAACCAGTAAGGCAACCCCGCCAGCCTAGCCGGGTCC
TCAACGACAGGAGCACGATCATGCGCACCCGTGGGGCCGCCATGCCGGCGATAA
TGGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGTGACGAAGGCTTGAGC
GAGGGCGTGCAAGATTCCGAATACCGCAAGCGACAGGCCGATCATCGTCGCGCT
CCAGCGAAAGCGGTCCTCGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCC
TACGAGTTGCATGATAAAGAAGACAGTCATAAGTGCGGCGACGATAGTCATGCC
CCGCGCCCACCGGAAGGAGCTGACTGGGTTGAAGGCTCTCAAGGGCATCGGTCG
AGATCCCGGTGCCTAATGAGTGAGCTAACTTACATTAATTGCGTTGCGCTCACTG
CCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAAC
GCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTTCACCA
GTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTTGCA
GCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGT
TAACGGCGGGATATAACATGAGCTGTCTTCGGTATCGTCGTATCCCACTACCGAG
ATATCCGCACCAACGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGC
GCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCTCATTCAGCA
TTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCT
ATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGAC
GCGCCGAGACAGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCA
ATGCGACCAGATGCTCCACGCCCAGTCGCGTACCGTCTTCATGGGAGAAAATAAT
ACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATAACGCCGGAACATTAGT
GCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAATGATC
AGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGA
CGCCGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCG
AGATTTAATCGCCGCGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGT
GGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGCCACGCGGTTG
GGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGA
AACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGATAAGAGACACCGGC
ATACTCTGCGACATCGTATAACGTTACTGGTTTCACATTCACCACCCT
Preparation of120mer spike-in controls
120mer spike-in controls were produced by primer extension. Oligo sequences
and primers are listed in the Table 3.
Table 3. Sequences of DNA oligos and primers used for preparation of 120mer control
spike-ins.
Template ce Primer for extension
Sp1ke-1n control
(5, to 3,) (5, to 3,)
ATACTCATCATTAAACTTCGCCCTTACCTAC
CACTTCGTGTATGTAGATAGGTAGTATACA
120mer_1 ’éfigggéggflgéfic
ATCGAAATGAGTACGTAGATAGTA
CACTTCG
GAAAGTAAGATGGAGGTGAGAGTGAGAGT
TGATACTGGTCCCGAGSthCTGA
GATGGAGATTCATTC
AGTTAGGCCSthGGGATGACTGASthAG
TCTCGCCA
120mer-2 TCTTCCGAGACCGACGACACAGGTCTCCCT
TAATACGACTCACTA
AGTCGTATTATGGCGAGAGAATGA
ATCTCCATC
Briefly, for 120mer-l spike-in, 3 HM oligo was annealed with 10 HM primer in the annealing
buffer containing 5 mM Tris-Cl (pH 7.5), 5 mM MgC12, and 50 mM NaCl. For 120mer-2
spike-in, 5 uM oligo was annealed with 7.5 uM . Primer extension was performed in
the NEB buffer 2 with 0.4 uM dNTPs (120mer-l: dATP/dGTP/dTTP/dthTP, -2:
dATP/dGTP/dTTP/dCTP) and 5 U ofKlenow Polymerase (New England Biolabs) for 1 hour
at 37°C. After reaction spike-in controls were d on Zymo-Spin columns (Zymo
Research). The 120mer spike-in controls were then methylated in 50 “L reaction containing
0.64 mM SAM and 0.8 U/ul M.SssI enzyme in NEB buffer 2 for 2 hours at 37°C and purified
with Zymo-Spin columns. All spike-in sequences used can be aded from
https://f1gshare.com/s/80c3ab7 l 3 c26 l262494b.
tion of Synthetic spike-in with NSmCNN and NSthNN
Synthetic oligo with N5mCNN and N5thNN ces was produced by
annealing and extension method. Oligo sequences are listed in Table 4, below.
Table 4.
Template sequence (5' to 3')
GCAGAAGACAGGAAGGATGAAACACTCAGGCG
N5mCNN CACGCTGGCATN“‘CNNGACAAACCACAAGAACAGGCTAG
TGAGAATGAAGGGA
CCAACTCTGAAACCCACCAACGCCAACATCCACCACACA
N5thNN ACCCAAGATthCNNGACCATCTTACAAACATATCCCTTC
ATTCTCACTAGCC
Briefly, 10 HM N5mCNN and N5thNN oligos (IDT) were annealed together in
the annealing buffer containing 5 mM Tris-Cl (pH 7.5), 5 mM MgC12, and 50 mM NaCl.
Extension was performed in the NEB buffer 2 with 0.4 mM dNTPs
(dATP/dGTP/dTTP/dCTP) and 5 U of Klenow Polymerase (NEB) for 1 hour at 37°C. After
on, spike-in control was purified on Zymo-Spin column (Zymo Research). Synthetic
in with N5mCNN and N5thNN (5 ’ to 3 ’):
GAAGATGCAGAAGACAGGAAGGATGAAACACTCAGGCGCACGCTGGCATNmCN
NGACAAACCACAAGAACAGGCTAGTGAGAATGAAGGGATATGTTTGTAAGATGG
TCNNGNATCTTGGGTTGTGTGGTGGATGTTGGCGTTGGTGGGTTTCAGAGTTGG.
Complementary strand (5’ to 3’):
CCAACTCTGAAACCCACCAACGCCAACATCCACCACACAACCCAAGATthCNN
CTTACAAACATATCCCTTCATTCTCACTAGCCTGTTCTTGTGGTTTGTCN
NGNATGCCAGCGTGCGCCTGAGTGTTTCATCCTTCCTGTCTTCTGCATCTTC.
DNA digestion and HPLC-MS/MS analysis
DNA samples were digested with 2 U ofNuclease Pl (Sigma-Aldrich) and 10 nM
deaminase tor erythroAmino-B-hexyl—a—methyl-9H—purineethanol hydrochloride
(Sigma-Aldrich). After ght incubation at 37°C, the samples were further treated with 6
U of alkaline phosphatase (Sigma-Aldrich) and 0.5 U phodiesterase I (Sigma-Aldrich)
for 3 hours at 37°C. The digested DNA solution was filtered with Amicon Ultra-0.5 mL 10
K centrifugal filters (Merck Millipore) to remove the proteins, and subjected to HPLC-
MS/MS analysis.
The HPLC-MS/MS analysis was carried out with 1290 Infinity LC Systems
(Agilent) coupled with a 6495B Triple Quadrupole Mass ometer (Agilent). A
ZORBAX Eclipse Plus C18 column (2.1 X 150 mm, 1.8-Micron, Agilent) was used. The
column temperature was maintained at 40°C, and the solvent system was water containing 10
mM ammonium acetate (pH 6.0, solvent A) and acetonitrile (60/40, v/v, t B)
with 0.4 mL/min flow rate. The gradient was: 0-5 min; 0 solvent B; 5-8 min; 0-5 .63 %
solvent B; 8-9 min; 5.63 % solvent B; 9-16 min; 5.63-13.66% solvent B; 16-17 min; 13.66-
100% solvent B; 17-21 min; 100% solvent B; 21-24.3 min; 100-0% solvent B; 24.3-25 min;
0% solvent B. The dynamic multiple reaction monitoring mode (dMRM) of the MS was used
for quantification. The source-dependent parameters were as follows: gas temperature
230°C, gas flow 14 L/min, nebulizer 40 psi, sheath gas temperature 400°C, sheath gas flow
11 L/min, capillary voltage 1500 V in the positive ion mode, nozzle voltage 0 V, high
pressure RF 110 V and low pressure RF 80 V, both in the positive ion mode. The fiagmentor
voltage was 380 V for all compounds, while other compound-dependent parameters were as
summarized in Table 5.
Table 5. Compound-dependent HPLC-MS/MS parameters used for nucleosides
quantification. RT: retention time, CE: ion energy; CAE: cell accelerator e. All
the nucleo sides were analyzed in the positive mode.
Precursor Ion t Ion Delta CE CAE
Compound RT (mm)
(m/z) (m/z) RT(min) (V) (V)
dA+H 252 136 13.78 2 10 4
dT+H 243 127 11.07 2 10 4
dT+Na 265 149 11.07 2 10 4
dG+H 268 1 52 9.64 2 1 0 4
dC+H 228 112 3.71 1.5 10 4
dC+Na 250 134 3.71 1.5 10 4
de+H 242 126 9.05 1.5 10 4
de+Na 264 148 9.05 1.5 10 4
hde+H 258 142 4.34 2 12 4
hde+Na 280 164 4.34 2 12 4
de+H 256 140 10.69 2 8 4
de+Na 278 162 10.69 2 8 4
cadC+H 272 156 1.75 3 12 4
cadC+Na 294 178 1.75 3 12 4
DHU+H 231 1 15 3.45 3 10 4
DHU+Na 253 137 3.45 3 10 4
Expression andpurification ofNgTET1
pRSET-A plasmid encoding His-tagged NgTETl n (GG739552.1) was
designed and purchased from Invitrogen. n was expressed in E. coli BL21 (DE3)
bacteria and purified as previously described with some modifications (J. E. Pais et al.,
Biochemical characterization of a Naegleria ke oxygenase and its application in single
molecule sequencing of 5-methylcytosine. Proc. Natl. Acad. Sci. USA. 112, 4316-4321
(2015), incorporated herein by nce). Briefly, for protein expression bacteria fiom
overnight small-scale culture were grown in LB medium at 37°C and 200 rpm until OD600
was between 0.7-0.8. Then cultures were cooled down to room temperature and target
protein expression was induced with 0.2 mM isopropyl-B-d-l-thiogalactopyranoside (IPTG).
Cells were maintained for additional 18 hours at 18°C and 180 rpm. Subsequently, cells were
harvested and re-suspended in the buffer containing 20 mM HEPES (pH 7.5), 500 mM NaCl,
1 mM DTT, 20 mM imidazole, 1 ug/mL tin, 1 ug/mL pepstatin A and 1 mM PMSF.
Cells were broken with EmulsiFlex-C5 high-pressure homogenizer, and lysate was clarified
by centrifugation for 1 hour at 30,000 x g and 4°C. ted supernatant was loaded on Ni-
NTA resins and NgTETl protein was eluted with buffer containing 20 mM HEPES (pH 7.5),
500 mM imidazole, 2 M NaCl, 1 mM DTT. ted fractions were then purified on
HiLoad 16/60 de 75 (20 mM HEPES pH 7.5, 2 M NaCl, 1 mM DTT). Fractions containing
NgTETl were then collected, buffer exchanged to the buffer containing 20 mM HEPES (pH
7.0), 10 mM NaCl, 1 mM DTT, and loaded on HiTrap HP SP column. Pure protein was
eluted with the salt gradient, collected and buffer-exchanged to the final buffer containing 20
mM Tris-Cl (pH 8.0), 150 mM NaCl and 1 mM DTT. Protein was then concentrated up to
130 uM, mixed with glycerol (30% v/v) and ts were stored at -80°C.
Expression ification ofmTET1CD
mTETlCD catalytic domain (NM_001253857.2, 4371-6392) with N-terminal
Flag-tag was cloned into pcDNA3 -Flag between KpnI and BamHl restriction sites. For
protein expression, 1 mg plasmid was transfected into 1 L of Expi293F (Gibco) cell culture at
density 1 X 106 cells/mL and cells were grown for 48 h at 37°C, 170 rpm and 5% C02.
Subsequently, cells were harvested by centrifugation, re-suspended in the lysis buffer
containing 50 mM Tris-Cl pH = 7.5, 500 mM NaCl, 1X cOmplete Protease Inhibitor Cocktail
(Sigma), 1 mM PMSF, 1% Triton X-100 and ted on ice for 20 min. Cell lysate was
then clarified by centrifugation for 30 min at 30000 X g and 4°C. Collected supernatant was
purified on ANTI-FLAG M2 Affmity Gel ) and pure protein was eluted with buffer
containing 20 mM HEPES pH = 8.0, 150 mM NaCl, 0.1 mg/mL 3X Flag peptide (Sigma), 1X
cOmplete Protease Inhibitor Cocktail (Sigma), 1 mM PMSF. Collected ns were
concentrated and buffer-exchanged to the final buffer containing 20 mM HEPES pH = 8.0,
150 mM NaCl and 1 mM DTT. Concentrated protein was mixed with glycerol (30% v/v),
frozen in liquid nitrogen and aliquots were stored at -80°C. Activity and quality of
recombinant mTETl CD was checked by MALDI Mass ometry analysis. Based on this
assay, recombinant mTETl CD is fully active and able to catalyze oxidation of 5mC to 5caC.
Any significant digestion of tested model oligo was detected by MALDI ing that
protein is fiee fiom nucleases.
TET Oxidation
NgTETl Oxidation. For Tet oxidation of the 222 bp model DNA oligos, 100 ng
of 222 bp DNA was incubated in 20 pl solution containing 50 mM MOPS buffer (pH 6.9),
100 mM um iron (11) sulfate, 1 mM a-ketoglutarate, 2 mM ascorbic acid, 1 mM
dithiothreitol (DTT), 50 mM NaCl, and 5 pM NgTET for 1 hr at 37 °C. After that, 0.4 U of
Proteinase K (New England Biolab s) was added to the reaction mixture and incubated for 30
min at 37°C. The product was purified by pin column (Zymo Research) following
manufacturer’ s instruction.
For NgTETl oxidation of genomic DNA, 500 ng of genomic DNA were incubated
in 50 pl solution ning 50 mM MOPS buffer (pH 6.9), 100 mM ammonium iron (11)
e, 1 mM a-ketoglutarate, 2 mM ic acid, 1 mM dithiothreitol, 50 mM NaCl, and 5
pM NgTETl for 1 hour at 37°C. After that, 4 U ofProteinase K (New England Biolab s)
were added to the reaction mixture and incubated for 30 min at 37°C. The product was
cleaned-up on 1.8X Ampure beads following the manufacturer’s instruction.
mTETl Oxidation. 100 ng of genomic DNA was incubated in 50 pl reaction
containing 50 mM HEPES buffer (pH 8.0), 100 pM ammonium iron (11) sulfate, 1 mM a-
ketoglutarate, 2 mM ascorbic acid, 1 mM dithiothreitol, 100 mM NaCl, 1.2 mM ATP and 4
pM mTETlCD for 80 min at 37°C. After that, 0.8 U of nase K (New England Biolabs)
were added to the reaction mixture and incubated for 1 hour at 50°C. The product was
cleaned-up on Bio-Spin P-30 Gel Column (Bio-Rad) and 1.8X Ampure XP beads following
the manufacturer’s instruction.
Borane Reduction
Pic-BH3 reduction 25 pL of 5 M a—picoline-borane (pic-BH3, Sigma-Aldrich) in
MeOH and 5 pL of 3 M sodium acetate solution (pH 5.2, Thermo Fisher) was added into 20
pL DNA sample and incubated at 60°C for l h. The product was purified by Zymo-Spin
column (Zymo ch) following manufacturer’s instructions for the 222 bp or by Micro
Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s instruction for the oligos.
Alternatively, 100 mg of2-picoline-borane (pic-borane, Aldrich) was
dissolved in 187 pL ofDMSO to give around 3.26 M solution. For each on, 25 pL of
pic-borane solution and 5 pL of 3 M sodium acetate solution (pH 5.2, Thermo ) were
added into 20 pL ofDNA sample and incubated for 3 hours at 70°C. The product was
d by Zymo-Spin column for genomic DNA or by Micro Bio-Spin 6 Columns (Bio-
Rad) for DNA oligos following the manufacturer’s instructions.
Pyridine borane reduction. 50-100 ng of oxidised DNA in 35 [1L of water were
reduced in 50 [1L reaction containing 600 mM sodium acetate solution (pH = 4.3) and l M
pyridine borane for 16 hours at 37°C and 850 rpm in orf ThermoMixer. The product
was purified by Zymo-Spin column.
Single nucleoside pic-borane reaction. 500 [1L of 3.26 M 2-picoline-borane (pic-
borane, Sigma-Aldrich) in MeOH and 500 [LL of 3 M sodium acetate solution (pH 5.2,
Thermo ) were added into 10 mg of2’-deoxycytidinecarboxylic acid sodium salt
(Berry&Associates). The mixture was stirred for 1 hour at 60°C. The product was purified
by HPLC to give pure compound as white foam. High resolution MS (Q-TOF) m/z [M +
Na]+ ated for 205Na: 253 .0800; found: 253.0789.
5thblocking
5th blocking was performed in 20 pl solution containing 50 mM HEPES buffer
(pH 8), 25 mM MgC12, 200 “M uridine diphosphogluco se (UDP-Glc, New England Biolabs),
and 10 U BGT (Thermo Fisher), and 10 [1M 5th DNA oligo for 1 hr at 37 °C. The product
was purified by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s ction.
5fCblocking
5fC ng was performed in 100 mM MES buffer (pH 5.0), 10 mM 0-
ethylhydroxylamine (Sigma- Aldrich), and 10 [1M 5fC DNA oligo for 2 hours at 37 °C. The
t was purified by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s
instruction.
5caC blocking
5caC blocking was performed in 75 mM MES buffer (pH 5.0), 20 mM N-
ysuccinimide (NHS, Sigma-Aldrich), 20 mM 1-(3-dimethylaminopropyl)
ethylcarbodiimide hydrochloride (EDC, Fluorochem), and 10 [1M 5caC DNA oligo at 37 °C
for 0.5 h. The buffer was then exchanged to 100 mM sodium phosphate (pH 7.5), 150 mM
NaCl using Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s instructions. 10
mM ethylamine -Aldrich) was added to the oligo and incubated for 1 hour at 37°C.
The product was purified by Micro Bio-Spin 6 Columns (Bio-Rad) following manufacturer’s
instructions.
5th oxidation
46 [IL of5th DNA oligo was denatured with 2.5 [LL of l M NaOH for 30 min at
37°C in a g incubator, then oxidized with 1.5 [LL of solution containing 50 mM NaOH
and 15 mM potassium perruthenate , Sigma-Aldrich) for 1 hour on ice. The product
was purified by Micro Bio-Spin 6 Columns following manufacturer’s instructions.
Validation ofTAPS conversion with Taan assay
5mC conversion after TAPS was tested by PCR amplification of a target region
which contains Taan restriction site (TCGA) and subsequent Taan digestion. For example,
5mC conversion in our TAPS libraries can be tested based on 194 bp amplicon ning
single Taan ction site that is ed fiom CpG methylated lambda DNA spike-in
control. PCR product amplified from the 194 bp amplicon is digested with Taan restriction
enzyme and digestion product is checked on 2% agaro se gel. PCR product amplified on
unconverted control DNA is digested by Taan and shows two bands on the gel. In TAPS-
converted sample restriction site is lost due to C-to-T transition, so the 194 bp amplicon
would remain intact. Overall conversion level can be assessed based on digested and
undigested gel bands quantification and for successful TAPS samples should be higher than
Briefly, the converted DNA sample was PCR amplified by Taq DNA Polymerase
(New England Biolab s) with corresponding primers. The PCR product was incubated with 4
units of Taan restriction enzyme (New England Biolabs) in 1X CutSmart buffer (New
d Biolabs) for 30 min at 65°C and checked by 2% agarose gel electrophoresis.
Quantitativepolymerase chain reaction (qPCR)
For comparison of amplification curves and melting curves between model DNAs
before and after TAPS (Fig. 11), 1 ng ofDNA sample was added into 19 [LL ofPCR master
mix containing 1X LightCycler 480 High Resolution Melting Master Mix (Roche Diagnostics
Corporation), 250 nM ofprimers FW-CCTGATGAAACAAGCATGTC and RV-
CATTACTCACTTCCCCACTT and 3 mM ofMgSO4. For PCR cation, an initial
denaturation step was med for 10 min at 95 °C, followed by 40 cycles of 5 sec
ration at 95°C, 5 sec annealing at customized annealing temperature and 5 sec
elongation at 72°C. The final step included 1 min at 95°C, 1 min at 70°C and a melting curve
C step increments, 5 sec hold before each acquisition) from 65°C to 95°C.
For other assays, qPCR was med by adding the required amount ofDNA
sample into 19 [IL ofPCR master mix containing 1X Fast SYBR Green Master Mix (Thermo
Fisher), 200 nM of forward and reverse primers. For PCR amplification, an initial
denaturation step was performed for 20 sec at 95 °C, followed by 40 cycles of 3 s denaturation
at 95°C, 20 s annealing and elongation at 60°C.
Validation ofCmCGG methylation level in NA with HpaII-qPCR
assay.
1 ug mESC gDNA was incubated with 50 units ofHpaII (NEB, 50 units/uL) and
1X CutSmart buffer in 50 [LL reaction for 16 hours at 37°C. No HpaII was added for control
reaction. 1 [LL Proteinase K was added to the reaction and incubated at 40°C for 30 minutes
ed by inactivation of Proteinase K for 10 minutes at 95°C. Ct value of HpaII digested
sample or control sample was measured by qPCR assay as above with corresponding primer
sets for specific CCGG ons (listed in Table 9).
Sanger sequencing
The PCR product was purified by Exonuclease I and Shrimp Alkaline Phosphatase
(New England Biolab s) or Zymo-Spin column and processed for Sanger sequencing.
DNA damage test onfragments with diferent length.
mESC genomic DNA was spiked-in with 0.5% ofCpG methylated lambda DNA
and left mented or sonicated with Covaris M220 instrument and size-selected to 500-1
kb or 1 kb—3 kb on Ampure XP beads. 200 ng ofDNA were single-oxidised with mTETlCD
and reduced with Pyridine borane complex as described above or converted with EpiTect
Bisulfite Kit (Qiagen) according to manufacturer’s protocol. 10 ng ofDNA before and after
TAPS and Bisulfite conversion were run on 1% e gel. To visualize bisulfite converted
gel was cooled down for 10 min samples in ice bath. 5mC conversion in TAPS samples was
tested by Taan ion assay as bed above.
mESCs culture and isolation ofgenomic DNA
Mouse ESCs (mESCs) E14 were cultured on gelatin-coated plates in Dulbecco’s
Modified Eagle Medium (DMEM) (Invitrogen) supplemented with 15% FBS (Gibco), 2 mM
L-glutamine (Gibco), 1% non-essential amino acids (Gibco), 1% llin/streptavidin
(Gibco), 0.1 mM aptoethanol (Sigma), 1000 units/mL LIF (Millipore), 1 uM
PD0325901 (Stemgent), and 3 uM CHIR99021 (Stemgent). Cultures were maintained at
37°C and 5% C02 and passaged every 2 days.
For isolation of genomic DNA, cells were harvested by centrifugation for 5 min at
1000 X g and room temperature. DNA was extracted with Quick-DNA Plus kit (Zyrno
Research) ing to manufacturer’s protocol.
Preparation ofmESCgDNAfor TAPS and WGBS.
For whole-genome bisulfite sequencing (WGBS), mESC gDNA was spiked-in
with 0.5% ofunmethylated lambda DNA. For whole-genome TAPS, mESC gDNA was
-in with 0.5% of methylated lambda DNA and 0.025% ofunmodified 2 kb spike-in
control. DNA samples were fragmented by Covaris M220 instrument and size-selected to
200-400 bp on Ampure XP beads. DNA for TAPS was additionally spiked-in with 0.25% of
N5mCNN and N5thNN control oligo after election with Ampure XP beads.
Whole Genome Bisulfite Sequencing
For Whole Genome Bisulfite Sequencing , 200 ng of fragmented mESC
gDNA spiked-in with 0.5% ofunmethylated bacteriophage lambda DNA was used. End-
ed and A-tailing reaction and ligation of methylated adapter (NextFlex) were prepared
with KAPA HyperPlus kit (Kapa Biosystems) ing to cturer’s protocol.
Subsequently, DNA ent bisulfite conversion with EpiTect Bisulfite Kit (Qiagen)
according to Illumina’s protocol. Final library was amplified with KAPA Hifi Uracil Plus
Polymerase (Kapa Biosystems) for 6 cycles and cleaned-up on 1X Ampure beads. WGBS
sequencing library was -end 80 bp sequenced on a NextSeq 500 sequencer (Illumina)
using a NextSeq High Output kit with 15% PhiX control library spike-in.
Whole-genome TAPS
For whole genome TAPS, 100 ng of fragmented mESC gDNA -in with
0.5% of methylated lambda DNA and 0.025% dified 2 kb spike-in control were used.
End-repair and A-tailing reaction and ligation of Illumina Multiplexing adapters were
prepared with KAPA lus kit according to manufacturer’s protocol. Ligated DNA was
oxidized with mTETl CD twice and then reduced with pyridine borane according to the
protocols described above. Final sequencing library was amplified with KAPA Hifi Uracil
Plus Polymerase for 5 cycles and cleaned-up on 1X Ampure beads. Whole-genome TAPS
sequencing library was paired-end 80 bp sequenced on a NextSeq 500 sequencer (Illumina)
using one NextSeq High Output kit with 1% PhiX control library in.
Low-input whole-genome TAPS with dsDNA librarypreparation kits
mESC gDNA prepared as described above for genome TAPS was used for
low-input whole-genome TAPS. Briefly, samples containing 100 ng, 10 ng, and 1 ng of
mESC gDNA were oxidized with NgTETl once according to the protocol described above.
End-repaired and A-tailing reaction and ligation were performed with NEBNext Ultra II
(New England Biolab s) or KAPA HyperPlus kit according to manufacturer’s ol.
Subsequently DNA underwent pic-borane reaction as described above. Converted libraries
were amplified with KAPA Hifi Uracil Plus Polymerase and cleaned-up on 1X Ampure
beads.
Low-input whole-genome TAPS with ssDNA librarypreparation kit
mESC gDNA prepared as described above for whole-genome TAPS was used for
low-input whole-genome TAPS. Briefly, samples containing 100 ng, 10 ng, 1 ng, 100 pg,
and 10 pg ofmESC gDNA were oxidized with NgTETl once and reduced with pic-borane as
described above. Sequencing libraries were prepared with Accel-NGS Methyl-Seq DNA
Library Kit (Swift Biosciences) according to manufacturer’s protocol. Final libraries were
amplified with KAPA Hifi Uracil Plus Polymerase for 6 cycles (100 ng), 9 cycles (10 ng), 13
cycles (1 ng), 16 cycles (100 pg), and 21 cycles (10 pg) and cleaned-up on 0.85X Ampure
beads.
In other experiments, mESC gDNA prepared as described above for whole-
genome TAPS were used for low-input whole-genome TAPS. Briefly, samples containing
100 ng, 10 ng, and 1 ng ofmESC gDNA were used for End-repaired and A-tailing reaction
and ligated to na Multiplexing adaptors with KAPA HyperPlus kit according to
manufacturer’s ol. Ligated s were then oxidized with mTETl CD once and then
reduced with ne borane according to the protocols described above. ted libraries
were amplified with KAPA Hifi Uracil Plus Polymerase for 5 cycles (100 ng), 8 cycles (10
ng), and 13 cycles (1 ng) and cleaned-up on 1X Ampure XP beads.
Cell-free DNA TAPS
Cell-free DNA TAPS samples were prepared fiom 10 ng and 1 ng of cell-free
DNA sample. Briefly, s were oxidized with NgTETl once and reduced with pic-
borane as described above. Sequencing libraries were prepared with Accel-NGS Methyl-Seq
DNA Library Kit (Swift Biosciences) according to manufacturer’s protocol. Final libraries
were amplified with KAPA Hifi Uracil Plus Polymerase for 9 cycles (10 ng) and 13 cycles (1
ng) and cleaned-up on 0.85X Ampure beads.
In other experiments, cell-flee DNA TAPS samples were prepared from 10 ng and
1 ng of cell-free DNA sample as described above for whole-genome TAPS. Briefly, cell-free
DNA samples were used for End-repaired and ing reaction and ligated to na
Multiplexing rs with KAPA HyperPlus kit according to manufacturer’s protocol.
Ligated samples were then ed with mTETl CD once and then reduced with pyridine
borane according to the protocols described above. Converted libraries were amplified with
KAPA Hifi Uracil Plus rase for 7 cycles (10 ng), and 13 cycles (1 ng) and cleaned-up
on 1X Ampure XP beads.
WGBS data processing
Paired-end reads were ad as FASTQ from Illumina BaseSpace and
subsequently quality-trimmed with Trim Galore! v0.4.4
(https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Read pairs where at
least one read was shorter than 35 bp after trimming were removed. Trimmed reads were
mapped to a genome combining the mm9 version of the mouse genome, lambda phage and
PhiX (sequence fiom Illumina iGENOMES) using Bismark v0.19 using --no_overlap option
(F. Krueger, S. R. Andrews, Bismark: a flexible aligner and methylation caller for Bisulf1te-
Seq applications. Bioinformatics 27, 1571-1572 (201 1 ), incorporated herein by reference).
The ‘three-C’ filter was used to remove reads with excessive non-conversion rates. PCR
duplicates were called using Picard v1.119 (http://broadinstitute.github.io/picard/)
MarkDuplicates. Regions known to be prone to mapping artefacts were aded
(https://sites.google.com/site/anshulkundaje/projects/blacklists) and excluded from further
is (E. P. Consortium, An integrated encyclopedia ofDNA elements in the human
genome. Nature 489, 57-74 (2012), incorporated herein by reference).
TAPS datapre-processing
Paired-end reads were downloaded fiom Illumina BaseSpace and subsequently
quality-trimmed with Trim Galore! v0.4.4. Read pairs where at least one read was shorter
than 35 bp after trimming were d. Trimmed reads were mapped to a genome
combining spike-in sequences, lambda phage and the mm9 version of the mouse genome
using BWA mem 15 (H. Li, R. , Fast and accurate short read alignment with
Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009), incorporated herein by
reference) with default parameters. Regions known to be prone to g artefacts were
downloaded (https://sites.google.com/site/anshulkundaje/projects/blacklists) and excluded
from further analysis (E. P. Consortium, Nature 489, 57-74 (2012)).
Detection of converted bases in TAPS
Aligned reads were split into al top (OT) and original bottom (OB) strands
using a custom python3 script (MF-filter.py). PCR ates were then removed with Picard
MarkDuplicates on OT and OB separately. Overlapping segments in read pairs were
removed using BamUtil clivaerlap (https://github.com/statgen/bamUtil) on the
icated, mapped OT and OB reads separately. Modified bases were then detected using
samtools mpileup and a custom python3 script (MF-caller_MOD.py).
Sequencing quality analysis ofTAPS and WGBS
Quality score statistics per nucleotide type were extracted from original FASTQ
files as downloaded from Illumina BaseSpace with a python3 script (MF-phredder.py).
Coverage is ofTAPS and WGBS
Per-base genome coverage files were generated with Bedtools v2.25 genomecov
(A. R. n, I. M. Hall, ls: a flexible suite of utilities for comparing genomic
es. Bioinformatics 26, 841-842 (2010), orated herein by reference). To compare
the relative coverage distributions between TAPS and WGBS, TAPS reads were subsampled
to the corresponding coverage median in WGBS using the —s option of samtools view. In the
analyses comparing coverage in WGBS and pled TAPS, clivaerlap was used on
both TAPS and WGBS bam files.
Analysis sine modifications measured by TAPS and WGBS
The fraction of modified reads per base was calculated fiom Bismark output, and
the output ofMF-caller_MOD.py, respectively. Intersections were performed using Bedtools
intersect, and statistical analyses and figures were generated in R and Matlab. Genomic
regions were visualized using IGV v2.4.6 (J. T. Robinson et al., Integrative genomics viewer.
Nat. Biotechnol. 29, 24-26 (2011), incorporated herein by reference). To plot the coverage
and modification levels around CGIs, all CGI coordinates for mm9 were downloaded from
the UCSC genome browser, binned into 20 s, and extended by up to 50 windows of
size 80 bp on both sides (as long as they did not reach half the ce to the next CGI).
Average modification levels (in Cst) and coverage (in all bases, both strands) in each bin
were computed using Bedtools map. The values for each bin were again averaged and
subsequently plotted in Matlab.
Dataprocessing time simulation
Synthetic pair-end sequencing reads were simulated using ART42 based on the
lambda phage genome (with parameters -p -ss NSSO --errfree --minQ 15 -k 0 -nf 0 -l 75 -c
0 -m 240 -s 0 -ir 0 -ir2 0 -dr 0 -dr2 0 -sam -rs 10). 50% of all CpG positions were
subsequently marked as modified and two libraries were ed, either as TAPS (convert
modified bases) or as WGBS (convert unmodified bases), using a custom python3 script. The
reads were then processed ing the pipeline used for each of the s in the paper.
Processing time was measured with Linux d time. All steps of the analysis were
performed in single-threaded mode on one Intel Xeon CPU with 250GB of memory.
Results and Discussion
It was discovered that 3 can readily t SfC and ScaC to DHU by a
previously unknown reductive decarboxylation/deamination reaction (Fig. 4). The reaction
was shown to be quantitative both in single nucleoside and in oligonucleotides using MALDI
(Figs. 2-3, and 6-7).
An llmer ScaC-containing DNA oligo was used as a model to screen chemicals
that could react with ScaC, as monitored by matrix-assisted laser desorption/ionization mass
spectroscopy (MALDI). Certain borane-containing compounds were found to efficiently
react with the ScaC oligo, ing in a molecular weight reduction of41 Da (Figs. 1 and 2).
Pyridine borane and its derivative 2-picoline borane (pic-borane) were selected for filrther
study as they are commercially available and nmentally benign reducing agents.
The reaction on a single 5caC nucleoside was repeated and confirmed that pyridine
borane and pic-borane convert 5caC to dihydrouracil (DHU) (Figs. 3, 4B). Interestingly,
pyridine borane and pic-borane was found to also convert 5fC to DHU through an apparent
ive oxylation/ ation mechanism (Figs. 4C and 6). The detailed
mechanism ofboth reactions remains to be defined. Quantitative analysis of the borane
reaction on the DNA oligo by HPLC-MS/MS confirms that rane converts 5caC and
5fC to DHU with around 98% efficiency and has no activity against unmethylated cytosine,
5mC or 5th (Fig. 2B).
As a uracil derivative, DHU can be recognized by both DNA and RNA
polymerases as thymine. Therefore, borane ion can be used to induce both 5caC-to-T
and 5fC-to-T transitions, and can be used for base-resolution cing of 5fC and 5caC,
which we termed Pyridine borane Sequencing (“PS”) (Table 6). The borane reduction of 5fC
and 5caC to T can be blocked through hydroxylamine conjugation (C. X. Song et al.,
Genome-wide profiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153,
678-691 (2013), incorporated herein by reference) and EDC coupling (X. Lu et al., Chemical
modification-assisted bisulfite sequencing (CAB-Seq) for 5-carboxylcyto sine ion in
DNA. J. Am. Chem. Soc. 135, 9315-9317 (2013), orated herein by reference),
respectively (Fig. 6). This blocking allows PS to be used to sequence 5fC or 5caC
specifically (Table 6).
Table 6. Comparison ofBS and related methods versus PS for 5fC and 5caC cing.
Base BS fCAB-Seq Seq fC-CET PS PS with 5fC PS with 5caC
/redBS-Seq blockin blockin
C C C C
5mC C C C C C
Sth C C C C C
5fC T C T
5caC T T T C
Furthermore, TET enzymes can be used to oxidize 5mC and Sth to 5caC, and
then subject 5caC to borane reduction in a process herein called TET-Assisted Pyridine
borane Sequencing (“TAPS”) (Fig. 5A—B, Table l). TAPS can induce a C-to-T transition of
5mC and Sth, and therefore can be used for base-resolution detection of5mC and 5th.
In addition, B-glucosyltransferase (BGT) can label 5th with glucose and thereby
protect it from TET oxidation (M. Yu et al., Base-resolution is of 5-
hydroxymethylcytosine in the mammalian genome. Cell 149, 1368-1380 ) and borane
reduction (Fig. 7), ng the selective sequencing of only SmC, in a process referred to
herein as TAPSB (Fig. 5B, Table l). 5th sites can then be deduced by subtraction of
TAPSB from TAPS measurements. Alternatively, ium perruthenate (KRuO4), a
reagent previously used in oxidative bisulfite sequencing (oxBS) (M. J. Booth et al.,
Quantitative Sequencing of 5-Methylcytosine and S-Hydroxymethylcytosine at Single-Base
Resolution. Science 336, 934-937 (2012)), can be used to replace TET as a chemical oxidant
to specifically oxidize Sth to SfC (Fig. 7). This approach, referred to herein as Chemical-
Assisted Pyridine borane Sequencing (“CAPS”), can be used to sequence 5th specifically
(Fig. 5B, Table 1). Therefore, TAPS and related methods can in principle offer a
comprehensive suite to ce all four cytosine epigenetic modifications (Fig. 5B, Table 1,
Table 6).
TAPS alone will detect the existing SfC and ScaC in the genome as well.
However, given the extremely low levels of SfC and ScaC in c DNA under normal
ions, this will be acceptable. If under certain conditions, one would like to eliminate
the SfC and ScaC signals completely, it can also be readily accomplished by protecting the
SfC and ScaC by hydroxylamine conjugation and EDC coupling, respectively, thereby
ting conversion to DHU.
The performance of TAPS was evaluated in comparison with bisulfite sequencing,
the current standard and most widely used method for base-resolution mapping ofSmC and
Sth. Naegleria TET-like oxygenase (NgTETl) and mouse Tetl (mTetl) were used
because both can efficiently oxidize SmC to ScaC in vitro. To confnm the SmC-to-T
transition, TAPS was applied to model DNA containing fully methylated CpG sites and
showed that it can ively convert SmC to T, as demonstrated by restriction enzyme
digestion (Fig. 8A-B) and Sanger cing (Fig. 9A). TAPSB and CAPS were also
validated by Sanger sequencing (Fig. 12).
TAPS was also applied to genomic DNA (gDNA) from mouse embryonic stem
cells (mESCs). S/MS quantification showed that, as expected, SmC accounts for
98.5% of cytosine modifications in the mESCs gDNA; the remainder is ed ofSth
(1.5%) and trace amounts SfC and ScaC, and no DHU (Fig. 9B). After TET oxidation, about
96% of cytosine modifications were oxidized to ScaC and 3% were oxidized to SfC (Fig. 9B).
After borane reduction, over 99% of the cytosine modifications were converted into DHU
(Fig. 9B). These results demonstrate both TET oxidation and borane reduction work
efficiently on genomic DNA.
Both TET oxidation and borane reduction are mild reactions, with no notable DNA
degradation ed to bisulfite (Fig. 10A-D) and thereby provide high DNA recovery.
Another notable advantage over bisulfite sequencing is that TAPS is non-destructive and can
preserve DNA up to 10 kbs long (Fig. 10C). Moreover, DNA remains double stranded after
TAPS (Fig. 10A-C), and the conversion is independent of the DNA length (Fig. 15A-B).
In addition, because DHU is close to a l base, it is compatible with various
DNA polymerases and isothermal DNA or RNA rases (Figs. 13A-B) and does not
show a bias compared to T/C during PCR (Fig. 14).
Whole genome sequencing was performed on two samples ofmESC gDNA, one
converted using TAPS and the other using stande whole-genome bisulfite cing
(WGBS) for comparison.
To assess the accuracy of TAPS, spike-ins of different lengths were added that
were either fully unmodified, in vitro ated using CpG Methyltransferase (M.SssI) or
GpC Methyltransferase (M.CviPI) (using the above methods). For short spike-ins (120mer-1
and 120mer-2) containing 5mC and 5th, near complete conversion was ob served for both
modifications on both strands in both CpG and non-CpG ts (Fig. 17A-B).
100 ng gDNA was used for TAPS, compared to 200 ng gDNA for WGBS. To
assess the accuracy of TAPS, we added three different types of spike-in ls. Lambda
DNA where all Cst were fully methylated was used to estimate the false negative rate (non-
conversion rate of 5mC); a 2 kb unmodified amplicon was used to estimate the false positive
rate (conversion rate ofunmodified C); synthetic oligo ins containing both a methylated
and hydroxymethylated C surrounded by any other base (N5mCNN and N5thNN,
respectively) were used to compare the conversion rate on 5mC and Sth in different
ce contexts. The ation of mTet1 and pyridine borane achieved the highest 5mC
conversion rate (96.5% and 97.3% in lambda and synthetic spike-ins, respectively) and the
lowest conversion rate dified C (0.23%) (Fig. 18A—B and Fig. 16). A false negative
rate between 2.7% and 3.5%, with a false-positive rate of only 0.23%, is comparable to
bisulfite sequencing: a recent study showed 9 commercial bisulfite kits had average false
negative and false positive rates of 1.7% and 0.6%, respectively (Holmes, E.E. et al.
Performance evaluation of kits for bisulfite-conversion ofDNA from tissues, cell lines, FFPE
tissues, aspirates, lavages, effusions, , serum, and urine. PLoS One 9, e93933 (2014)).
The tic spike-ins suggest that TAPS works well on both 5mC and Sth, and that
TAPS performs only slightly worse in non-CpG contexts. The conversion for 5th is 8.2%
lower than 5mC, and the conversion for non-CpG contexts is 1 1.4% lower than for CpG
contexts (Fig. 18A).
WGBS data es special software both for the ent and modification-
calling steps. In contrast, our processing pipeline uses a standard genomic aligner (bwa),
followed by a custom modification-calling tool that we call “asTair”. When sing
simulated WGBS and TAPS reads (derived from the same semi-methylated source sequence),
TAPS/asTair was more than 3x faster than WGBS/Bismark (Fig. 18C).
Due to the conversion of nearly all cytosine to thymine, WGBS libraries feature an
extremely skewed nucleotide composition which can negatively affect Illumina sequencing.
Consequently, WGBS reads showed substantially lower sequencing quality scores at
cytosine/guanine base pairs compared to TAPS (Fig. 18E). To sate for the nucleotide
composition bias, at least 10 to 20% PhiX DNA (a base-balanced control library) is
commonly added to WGBS libraries (see, e.g., Illumina’s Whole-Genome Bisulfite
Sequencing on the HiSeq 3000/HiSeq 4000 Systems). Accordingly, we supplemented the
WGBS library with 15% PhiX. This, in ation with the reduced information content of
BS-converted reads, and DNA ation as a result ofbisulf1te treatment, resulted in
significantly lower mapping rates for WGBS compared to TAPS (Fig. 18D and Table 7).
Table 7. Mapping and cing quality statistics for WGBS and TAPS.
Measure WGBS TAPS
Total raw reads 376062375 455548210
Trimmed reads 367860813 453028186
Mapped reads (mm9+spike-ins+PhiX) 251940139 451077132
PCR deduplicated reads 232303596 851
Mapping rate d reads/trimmed reads) 68.49% 99.57%
Unique mapping rate (unique reads 0 for
TAPS]/trimmed reads) 68.49% 88.08%
Unique PCR deduplicated mapping rate (unique PCR
dedu licated reads [MAPQ>0 for TAPS] /trimmed reads 63.15% 81.31%
Therefore, for the same sequencing cost (one NextSeq High Output run), the
e depth of TAPS exceeded that ofWGBS (21 X and 13.1 X, respectively; Table 8).
Furthermore, TAPS ed in fewer uncovered regions, and overall showed a more even
coverage distribution, even after down-sampling to the same sequencing depth as WGBS
(inter-quartile range: 9 and 11, respectively, Fig. 19A and Table 8).
Table 8. Coverage statistics for TAPS, WGBS and TAPS down-sampled to have
approximately the same mean coverage as WGBS. Here, ge was computed for both
strands at all positions in the genome.
TAPS with down- TAPS without down-
Measure WGBS sampling sampling
Mean 13.078 12.411 21.001
Variance 1988.242 482.242 1371.912
median 13 13 22
qtl25 7 8 15
th75 18 17 28
iqr 1 1 9 13
maximum 1 16084 37329 63 526
For example, CpG Islands (CGIs) in particular were generally better covered by TAPS, even
when controlling for differences in sequencing depth between WGBS and TAPS (Fig. 21A),
while both showed equivalent ylation inside CGIs (Fig. 22). Moreover, WGBS
showed a slight bias of sed modification levels in highly covered CpG sites (Fig. 23 A),
while our results suggest that TAPS exhibits very little of the ation-coverage bias
(Fig. 23B). These results demonstrate that TAPS dramatically improved sequencing quality
compared to WGBS, while effectively halving the sequencing cost.
The higher and more even genome coverage of TAPS resulted in a larger number
of CpG sites covered by at least three reads. With TAPS, 88.3% of all 43,205,316 CpG sites
in the mouse genome were covered at this level, compared to only 77.5% with WGBS (Fig.
21B and 19B). TAPS and WGBS resulted in highly correlated methylation measurements
across chromosomal regions (Fig. 21 D and Fig. 20). On a cleotide basis, 32,755,271
CpG positions were d by at least three reads in both methods (Fig. 21B). Within these
sites, we defined “modified Cst” as all CpG positions with a modification level of at least
% (L. Wen et al., Whole-genome analysis of 5-hydroxymethylcyto sine and 5-
methylcytosine at base tion in the human brain. Genome Biology 15, R49 ).
Using this threshold, 95.8% of Cst showed matching ation states between TAPS
and WGBS. 98.5% of all Cst that were covered by at least three reads and found modified
in WGBS were recalled as modified by TAPS, indicating good agreement between WGBS
and TAPS (Fig. 21C). When comparing ation levels per each CpG covered by at
least three reads in both WGBS and TAPS, good correlation between TAPS and WGBS was
observed (Pearson r = 0.63, p < 2e-16, Fig. 21E). Notably, TAPS identified a subset of
highly modified CpG positions which were missed by WGBS (Fig. 21E, bottom right comer).
We further validated 7 of these Cst, using an orthogonal restriction digestion and real-time
PCR assay, and confirmed all ofthem are fully methylated and/or hydroxymethylated (Table
Table 9. Comparison ofCmCGG methylation level in mESC gDNA quantified by TAPS,
WGBS and HpaII-qPCR assay. ge and methylation level (mC%) by TAPS and WGBS
were computed for per strand. Ct value for HpaII digested sample (CthaII) or control sample
(thm) in the HpaII-qPCR assay was e of triplicates. mC% is calculated using
following equation: mC% = 2"( thm 'CthalI)*100%.
WGBS Hpall—qPCR assay
Position of
CmCGG mC% Forsvgrd and reverse pr1mer
GCTGCAGATTGGAGCC
chr6: AAAG
29.628 29.642 101.0%
135868201 TTGATGGTGATGGTGG
TCAGTGCTCATGGACTC
chr3: ATACT
22.162 22.111 96.5%
31339449 TGGGAGCAAA
GTTGTTG
CCCACTAGACATGCTCT
chr4: GCC
31.304 31.279 98.3%
128271030 CAAAATGTTGCTTGCCT
TCCCTGAGCCCTGATCT
chr1: AGT
22.008 22.026 101.3%
58635199 AATACTGGCTGACCGG
ACACCACAGCAGAAGA
chr14: GAGC
21.228 21.053
36331351 TGTTGCACAG
GCTGAGCTGTATCCTTG
chr19: AGGT
22.515 22.558 103.0%
42893499 GGGTATTCCA
GTGGATCTTCAGTGGTG
chr3: GCA
22.439 22.545 107.6%
113611193 ATGCTCCCTCATCCTTT
Negative CCGG site
AGCCTCTGAACTTGACT
chr19: GCC
27.11 21.409 1.9%
9043049 GCCTGGAACTCCTGAC
Positive CCGG site
GGTCCTTGATCCACCCA
chr15: GAC
106.1%
39335961 ACATGGTGCTGGTCTA
Together, these results indicate that TAPS can directly replace WGBS, and in fact provides a
more comprehensive view of the methylome than WGBS.
y, TAPS was tested with low input DNA and TAPS was shown to work with
as little as 1 ng gDNA and in some instances down to 10 pg of gDNA, close to single-cell
level. TAPS also works effectively with down to 1 ng of circulating cell-fiee DNA. These
results demonstrate the potential of TAPS for low input DNA and clinical ations (Fig.
24A-C, Fig. 25A-B).
TAPS was tested on three circulating cell-free DNA samples (chNA) from one
healthy sample, one Barrett’s oesophagus (Barrett’s) , and one pancreatic cancer
sample that were obtained from 1-2 ml ofplasma. Standard TAPS protocol was followed and
each sample sequenced to ~10X coverage. Analysis of the chNA TAPS results showed that
TAPS provided the same high-quality methylome sequencing fiom low-input cfl)NA as from
bulk genomic DNA, including high SmC conversion rate (Fig. 26A), low false positive rate
(conversion ofunmodified cytosine, Fig. 26B), high mapping rate (Fig. 26C), and low PCR
duplication rate (Fig. 26D). These results demonstrate the power of TAPS for e
diagnostics from chNA.
TAPS can also differentiate methylation from C-to-T genetic variants or single
tide rphisms (SNPs), therefore could detect genetic ts. Methylations and
C-to-T SNPs result in different patterns in TAPS: methylations result in T/G reads in original
top strand (OT)/original bottom strand (OB) and A/C reads in strands complementary to OT
(CTOT) and OB (CTOB), whereas C-to-T SNPs result in T/A reads in OT/OB and
(CTOB/CTOT) (Fig. 27). This further increases the y of TAPS in providing both
methylation information and genetic variants, and ore ons, in one experiment and
sequencing run. This ability of the TAPS method disclosed herein provides integration of
genomic analysis with epigenetic analysis, and a substantial reduction of sequencing cost by
eliminating the need to perform standard whole genome sequencing (WGS).
In summary, we have developed a series of PS-derived bisulfite-fiee, base-
resolution sequencing methods for ne epigenetic modifications and demonstrated the
utility of TAPS for whole-methylome sequencing. By using mild enzymatic and chemical
reactions to detect SmC and 5th directly at base-resolution with high sensitivity and
specificity without affecting unmodified cytosines, TAPS outperforms bisulfite sequencing in
providing a high quality and more complete methylome at half the sequencing co st. As such
TAPS could replace bisulfite sequencing as the new stande in DNA methylcytosine and
hydroxymethylcytosine analysis. Rather than introducing a bulky ation on cytosines
in the bisulfite-free SfC sequencing method reported ly (B. Xia et al., Bisulfite-free,
base-resolution analysis of S-formylcytosine at the genome scale. Nat. Methods 12, 1047-
1050 (2015); C. Zhu et al., -Cell S-Formylcytosine Landscapes ofMammalian Early
Embryos and ESCs at Single-Base Resolution. Cell Stem Cell 20, 720-731 (2017)), TAPS
converts modified cytosine into DHU, a near natural base, which can be “read” as T by
common polymerases and is potentially compatible with PCR—free DNA sequencing. TAPS
is compatible with a variety of downstream analyses, ing but not limit to,
pyro sequencing, methylation-sensitive PCR, ction digestion, MALDI mass
spectrometry, microarray and whole-genome sequencing. Since TAPS can preserve long
DNA, it can be extremely valuable when combined with long read sequencing technologies,
such as SMRT sequencing and nanopore sequencing, to igate certain difficult to map
regions. It is also possible to combine pull-down methods with TAPS to further reduce the
sequencing co st and add base-resolution information to the low-resolution y-based
maps. Herein, it was demonstrated that TAPS could directly replace WGBS in routine use
while ng co st, complexity and time required for analysis. This could lead to wider
adoption of epigenetic analyses in academic research and al diagnostics.
Claims (44)
1. A method for converting oxylcytosine (5caC) and/or 5-formylcytosine (5fC) to dihydrouracil (DHU) comprising contacting a nucleic acid sample comprising 5caC and/or 5fC with a borane reducing agent.
2. The method of claim 1, n the borane reducing agent comprises an agent selected from the group consisting of 2-picoline borane (pic-BH3), borane, sodium borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride.
3. The method of claim 1, wherein the borane reducing agent comprises .
4. The method of claim 1, wherein the borane reducing agent comprises sodium borohydride.
5. The method of claim 1, wherein the borane reducing agent comprises sodium cyanoborohydride.
6. The method of claim 1, wherein the borane reducing agent comprises sodium triacetoxyborohydride.
7. The method of claim 1, wherein the borane reducing agent comprises line borane.
8. The method of claim 1 comprising contacting the nucleic acid sample with an oxidizing agent prior to contacting with a borane reducing agent.
9. The method of claim 8, wherein the oxidizing agent is a ten-eleven translocation (TET) enzyme
10. The method of claim 9, wherein the TET enzyme comprises human TET1, human TET2, human TET3, murine TET1, murine TET2, murine TET3, Naegleria TET (NgTET), Coprinopsis cinerea (CcTET), or derivatives or analogues f.
11. The method of claim 9, wherein the oxidizing agent comprises a chemical oxidizing agent.
12. The method of claim 11, wherein the chemical oxidizing agent comprises potassium perruthenate ) or Cu(II)/TEMPO.
13. The method of claim 8, further comprising adding a blocking group to one or more modified cytosines in the nucleic acid sample.
14. The method of claim 13, wherein the blocking group is added prior to ting with the oxidizing agent.
15. The method of claim 14, wherein the one or more modified cytosines comprises 5hmC.
16. The method of claim 12, wherein the blocking group comprises a sugar or a uridine diphosphate (UDP)-linked sugar.
17. The method of claim 13, wherein the blocking group is added after contacting with the oxidizing agent and prior to contacting with the borane ng agent.
18. The method of claim 17, wherein the one or more ed cytosines comprises 5caC or
19. The method of claim 18, wherein the blocking group comprises an de reactive compound.
20. The method of claim 19, n the aldehyde reactive compound ses a hydroxylamine derivative, a hydrazine derivative, or a hydrazide derivative.
21. The method of claim 20, wherein adding the blocking group comprises contacting the nucleic acid sample with (i) a ng agent and (ii) an amine, hydrazine, or hydroxylamine compound.
22. The method of claim 1, further comprising sequencing the nucleic acid sample after contacting with the borane reducing agent to identify converted cytosine bases.
23. A method for converting a modified cytosine to dihydrouracil (DHU) comprising: i) contacting a c acid sample comprising a modified cytosine with an oxidizing agent to produce 5-carboxylcytosine (5caC) and/or 5-formylcytosine (5fC); and ii) contacting the nucleic acid sample comprising 5caC and/or 5fC with a borane reducing agent.
24. The method of claim 23, wherein the oxidizing agent is a ten-eleven translocation (TET) enzyme
25. The method of claim 24, wherein the TET enzyme comprises human TET1, human TET2, human TET3, murine TET1, murine TET2, murine TET3, Naegleria TET (NgTET), Coprinopsis cinerea (CcTET), or derivatives or analogues f.
26. The method of claim 23, wherein the oxidizing agent comprises a chemical oxidizing agent.
27. The method of claim 26, wherein the oxidizing agent comprises potassium perruthenate (KRuO4) or /TEMPO.
28. The method of claim 23, wherein the borane reducing agent comprises an agent selected from the group consisting of line borane (pic-BH3), borane, sodium dride, sodium cyanoborohydride, and sodium triacetoxyborohydride.
29. The method of claim 28, wherein the borane ng agent comprises borane.
30. The method of claim 28, wherein the borane reducing agent comprises sodium borohydride.
31. The method of claim 28, wherein the borane reducing agent comprises sodium cyanoborohydride.
32. The method of claim 28, wherein the borane reducing agent comprises sodium triacetoxyborohydride.
33. The method of claim 28, wherein the borane reducing agent comprises 2-picoline borane.
34. The method of claim 23, further comprising adding a blocking group to one or more of the modified cytosines in the nucleic acid .
35. The method of claim 34, wherein the blocking group is added prior to ting with the oxidizing agent.
36. The method of claim 35, wherein the one or more modified cytosines comprises 5hmC.
37. The method of claim 36, n the blocking group comprises a sugar or a uridine phate linked sugar.
38. The method of claim 34, wherein the blocking group is added after contacting with the oxidizing agent and prior to contacting with the borane reducing agent.
39. The method of claim 38, wherein the one or more modified cytosines comprises 5caC or
40. The method of claim 39, wherein the blocking group comprises an aldehyde reactive compound.
41. The method of claim 40, wherein the aldehyde reactive compound comprises a hydroxylamine tive, a hydrazine derivative, or a hydrazide derivative.
42. The method of claim 41, wherein adding the blocking group comprises contacting the nucleic acid sample with (i) a coupling agent and (ii) an amine, hydrazine, or hydroxylamine compound.
43. The method of claim 23, further comprising sequencing the nucleic acid sample after contacting with the borane reducing agent to identify converted cytosine bases.
44. The method of claim 23, wherein the nucleic acid sample comprises DNA or RNA. “gang , _ . Egg» gag a.” N... marmmfifimfifiwumg $ng fi...fi fl H 3&3”.wfififlm§§fi§§ Em Egg. $33, $3 mXLNxmleN: mxfi Egg.Nfifimmfisaéfl 43% fig.mITUII wfifififiwfifi gm: mfifimfimfimfi V. are”: $33 N: mégwg fig: xmfiagz §mg Q XE. [flank NI fling,E‘Ewwmwfimfim fi \mmI‘ xme 3&3.aummcfiufimwfiwfiyw .mcfimm.,§mmw§m magma §$ Scfmmx mumwwxfig «Na? 3me E33», aamaxmgmfi SUBSTITUTE SHEET (RULE 26) WO 36413 .EEEE L....\\\.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US62/614,798 | 2018-01-08 | ||
US62/660,523 | 2018-04-20 | ||
US62/771,409 | 2018-11-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
NZ793130A true NZ793130A (en) | 2022-10-28 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11306355B2 (en) | Bisulfite-free, base-resolution identification of cytosine modifications | |
EP3997245B1 (en) | Bisulfite-free, whole genome methylation analysis | |
US11702690B2 (en) | Sequencing library, and preparation and use thereof | |
RU2754038C2 (en) | Methods for dna amplification to preserve methylation status | |
US20230357833A1 (en) | Cytosine modification analysis | |
EP3710596B1 (en) | Mapping the location, type and strand of damaged nucleotides in double-stranded dna | |
NZ793130A (en) | Bisulfite-free, base-resolution identification of cytosine modifications | |
EA047457B1 (en) | BISULFITE-FREE MODIFICATION OF CYTOSINE WITH ONE-BASE RESOLUTION | |
Weng et al. | METHODS FOR MAPPING OF NUCLEIC ACIDS EPIGENETIC MODIFICATIONS AND ITS CLINIC APPLICATIONS | |
WENG et al. | METHODS FOR MAPPING OF MODIFICATIONS NUCLEIC ACIDS AND EPIGENETIC ITS CLINIC APPLICATIONS | |
Liu | High-resolution mapping of abasic sites and pyrimidine modifications in DNA | |
Wu | Nucleotide-resolution Genome-wide Mapping of Oxidative DNA Damage |