WO2024254003A1

WO2024254003A1 - Identification and mapping of methylation sites

Info

Publication number: WO2024254003A1
Application number: PCT/US2024/032260
Authority: WO
Inventors: Daniel Wolf; Xi-jun CHEN; Pietro Gatti Lafranconi
Original assignee: Illumina Inc
Current assignee: Illumina Inc
Priority date: 2023-06-05
Filing date: 2024-06-03
Publication date: 2024-12-12
Anticipated expiration: 2025-12-05
Also published as: WO2024254003A9

Abstract

This disclosure relates to methods of identifying and mapping methylation sites. In some embodiments, these methods incorporate cytosine deamination and base excision repair. In some embodiments, these methods incorporate use of hairpin adapters. Methods of sequence analysis are also described herein.

Description

IDENTIFICATION AND MAPPING OF METHYLATION SITES

DESCRIPTION

REFERENCE TO ELECTRONIC SEQUENCE LISTING

[001] The application contains a Sequence Listing which has been submitted electronically in .XML format and is hereby incorporated by reference in its entirety. Said .XML copy, created on May 30, 2024, is named “ILUM0158PCT.xml” and is 46,807 bytes in size. The sequence listing contained in this .XML file is part of the specification and is hereby incorporated by reference herein in its entirety.

FIELD

[002] This disclosure relates to methods of identifying and mapping methylation sites. BACKGROUND

[003] Cytosine modifications, such as cytosine methylation, can alter gene expression, with methylated cytosines often being associated with transcriptional silencing. Cytosines modified at the 5th carbon position with a methyl group generate 5-methylcytosine (5mC) and oxidation of 5mC generates 5-hydroxymethylcytosine (5hmC).

[004] Bisulfite sequencing is widely used to detect 5mC and 5hmC at single base resolution from DNA samples, but this process involves extreme temperatures and pH that can trigger DNA degradation. Thus, bisulfite treatment can induce fragmentation, loss of DNA, and biased sequencing data. Furthermore, since cytosines are disproportionately damaged (as compared to 5mC or 5hmC) by bisulfite sequencing, resulting sequencing libraries have an unbalanced nucleotide composition with reduced mapping rates and skewed GC content representation. In summary, libraries generated using bisulfite sequencing often do not adequately cover the genome.

[005] A more recently described method is TET-assisted pyridine borane sequencing (TAPS), which combines an enzymatic and chemical reaction to detect 5mC and 5hmC (See Liu Y et al. Nat Biotechnol. 37(4):424-9 (2019)). The TET1 enzyme is used to oxidize 5mC and 5hmC to 5-carboxycytosine (5caC), and the 5caCs are then reduced to dihydrouracil (DHU) using pyridine borane. Subsequent PCR converts DHU to thymine, which allows for differentiation between cytosines and modified cytosines. This method can be hindered by difficulty in preparing hyperactive TET1, however, which is required for complete the oxidization of 5mC to 5caC. In addition, the method of Liu et al. 2019 includes a bead-based purification step, which can lead to loss of yield from washes and recovery steps.

[006] Accordingly, alternative means of generating methylation sequencing data are needed. The present disclosure describes means of identifying and mapping methylation sites. Some embodiments described herein use enzymatic means to avoid chemical reactions that may damage DNA.

SUMMARY

[007] In accordance with the description, this disclosure describes methods of methylation analysis. In some embodiments, the present methods incorporate enzymatic steps to generate double-stranded breaks at methylated cytosines to incorporate methylation-specific adapters. In some embodiments, the present methods describe how to dehybridize regions of double-stranded DNA to allow for methylation analysis. Further, the present disclosure describes use of hairpin adapters to methylation analysis of library fragments.

[008] Embodiment 1. A method of preparing a methylation-specific DNA library from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising:

• preparing double-stranded library fragments by fragmenting the double-stranded target DNA and incorporating a first adapter at one or both ends of the fragment;

• inducing a double-stranded break of a double-stranded library fragment at a methylated cytosine via enzymatic reactions; and

• ligating a methylation-specific adapter onto the library fragment at an end generated by the double-stranded break, wherein the methylation-specific adapter is different from the first adapter.

[009] Embodiment 2. The method of embodiment 1, wherein the enzymatic reactions comprise base excision repair and endonuclease cleavage. [0010] Embodiment 3. The method of embodiment 1 or embodiment 2, wherein the base excision repair reaction is performed with a mix of a DNA glycosylase and an apurini c/ apy rimi di ni c endonucl ease .

[0011] Embodiment 4. The method of any one of embodiments 2 or embodiment 3, wherein the endonuclease cleavage is performed by an endonuclease.

[0012] Embodiment 5. The method of any one of embodiments 1-4, wherein the method is performed in a single reaction vessel.

[0013] Embodiment 6. The method of any one of embodiments 1-5, wherein the method does not employ a bead-based purification step.

[0014] Embodiment 7. A method of preparing a methylation-specific DNA library from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising:

• preparing library fragments by fragmenting the double-stranded target DNA and incorporating a first adapter at one or both ends of the fragment;

• converting a methylated cytosine in the library fragments into an abasic site;

• generating a nick in the library fragment at the abasic site;

• cleaving the DNA nucleotide opposite the nick to generate a double-stranded break in the library fragment; and

• ligating a methylation-specific adapter onto the library fragment at the end generated by the double- stranded break, wherein the methylation-specific adapter is different from the first adapter.

[0015] Embodiment 8. The method of embodiment 7, wherein the methylated cytosine is converted into an abasic site by an enzymatic reaction.

[0016] Embodiment 9. The method of embodiment 8, wherein the method is performed in a single reaction vessel.

[0017] Embodiment 10. The method of any one of embodiments 1-9, wherein the method does not employ a bead-based purification step.

[0018] Embodiment 11. The method of any one of embodiments 8-10, wherein the enzymatic reaction is base excision repair to excise the methylated cytosine and generate an abasic site. [0019] Embodiment 12. The method of embodiment 1 1, wherein the base excision repair reaction is performed with a mix of a DNA glycosylase and an apurinic/apyrimidinic endonuclease.

[0020] Embodiment 13. The method of embodiment 12, wherein the apurinic/apyrimidinic endonuclease is T7 endonuclease.

[0021] Embodiment 14. The method of any one of embodiments 7-13, wherein the methylated cytosine is converted into an abasic site by a chemical reaction followed by an enzymatic reaction.

[0022] Embodiment 15. The method of embodiment 14, wherein the chemical reaction converts a methylated cytosine into a uracil and the enzymatic reaction is excising the uracil is performed by a uracil-specific excision reagent (USER).

[0023] Embodiment 16. The method of embodiment 15, wherein the chemical reaction is borane reduction.

[0024] Embodiment 17. The method of embodiment 16, wherein the borane reduction is TET-assisted pyridine borane sequencing reduction.

[0025] Embodiment 18. The method of any one of embodiments 15-17, wherein the USER is a mixture of uracil DNA glycosylase and endonuclease VIII.

[0026] Embodiment 19. The method of any one of embodiments 1-18, wherein the methylated cytosine is comprised within a CpG site.

[0027] Embodiment 20. The method of embodiment 19, wherein the CpG site is comprised within a region having a GC content of 50% or greater, a length greater than 200bp, and a ratio of observed to expected CpG dinucleotides of greater than 0.6.

[0028] Embodiment 21. The method of embodiments 7-20, wherein the double-stranded break results in a single guanine overhang on each fragment.

[0029] Embodiment 22. The method of embodiment 21, wherein the methylation-specific adapter comprises a single cytosine overhang.

[0030] Embodiment 23. The method of any one of embodiments 1-22, wherein the methylation-specific adapter comprises a methylation index sequence.

[0031] Embodiment 24. The method of any one of embodiments 1-23, wherein the first adapter comprises a first-read sequencing adapter sequence and the methylation-specific adapter comprises a second-read sequencing adapter sequence. [0032] Embodiment 25. The method of any one of embodiments 1-24, comprising incorporating a first adapter at one end of the fragment and a second adapter at the other end of the fragment.

[0033] Embodiment 26. The method of embodiment 25, wherein the first adapter comprises a first-read sequencing adapter sequence and the second adapter comprises a second- read sequencing adapter sequence.

[0034] Embodiment 27. The method of embodiment 26, wherein the same second-read sequencing adapter sequence is comprised in the second adapter and in the methylation-specific adapter.

[0035] Embodiment 28. The method of any one of embodiments 1-27, comprising incorporating the first adapter at both ends of each fragment.

[0036] Embodiment 29. The method of any one of embodiments 1-28, wherein the library preparation is by tagmentation.

[0037] Embodiment 30. The method of any one of embodiments 1-29, wherein the library fragments are sequenced after ligating the methylation-specific adapter.

[0038] Embodiment 31. The method of embodiment 30, wherein the sequencing is shortcycle sequencing.

[0039] Embodiment 32. The method of embodiment 31, wherein the short-cycle sequencing comprises less than 250, less than 100, or less than 50 cycles.

[0040] Embodiment 33. The method of any one of embodiments 23-32, wherein the methylation index sequence is used to identify the genome location of the methylated cytosine.

[0041] Embodiment 34. The method of embodiment 33, wherein the nucleotide within a given fragment adjacent to the methylation-specific adapter corresponds to a nucleotide that was adjacent to the methylated cytosine in the target nucleic acid.

[0042] Embodiment 35. The method of embodiment 34, wherein the nucleotide that was adjacent to the methylated cytosine was 5’ of the methylated cytosine.

[0043] Embodiment 36. The method of embodiment 34, wherein the nucleotide that was adjacent to the methylated cytosine was 3’ of the methylated cytosine.

[0044] Embodiment 37. The method of any one of embodiments 30-36, wherein the method does not require amplification. [0045] Embodiment 38. A method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising:

• dehybridizing a section of the double-stranded target DNA into the two separate single DNA strands;

• deaminating cytosines in the two separate single DNA strands, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA;

• rehybridizing the two separate single DNA strands into double-stranded DNA;

• performing USER to remove uracils;

• performing gap filling; and

• preparing library fragments by fragmenting the target DNA.

[0046] Embodiment 39. A method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising:

• preparing library fragments by fragmenting the double-stranded target DNA and incorporating one or more adapters at both ends of the double-stranded fragments, wherein at least one adapter is a hairpin adapter that binds both strands at one end of the double-stranded fragment;

• dehybridizing the double-stranded DNA fragments into a first single DNA strand and a second single DNA strand linked by the hairpin adapter;

• protecting the first single DNA strand with a first complementary DNA strand, thereby generating a double-stranded DNA comprising the first single DNA strand;

• deaminating cytosines in the second single DNA strand;

• dehybridizing the first complementary DNA strand;

• rehybridizing the first single DNA strand and second single DNA strand into double-stranded DNA; performing USER to remove uracils in the second single DNA strand; performing a first gap filling; • dehybridizing the double-stranded DNA fragments into the first single DNA strand and the second single DNA strand linked by the hairpin adapter;

• protecting the second single DNA strand with a first complementary DNA strand, thereby generating a double-stranded DNA comprising the second single DNA strand;

• deaminating cytosines in the first single DNA strand;

• dehybridizing the second complementary strand;

• rehybridizing the first single DNA and second single DNA strand into doublestranded DNA;

• performing USER to remove uracils in the first single DNA strand; and

• performing a second gap filling.

[0047] Embodiment 40. The method of embodiment 38 or embodiment 39, wherein protecting the first single DNA strand with a first complementary DNA strand comprises binding an extension primer to an adapter attached to the first single DNA strand and extending a first extended strand of DNA complementary to the first single DNA.

[0048] Embodiment 41. The method of embodiment 38 or embodiment 39, wherein protecting the first single DNA strand with a first complementary DNA strand comprises binding an oligonucleotide comprising the first complementary DNA strand to the first single DNA strand.

[0049] Embodiment 42. The method of any one of embodiments 38-41, wherein protecting the second single DNA strand with a second complementary DNA strand comprises binding an extension primer to an adapter attached to the second single DNA strand and extending a second extended strand of DNA complementary to the first single DNA.

[0050] Embodiment 43. The method of any one of embodiments 38-41, wherein protecting the second single DNA strand with a second complementary DNA strand comprises binding an oligonucleotide comprising the second complementary DNA strand to the second single DNA strand.

[0051] Embodiment 44. A method of preparing a DNA library for identifying methylated cytosines in a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising: • preparing library fragments by fragmenting the double-stranded target DNA and incorporating one or more adapters at both ends of the double-stranded fragments, wherein at least one adapter is a hairpin adapter that binds both strands at one end of the double-stranded fragment;

• binding an extension primer to an adapter attached to the first single DNA strand and extending a first extended strand of DNA complementary to the first single DNA strand and thereby generating double-stranded DNA;

• deaminating cytosines in the second single DNA strand, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA;

• dehybridizing the first extended strand;

• rehybridizing the first single DNA strand and second single DNA strand into double-stranded DNA;

• performing a base excision repair reaction to remove uracils generated from deaminating cytosines in the second single DNA strand;

• performing a first gap filling;

• dehybridizing the double-stranded DNA fragments into the first single DNA strand and the second single DNA strand linked by the hairpin adapter;

• binding an extension primer to an adapter attached to the second single DNA strand and extending a second extended strand of DNA complementary to the second single DNA strand and thereby generating double-stranded DNA;

• deaminating cytosines in the first single DNA strand, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA;

• dehybridizing the second extended strand; • rehybridizing the first single DNA and second single DNA strand into a doublestranded DNA fragment;

• performing a base excision repair reaction to remove uracils and deaminating cytosines in the first single DNA strand; and

• performing a second gap filling.

[0052] Embodiment 45. The method of any one of embodiments 38-44, wherein the method is performed in a single reaction vessel.

[0053] Embodiment 46. The method of any one of embodiments 38-45, wherein the method does not employ a bead-based purification step.

[0054] Embodiment 47. The method of any one of embodiments 38-46, wherein the dehybridizing is performed with a helicase or with a recombinase plus single-stranded DNA binding protein.

[0055] Embodiment 48. The method of any one of embodiments 38-47, wherein the deaminating is performed with a deaminase.

[0056] Embodiment 49. The method of embodiment 48, wherein the deaminase is APOBEC3A.

[0057] Embodiment 50. The method of embodiment 48 or 49, wherein the deaminating converts unmethylated cytosines into uracils and methylated cytosines into thymines.

[0058] Embodiment 51. The method of any one of embodiments 38-50, wherein the gap filling is performed with a polymerase.

[0059] Embodiment 52. The method of embodiment 51, wherein the USER is a mixture of a uracil DNA glycosylase and an apurinic/apyrimidinic endonuclease.

[0060] Embodiment 53. The method of embodiment 52, wherein the apurinic/apyrimidinic endonuclease is endonuclease VIII.

[0061] Embodiment 54. The method of any one of embodiments 38-54, where performing USER and performing gap filling leads to cytosines being incorporated into positions that had been uracils.

[0062] Embodiment 55. The method of embodiment 54, wherein unmethylated cytosines after gap filling correspond to the unmethylated cytosines in the double-stranded target DNA. [0063] Embodiment 56. The method of embodiment 54 or 55, wherein thymines mismatched with guanines in complementary library fragments after gap filling correspond to positions of methylated cytosines in the double-stranded target DNA.

[0064] Embodiment 57. The method of any one of embodiments 39-56, wherein the library fragments are immobilized on a solid support.

[0065] Embodiment 58. The method of embodiment 57, wherein the library fragments are immobilized on the solid support via a sequence comprised in at least one adapter.

[0066] Embodiment 59. The method of any one of embodiments 39-58, wherein the hairpin adapter comprises a modification to block extension.

[0067] Embodiment 60. The method of embodiment 59, wherein the modification to block extension is a non-nucleic acid moiety.

[0068] Embodiment 61. The method of any one of embodiments 39-60, wherein the hairpin adapter comprises a cleavable linker.

[0069] Embodiment 62. The method of any one of embodiments 39-61, wherein the hairpin adapter comprises one or more cytosines.

[0070] Embodiment 63. The method of embodiment 62, wherein the hairpin adapter comprises 2 or 3 cytosines.

[0071] Embodiment 64. The method of any one of embodiments 39-63, wherein a hairpin adapter is attached to both ends of the fragment.

[0072] Embodiment 65. The method of any one of embodiments 39-64, wherein one or more hairpin adapter is cleaved after the second gap filling.

[0073] Embodiment 66. The method of any one of embodiments 39-65, further comprising sequencing fragments after the second gap filling.

[0074] Embodiment 67. The method of embodiment 66, wherein sequencing diversity and alignment efficiency are retained.

[0075] Embodiment 68. The method of embodiment 66 or embodiment 67, further comprising analyzing sequencing data for mismatched thymines and guanines in complementary library fragments.

[0076] Embodiment 69. The method of any one of embodiments 66-68, wherein fragments are seeded onto a flow cell before sequencing. [0077] Embodiment 70. The method of embodiment 69, wherein double-stranded fragments are seeded.

[0078] Embodiment 71. The method of embodiment 69 or 70, wherein fragments are not amplified before seeding.

[0079] Embodiment 72. The method of embodiment 69-71, wherein fragments are extended, amplified, and linearized after seeding and before sequencing.

[0080] Additional objects and advantages will be set forth in part in the description which follows, and in part will be understood from the description, or may be learned by practice. The objects and advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.

[0081] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.

[0082] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one (several) embodiment(s) and together with the description, serve to explain the principles described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0083] Figure 1 provides an overview of a method of preparing a DNA library comprising fragments comprising non-methylated regions and fragments with a methylationspecific adapter corresponding to an adapter comprising a methylation-specific index added at a position that had been a methylated base. The fragments comprising the methylation-specific adapter provide information on the methylated base and its genome location. LP = library preparation.

[0084] Figure 2 summarizes means of converting 5mC into C. Chemical conversion of a 5-methylcytosine (5mC) into an apurinic site (AP site) can be followed by conversion of the AP site into a cytosine (Cyt) by base excision repair (BER). Alternatively, an enzymatic process can include oxidative deamination to convert 5mC into a thymine (Thy) and further conversion into a cytosine by base excision repair (BER).

[0085] Figures 3A and 3B summarize sodium bi sulfite-based methods and alternatives to bisulfite-based methods for methylation sequencing. The sodium-bi sulfite method and non- bisulfite base conversion via enzymatic methyl sequencing (EM-seq) or TAPS (A) and non- bisulfite deamination/base excision repair (BER) (B) methods are shown. Sodium bisulfite chemically modifies DNA and results in the conversion of unmethylated cytosines (C’s) to uracils (U’s) and then to thymines (T’s), however, 5mC and 5hmC are not converted under these reaction conditions and are read as C in sequencing (A). The EM-seq method, as described in Vaisvila et al., bioRxiv preprint posted May 16, 2020; https://doi.org/10.! 101/2019.12.20.884692 converts C’s to U’s and then to T’s and methylated C’s remain as methylated C’s and are read as C in sequencing (A). TAPS (as described in Liu Y et al. Nat Biotechnol. 37(4):424-9 (2019)) converts methylated C’s to U’s and then to T’s, whiles C’s remain as C’s (A). The cytosine deamination/BER method (non-bi sulfite deamination) method described herein converts methylated C’s to T’s by deamination, while non-methylated C’s are converted to U’s by deamination and then converted to C’s by BER (B). The oval labeled “m” indicates methyl in a 5mC, and the oval labeled “hm” indicates a hydroxymethyl in 5hmC. 5mC bases treated with a cytosine deaminase result in thymine bases, providing a signal for assessing sequence-specific methylation state of cytosines when sequenced, for example, APOBEC3A is a cytidine deaminase that recognizes single-stranded DNA and catalyzes the deamination of cytosine (C) to uracil (U), 5-methylcytosine (5mC) to thymine (T), and 5-hydroxymethylcytosine to 5- hydroxymethyluracil. Sequences shown in Figure 3A correspond to SEQ ID NOs: 1-3.

[0086] Figures 4A and 4B show methods of incorporating methylation-specific index sequences into fragments at sites corresponding to methylated C’s. Intact fragments comprise adapters at both ends of the fragment, while fragments that had comprised methylated C now have one end with an adapter and one end without an adapter (A). Ligation can be used to add an adapter comprising a methylation-specific index, and optionally comprising a sequencing primer sequence and an adapter sequence (such as an adapter sequence that mediates attachment to a complementary oligonucleotide immobilized on a sequencing flow cell) to the end that lacks an adapter (B). In this way, fragments with the methylation-specific index can be used to identify the position of methylated C’s in the target nucleic acid that was fragmented.

[0087] Figure 5 shows how GC residues, including those comprised in regions of sequence with relatively high CpG content such as CpG islands, can lead to a break in fragments with a single-G overhand using the present methods. A CpG island may comprise a region with 40% CpG content or greater, 50% CpG content or greater, 60% CpG content or greater, or 70% CpG content or greater, as described herein. Such an overhang can be used to ligate an adapter with a single-C overhang. In this way, fragments comprising the adapter with the single-C overhang can be identified as having a position with a methylated cytosine.

[0088] Figure 6 shows how methylation-specific DNA libraries can be generated from genomic DNA (i.e., a target nucleic acid comprising genomic DNA). In one embodiment, a standard library prep (LP) can be used to incorporate different adapters at the two ends of fragments (such as an adapter comprising a first-read sequencing adapter sequence at one end and a second-read sequencing adapter sequence at the other end). An adapter can be added at an end that previously corresponded to a methylated base, as that end will be missing an adapter, wherein this added adapter may comprise a methylation-specific index and a first- or second- read sequencing adapter sequence. In some embodiments, the library contains fragments for generating sequencing data of non-m ethylated regions from fragments that do not comprise a methylation-specific adapter sequence (whole genome sequencing (WGS) and MethSeq method). Alternatively, single-adapter LP can be used to incorporate the same adapter (such as one comprising a first-read sequencing adapter) at both ends of each fragment. After this library prep, an adapter can be added at any end that previously corresponded to a methylated base, wherein this adapter may comprise a methylation-specific index and a second-read sequencing adapter sequence. In this embodiment (MethSeq enrichment), fragments that did not comprise a methylated base will not cluster, as they only have a first-read sequencing adapter. With MethSeq enrichment, only fragments that comprised a methylated base will have an adapter with a first-read sequencing adapter sequence and the other end will have an adapter with a second- read sequencing adapter sequence and as such will generate paired-end sequencing data.

[0089] Figure 7 shows a non-bisulfite sequencing (NBS) method for mapping 5mC’s in genomic DNA. In this method, a region of double-stranded DNA can be dehybridized (i.e., unzipping of the duplex), followed by cytosine deamination, reannealing, BER, and library preparation. This method converts methylated C’s to T’s by deamination, while non-methylated C’s are converted to U’s by deamination and then converted back to C’s by BER. In sequencing of the resulting fragments, complementary fragments that have a mismatch of T/G indicate the presence of a methylated C. Sequences shown in Figure 7 correspond to SEQ ID NOs: 4-13, while the P5 and P7 sequences are SEQ ID NOs: 22 and 23, respectively. UDG = Uracil DNA glycosylase; APEndo = apurinic/apyrimidinic (AP) endonuclease; SSB = single-stranded binding protein. Superscript or subscript T indicates a T that is mispaired with a G in the complementary strand (i.e., a mismatched T). A double-underlined G indicates a normal C (i.e., unmethylated), and a C with white text on black background indicates a methylated C. A double-underlined JJ indicates an unmethylated C that have been converted to a U. A white T on a black background indicates a methylated C that has been converted to T. A bold A indicates an A paired with a T that was converted from a methylated C. These marking are also used in Figures 8-10.

[0090] Figure 8 shows a “single-pot” NBS method for mapping 5mC in genomic DNA. Sequences shown in Figure 8 correspond to SEQ ID NOs: 4-6 and 9-13.

[0091] Figure 9 shows a NBS method for mapping 5mC in genomic DNA using one hairpin adapter. In this method, double- stranded fragments comprise a hairpin adapter at one end. APOBEC3A = apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3A; hyb = hybridize; dehyb = dehybridize. Sequences shown in Figure 9 correspond to SEQ ID NOs: 14- 21.

[0092] Figure 10 shows a NBS method for mapping 5mC in genomic DNA using two hairpin adapters. In this method, double-stranded fragments comprise a hairpin adapter at both ends. Sequences shown in Figure 10 correspond to SEQ ID NOs: 14-21.

[0093] Figures 11A-11C show a schematic of direct seeding of Y-adapter doublestranded library fragments (A), analysis of colocalization of individual strands to proximal locations (B), and results from a 36-cycle MiSeq (Illumina) sequencing run (C). Figure 11 A shows a schematic of seeding of a double-stranded library fragment comprising a R1 strand and a R2 strand complementary to each other, wherein extension, amplification, and linearization occur after seeding to allow for proximal locations of the R1 and R2 strands on the surface for sequencing (such as a flowcell). The circled regions in Figure 1 IB indicate where a R1 strand and a R2 strand (wherein the R1 strand and the R2 strand are labeled with different fluorescent markers) of a double-stranded fragment are proximal to each other when seeded onto a sequencing surface with undenatured conditions. Proximal localization of the two strands indicates that undenatured conditions leads to seeding of a high percentage of intact doublestranded fragments, which can allow for easier resolution of methylation analysis data as described in Example 3. In contrast, these R1 and R2 strands complementary strands are not in proximal locations when double-stranded fragments are seeded under denatured (control) conditions as shown by the lack of circled clusters in Figure 1 IB. Similar localization was seen with a 36-cycle MiSeq experiment, showing that approximately 60% of clusters with undenatured conditions were true paired-end (PE) clusters generated by seeding of intact doublestranded fragments (Figure 11C).

DESCRIPTION OF THE SEQUENCES

[0094] Table 1 provides a listing of certain sequences referenced herein.

DESCRIPTION OF THE EMBODIMENTS

I. Methods of Methylation Analysis Based on Cytosine Deamination and Base Excision Repair

[0095] As used herein, “methylation analysis” refers to evaluating whether cytosines in a given double-stranded target nucleic acid are methylated. In some embodiments, the methylated cytosine is 5-methylcytosine (5mC).

[0096] In some embodiments, methylation analysis is performed using cytosine deamination and base excision repair, as described below. Such methods can be used with a variety of samples.

[0097] In some embodiments, a method described herein is performed in a single reaction vessel. For example, methods using enzymatic steps can be performed under conditions compatible with multiple enzymes.

[0098] In some embodiments, a method does not employ a bead-based purification step. Many other methods of methylation analysis require bead-based purification steps (see, for example Liu et al. 2019). Since bead-based purification is associated with sample loss with washing steps and incomplete binding, avoiding bead-based purification can improve yield in comparison to other methods. A. Double-Stranded Target Nucleic Acid

[0099] The present methods can be used with any target nucleic acid comprising DNA, including genomic DNA and cell-free DNA. In some embodiments, the target DNA comprises double-stranded target DNA.

[00100] In some embodiments, the double-stranded target nucleic acid is comprised in a sample comprising DNA and other materials. In some embodiments, the target nucleic acid is comprised in a clinical sample from a patient. Methylation analysis is relevant to a wide range of clinical samples, including clinical samples comprising tumor cells. Exemplary clinical samples comprising DNA include biopsy and liquid biopsy samples. In some embodiments, a clinical sample comprises cell-free DNA (cfDNA), such as fetal DNA or circulating tumor DNA (ctDNA), which may be measured from plasma or blood samples. Exemplary use of methylation markers with a clinical sample comprising ctDNA include early detection, estimation of prognosis, evaluation of minimal residual disease and risk of relapse, selection of treatment, and evaluation of treatment resistance (see, for example, Lianidou Molecular Oncology 15: 1683-1700 (2021)).

[00101] While cfDNA and ctDNA samples are being widely adopted for analysis, their characteristics can hamper analysis, such as their common size of 160 to 200 base pair fragments that limits the length of sequence reads (see, for example, Yu PLOS One 17(4):e0266889 (2022)). Further, ctDNA may only comprise 1% of total cfDNA in a sample. In some embodiments, the present methods of methylation analysis incorporating enzymatic processing minimize damage to samples comprising DNA (such as cfDNA and ctDNA) during processing in comparison to other types of methylation analysis. In some embodiments, sequencing results with methylation analysis prepared using the present methods are improved compared to sequencing results with methylation analysis prepared using methods known in the art due to decreased DNA damage from methylation analysis steps with the present methods.

[00102] In some embodiments, a double-stranded target DNA comprises multiple cytosines. In some embodiments, multiple cytosines in a double-stranded target DNA are comprised in within a CpG site. As used herein, a CpG site refers to any region of the genome wherein cytosine and guanine appear consecutively on the same strand of a nucleic acid, with the “p” in CpG representing the phosphodiester bond joining the cytosine and guanine. CpG sequences may occur at relatively high frequency in some regions of the genome. In some embodiments, a CpG site is comprised within a region having a GC content of 50% or greater, a length greater than 200bp, and a ratio of observed to expected CpG dinucleotides of greater than 0.6. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. J Mol Biol 196: 261-282 (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G).

[00103] The term “CpG island,” as used herein, refers to a region of nucleic acid with a high frequency of CpG sites. While not limited to a specific percentage, CpG islands may commonly include nucleic acids having greater than 15% CpG content. In some embodiments, a CpG island as described herein comprises a region of nucleic acid having 10% CpG content or greater, 15% CpG content or greater, or 20% CpG content or greater. In some embodiments, CpG islands are known to have relatively high rates of methylated cytosines as compared to other cytosines in the genome (see, for example, Uroshlev et al., Scientific Reports 10:8635 (2020)). Accordingly, the present methods may have particular use in evaluating the methylation status of CpG islands and other regions with relatively high CpG content.

[00104] In some embodiments, use of the present methods allows for a doublestranded break (generated at a position that was a methylated cytosine) to result in a single guanine overhang on a fragment prepared from a double-stranded target DNA. An adapter comprising a single cytosine overhang can be used to selectively ligate opposite a single guanine overhang, as shown in Figure 5.

[00105] If multiple cytosines are in close proximity, fragments generated by double-stranded breaks (generated at a position that was a methylated cytosine) may have a single guanine overhang at both ends, allowing for ligation of an adapter with a single cytosine at both ends of a fragment. In some embodiments, use of the present methods allows for doublestranded breaks to result in a guanine overhang at both ends of a fragment prepared from a double-stranded target DNA with cytosines in close proximity. CpG islands and other DNA regions having a high number of methylated cytosines (such as 40% CpG content or greater, 50% CpG content or greater, 60% CpG content or greater, or 70% CpG content or greater) may be identified by the presence of fragments comprising adapter sequences (such as methylationspecific index sequences) at both ends that were added by ligation with an adapter comprising a single cytosine overlap. B. Cytosine Deamination

[00106] In some embodiments, cytosine deamination is used modify nonmethylated and methylated cytosines. Methods known to modify methylated cytosines (i.e., 5mC) are shown in Figure 2. In some embodiments, as shown in Figure 3B, cytosine deamination can convert a nonmethylated cytosine to a uracil and convert a methylated cytosine to a thymine. These approaches can offer an alternative to use of bisulfite for methylation sequencing.

[00107] In some embodiments, cytosine deamination is performed using apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC). APOBEC allows for a bisulfite-free means of modifying methylated cytosines and nonmethylated cytosines. The deaminating properties of the non-damaging enzyme APOBEC avoids DNA damage that associated with harsh chemical bisulfite treatment (see, for example, Schutsky et al., Nucleic Acids Research 45: 13 (2017)).

[00108] In some embodiments, cytosine deamination is performed via a chemical reaction. In some embodiments, the chemical reaction is borane reduction. In some embodiments, the borane reduction is TET-assisted pyridine borane sequencing reduction.

C. Base Excision Repair

[00109] In some embodiments, base excision repair (BER) is performed after cytosine deamination. As used herein, “base excision repair” or “BER” refers to a method that can repair damaged DNA. In some embodiments, BER is an enzymatic reaction. In some embodiments, BER can be initiated by DNA glycosylases, which recognize and remove specific damaged or inappropriate bases to form apurinic/apyrimidinic (AP) sites. AP sites may also be referred to as abasic sites.

[00110] In some embodiments, a methylated cytosine is converted into an abasic site by a chemical reaction followed by an enzymatic reaction to remove the abasic site. In some embodiments, a chemical reaction converts a methylated cytosine into a uracil, and the uracil can be excised by a uracil-specific excision reagent (USER).

[00111] In some embodiments, a polymerase can be used to fill-in the gap generated by a BER reaction. When the gap is filled-in, the AP site will be replaced with a nucleotide that is complementary to the nucleotide in the opposite strand. For example, when a nonmethylated cytosine undergoes cytosine deamination to generate a uracil, BER will cause this uracil to be removed and a polymerase will fdl-in this gap with a cytosine. In this way, at the end of the method, a nonmethylated cytosine will be retained as a nonmethylated cytosine. In other words, methylated cytosines will be thymines after the cytosine deamination and base excision repair methodology.

[00112] In some embodiments, the BER is performed using a USER reaction. As used herein, “USER” or “uracil-specific excision reagent” refers to an enzyme or enzyme mix that generates a single nucleotide gap in a double-stranded DNA at the location of a uracil. In a USER, uracil DNA glycosylase (UDG) can catalyze the excision of a uracil base, forming an abasic (apyrimidinic) site while leaving the phosphodiester backbone intact. The lyase activity of an AP endonuclease can breaks the phosphodiester backbone at the 3’ and 5’ sides of the abasic site so that base-free deoxyribose is released. In some embodiments, the AP endonuclease is endonuclease VIII or T7 endonuclease. In some embodiments, a USER enzyme mix can combine two enzymatic activities to generate a single nucleotide gap at the location of a uracil residue. In some embodiments, a polymerase can be used to fill-in the gap generated by a USER reaction. In some embodiments, a base excision repair reaction is performed with a mix of a DNA glycosylase and an apurinic/apyrimidinic endonuclease.

[00113] In some embodiments, library preparation is performed after BER, as shown in Step 4 of Figure 7.

[00114] In some embodiments, a thymine generated by cytosine deamination of a methylated cytosine will not be subject to BER and will be retained as a thymine after the BER step. In some embodiments, this results in T (resulting from a methylated cytosine) being paired with a G in a complementary strand. This mismatch between strands to T/G instead of C/G can be readily resolved with sequence bioinformatics. In some embodiments, resolution can be increased by using true paired-end sequencing, wherein library fragments themselves (and not amplicons of fragments) are sequenced, which can allow for direct confirmation of mismatches directly from paired sequences on a flowcell.

[00115] Since generally most cytosines will not be methylated in a target nucleic acid, the present method has an advantage of having only methylated cytosines changed to thymines by the method, and nonmethylated cytosines will be retained as cytosines. Therefore, sequencing analysis would have fewer mismatches to map, as nonmethylated cytosines would show normal C/G pairing between complementary strands.

II. Methylation Analysis Using Dehybridization of a Section of Target DNA

[00116] As shown in Figure 7, methylation analysis can be performed by “unzipping” a region of a double-stranded target DNA, such that the region is available as two strands of single-stranded DNA. Such a method can be summarized as (1) unzipping the duplex and performing cytosine deamination on the two single-stranded regions as described herein, (2) reannealing the two single-stranded regions into the duplex, (3) performing BER, such as with UDG/APEndo/polymerase enzyme combination, (4) performing library preparation, and (5) sequencing and analyzing sequence data. A single-pot reaction (i.e., wherein the reaction is performed in a single-reaction vessel) using dehybridization is shown in Figure 8.

[00117] In some embodiments, a method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprises (a) dehybridizing a section of the doublestranded target DNA into the two separate single DNA strands; (b) deaminating cytosines in the two separate single DNA strands, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA; (c) rehybridizing the two separate single DNA strands into double-stranded DNA; (d) performing USER to remove uracils; (e) performing gap filling; and (f) preparing library fragments by fragmenting the target DNA. In some embodiments, such a method is prepared in a single-reaction vessel with a single enzyme mix (such as UDG/APEndo/Polymerase).

[00118] In some embodiments, the method does not employ a bead-based purification step. The lack of bead-based purification can improve yields and reduce hands-on preparation time. For example, the present methods do not require a bead-based pulldown of biotinylated nucleic acid products, as used in some methods in the art, such as Liu et al. 2019. This is especially beneficial for target nucleic acids present in a sample at a low copy number.

[00119] In some embodiments, the dehybridizing is performed with a helicase or with a recombinase plus single-stranded DNA binding protein. These enzymes can allow for regions of the duplex of the double-stranded target DNA to “unzip” into two separate strands that can be acted on for deamination of cytosines. In some embodiments, the deaminating is performed with a deaminase. In some embodiments, the deaminase is APOBEC3A. In some embodiments, the deaminating converts unmethylated cytosines into uracils and methylated cytosines into thymines. In some embodiments, gap fdling is performed with a polymerase. In some embodiments, the USER is a mixture of a uracil DNA glycosylase and an apurinic/apyrimidinic endonuclease. In some embodiments, the apurinic/apyrimidinic endonuclease is endonuclease VIII. In some embodiments, performing USER and gap fdling leads to cytosines being incorporated into positions that had been uracils. In some embodiments, unmethylated cytosines after gap fdling correspond to the unmethylated cytosines in the doublestranded target DNA. After deamination and BER steps, library fragments can be prepared with any standard method, followed by sequencing. The sequencing data can be analyzed as described herein to evaluate positions of methylated cytosines in the sequenced fragments.

A. Sequencing and Analysis

[00120] Sequencing may be performed after (1) treatment of library fragments with a method described herein or (2) after treatment of a target nucleic acid with a method described herein and preparation of library fragments. In either case, library fragments may have cytosine methylation that is evaluated using sequencing information.

[00121] In some embodiments, library fragments are prepared on a solid support and sequenced after release from the solid support. Such means of releasing sequencing templates from the surface of a solid support are well-known in the art. The incorporated materials of US Patent Nos. 7,985,565 and 7,115,400 describe methods of solid-phase nucleic acid amplification which allow amplification products to be immobilized on a solid support in order to form arrays comprised of clusters or “colonies” of immobilized nucleic acid molecules.

[00122] Exemplary sequencing by synthesis (SBS) procedures, fluidic systems and detection platforms that can be readily adapted for use with library fragments and/or amplicons produced by the methods of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; US 7,057,026; WO 91/06678; WO 07/123744; US 7,329,492; US 7,211,414; US 7,315,019; US 7,405,281, and US 2008/0108082, each of which is incorporated herein by reference. [00123] In some embodiments, sequencing is performed after amplifying. In some embodiments, amplification is not performed before sequencing. A number of different sequencing methods are known to those skilled in the art, such as those described in US 9,683,230 and US 10,920,219, each of which is incorporated by reference herein in its entirety.

[00124] In some embodiments, the sequencing fragments are deposited on a flow cell. In some embodiments, the sequencing fragments are hybridized to complementary primers grafted to the flow cell or surface. In some embodiments, the sequences of the sequencing fragments are detected by array sequencing or next-generation sequencing methods, such as sequencing-by-synthesis.

[00125] The P5 and P7 primers are used on the surface of commercial flow cells sold by Illumina, Inc., for sequencing on various Illumina platforms. Such primer sequences are described in US Patent Publication No. 2011/0059865 Al, which is incorporated herein by reference in its entirety. While the P5 and P7 primers are given as examples, it is to be understood that any suitable amplification primers can be used in the examples presented herein.

[00126] In some embodiments, a sequencing primer used for sequencing comprises a sequence fully or partially complementary to one or more unique primer binding sequences comprised in the sequencing template. In some embodiments, a sequencing primer comprises at least an A2 sequence (SEQ ID NO: 40), at least an A14 sequence (SEQ ID NO: 4), or at least a B15 sequence (SEQ ID NO: 5), or their complements.

[00127] An integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more nucleic acid fragments, the system comprising components such as pumps, valves, reservoirs, fluidic lines, and the like. A flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, e.g., in US 2010/0111768 A l and US 13/273,666, each of which is incorporated herein by reference. As exemplified for flow cells, one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method. Taking a nucleic acid sequencing embodiment as an example, one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above. Alternatively, an integrated system can include separate fluidic systems to carry out amplification methods and to carry out detection methods. Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeq™ platform (Illumina, Inc., San Diego, CA) and devices described in US 13/273,666, which is incorporated herein by reference.

[00128] As used herein, a “paired-end cluster” refers to sequencing that allows users to sequence both ends of a double-stranded fragment to generate sequences aligned as read pairs. Sequences aligned as read pairs enable more accurate read alignment and the ability to detect insertion-deletion (indel) variants, which is not possible with single-read data (See, for example, Advantages of paired-end and single-read sequencing, Illumina 2021).

[00129] As used herein, “true paired-end cluster” refers to a cluster generated from direct seeding of a double-stranded fragment (and not an amplicon) to a solid support, such as a flow cell. In other words, a “true paired-end cluster” indicates that the double-stranded fragment was not amplified and/or denatured after fragmentation and before seeding. Such seeding may be performed with double-stranded fragments comprising a Y-adapter at one or both end of the fragment. In some embodiments, sequencing is performed without amplification of fragments (i.e., fragments are sequenced and not amplicons of fragments). In some embodiments, fragments are seeded onto a flow cell before sequencing. In some embodiments, In some embodiments, double-stranded fragments are seeded. In some embodiments, fragments are not amplified before seeding. In some embodiments, fragments are extended, amplified, and linearized after seeding and before sequencing. Figures 11 A-l ICshow methods and schematic (Figure 11A) and representative data (Figure 1 IB and Figure 11C) on true paired-end sequencing.

[00130] In some embodiments, library fragments are sequenced after ligating the methylation-specific adapter that comprises a methylation-specific index. In some embodiments, the methylation-specific index allows for easy identification of sites of methylation in sequencing data. Described herein are a number of means of incorporating a methylation-specific adapter, such as WGS & MethSeq and MethSeq enrichment, as shown in Figure 6.

[00131] In some embodiments, the sequencing is short-cycle sequencing. In some embodiments, the short-cycle sequencing comprises less than 250, less than 100, or less than 50 cycles. In some embodiments, a methylation index sequence, as described below, is used to identify the genome location of the methylated cytosine. Short-cycle sequencing may be of particular use for MethSeq where only fragments comprising a methylation-specific adapter are sequenced.

[00132] Sequencing of fragments after use of the methods described herein of methylation analysis allows the user to identify positions of methylated cytosines. In some embodiments, the nucleotide within a given fragment adjacent to the methylation-specific adapter corresponds to a nucleotide that was adjacent to the methylated cytosine in the target nucleic acid. In some embodiments, the nucleotide that was adjacent to the methylated cytosine was 5’ of the methylated cytosine. In some embodiments, the nucleotide that was adjacent to the methylated cytosine was 3’ of the methylated cytosine.

[00133] In some embodiments, sequencing diversity and alignment efficiency are retained. In other methods of methylation analysis, DNA can be damaged by harsh conditions of the methods, such as with bisulfite sequencing. By avoiding such methods in the prior art, the relatively low DNA damage improves sequencing results. Further, methods of true-paired sequencing (including sequencing of fragments without amplification) can improve alignment efficiency since complementary strands are in close proximity on a sequencing surface.

[00134] In some embodiments, the method comprises analyzing sequencing data for mismatched thymines and guanines in complementary library fragments. In some embodiments, this T and G mismatch indicates that the sequenced T was a methylated cytosine in the double-stranded target nucleic acid.

III. Methods of Methylation Analysis Based on Incorporation of Methylation-Specific Adapters

[00135] In some embodiments, a DNA library is prepared using methylationspecific adapters. In some embodiments, methods of incorporating adapters occurs during or after a standard method of library preparation. The present methods are not limited by the method of library preparation and can be used with tagmentation, fragmentation, or any other method of preparing a library of fragments. In some embodiments, the library preparation is by tagmentation. As used herein “tagmentation” is a process involves the modification of DNA by a transposome complex comprising transposase enzyme complexed with adaptors comprising transposon end sequence. Protocols available for tagmentation are well-known in the art, such as those described for the Illumina DNA Nextera® XT DNA Library Preparation Kit (see Nextera XT Reference Guide, Document 770-2012-011).

[00136] In some embodiments, a method of preparing a methylation-specific DNA library from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprises preparing double-stranded library fragments by fragmenting the doublestranded target DNA and incorporating a first adapter at one or both ends of the fragment; inducing a double-stranded break of a double-stranded library fragment at a methylated cytosine via enzymatic reactions; and ligating a methylation-specific adapter onto the library fragment at an end generated by the double-stranded break.

[00137] In some embodiments, a method of preparing a methylation-specific DNA library from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprises (a) preparing library fragments by fragmenting the double-stranded target DNA and incorporating a first adapter at one or both ends of the fragment; (2) converting a methylated cytosine in the library fragments into an abasic site; (3) generating a nick in the library fragment at the abasic site; (4) cleaving the DNA nucleotide opposite the nick to generate a double-stranded break in the library fragment; and (5) ligating a methylation-specific adapter onto the library fragment at the end generated by the double-stranded break.

[00138] In some embodiments, a methylation-specific adapter is different from a first adapter. For example, the first adapter and the methylation-specific adapter may comprise different sequencing primer binding sequences. Figure 1 shows a representative method using a methylation-specific adapter (i.e., a new adapter is added at methylated bases). Figures 4A and 4B provide more information on a representative methylation-specific adapter, which may comprise a methylation-specific index sequence, a sequencing primer binding sequence, and one or more additional adapter sequence (such as for PCR amplification). A methylation-specific adapter may comprise a variety of different adapter sequences as described herein.

[00139] In some embodiments, the first adapter comprises a first-read sequencing adapter sequence and the methylation-specific adapter comprises a second-read sequencing adapter sequence. In this way, sequencing results are biased towards those fragments that comprise a first adapter at one end and a methylation-specific adapter at the other end.

[00140] In some embodiments, a method comprises incorporating a first adapter at one end of the fragment and a second adapter at the other end of the fragment. In some embodiments, the first adapter comprises a first-read sequencing adapter sequence and the second adapter comprises a second-read sequencing adapter sequence. In some embodiments, the same second-read sequencing adapter sequence is comprised in the second adapter and in the methylation-specific adapter.

A. Adapters

[00141] In some embodiments, an adapter comprises a unique molecular identifier (UMI), primer sequence, anchor sequence, universal sequence, spacer region, index sequence, capture sequence, barcode sequence, cleavage sequence, sequencing-related sequence, and combinations thereof. As used herein, a “barcode sequence” refers to a sequence that may be used to differentiate samples. As used herein, a sequencing-related sequence may be any sequence related to a later sequencing step. A sequencing-related sequence may work to simplify downstream sequencing steps. For example, a sequencing-related sequence may be a sequence that would otherwise be incorporated via a step of ligating an adapter to nucleic acid fragments. In some embodiments, the adapter sequence comprises a P5 or P7 sequence (or their complement) to facilitate binding to a flow cell in certain sequencing methods.

[00142] Such sequences may be comprised in either a first adapter and/or a methylation-specific adapter. In some embodiments, a methylation-specific adapter comprises a methylation-specific index that allows a use to identify these adapters on fragment sequences.

[00143] In some embodiments, an adapter comprises a UMI.

[00144] In some embodiments, an adapter may comprise a tag. The terms “tag” as used herein refers to a portion or domain of a polynucleotide that exhibits a sequence for a desired intended purpose or application. Tag domains can comprise any sequence provided for any desired purpose. For example, in some embodiments, a tag domain comprises one or more restriction endonuclease recognition sites. In some embodiments, a tag domain comprises one or more regions suitable for hybridization with a primer for a cluster amplification reaction. In some embodiments, a tag domain comprises one or more regions suitable for hybridization with a primer for a sequencing reaction. It will be appreciated that any other suitable feature can be incorporated into a tag domain. In some embodiments, the tag domain comprises a sequence having a length from 5 bp to 200 bp. In some embodiments, the tag domain comprises a sequence having a length from 10 bp to 100 bp. In some embodiments, the tag domain comprises a sequence having a length from 20 bp to 50 bp. In some embodiments, the tag domain comprises a sequence having a length of 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150 or 200 bp.

[00145] The tag can include one or more functional sequences or components (e.g., primer sequences, anchor sequences, universal sequences, spacer regions, or index tag sequences) as needed or desired.

[00146] In some embodiments, the tag comprises a region for cluster amplification. In some embodiments, the tag comprises a region for priming a sequencing reaction.

[00147] In some embodiments a tag comprises an A14 primer sequence. In some embodiments, a tag comprises a B15 primer sequence.

[00148] The present disclosure is not limited to the type of adaptor sequences which could be used and a skilled artisan will recognize additional sequences which may be of use for library preparation and next generation sequencing.

[00149] As used herein, a “first-read sequencing adapter sequence” or “second- read sequencing adapter sequence” refer to different sequences that can bind to primers during sequencing reactions. Different sequencing protocols use different “first-read sequencing adapters” and “second-read sequencing adapters,” and these adapters vary by manufacturer and equipment. In other words, the order and identity of sequencing reads is arbitrary for a given sequencing method. Thus, “first-read” and “second-read” sequencing adapters, as used herein, simply require the presence of two read sequencing adapters; they do not require that a specific adapter must be used for the first sequencing read versus second sequencing read in any downstream sequencing method after preparation of a sequencing library. Those skilled in the art could choose to first run a downstream sequencing reaction with a “second-read” sequencing adapter and then a “first-read” sequencing adapter if they so choose.

[00150] As used herein, a “Y-adapter” refers to an double- stranded adapter that can be attached to the end of double-stranded DNA library fragments. The Y-adapter comprises two strands, wherein a portion of the two strands closer to the DNA library fragments are complementary while a portion of the two strands further from the DNA library fragments are non-complementary. Accordingly, the Y-adapter takes on a Y-shape. The portion of the two strands that are non-complementary do not base-pair with each other and are thus free to bind to other oligonucleotides. For example, one strand comprised in the portion of the two strands that are non-complementary sequence can bind to a capture oligonucleotide. Thus, a sequence comprised in a Y-adapter that can bind to a capture oligonucleotide on a solid support. In other words, a sequence comprised within the Y-adapter may be complementary to all or part of a capture oligonucleotide to allow seeding of fragments comprising the Y-adapter on a solid support. Figure 11A shows a representative Y-adapter.

B. Conversion of AP sites to Double-stranded Breaks in the DNA

[00151] In some embodiments, AP sites can be cleaved by an AP endonuclease to generate a double-stranded break in the nucleic acid. In some embodiments, the AP endonuclease is endonuclease VIII or T7 endonuclease.

C. Whole Genome Sequencing with MethSeq

[00152] As used herein, “WGS & MethSeq” refers to a method designed to allow for both whole genome sequencing of a target nucleic acid and sequencing to identify methylation sites in a target nucleic acid. A representative workflow of WGS & MethSeq is shown Figure 6.

[00153] Certain components of the WGS & MethSeq workflow allow for preparation of sequenceable fragments of non-methylated regions of a double-stranded target DNA and sequenceable fragments identifying the sequence and location of methylated cytosines in methylated regions of the same target DNA. In some embodiments, library fragment preparation incorporates a first adapter at one end of fragments and a second adapter at the other end of fragments. In some embodiments, the first adapter comprises a first-read sequencing adapter sequence and the second adapter comprises a second-read sequencing adapter sequence. In some embodiments, fragments are then subjected to methods described herein for preparing a double-stranded break at methylated cytosines, and a methylation-specific adapter (comprising a methylation-specific index sequence) is ligated to positions of the double-stranded break. In this way, the position of methylated cytosines in double-stranded target DNA can be marked by the presence in sequencing data of a methylation-specific index sequence in sequenced fragments.

[00154] In some embodiments, the same second-read sequencing adapter sequence is comprised in the second adapter and in the methylation-specific adapter. This can allow for sequencing of both fragments that did not comprise a methylated cytosine (and will comprise a second adapter) and those fragments that did comprise a methylated cytosine (and will comprise a methylation-specific adapter). Accordingly, a user can choose a WGS & MethSeq method if they wish to gather both sequencing data on the full sequence of the double-stranded target DNA and data on positions of methylated cytosines.

D. MethSeq Enrichment

[00155] As used herein, “MethSeq Enrichment” refers to a method designed to enrich for sequencing library fragment that identify methylation sites in a target nucleic acid. A representative workflow of MethSeq Enrichment is shown Figure 6.

[00156] Certain components of the MethSeq Enrichment workflow allow for preparation of sequenceable fragments identifying the sequence and location of methylated cytosines in methylated regions a double-stranded target DNA, without preparing sequenceable fragments from regions of the same target DNA that are non-methylated. In some embodiments, library fragment preparation incorporates a first adapter at both ends of fragments. In some embodiments, fragments are then subjected to methods described herein for preparing a doublestranded break at methylated cytosines, and a methylation-specific adapter (comprising a methylation-specific index sequence) is ligated to positions of the double-stranded break. In some embodiments, the first adapter comprises a first-read sequencing adapter sequence, and the methylation-specific adapter comprises a second-read sequencing adapter sequence. In this way, the position of methylated cytosines in double-stranded target DNA can be marked by the presence in sequencing data of a methylation-specific index sequence in sequenced fragments.

[00157] A user may choose to use a MethSeq Enrichment method when they are interested in identifying methylated cytosines, but the sequence of the double-stranded DNA is already known. For example, if a user has enriched for double-stranded target DNA of interest in determining prognosis based on methylation status of known cancer-related genes, they may then use MethSeq Enrichment to specifically determine the positions and levels of methylated cytosines. In such a case, the user may want only the methylation data and can save time and resources by not sequencing fragments library fragments that lack methylation.

E. Solid support

[00158] In some embodiments, library fragments or other DNA fragments used in methods described herein are immobilized on a solid support. In some embodiments, library fragments are immobilized on the solid support via a sequence comprised in at least one adapter. In some embodiments, the at least one adapter is at one or both ends of the library fragment.

[00159] Methods of immobilizing library fragments or other DNA fragments (and later releasing them) are well-known in the art and can be useful for washing steps and other parts of workflows.

[00160] Certain embodiments may make use of solid supports comprised of an inert substrate or matrix (e.g. glass slides, polymer beads etc.) which has been functionalized, for example by application of a layer or coating of an intermediate material comprising reactive groups which permit covalent attachment to biomolecules, such as polynucleotides. Examples of such supports include, but are not limited to, polyacrylamide hydrogels supported on an inert substrate such as glass, particularly polyacrylamide hydrogels as described in WO 2005/065814 and US 2008/0280773, the contents of which are incorporated herein in their entirety by reference. In such embodiments, the biomolecules (e.g. polynucleotides) may be directly covalently attached to the intermediate material (e.g. the hydrogel) but the intermediate material may itself be non-covalently attached to the substrate or matrix (e.g. the glass substrate). The term “covalent attachment to a solid support” is to be interpreted accordingly as encompassing this type of arrangement.

[00161] The terms “solid surface,” “solid support” and other grammatical equivalents herein refer to any material that is appropriate for or can be modified to be appropriate for the attachment of the transposome complexes. As will be appreciated by those in the art, the number of possible substrates is very large. Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, Teflon™, etc.), polysaccharides, nylon or nitrocellulose, ceramics, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses, plastics, optical fiber bundles, and a variety of other polymers. Particularly useful solid supports and solid surfaces for some embodiments are located within a flow cell apparatus. Exemplary flow cells are set forth in further detail below.

[00162] In some embodiments, the solid support comprises a patterned surface suitable for immobilization of transposome complexes in an ordered pattern. A “patterned surface” refers to an arrangement of different regions in or on an exposed layer of a solid support. For example, one or more of the regions can be features where one or more transposome complexes are present. The features can be separated by interstitial regions where transposome complexes are not present. In some embodiments, the pattern can be an x-y format of features that are in rows and columns. In some embodiments, the pattern can be a repeating arrangement of features and/or interstitial regions. In some embodiments, the pattern can be a random arrangement of features and/or interstitial regions. In some embodiments, the transposome complexes are randomly distributed upon the solid support. In some embodiments, the transposome complexes are distributed on a patterned surface. Exemplary patterned surfaces that can be used in the methods and compositions set forth herein are described in US App. No. 13/661,524 or US Pat. App. Publ. No. 2012/0316086 Al, each of which is incorporated herein by reference.

[00163] In some embodiments, the solid support comprises an array of wells or depressions in a surface. This may be fabricated as is generally known in the art using a variety of techniques, including, but not limited to, photolithography, stamping techniques, molding techniques and microetching techniques. As will be appreciated by those in the art, the technique used will depend on the composition and shape of the array substrate.

[00164] The composition and geometry of the solid support can vary with its use. In some embodiments, the solid support is a planar structure such as a slide, chip, microchip and/or array. As such, the surface of a substrate can be in the form of a planar layer. In some embodiments, the solid support comprises one or more surfaces of a flow cell. The term “flow cell” as used herein refers to a chamber comprising a solid surface across which one or more fluid reagents can be flowed. Examples of flow cells and related fluidic systems and detection platforms that can be readily used in the methods of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; US 7,057,026; WO 91/06678; WO 07/123744; US 7,329,492; US 7,211,414; US 7,315,019; US 7,405,281, and US 2008/0108082, each of which is incorporated herein by reference.

[00165] In some embodiments, the solid support or its surface is non-planar, such as the inner or outer surface of a tube or vessel. In some embodiments, the solid support comprises microspheres or beads. By “microspheres” or “beads” or “particles” or grammatical equivalents herein is meant small discrete particles. Suitable bead compositions include, but are not limited to, plastics, ceramics, glass, polystyrene, methylstyrene, acrylic polymers, paramagnetic materials, thoria sol, carbon graphite, titanium dioxide, latex or cross-linked dextrans such as Sepharose, cellulose, nylon, cross-linked micelles and teflon, as well as any other materials outlined herein for solid supports may all be used. “Microsphere Selection Guide” from Bangs Laboratories, Fishers Ind. is a helpful guide. In certain embodiments, the microspheres are magnetic microspheres or beads.

[00166] Attachment of a nucleic acid to a support, whether rigid or semi-rigid, can occur via covalent or non-covalent linkage(s). Exemplary linkages are set forth in US Pat. Nos. 6,737,236; 7,259,258; 7,375,234 and 7,427,678; and US Pat. Pub. No. 2011/0059865 Al, each of which is incorporated herein by reference. In some embodiments, a nucleic acid or other reaction component can be attached to a gel or other semisolid support that is in turn attached or adhered to a solid-phase support. In such embodiments, the nucleic acid or other reaction component will be understood to be solid phase.

[00167] In some embodiments, the solid support comprises microparticles, beads, a planar support, a patterned surface, or wells. In some embodiments, the planar support is an inner or outer surface of a tube.

[00168] In some embodiments, a solid support has a library of tagged DNA fragments immobilized thereon prepared. Such a library can then be processed using one or more method of methylation analysis described herein. In some embodiments, library fragments are prepared by surface-bound transposomes. In some embodiments, on-bead tagmentation allows for a more uniform tagmentation reaction compared to in-solution tagmentation reactions.

[00169] The density of these surface-bound transposomes can be modulated by varying the density of the first polynucleotide or by the amount of transposase added to the solid support. For example, in some embodiments, the transposome complexes are present on the solid support at a density of at least 10³, 10⁴, 10³, or 10⁶ complexes per mm².

IV. Methods of Methylation Sequencing Using Library Fragments Comprising One or More Hairpin Adapters

[00170] In some embodiments, methylation analysis includes use of hairpin adapters.

[00171 ] As used herein, a “hairpin” refers to a nucleic acid comprising a pair of nucleic acid sequences that are at least partially complementary to each other. These two nucleic acid sequences that are at least partially complementary can bind to each other and mediate folding of a nucleic acid. In some embodiments, the two nucleic acid sequences that are at least partially complementary generate a nucleic acid with a hairpin secondary structure.

[00172] A “hairpin adaptor,” as used herein, refers to an adaptor that comprises at least one pair of nucleic acid sequences that are at least partially complementary to each other. In some embodiments, a hairpin adaptor has a folded secondary structure. In some embodiments, base pairing between a pair of nucleic acid sequences that are at least partially complementary to each other “locks” the adaptor into a hairpin secondary structure. A hairpin adapter can be incorporated at either end of a double- stranded DNA fragment.

[00173] In the present methods, a hairpin adapter can allow for two strands of a double-stranded DNA fragment to dehybridize into a single-stranded confirmation, while retaining the association of the two strands. In other words, the two strands of a double-stranded fragment remain attached to each other while dehybridized when a hairpin adapter is used.

[00174] In some embodiments, a hairpin adapter comprises a modification to block extension. In this way, extension can be performed selectively in a desired manner.

[00175] In some embodiments, the modification to block extension is a nonnuclei c acid moiety.

[00176] In some embodiments, the hairpin adapter comprises a cleavable linker. As used herein, a “cleavable” linker may be one that is generated in an enzymatic reaction. For example, one or more cytosines in a hairpin adapter can be converted to uracils (as shown in step 6 of Figure 9), and these uracils can then be subject to BER to cleave the hairpin adapter, leaving a double-stranded DNA fragments wherein the two strands are hybridized together but are not linked via a hairpin linker.

[00177] In some embodiments, a hairpin adapter comprises one or more cytosines. In some embodiments, a hairpin adapter comprises 2 or 3 cytosines.

[00178] In some embodiments, one or more hairpin adapter is cleaved after the second gap filling.

[00179] In some embodiments, a method comprises methylation analysis incorporating dehybridizing and protecting each single DNA strand comprised in a doublestranded DNA fragment with a hairpin adapter at least one end, as shown in Figure 9. In some embodiments, a method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprises (a) dehybridizing a section of the double-stranded target DNA into the two separate single DNA strands; (b) deaminating cytosines in the two separate single DNA strands, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA; (c) rehybridizing the two separate single DNA strands into doublestranded DNA; (d) performing USER to remove uracils; (e) performing gap filling; and (f) preparing library fragments by fragmenting the target DNA. Steps of dehybridizing, deamination cytosines, and performing USER can be performed using methods described in Section I and II above.

[00180] In some embodiments, a method comprises steps of sequentially dehybridizing and protecting a first single DNA strand and then a second single DNA strand that are comprised in a double-stranded DNA fragment with a hairpin adapter at least one end. Such a method is shown in Figure 9. As used herein, “protecting” a single DNA strand means to hybridize this single DNA strand with another strand and thus “protect” this single DNA strand from a deamination reaction. In some embodiments, reactions, such as deamination, preferentially occur on a single DNA strand, as compared to a double-stranded DNA region. In this way, hybridizing a single DNA strand to a complementary strand can “protect” this strand. In some embodiments, protecting is performed by extended a complementary DNA strand from a primer region in a hairpin adapter, such that a single DNA strand is bound to the extension product of a complementary single DNA strand. In some embodiments, a complementary DNA strand is added to a reaction mix and can bind to a single DNA strand. As shown in Figure 9, a “new strand” (such as an extended complementary strand) can be dehybridized, such as with an increase in heat. Since the two strands of the double-stranded fragment are attached via the hairpin adapter, they remain attached and can rehybridize to each other (such as with a cooling of the reaction mixture).

[00181] In some embodiments, a method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprises (a) preparing library fragments by fragmenting the double-stranded target DNA and incorporating one or more adapters at both ends of the double-stranded fragments, wherein at least one adapter is a hairpin adapter that binds both strands at one end of the double-stranded fragment; (b) dehybridizing the doublestranded DNA fragments into a first single DNA strand and a second single DNA strand linked by the hairpin adapter; (c) protecting the first single DNA strand with a first complementary DNA strand, thereby generating a double-stranded DNA comprising the first single DNA strand; (d) deaminating cytosines in the second single DNA strand; (e) dehybridizing the first complementary DNA strand; (f) rehybridizing the first single DNA strand and second single DNA strand into double-stranded DNA; (g) performing USER to remove uracils in the second single DNA strand; (h) performing a first gap filling; (i) dehybridizing the double-stranded DNA fragments into the first single DNA strand and the second single DNA strand linked by the hairpin adapter; (j) protecting the second single DNA strand with a first complementary DNA strand, thereby generating a double-stranded DNA comprising the second single DNA strand;

(k) deaminating cytosines in the first single DNA strand; (1) dehybridizing the second complementary strand; (m) rehybridizing the first single DNA and second single DNA strand into double-stranded DNA; (n) performing USER to remove uracils in the first single DNA strand; and (o) performing a second gap filling.

[00182] Protecting of the first and second single DNA strand can be performed in a number of ways, and the method is not limited by the choice of how to protect a single strand. In some embodiments, protecting the first single DNA strand with a first complementary DNA strand comprises binding an extension primer to an adapter attached to the first single DNA strand and extending a first extended strand of DNA complementary to the first single DNA. In some embodiments, protecting the first single DNA strand with a first complementary DNA strand comprises binding an oligonucleotide comprising the first complementary DNA strand to the first single DNA strand. In some embodiments, protecting the second single DNA strand with a second complementary DNA strand comprises binding an extension primer to an adapter attached to the second single DNA strand and extending a second extended strand of DNA complementary to the first single DNA. In some embodiments, protecting the second single DNA strand with a second complementary DNA strand comprises binding an oligonucleotide comprising the second complementary DNA strand to the second single DNA strand. A. Methods with Fragments Comprising Two Hairpin Adapters

[00183] In some embodiments, a hairpin adapter is attached to both ends of a library fragment. A method wherein a hairpin adapter is attached to both ends of a library fragment is shown in Figure 10.

[00184] When a hairpin adapter is attached to both ends of a library fragment, the two strands of the double-stranded fragment can be opened (such as with an increase in temperature of the reaction mixture), a first DNA strand can be protected, a second DNA strand can be acted on by an enzyme (such as APOBEC3A for deamination), the two DNA strands can be rehybridized, and BER can be performed. Then the two strands of the double-stranded fragment can be opened again, the second DNA strand can be protected, the first DNA strand can be acted on by an enzyme (such as APOBEC3A for deamination), the two DNA strands can be rehybridized, and BER can be performed again. At this stage, the hairpin adapters may be cleaved (such as be cleavage of the linkers marked “X” in Figure 10) to produce a doublestranded library fragment without any hairpin adapters that is ready for sequencing.

V. Methods of Methylation Analysis of Concatenated Sequencing Templates or Library Fragments Comprising Multiple Inserts

[00185] In some embodiments, the present methods are used with concatenated sequencing templates comprising multiple inserts, as described further herein.

A. Library Fragments Comprising Multiple Inserts

[00186] In some embodiments, library fragments comprise multiple insert sequences, wherein each insert comprises a portion of one or more target nucleic acid. A single polynucleotide (i.e., library fragment) comprising multiple insert sequences allows for sequencing of multiple regions of the one or more target nucleic acid in the same region of a flowcell. In this way, more regions of the one or more target nucleic acid can be sequenced without the need for a larger flowcell.

[00187] In some embodiments, the polynucleotides are generated from 2 separate library products based on hybridizing of a hybridization sequence in one library product to the complement of a hybridization sequence in the other library product to form a hybridized adduct, followed by elongation to produce a concatenated nucleic acid sequencing template. Such methods are described in Application No. PCT/US2021/055878, which is incorporated herein by reference in its entirety.

[00188] In some embodiments, library fragments comprising more than one insert sequence can be used in methods of methylation analysis as described herein. For example, one or both ends of a library fragment comprising multiple inserts may comprise a hairpin adapter, and methods of methylation analysis described herein for library fragments comprising hairpin adapters may be performed.

B. Methylation Analysis of Concatenated Sequencing Templates

[00189] In some embodiments, concatenated sequencing templates comprising an insert sequence and a copy of the same insert may be used for methylation analysis. Exemplary concatenated sequencing templates are described in Application No. PCT/US2021/055878, which is incorporated herein by reference in its entirety. These concatenated sequencing templates may comprise “two copies” of an insert sequence, however, a copy of an insert sequence would not comprise modified nucleotides (such as modified cytosines) in the absence of conditions to promote them.

[00190] In some embodiments, a concatenated sequencing template may be subjected to methylation analysis using a SSB and helicase or recombinase with a cytosine deaminase, followed by BER.

[00191] A number of different of preparing sequencing templates with multiple inserts would be known to those skilled in the art, such as Duplex Sequencing (Schmitt, et al. Proc. Natl. Acad. Sei. U. S. A. 109:14508-14513 (2012), Duplex Proximity Sequencing (Pro-Seq, as described in Pel et al. PLoS One 13:1-19 (2018)), CypherSeq (Gregory et al. Nucleic Acids Res. 44:e22 (2016)), o2n-seq (Wang et al. Nat. Commun. 8, 15335 (2017)), Circle Sequencing (Lou et al., Proc. Natl. Acad. Sci. U. S. A. 110:19872-19877 (2013)), Bot Sequencing (Hoang et al. Proc. Natl. Acad. Sci. U. S. A. 113:9846-9851 (2016) and Abascal et al. Nature 593, 405-410 (2021)), and Bae et al., bioRxiv, 10.1101/2021.06.11.448110, posted June 12, 2021, each of which is incorporated herein in its entirety. Methylation analysis described herein may be used with any of these types of sequencing templates comprising multiple inserts to assess methylation status of an insert sequence from a target nucleic acid and a copy of this insert (wherein the copy does not comprise modified cytosines). EXAMPLES

Example 1. Cytosine Deamination and Base Excision Repair to Identify Methylated Cytosines

[00192] A method of methylation analysis may first use a cytosine deaminase to deaminate un-methylated cytosines to form uracils and to deaminate methylated cytosines to form thymines. Following cytosine deaminase, the DNA will be treated with a uracil DNA glycosylase (UDG) and an apyrimidinic (AP) endonuclease to mediate base excision repair (BER) followed by polymerase treatment to remove the uracils and replace with a cytosine as shown in Figure 3B. In this way, the methylated cytosines will be retained as thymines.

[00193] In this way, the final products will have methylated cytosines converted to thymines while unmethylated cytosines will remain as cytosines. This method has an advantage over existing methylation analyses by limiting the amount of DNA damage compared to other methods, as the present method only converts methylated cytosines to thymines, which will not affect sequencing diversity or alignment efficiency. Further, this procedure will require less time than other methylation identification methods.

Example 2. Methods Using Methylation-Specific Indexes

[00194] Methylated bases can be converted to double-stranded breaks that are then used to generate methylation-specific library constructs. The presence of a methylation-specific index within a methylation-specific adapter leads to methylated regions being easily identified, with the actual methylated base being at the first position sequenced after the primer. A short, e.g. 1x36, run can thus provide the locus of a methylation event, while a full run using standard oligonucleotides will provide WGS information without requiring a separate prep. When the method is used to evaluate WGS information and methylation events, the method may be referred to as “MethSeq” or “WGS & MethSeq” while other methods described herein may be used to enrich for sequencing of methylation events and may be referred to as “MethSeq Enrichment.”

[00195] Both MethSeq and MethSeq Enrichment include preparation of a doublestranded break at a position that corresponded to a methylated cytosine and incorporating a methylation-specific adapter at the site of the double-stranded break (Figures 4A and 4B). The methylation-specific adapter incorporated at the position of the double-stranded break may comprise a methylation-specific index (used to identify that fragments comprising this index were methylated and led to a double-stranded break), a sequencing primer sequence, and an adapter sequence (such as an adapter for mediating binding to immobilized oligonucleotides on the surface of a flow cell). As shown in Figure 5, double-stranded breaks generated by BER can generate a single G overhang, which can be used to mediate preferential ligation of an adapter with a complementary C overhang.

[00196] Figure 6 shows an outline of MethSeq (WGS & MethSeq) and MethSeq Enrichment. The user can determine whether they prefer to collect sequencing data on nonmethylated regions (i.e., WGS results from WGS & MethSeq) or only information on methylated regions (MethSeq Enrichment).

[00197] As shown in Figure 6 for a method of MethSeq (WGS & MethSeq), fragments are prepared with adapters at both ends. The adapters at the different ends of the fragments may have different sequences to allow for paired-end sequencing. For example, commercially available means of library preparation can be used such as Illumina Nextera XT Library Preparation Kit or Illumina DNA Prep with Enrichment. Following this library preparation, each fragment can be sequenced using paired-end sequencing.

[00198] When double-stranded breaks are generated at the site of methylated cytosine, a different adapter can be ligated at the site of this break. As this different adapter comprises a methylation-specific index sequence, the user can identify (1) fragments that had comprised a methylated cytosine and (2) identify that the methylated cytosine was at the first position of the fragment after the adapter ligation site.

[00199] Alternatively, for a method of MethSeq Enrichment, the user can incorporate a given first adapter at both ends of all fragments (for example, by performing a symmetric tagmentation where all transposon complexes comprise the first adapter in the transferred strand to tag all fragment ends, as described in WO 2015168161, which is incorporated herein in its entirety). These fragments with the same adapter at both ends will not sequence. The library of fragments can then be subjected to methods of preparing doublestranded breaks at the positions of methylated cytosines, followed by ligation of a methylationspecific adapter at all these breaks. In a representative example, the first adapter could comprise a first-read sequencing adapter sequence, while the methylation-specific adapter could comprise a second-read sequencing adapter sequence. In this way, only fragments that comprised a methylated cytosines would incorporate a methylation-specific adapter, and only these fragments would cluster and sequence. In this way, MethSeq Enrichment can dramatically reduce the amount of DNA to be sequenced, which reduces time and cycle number required for sequencing.

Example 3. Identification of Methylated Cytosines Using Dehybridization of Genomic DNA

[00200] Identification of methylated cytosines in a double-stranded target nucleic acid (such as dsDNA) can be performed using a (i) helicase or recombinase and (ii) a single stranded binding protein (SSB) to slowly unzip (i.e., dehybridize) the fragment. Both helicases and recombinases can dehybridize dsDNA, while SSBs can stabilize single DNA strands (such as to maintain a replication bubble).

[00201] In this method, as shown in Figure 7, cytosine deamination (step 1), reannealing (step 2), and BER (step 3) can be performed before preparation of library fragments (step 4). In other words, conversion of methylated cytosines to thymines can be performed on the target nucleic acid, and then library fragments are prepared.

[00202] As a “bubble” forms where the two strands of the nucleic acid are dehybridized, cytosine deamination (such as with APOBEC) can occur to convert methylated cytosines into thymines and nonmethylated cytosines into uracils (step 1 of Figure 7). The two strands of the nucleic acid can then be reannealed (as with a change in temperature, as shown in step 2 of Figure 7).

[00203] BER can then be performed with a mixture of UDG, AP endonuclease (AP Endo) and a polymerase (as shown in step 3 of Figure 7). In this step, nonmethylated cytosines that were converted into uracils will be converted back to cytosines, while methylated cytosines will remain as converted into thymines. The thymines corresponding to positions that had been methylated cytosines will be mispaired with a guanine in the opposite strand.

[00204] Library preparation can be performed (as shown in step 4 of Figure 7) , such as with fragmenting and adding P5 and P7 adapter sequences, followed by sequencing. Based on sequencing results, positions wherein a thymine is paired with a guanine in the complementary strand will indicate a position that was a methylated cytosine in the target nucleic acid. In other words, this T/G mismatch in final sequencing results can be used to identify a position that was a methylated cytosine in the target nucleic acid before the deamination/BER method. In contrast, positions that were originally a nonmethylated cytosine in the target nucleic acid will still be paired with a guanine in the complementary strand (i.e., a C/G pairing). In this way, paired-end sequencing can be used to determine the locations of methylated cytosines in the starting target nucleic acid by analyzing results for T/G pairing in opposite strands from the same double-stranded fragment.

[00205] As shown in Figures 11 A-l 1C, this analysis can be simplified when double-stranded fragments are seeded onto a flow cell before amplification, without denaturing the double-stranded fragments before seeding. This is because complementary strands of generated from a given double-stranded fragment will be immobilized in close proximity on the flow cell, and analysis can incorporate this aspect to determine T/G mismatching that is present in single strands that represent complementary strands from the same double-stranded fragment.

[00206] A single-pot reaction can be performed with a helicase or recombinase plus SSB as shown in Figure 8. In this method, the unzipping of the duplex, the cytosine deamination, and the BER can be performed in a single reaction vessel (step 1 of Figure 8) followed by library preparation (step 2 of Figure 8) and sequencing as described above to identify positions that were methylated cytosines in the target nucleic acid.

Example 4. Identification of Methylated Cytosines Using Dehybridization of Double- Stranded Fragments Comprising a Single Hairpin Adapter

[00207] Identification of methylated cytosines can be performed with a doublestranded library fragment comprising a hairpin adapter at one end of the fragment. Such a library fragment can be prepared from a target nucleic acid and can comprise an insert sequence comprising a sequence from the target nucleic acid. Such fragments can be generated with standard methods, such as tagmentation or fragmentation and adapter ligation.

[00208] Figure 9 shows a method of identifying methylated cytosines in a library fragment that has a double-stranded adapter comprising P7/P7’ (i.e., SEQ ID NO: 23 and its complement) at one end and a hairpin adapter comprising P5/P5’ (i.e., SEQ ID NO: 22 and its complement) at the other end. When the fragment is “opened” (i.e., the two strands of the insert are dehybridized as with heat), a primer can be bound to a sequence in one adapter (such as a P7 primer that can bind to a P7’ sequence comprised in an adapter) and extended with a polymerase (as shown in Figure 9, step 1). In this way a complementary strand can be extended that protects the strand comprising the P7’ sequence. This complementary strand may be extended until a linker comprising a non-nucleic acid moiety (i.e., the “linker” shown in Figure 9). In this way, the “CCC” sequence comprised in the hairpin is protected from the cytosine deamination (in the next step of the method) by the extended strand.

[00209] After preparing this complementary strand, deamination of the strand comprising the P7 sequence can be performed with a cytosine deaminase (such as APOBEC3A) to convert methylated cytosines to thymines and nonmethylated cytosines to uracils (as shown in Figure 9, step 2). After the deamination, the fragment can be “closed” (i.e., the two strands of the insert are rehybridized as with a decrease in temperature, as shown in Figure 9, step 3). At this point, BER can be performed to excise uracils (generated from nonmethylated cytosines) using a UDG/APEndo/polymerase mixture, with the polymerase gap-fdling excision sites with cytosines (as shown in Figure 9, step 4). In this way, the strand comprising the P7 sequence has now been modified such that methylated cytosines that were originally in this strand are now thymines.

[00210] Next, the hairpin can be opened as with an increase in heat, and a primer that binds an sequence in another adapter (such as a P5 adapter that binds P5’, the complement of P5) can be bound to the library fragment to extend a complementary strand that protects the strand comprising the P7 sequence using a polymerase (as shown in Figure 9, step 5). This strand can be extended until a non-nucleotide linker, which would not protect the “CCC” sequence from deamination. In this way, the CCC in the hairpin can be used to generate a cleavage site in the hairpin.

[00211] The strand comprising the P7’ sequence (the complement of SEQ ID NO: 23, P7) is now free to be cytosine deaminated with APOBEC3A to convert methylated cytosines to thymines and nonmethylated cytosines to uracils (Figure 9, step 6). Further, the CCC comprised in the hairpin can be converted to UUU, as the CCC is not protected by a complementary strand. After the deamination, the fragment can again be “closed,” and the extended strand can be dehybridized (Figure 9, step 7). BER can be performed with UDG/APEndo/polymerase to excise uracils (generated from nonmethylated cytosines), and the polymerase can then gap-fill the excision site with cytosines (Figure 9, step 8). The BER can also convert any uracils that had been generated by deamination of cytosines comprised in the linker (shown as CCC to UUU conversion) to allow the hairpin to be cleaved during the BER reaction. [00212] After these reactions, the modified fragment (i.e., fragment subjected to the method outlined in Figure 9) can be seeded onto a flow cell and sequenced. This seeding may be performed with further denaturing or amplification, such that true paired-end sequencing can be performed. In the sequencing results, positions with a mismatch between T/G in complementary single-stranded fragments can be used to identify that this T corresponds to the position of a methylated cytosine in the target nucleic acid used to prepare the library.

Example 5. Identification of Methylated Cytosines Using Dehybridization of Double- Stranded Fragments Comprising Two Hairpin Adapters

[00213] Identification of methylated cytosines can also be performed using doublestranded fragments comprising two hairpin adapters, i.e., a DNA library pin as shown in Figure 10. Each hairpin adapter can comprise a cleavable, non-nucleotide linker (“X”), where elongation by a polymerase will stop.

[00214] The double-stranded fragment can be “opened” by denaturing the two insert sequences, such as with an increase in heat. Since each end of the fragment comprises a hairpin adapter that attaches the two complementary insert sequences, the denaturing does not allow for the two strands to be released from fragment. Binding of a primer (such as P7-B15 as shown in Figure 10, step 1) allows for elongation to protect a first strand of the double-stranded insert.

[00215] Then, the single-stranded insert sequence (i.e., the second strand comprising P5-B15 and P7’-A14’) can be deaminated with APOBEC3A to convert nonmethylated cytosines to uracils and methylated cytosines to thymines, followed by closing of the DNA library pin and dehybridization of the elongated strand (Figure 10, steps 2 and 3). BER can then be performed with UDG/APEndo/polymerase to remove the uracils and fill-in the excision gaps with cytosines.

[00216] Then the DNA library pin can be reopened (such as with an increase in heat) and a P7-A14 primer can be bound to allow for elongation of a complementary strand to the second strand of the double-stranded insert sequence (Figure 10, step 5). After deamination of nonmethylated cytosines to uracils and methylated cytosines to thymines (Figure 10, step 6), the DNA library pin can be closed and the elongated strand can be dehybridized (Figure 10, step 7). BER can be performed with UDG, APEndo, and a polymerase to excise uracils and gap-fill the excision site with a cytosine (Figure 10, step 8). The cleavable linker X can then be cleaved to produce a modified double-stranded library fragment that is ready to be sequenced.

[00217] After seeding of the modified fragments (i.e., fragment subjected to the method outlined in Figure 10) onto a flow cell, sequencing can be performed with true paired- end sequencing. In the sequencing results, positions with a mismatch between T/G in complementary single-stranded fragments will indicate that this T corresponds to the position of a methylated cytosine in the target nucleic acid used to prepare the library.

EQUIVALENTS

[00218] The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the embodiments. The foregoing description and Examples detail certain embodiments and describes the best mode contemplated by the inventors. It will be appreciated, however, that no matter how detailed the foregoing may appear in text, the embodiment may be practiced in many ways and should be construed in accordance with the appended claims and any equivalents thereof.

[00219] As used herein, the term about refers to a numeric value, including, for example, whole numbers, fractions, and percentages, whether or not explicitly indicated. The term about generally refers to a range of numerical values (e g., +/-5-10% of the recited range) that one of ordinary skill in the art would consider equivalent to the recited value (e.g., having the same function or result). When terms such as at least and about precede a list of numerical values or ranges, the terms modify all of the values or ranges provided in the list. In some instances, the term about may include numerical values that are rounded to the nearest significant figure.

Claims

What is Claimed is:

1. A method of preparing a methylation-specific DNA library from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising: a. preparing double-stranded library fragments by fragmenting the double-stranded target DNA and incorporating a first adapter at one or both ends of the fragment; b. inducing a double-stranded break of a double-stranded library fragment at a methylated cytosine via enzymatic reactions; and c. ligating a methylation-specific adapter onto the library fragment at an end generated by the double-stranded break, wherein the methylation-specific adapter is different from the first adapter.

2. The method of claim 1, wherein the enzymatic reactions comprise base excision repair and endonuclease cleavage.

3. The method of claim 1 or claim 2, wherein the base excision repair reaction is performed with a mix of a DNA glycosylase and an apurinic/apyrimidinic endonuclease.

4. The method of any one of claims 2 or claim 3, wherein the endonuclease cleavage is performed by an endonuclease.

5. The method of any one of claims 1-4, wherein the method is performed in a single reaction vessel.

6. The method of any one of claims 1-5, wherein the method does not employ a bead-based purification step.

7. A method of preparing a methylation-specific DNA library from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising: a. preparing library fragments by fragmenting the double-stranded target DNA and incorporating a first adapter at one or both ends of the fragment; b. converting a methylated cytosine in the library fragments into an abasic site; c. generating a nick in the library fragment at the abasic site; d. cleaving the DNA nucleotide opposite the nick to generate a double-stranded break in the library fragment; and e. ligating a methylation-specific adapter onto the library fragment at the end generated by the double-stranded break, wherein the methylation-specific adapter is different from the first adapter.

8. The method of claim 7, wherein the methylated cytosine is converted into an abasic site by an enzymatic reaction.

9. The method of claim 8, wherein the method is performed in a single reaction vessel.

10. The method of any one of claims 1-9, wherein the method does not employ a bead-based purification step.

11. The method of any one of claims 8-10, wherein the enzymatic reaction is base excision repair to excise the methylated cytosine and generate an abasic site.

12. The method of claim 11, wherein the base excision repair reaction is performed with a mix of a DNA glycosylase and an apurinic/apyrimidinic endonuclease.

13. The method of claim 12, wherein the apurinic/apyrimidinic endonuclease is T7 endonuclease.

14. The method of any one of claims 7-13, wherein the methylated cytosine is converted into an abasic site by a chemical reaction followed by an enzymatic reaction.

15. The method of claim 14, wherein the chemical reaction converts a methylated cytosine into a uracil and the enzymatic reaction is excising the uracil is performed by a uracil-specific excision reagent (USER).

16. The method of claim 15, wherein the chemical reaction is borane reduction.

17. The method of claim 16, wherein the borane reduction is TET-assisted pyridine borane sequencing reduction.

18. The method of any one of claims 15-17, wherein the USER is a mixture of uracil DNA glycosylase and endonuclease VIII.

19. The method of any one of claims 1-18, wherein the methylated cytosine is comprised within a CpG site.

20. The method of claim 19, wherein the CpG site is comprised within a region having a GC content of 50% or greater, a length greater than 200bp, and a ratio of observed to expected CpG dinucleotides of greater than 0.6.

21. The method of claims 7-20, wherein the double-stranded break results in a single guanine overhang on each fragment.

22. The method of claim 21, wherein the methylation-specific adapter comprises a single cytosine overhang.

23. The method of any one of claims 1-22, wherein the methylation-specific adapter comprises a methylation index sequence.

24. The method of any one of claims 1-23, wherein the first adapter comprises a first-read sequencing adapter sequence and the methylation-specific adapter comprises a second-read sequencing adapter sequence.

25. The method of any one of claims 1-24, comprising incorporating a first adapter at one end of the fragment and a second adapter at the other end of the fragment.

26. The method of claim 25, wherein the first adapter comprises a first-read sequencing adapter sequence and the second adapter comprises a second-read sequencing adapter sequence.

27. The method of claim 26, wherein the same second-read sequencing adapter sequence is comprised in the second adapter and in the methylation-specific adapter.

28. The method of any one of claims 1-27, comprising incorporating the first adapter at both ends of each fragment.

29. The method of any one of claims 1-28, wherein the library preparation is by tagmentation.

30. The method of any one of claims 1-29, wherein the library fragments are sequenced after ligating the methylation-specific adapter.

31. The method of claim 30, wherein the sequencing is short-cycle sequencing.

32. The method of claim 31, wherein the short-cycle sequencing comprises less than 250, less than 100, or less than 50 cycles.

33. The method of any one of claims 23-32, wherein the methylation index sequence is used to identify the genome location of the methylated cytosine.

34. The method of claim 33, wherein the nucleotide within a given fragment adjacent to the methylation-specific adapter corresponds to a nucleotide that was adjacent to the methylated cytosine in the target nucleic acid.

35. The method of claim 34, wherein the nucleotide that was adjacent to the methylated cytosine was 5’ of the methylated cytosine.

36. The method of claim 34, wherein the nucleotide that was adjacent to the methylated cytosine was 3’ of the methylated cytosine.

37. The method of any one of claims 30-36, wherein the method does not require amplification.

38. A method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising: a. dehybridizing a section of the double-stranded target DNA into the two separate single DNA strands; b. deaminating cytosines in the two separate single DNA strands, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA; c. rehybridizing the two separate single DNA strands into double-stranded DNA; d. performing USER to remove uracils; e. performing gap filling; and f. preparing library fragments by fragmenting the target DNA.

39. A method of preparing a DNA library for identifying methylated cytosines from a double-stranded target DNA comprising both methylated and unmethylated cytosines comprising: a. preparing library fragments by fragmenting the double-stranded target DNA and incorporating one or more adapters at both ends of the double-stranded fragments, wherein at least one adapter is a hairpin adapter that binds both strands at one end of the double-stranded fragment; b. dehybridizing the double-stranded DNA fragments into a first single DNA strand and a second single DNA strand linked by the hairpin adapter; c. protecting the first single DNA strand with a first complementary DNA strand, thereby generating a double- stranded DNA comprising the first single DNA strand; d. deaminating cytosines in the second single DNA strand; e. dehybridizing the first complementary DNA strand; f. rehybridizing the first single DNA strand and second single DNA strand into double-stranded DNA; g. performing USER to remove uracils in the second single DNA strand; h. performing a first gap filling; i. dehybridizing the double-stranded DNA fragments into the first single DNA strand and the second single DNA strand linked by the hairpin adapter; j . protecting the second single DNA strand with a first complementary DNA strand, thereby generating a double-stranded DNA comprising the second single DNA strand; k. deaminating cytosines in the first single DNA strand; l. dehybridizing the second complementary strand; m. rehybridizing the first single DNA and second single DNA strand into doublestranded DNA; n. performing USER to remove uracils in the first single DNA strand; and o. performing a second gap filling.

40. The method of claim 38 or claim 39, wherein protecting the first single DNA strand with a first complementary DNA strand comprises binding an extension primer to an adapter attached to the first single DNA strand and extending a first extended strand of DNA complementary to the first single DNA.

41. The method of claim 38 or claim 39, wherein protecting the first single DNA strand with a first complementary DNA strand comprises binding an oligonucleotide comprising the first complementary DNA strand to the first single DNA strand.

42. The method of any one of claims 38-41, wherein protecting the second single DNA strand with a second complementary DNA strand comprises binding an extension primer to an adapter attached to the second single DNA strand and extending a second extended strand of DNA complementary to the first single DNA.

43. The method of any one of claims 38-41, wherein protecting the second single DNA strand with a second complementary DNA strand comprises binding an oligonucleotide comprising the second complementary DNA strand to the second single DNA strand.

44. A method of preparing a DNA library for identifying methylated cytosines in a doublestranded target DNA comprising both methylated and unmethylated cytosines comprising: a. preparing library fragments by fragmenting the double-stranded target DNA and incorporating one or more adapters at both ends of the double-stranded fragments, wherein at least one adapter is a hairpin adapter that binds both strands at one end of the double-stranded fragment; b. dehybridizing the double-stranded DNA fragments into a first single DNA strand and a second single DNA strand linked by the hairpin adapter; c. binding an extension primer to an adapter attached to the first single DNA strand and extending a first extended strand of DNA complementary to the first single DNA strand and thereby generating double-stranded DNA; d. deaminating cytosines in the second single DNA strand, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA; e. dehybridizing the first extended strand; f. rehybridizing the first single DNA strand and second single DNA strand into double-stranded DNA; g. performing a base excision repair reaction to remove uracils generated from deaminating cytosines in the second single DNA strand; h. performing a first gap filling; i. dehybridizing the double-stranded DNA fragments into the first single DNA strand and the second single DNA strand linked by the hairpin adapter; j . binding an extension primer to an adapter attached to the second single DNA strand and extending a second extended strand of DNA complementary to the second single DNA strand and thereby generating double-stranded DNA; k. deaminating cytosines in the first single DNA strand, wherein unmethylated cytosines are converted to uracils and methylated cytosines are converted to thymines, and wherein the deaminating is more efficient on single-stranded DNA compared to double-stranded DNA;

1. dehybridizing the second extended strand; m. rehybridizing the first single DNA and second single DNA strand into a doublestranded DNA fragment; n. performing a base excision repair reaction to remove uracils and deaminating cytosines in the first single DNA strand; and o. performing a second gap filling.

45. The method of any one of claims 38-44, wherein the method is performed in a single reaction vessel.

46. The method of any one of claims 38-45, wherein the method does not employ a beadbased purification step.

47. The method of any one of claims 38-46, wherein the dehybridizing is performed with a helicase or with a recombinase plus single-stranded DNA binding protein.

48. The method of any one of claims 38-47, wherein the deaminating is performed with a deaminase.

49. The method of claim 48, wherein the deaminase is AP0BEC3A.

50. The method of claim 48 or 49, wherein the deaminating converts unmethylated cytosines into uracils and methylated cytosines into thymines.

51. The method of any one of claims 38-50, wherein the gap filling is performed with a polymerase.

52. The method of claim 51, wherein the USER is a mixture of a uracil DNA glycosylase and an apurinic/apyrimidinic endonuclease.

53. The method of claim 52, wherein the apurinic/apyrimidinic endonuclease is endonuclease

VIII.

54. The method of any one of claims 38-54, where performing USER and performing gap filling leads to cytosines being incorporated into positions that had been uracils.

55. The method of claim 54, wherein unmethylated cytosines after gap filling correspond to the unmethylated cytosines in the double-stranded target DNA.

56. The method of claim 54 or 55, wherein thymines mismatched with guanines in complementary library fragments after gap filling correspond to positions of methylated cytosines in the double-stranded target DNA.

57. The method of any one of claims 39-56, wherein the library fragments are immobilized on a solid support.

58. The method of claim 57, wherein the library fragments are immobilized on the solid support via a sequence comprised in at least one adapter.

59. The method of any one of claims 39-58, wherein the hairpin adapter comprises a modification to block extension.

60. The method of claim 59, wherein the modification to block extension is a non-nucleic acid moiety.

61. The method of any one of claims 39-60, wherein the hairpin adapter comprises a cleavable linker.

62. The method of any one of claims 39-61, wherein the hairpin adapter comprises one or more cytosines.

63. The method of claim 62, wherein the hairpin adapter comprises 2 or 3 cytosines.

64. The method of any one of claims 39-63, wherein a hairpin adapter is attached to both ends of the fragment.

65. The method of any one of claims 39-64, wherein one or more hairpin adapter is cleaved after the second gap fdling.

66. The method of any one of claims 39-65, further comprising sequencing fragments after the second gap fdling.

67. The method of claim 66, wherein sequencing diversity and alignment efficiency are retained.

68. The method of claim 66 or claim 67, further comprising analyzing sequencing data for mismatched thymines and guanines in complementary library fragments.

69. The method of any one of claims 66-68, wherein fragments are seeded onto a flow cell before sequencing.

70. The method of claim 69, wherein double-stranded fragments are seeded.

71. The method of claim 69 or 70, wherein fragments are not amplified before seeding.

72. The method of claim 69-71, wherein fragments are extended, amplified, and linearized after seeding and before sequencing.