CN117062910A - Improved library preparation method - Google Patents

Improved library preparation method Download PDF

Info

Publication number
CN117062910A
CN117062910A CN202280024450.7A CN202280024450A CN117062910A CN 117062910 A CN117062910 A CN 117062910A CN 202280024450 A CN202280024450 A CN 202280024450A CN 117062910 A CN117062910 A CN 117062910A
Authority
CN
China
Prior art keywords
transposon
modified
sequence
transposon end
end sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280024450.7A
Other languages
Chinese (zh)
Inventor
艾莉森·容汉斯
安吉丽卡·玛丽·巴尔·沙勒梅比尔
凯拉·布斯比
史蒂芬·M·格罗斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Illumina Inc
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Inc filed Critical Illumina Inc
Priority claimed from PCT/US2022/022167 external-priority patent/WO2022212269A1/en
Publication of CN117062910A publication Critical patent/CN117062910A/en
Pending legal-status Critical Current

Links

Abstract

Disclosed herein is a modified transposon end sequence comprising a mosaic end sequence, wherein the mosaic end sequence comprises one or more mutations compared to a wild type mosaic end sequence, wherein the mutations comprise substitutions using uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, or a modified pyrimidine. Also disclosed are transposome complexes comprising these modified transposon end sequences and methods of using these modified transposon end sequences to prepare libraries.

Description

Improved library preparation method
Cross Reference to Related Applications
The present application claims the benefit of priority from U.S. provisional application No. 63/167,150 filed on month 29 of 2021 and U.S. provisional application No. 63/224,201 filed on month 21 of 2021, the contents of each of which are incorporated herein by reference in their entirety for any purpose.
Sequence listing
The present application is presented with a sequence listing in electronic format. The sequence listing is provided in a file named "2022-03-25_01243-0027-00pct_sequence_listing_st25.Txt" created on month 3 of 2022, which is 5,634 bytes in size. The electronically formatted information of the sequence listing is incorporated by reference herein in its entirety.
Description
Technical Field
The present disclosure relates to modified transposon end sequences comprising a mosaic end sequence, wherein the mosaic end sequence comprises one or more mutations compared to a wild type mosaic end sequence, wherein the mutations comprise substitutions using uracil, inosine, ribose, 8-oxoguanine, thymine diol, a modified purine, or a modified pyrimidine. The disclosure also relates to transposome complexes comprising these modified transposon end sequences and methods of using these modified transposon end sequences to prepare libraries.
Background
NGS requires fragmentation of DNA samples, but current methods are limited to (a) mechanical methods that require expensive fixed equipment, (B) enzymatic strategies with variable performance based on sample concentration and time, and (C) fragment-based tagging methods that impose limitations on library adaptor structures.
The first step in preparing NGS libraries is DNA fragmentation, where DNA fragments are generated with a size distribution centered around an optimal length, typically in the range of a few hundred base pairs. There are a variety of methods for DNA fragmentation, which can be classified as mechanical or enzymatic. Mechanical methods include sonication, acoustic shearing and atomization (see Maria s.poptsova et al, scientific Reports (2014)). These mechanical methods all require specialized immobilization equipment and may introduce DNA damage. In contrast, enzymes do not require specialized equipment, reducing the upfront costs to the user. Thus, the user may prefer to rely on enzymatically fragmented library preparation products.
In addition to transposases, such as those contained in some Illumina library preparation products, alternative classes of enzymes that can be used for DNA fragmentation include restriction enzymes and nicking enzymes. Restriction enzymes recognize and cleave at specific sites, resulting in a propensity for fragmentation and therefore are not typically used for NGS applications. In contrast, nicking enzymes introduce random single-stranded cleavage in the DNA substrate. An example of a product that can be enzymatically fragmented based on a nicking enzyme is a nebnet frag-ing enzyme. In this product, one enzyme creates random nicks in the substrate DNA and the separate enzyme cleaves the complementary strand, causing DNA fragmentation. An exemplary protocol for using this method would be NEBNExt dsDNA fragmenting enzyme (seefor DNA Sample Prep for the Illumina Platform,NEB,2019)。
Because NEB fragmenting enzymes fragment DNA without the addition of an adapter sequence, this workflow is compatible with various existing ligation-based library preparation workflows, including PCR-free methods. However, these fragmenting enzymes can be turned around several times, so that the fragmentation is time-dependent and concentration-dependent, and thus optimizing the reaction for a specific sample type of user is often necessary to obtain an appropriate fragment size distribution (see Joseph p. Dunham and Maren l. Friesen, cold Spring Harbor Protocols 9:820-34 (2013)). In contrast, transposase-mediated fragmentation is limited to one turn based on its dependence on preloaded transposon substrates, but transposase-mediated fragmentation requires the introduction of a mosaic end sequence into the DNA fragment.
In summary, many users prefer enzymatic fragmentation methods because they do not require specialized equipment and are more suitable for high-throughput applications. However, current enzymatic fragmentation methods do not have the advantages of BLT, such as DNA quantification and library normalization with BLT, distinguishing BLT-based methods from those using a fragmenting enzyme.
The key requirement for transposition of Tn5 is the "mosaic end" (ME) which is specifically recognized by Tn5 transposase and is required for its transposable activity. Tn5 transposases naturally recognize "outer end" (OE) and "inner end" (IE) sequences, which have been shown to be highly intolerant to mutations, most of which result in reduced activity. Later work demonstrated that chimeric sequences derived from IE and OE, termed "mosaic ends" (ME), together with mutant Tn5 enzymes increased transposition activity approximately 100-fold relative to the natural system. This superactivity system forms the basis of the llumina DNA Flex PCR-Free (study use only, RUO) technique of the Nextera technique previously known as Illumina. The crystal structure of the Tn5 transposase complexed with a DNA substrate indicates that 13 of the 19 base pairs have nucleobase specific crystal contacts, while other bases have been shown to play a role in catalysis.
Tn5 transposase and bead-linked transposomes (BLT) are powerful tools for mediating simultaneous enzymatic DNA fragmentation and adaptor ligation or fragment tagging for NGS library preparation. The fragment tagging process eliminates the need for mechanical or enzymatic fragmentation, enzymatic end repair and adaptor ligation of sample DNA, resulting in an easy library preparation method. However, a limitation of these systems is that a single stranded mosaic end sequence of 19 nucleotides is required to be incorporated adjacent to the 5' end of the library insertion sequence. While this can be readily used in standard library preparation, it is difficult to form libraries with additional features such as fork adaptors, barcodes and Unique Molecular Identifiers (UMIs) while maintaining compatibility with standard sequencing methods.
The fragmenting enzyme BLT (fBLT) technology described herein overcomes these technical challenges by exploiting the unique advantages of BLT, while additionally eliminating the constraint that previous fragment tagging methods require a defined 19 base pair sequence adjacent to the library insert sequence. By decoupling the enzymatic fragmentation and adaptor tagging steps, features such as fork adaptors, barcodes, and UMIs can be added while maintaining compatibility with standard sequencing methods. Based on these unique advantages, fBLT can be used in a variety of applications such as UMI library preparation and PCR-free library preparation.
The modified transposon end sequences disclosed herein can eliminate the constraint of the 19-bp mosaic end sequence that needs to be adjacent to the library insertion sequence and enable hybrid Tn 5-ligation library preparation methods, thus enabling the use of BLT in library preparation workflows that have been developed based on ligation chemistry. The present disclosure describes that Tn5 can tolerate a number of mutations and nucleobase modifications within a mosaic end substrate.
Disclosure of Invention
According to the present specification, the library preparation method may include transposition of a transposome (BLT) linked by beads, cleavage of a diameter modified mosaic end sequence contained in a transposon end, and adaptor ligation.
Embodiment 1. A modified transposon end sequence comprising a mosaic end sequence, wherein the mosaic end sequence comprises one or more mutations compared to a wild type mosaic end sequence, wherein the mutations comprise substitutions using:
a. uracil;
b. inosine;
c. ribose;
d.8-oxoguanine;
e. thymine glycol;
f. a modified purine; or (b)
g. Modified pyrimidines.
Embodiment 2. The modified transposon end sequence according to embodiment 1, wherein the wild type mosaic end sequence comprises SEQ ID No. 1, and further wherein the one or more mutations comprise substitutions at A16, C17, A18 and/or G19.
Embodiment 3. The modified transposon end sequence of embodiments 1-2 wherein the mosaic end sequence comprises no more than 8 mutations compared to the wild type sequence.
Embodiment 4. The modified transposon end sequence according to embodiment 2, wherein the mosaic end sequence comprises one or more mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
Embodiment 5. The modified transposon end sequence according to embodiment 2, wherein the mosaic end sequence comprises one to four substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
Embodiment 6. The modified transposon end of embodiment 2 wherein the mosaic end sequence has a substitution mutation in addition to the one or more mutations at A16, C17, A18 and/or G19 compared to SEQ ID NO. 1.
Embodiment 7. The modified transposon end of embodiment 2 wherein the mosaic end sequence has two substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
Embodiment 8. The modified transposon end of embodiment 2 wherein the mosaic end sequence has three substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
Embodiment 9. The modified transposon end of embodiment 2 wherein the mosaic end sequence has four substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
Embodiment 10. The modified transposon end sequence of any one of embodiments 2 to 9, wherein:
the substitution at a16 is a16T, A16C, A16G, A16U, A inosine, a16 ribose, a 16-8-oxoguanine, a16 thymine glycol, a16 modified purine, or a16 modified pyrimidine;
the substitution at C17 is C17T, C17A, C17G, C17U, C inosine, C17 ribose, C17-8-oxoguanine, C17 thymine glycol, C17 modified purine, or C17 modified pyrimidine;
the substitution at a18 is a18G, A18T, A18C, A18U, A inosine, a18 ribose, a 18-8-oxoguanine, a18 thymine glycol, a18 modified purine, or a18 modified pyrimidine; and/or
The substitution at G19 is G19T, G19C, G19A, G19U, G inosine, G19 ribose, G19-8-oxoguanine, G19 thymine glycol, G19 modified purine, or G19 modified pyrimidine.
Embodiment 11. The modified transposon end sequence of any one of embodiments 2 to 9, wherein the mutation comprises a substitution using:
a. uracil;
b. inosine;
c. ribose;
d.8-oxoguanine
e. Thymine glycol;
f. a modified purine; and/or
g. Modified pyrimidines.
Embodiment 12. The modified transposon end sequence of any one of embodiments 2 to 11, wherein the modified transposon end sequence comprises a mutation at a16, C17, a18 or G19.
Embodiment 13. The modified transposon end sequence of any one of embodiments 2 to 11, wherein the modified transposon end sequence comprises two mutations selected from the group consisting of mutations at a16, C17, a18, or G19.
Embodiment 14. The modified transposon end sequence of any one of embodiments 2 to 11, wherein the modified transposon end sequence comprises three mutations selected from the group consisting of mutations at a16, C17, a18, or G19.
Embodiment 15. The modified transposon end sequence of any one of embodiments 2 to 11, wherein the modified transposon end sequence comprises four mutations at a16, C17, a18 and G19.
Embodiment 16. The modified transposon end of any one of embodiments 2 to 11, wherein the modified transposon end sequence has one to four substitution mutations at A16, C17, A18 and/or G19 compared to SEQ ID No. 1.
Embodiment 17 the modified transposon end of any one of embodiments 1 to 11, wherein the modified transposon end sequence has a substitution mutation compared to the wild type sequence.
Embodiment 18. The modified transposon end of any one of embodiments 1 to 11, wherein the modified transposon end sequence has two substitution mutations compared to the wild type sequence.
Embodiment 19. The modified transposon end of any one of embodiments 1 to 11, wherein the modified transposon end sequence has three substitution mutations compared to the wild type sequence.
Embodiment 20. The modified transposon end of any one of embodiments 1 to 11, wherein the modified transposon end sequence has four substitution mutations compared to the wild type sequence.
Embodiment 21. The modified transposon end of any one of embodiments 1 to 20, wherein the modified purine is 3-methyladenine or 7-methylguanine.
Embodiment 22. The modified transposon end of any one of embodiments 1-20 wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
Embodiment 23. A transposome complex, the transposome complex comprising:
a. a transposase;
b. a first transposon comprising a modified transposon end sequence comprising uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine; and
c. a second transposon comprising a second transposon end sequence complementary to at least a portion of the first transposon end sequence.
Embodiment 24. The transposome complex of embodiment 23 wherein the first transposon comprises ribose, uracil, inosine, 8-oxoguanine, thymine glycol, a modified purine, and/or a modified pyrimidine, and the transposome complex is in solution.
Embodiment 25. The transposome complexes according to embodiment 23, wherein the first transposon comprises uracil, inosine, 8-oxoguanine, thymine diol, a modified purine and/or a modified pyrimidine, and the transposome complex is immobilized on a solid support.
Embodiment 26. The transposome complex of any one of embodiments 23-25, wherein the first transposon comprises the modified transposon end sequence of any one of embodiments 1-22.
Embodiment 27. The transposome complex of any one of embodiments 23-26 wherein the transposase is Tn5.
Embodiment 28. The transposome complex of any one of embodiments 23-27, wherein the first transposon is a transfer strand.
Embodiment 29. The transposome complex of any one of embodiments 23-28, wherein the second transposon is a non-transferred strand.
Embodiment 30. The transposome complex of any one of embodiments 23-29 wherein uracil in the first transposon base pairs with a in the second transposon.
Embodiment 31. The transposome complex of any one of embodiments 23-30 wherein inosine in the first transposon base pairs with C in the second transposon.
Embodiment 32. The transposome complex of any one of embodiments 23-31 wherein ribose in the first transposon is base paired with A, C, T or G in the second transposon.
Embodiment 33. The transposome complex of any one of embodiments 23-32 wherein thymine diol in the first transposon base pairs with a in the second transposon.
Embodiment 34. The transposome complex of any one of embodiments 23-33 wherein the modified purine in the first transposon is 3-methyladenine base paired with T in the second transposon.
Embodiment 35. The transposome complex of any one of embodiments 23-34 wherein the modified purine in the first transposon is 7-methylguanine base paired with C in the second transposon.
Embodiment 36. The transposome complex of any one of embodiments 23-34 wherein the modified pyrimidine in the first transposon is a 5-methylcytosine, a 5-formylcytosine, or a 5-carboxycytosine base-paired with a G base in the second transposon.
Embodiment 37 the transposome complex of any one of embodiments 23-36, wherein the first transposon or the second transposon comprises an affinity element.
Embodiment 38. The transposome complex of embodiment 37 wherein the first transposon comprises an affinity element.
Embodiment 39. The transposome complex of embodiment 38, wherein the affinity element is attached to the 5' end of the first transposon.
Embodiment 40. The transposome complexes according to embodiment 38 or embodiment 39, wherein the first transposon included in the targeted transposome complex comprises a linker.
Embodiment 41. The transposome complex of embodiment 40 wherein the linker has a first end attached to the 5' end of the first transposon and a second end attached to an affinity element.
Embodiment 42. The transposome complex of embodiment 37 wherein the second transposon comprises an affinity element.
Embodiment 43. The transposome complex of embodiment 42 wherein the affinity element is attached to the 3' end of the second transposon.
Embodiment 44. The transposome complex of embodiment 43 wherein the second transposon comprises SEQ ID No. 13.
Embodiment 45. The transposome complex of embodiment 44 wherein the second transposon comprises a linker.
Embodiment 46. The transposome complex of embodiment 45 wherein the linker has a first end attached to the 3' end of the second transposon and a second end attached to an affinity element.
Embodiment 47. The transposome complex of any one of embodiments 37-46, wherein the affinity element comprises biotin, avidin, streptavidin, an antibody, or an oligonucleotide.
Embodiment 48. The transposome complex of any one of embodiments 23-47, wherein the second transposon comprises:
a. a second transposon end sequence complementary to SEQ ID No. 1; or (b)
b. A second transposon end that is fully complementary to the first transposon end.
Embodiment 49. The transposome complex of embodiment 48 wherein the first transposon comprises a modified transposon end sequence comprising an A16U, A16-8-oxo guanine or A16 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 50. The transposon complex of embodiment 48 wherein the first transposon comprises a modified transposon end sequence comprising a C17-8-oxoguanine or C17 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 51. The transposon complex of embodiment 48 wherein the first transposon comprises a modified transposon end sequence comprising an A18-8-oxoguanine or A18 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 52. The transposon complex of embodiment 48 wherein the first transposon comprises a modified transposon end sequence comprising a G19-8-oxoguanine or G19 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 53. The transposome complexes according to any one of embodiments 23 to 52, wherein the transposome complexes are in solution.
Embodiment 54. A solid support having immobilized thereon a transposome complex according to any one of embodiments 23-52.
Embodiment 55. A method of fragmenting a double-stranded nucleic acid, the method comprising combining a sample comprising double-stranded nucleic acid with the transposome complex of any one of embodiments 23-53 or the solid support of embodiment 54, and preparing a fragment.
Embodiment 56A method of preparing a double stranded nucleic acid fragment lacking all or a portion of the first transposon end, the method comprising:
a. combining a sample comprising nucleic acid with the transposome complex of any one of embodiments 23-53 or with the solid support of embodiment 54, and preparing a fragment; and
b. combining the sample with (1) an endonuclease or (2) a DNA glycosylase and thermal, alkaline conditions, or a combination of endonucleases/lyases that recognize abasic sites, and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine within the mosaic sequence to remove all or a portion of the first transposon end from the fragment.
Embodiment 57. The method of embodiment 56, wherein the modified purine is 3-methyladenine or 7-methylguanine.
Embodiment 58 the method of embodiment 56, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
Embodiment 59. The method of embodiment 57 or embodiment 58, further comprising sequencing the fragment after removing all or a portion of the first transposon end from the fragment.
Embodiment 60. The method of embodiment 59, wherein the method does not require amplification of the fragment prior to sequencing.
Embodiment 61. The method of embodiment 59, wherein the fragments are amplified prior to sequencing.
Embodiment 62. The method of any one of embodiments 59 to 61, further comprising enriching the fragment of interest after ligating the adaptors and prior to sequencing.
Embodiment 63. A method of preparing a double stranded nucleic acid fragment comprising an adaptor, the method comprising:
a. combining a sample comprising nucleic acid with the transposome complex of any one of embodiments 23-53 or with the solid support of embodiment 54, and preparing a fragment;
b. combining the sample with (1) an endonuclease or (2) a DNA glycosylase and thermal, alkaline conditions, or a combination of endonucleases/lyases that recognize abasic sites, and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine diol, a modified purine, and/or a modified pyrimidine within the mosaic end sequence to remove all or a portion of the first transposon end from the fragment; and
c. Adaptors are ligated to the 5 'and/or 3' ends of the fragments.
Embodiment 64. The method of embodiment 63, wherein the modified purine is 3-methyladenine or 7-methylguanine.
Embodiment 65. The method of embodiment 63, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
Embodiment 66. The method of any one of embodiments 56 to 65, wherein the nucleic acid is double stranded DNA.
Embodiment 67. The method of any of embodiments 56-65, wherein the nucleic acid is RNA and a double-stranded cDNA or DNA: RNA duplex is generated prior to combining with the transposome complex.
Embodiment 68 the method of any one of embodiments 56-67, wherein all or a portion of the cleaved first transposon end is separated from the remainder of the sample.
Embodiment 69 the method of any one of embodiments 63 to 68, further comprising filling the 3 'end of the fragment and phosphorylating the 3' end of the fragment with a kinase prior to ligation.
Embodiment 70. The method of embodiment 69, wherein the filling is with T4DNA polymerase.
Embodiment 71. The method of embodiment 70, further comprising adding a single A overhang to the 3' end of the fragment.
Embodiment 72. The method of embodiment 71 wherein a polymerase adds the single A overhang.
Embodiment 73. The method of embodiment 72, wherein the polymerase is (i) Taq or (ii) a Klenow fragment, without exonuclease activity.
Embodiment 74 the method of any one of embodiments 56 to 73, wherein said fragment comprises 0-3 bases of said mosaic end sequence.
Embodiment 75. The method of any one of embodiments 56 to 74, wherein preparing a fragment results in preparing at least 50%, at least 60%, at least 70%, at least 80% or at least 90% of the number of fragments as compared to preparing a fragment with a transposome complex comprising a first transposon, the first transposon comprising a transposon end sequence comprising a wild type mosaic end sequence comprising SEQ ID No: 1.
Embodiment 76 the method of any one of embodiments 63-75, further comprising sequencing the fragments after ligating the adaptors.
Embodiment 77. The method of embodiment 76, wherein the method does not require amplification of the fragment prior to sequencing.
Embodiment 78. The method of embodiment 77, wherein the fragments are amplified prior to sequencing.
Embodiment 79. The method of any one of embodiments 76 to 78, further comprising enriching the fragment of interest after ligating the adaptors and prior to sequencing.
Embodiment 80. The method of any one of embodiments 56 to 79, wherein the modified transposon end sequence comprises uracil and the combination of a DNA glycosylase and an endonuclease/lyase recognizing abasic sites is a Uracil Specific Excision Reagent (USER).
Embodiment 81. The method of embodiment 80 wherein the USER is a mixture of uracil DNA glycosylase and endonuclease VIII or endonuclease III.
Embodiment 82 the method of any one of embodiments 56 to 79, wherein the modified transposon end sequence comprises inosine and the endonuclease is endonuclease V.
Embodiment 83 the method of any one of embodiments 56-79, wherein said modified transposon end sequence comprises ribose and said endonuclease is rnase HII.
Embodiment 84 the method of any one of embodiments 56-79, wherein the modified transposon end sequence comprises 8-oxoguanine and the endonuclease is a carboxamide pyrimidine-DNA glycosylase (FPG) or oxoguanine glycosylase (OGG).
Embodiment 85 the method of any one of embodiments 56 to 79, wherein the modified transposon end sequence comprises thymine diol and the DNA glycosylase is the endonuclease EndoIII (Nth) or endoviii.
Embodiment 86 the method of any one of embodiments 56 to 79, wherein the modified transposon end sequence comprises a modified purine and the DNA glycosylase is a human 3-alkyladenine DNA glycosylase and the endonuclease is endonuclease III or endonuclease VIII.
Embodiment 87. The method of embodiment 86, wherein the modified purine is 3-methyladenine or 7-methylguanine.
Embodiment 88 the method of any one of embodiments 56-79, wherein the modified transposon end sequence comprises a modified pyrimidine and (1) the DNA glycosylase is thymine-DNA glycosylase (TDG) or mammalian DNA glycosylase-methyl-CpG binding domain protein 4 (MBD 4), and the endonuclease/lyase recognizing abasic sites is endonuclease III or endonuclease VIII; or (2) the endonuclease is a DNA glycosylase/lyase ROS1 (ROS 1).
Embodiment 89. The method of embodiment 88, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
Embodiment 90 the method of any one of embodiments 56-89, wherein the first transposon comprises a modified transposon end sequence comprising more than one mutation selected from uracil, inosine, ribose, 8-oxoguanine, thymine diol, a modified purine, or a modified pyrimidine, and the combination of (1) an endonuclease or (2) a DNA glycosylase and a heat, alkaline condition, or an endonuclease/lyase that recognizes abasic sites is an enzyme mixture.
Embodiment 91. The method of embodiment 90, wherein the modified purine is 3-methyladenine or 7-methylguanine.
Embodiment 92. The method of embodiment 90 wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
Embodiment 93 the method of any one of embodiments 63-92, wherein cleaving the first transposon end generates a cohesive end for ligating the adaptors.
Embodiment 94. The method of embodiment 93 wherein the sticky ends are longer than one base.
Embodiment 95. The method of any one of embodiments 63-94, wherein the adaptors comprise double-stranded adaptors.
Embodiment 96 the method of any one of embodiments 63 to 95, wherein adaptors are added to the 5 'and 3' ends of the fragments.
Embodiment 97. The method of embodiment 96 wherein the adaptors added to the 5 'and 3' ends of the fragments are different.
Embodiment 98 the method of any one of embodiments 63-97, wherein the adapter comprises a Unique Molecular Identifier (UMI), a primer sequence, an anchor sequence, a universal sequence, a spacer, an index sequence, a capture sequence, a barcode sequence, a cleavage sequence, a sequencing-related sequence, and combinations thereof.
The method of any one of embodiment 98, wherein the adaptor comprises UMI.
Embodiment 100. The method of embodiment 99 wherein the adaptors comprising UMI are ligated to the 3 'and 5' ends of the fragments.
Embodiment 101. The method of any one of embodiments 63-100, wherein the adapter is a fork adapter.
Embodiment 102 the method of any one of embodiments 63-101, wherein the ligating is performed with a DNA ligase.
Embodiment 103 the method of any one of embodiments 63-102, wherein the method is performed in a single reaction vessel.
Embodiment 104. The method of any one of embodiments 56 to 103, wherein the density of transposomes immobilized on the solid surface is selected to modulate fragment size and library yield of the immobilized fragments.
Embodiment 105 the method of any one of embodiments 56 to 104, wherein the method allows for bead-based normalization.
Embodiment 106 the method of any one of embodiments 56 to 105, wherein the sample comprises partially fragmented DNA.
Embodiment 107 the method of any one of embodiments 56 to 106, wherein the sample is formalin-fixed paraffin-embedded tissue or cell-free DNA.
Embodiment 108 the method of any one of embodiments 56-107, wherein said library comprises fragments prepared by a single fragment tagging event.
Embodiment 109. A transposon pair having a first transposon and a second transposon, wherein the first transposon comprises a modified transposon end sequence according to any one of embodiments 1 to 22, and wherein the second transposon comprises:
a. A transposon end sequence comprising a mosaic end sequence complementary to the wild type mosaic end sequence; or (b)
b. A transposon end sequence that is fully complementary to the first transposon end.
Embodiment 110. The transposon pair of embodiment 109 wherein the first transposon comprises a modified transposon end sequence comprising an A16U, A16-8-oxoguanine or A16 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 111. The transposon pair of embodiment 109 wherein the first transposon comprises a modified transposon end sequence comprising a C17-8-oxoguanine or C17 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 112. The transposon pair of embodiment 109 wherein the first transposon comprises a modified transposon end sequence comprising an A18-8-oxoguanine or A18 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Embodiment 113. The transposon pair of embodiment 109 wherein the first transposon comprises a modified transposon end sequence comprising a G19-8-oxoguanine or G19 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
Additional objects and advantages will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice. These objects and advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one (or more) embodiments and, together with the description, serve to explain the principles described herein.
Drawings
FIGS. 1A and 1B show an overview of the fragmentation method. (A) The fragmentation-Tn 5 method of the invention uses modification of the Tn 5-mosaic end substrate to enable selective cleavage of the mosaic end and subsequent ligation of adaptors. (B) Standard competition workflow, in which the input DNA is mechanically sheared or enzymatically fragmented, followed by end repair and adaptor ligation. In FIGS. 1A and 1B, the attachment of Y-adapter containing all standard adapter sequences (P5-i 5-A14-ME and ME '-B15' -i7 '-P7') for Illumina sequencing is shown. In alternative configurations, short Y-adaptors containing only A14-ME and ME '-B15' may be used, and additional adaptor sequences may be added by PCR in a method such as that described in FIG. 2 of U.S. patent publication No. 20180201992A1, which is incorporated herein by reference in its entirety.
FIG. 2 outlines the mechanism of Tn5 transposase in standard fragment tagged library preparation. Tn5 transposase preloads a transposon DNA substrate consisting of homologous "mosaic ends" and additional adaptor sequences (such as A14 and B15 for the Illumina method). During fragment tagging, these transposomes act as genomic DNA, resulting in simultaneous fragmentation and tagging with adapter sequences. The sequences A14 and B15 are SEQ ID Nos. 11 and 12, respectively. The ME sequence and its complement (ME') are SEQ ID Nos. 1 and 4, respectively.
Figure 3 outlines how bead-linked transposomes (BLTs) achieve a normalization-free workflow. The amount of DNA converted into the library was normalized by conjugating the transposomes to the magnetic beads. In addition, some control over library fragment size was obtained by selecting transposome densities. The library may also be subjected to size selection based on Solid Phase Reversible Immobilization (SPRI) to obtain further control of fragment size. gDNA = genomic DNA.
FIG. 4 outlines enzymatic fragmentation using a fragmenting enzyme. In the illustrated method, enzyme 1 introduces random nicks into one strand and enzyme 2 introduces cleavage opposite the nicks and produces dsDNA breaks. The resulting DNA fragment typically has a 1-4 base overhang at the 5' end. An exemplary protocol for using this method would be NEBNExt dsDNA sheets Subdatase (seefor DNA Sample Prep for the Illumina Platform, NEB,2019, www.nebj.jp/products/details/1020. Com, available NEBNEEXt dsDNA fragmentation enzyme product details at day 17 of 2021).
Figure 5 illustrates a potential mechanism for removing the mosaic end sequences. Possible enzymatic strategies include the use of restriction enzymes, single-stranded dnases or DNA repair enzymes. In some embodiments, DNA repair enzymes are attractive due to their specificity.
FIGS. 6A and 6B show analysis of Tn5v3 activity in the presence of mutated mosaic terminal sequences. (A) Typical substitutions at different positions are reported based on the transfer strand sequence (SEQ ID NO: 1), with corresponding substitutions in the non-transfer strand (SEQ ID NO: 4), except that the base noted indicates that substitution is only made in the transfer strand and the wild type non-transfer strand anneals. At position 16A, T, C, G was substituted. At position 17C, T, T and G were substituted. At position 18A, substitutions G, T and C were made. At position 19G, T, C and a are substituted. Other substitutions for SEQ ID NO. 1 are made as noted. (B) Activity of Tn5v3 transposomes prepared with DNA modification in TS. Uracil base pairs with a and inosine base pairs with C. The sequences shown are the transfer strand TS (SEQ ID NO: 1) and the non-transfer strand NTS (SEQ ID NO: 4). AU = arbitrary units.
FIGS. 7A to 7C show the preparation of libraries using the Tn 5-fragmenting enzyme method. (A) workflow diagrams for preparing libraries. (B) A diagram of how a modified specific endonuclease cleaves a modified base in a 1-step reaction, or a modified specific glycosylase followed by an AP-lyase/endonuclease or how heat cleaves a modified base in a 2-step reaction. (C) electrophoresis pattern of library prepared by DNA modification. The library was treated with USER, endonuclease V or rnase HII according to the manufacturer's protocol (NEB). In this experiment, a large number of adapter dimers (peak at-160 bp) were observed, which may be due to non-optimal ligation adapter concentrations. Atl=a tailing. Lig=connect.
Fig. 8A and 8B show a comparison of uracil modification sites within ME. (A) electropherograms of libraries generated with alternative mosaic ends. (B) Qubit yield of library with surrogate ME. USER incubation times of 20 minutes and 60 minutes were tested.
FIG. 9 summarizes the library of fragmenting enzymes showing the expected ME "scars" adjacent to the library insert. Due to the variable UMI length, some libraries were shifted 1bp. ME scars for each modification site were present as expected. The A16U transfer strand sequence and the T16A non-transfer strand sequence are SEQ ID NOs 5 and 6, respectively. The C17U transfer strand sequence and the G17A non-transfer strand sequence are SEQ ID NOS: 7 and 8, respectively. The A18U transfer strand sequence and the T18A non-transfer strand sequence are SEQ ID NOS 9 and 10, respectively.
Fig. 10A-10C show representative fBLT library preparations. (A) workflow used in this study. (B) Library yields from BLT (eBLT) and fragmenting enzyme (fBLT) library preparations were enriched. (C) representative bioanalyzer traces of eBLT and fBLT libraries. Additional workflows with fBLT will be disclosed herein.
FIG. 11 shows an overview of fBLT, with representative modified transposon ends comprising a transfer strand with a G19I mutation (SEQ ID NO: 14) and a biotinylated non-transfer strand useful for the immobilization of transposomes (SEQ ID NO: 13). B=biotin.
FIG. 12 shows the results of fBLT with different modified bases in the mosaic end (ME sequence) of the first transposon. 16-19 represent the modified position of SEQ ID NO. 1. Oxo G = oxo guanine; AU = active unit.
Fig. 13A to 13C show the results of different types of fBLT. (A) Percent conversion results for the A18 inosine (I18), C17-8-oxoguanine (O17), and G19U (U19) mutations. (B) results of variant call performance for I18, O17, and U19. (C) Percent conversion results for I18, G19I (I19), O17, A18O (O18) and G19O (O19). The results generally indicate high performance of BLT with the mosaic ends replaced with inosine, with G19I (I19) performing the highest.
Fig. 14 presents data on chimeric reads. The use of modified transposon ends comprising uracil results in a higher percentage of chimeric reads than modified transposon ends comprising inosine or oxoguanine.
Figures 15A and 15B present fBLT and other library preparation fragmentation methods (i.e.,dsDNAor sonicated with a covarias sonicator following standard procedures). (A) summary of workflow. (B) A summary of sensitivity and specificity of variant call performance of 50ng of input gDNA 1% mixture (with 84 heterozygous variants and 0.5% Variant Allele Frequencies (VAFs)) of NA12877 into NA12878 background was measured using different methods.
FIG. 16 shows the error rates of different library preparation methods with fragmentation, including significantly higher error rates for samples prepared by sonication.
FIG. 17 shows library transformation efficiencies of different fragmentation methods. In summary, fBLT outperforms other library transformation methods. Sample 1 is a 1% mixture of genomic DNA of NA12877 in the NA12878 background. Samples 2-6 are Formalin Fixed Paraffin Embedded (FFPE) tissue. dCq is a measure of the DNA mass, with an elevated value corresponding to a lower mass sample. Thus, the higher conversion efficiency of sample 1 compared to the other samples highlights the fact that: library transformation is typically reduced from FFPE tissue due to the lower DNA mass of FFPE tissue.
Fig. 18A and 18B summarize a method comprising generating a single fragment tagged event of a fragment from FFPE tissue samples with fBLT. (A) summary of workflow. (B) percentage of fragments rescued from different tissues. The sample numbers are the same as those outlined in fig. 17. dCq is a measure of the DNA mass, with an elevated value corresponding to a lower mass sample. Thus, a lower percentage of sample 1 rescue fragments compared to samples 2-6 with FFPE tissue indicates a higher quality in the genomic DNA sample (i.e., fewer fragments from a single fragment tagging event and most fragments from two fragment tagging events). In contrast to genomic DNA, a higher proportion of samples were rescued from FFPE tissue, as there may be more single fragment tagging events in FFPE tissue due to its lower mass.
Figure 19 summarizes some of the advantages and flexibility of the fragment tagging scheme using fBLT. In particular, the method allows for the ligation of adaptors that allow for different workflows that a user may wish to pursue. As shown, the adaptors may comprise Unique Molecular Identifiers (UMIs) for determining different unique fragments from amplicons of the same fragment. Alternatively, fork adaptors may be used in a workflow containing PCR to incorporate an index, or indexed fork adaptors may be used in a workflow without PCR.
FIG. 20 outlines the standard workflow of library preparation using fBLT and optional enrichment. Boxes and triangles refer to the steps that a user must process a reaction sample. The total library preparation time of about 5.5 hours was similar to other ligation-based library preparation methods. The optional enrichment may be used, for example, to enrich the cancer-related group when preparing libraries from FFPE tissue samples from cancer patients.
Sequence description
Table 1 below provides a list of certain sequences cited herein. In the table,/3 biotin N/and/5 Phos/refer to 3 'biotin and 5' phosphate, respectively. I8oxo G/refers to the internal 8-oxo G nucleotide, and/38 oxo G/refers to the 3' position of the 8-oxo G nucleotide.
Table 1: sequence description
/>
Detailed Description
I. Modified transposon ends with mutations in the mosaic end sequences
Described herein are modified transposon end sequences comprising a mosaic end sequence. In some embodiments, these modified transposon end sequences comprise a mosaic end sequence that allows cleavage and removal of the mosaic end sequence after transposition. The key requirement for transposition is the "mosaic end" (ME) which is specifically recognized by Tn5 and is required for its transposition activity. Tn5 recognizes naturally both "outer" (OE) and "inner" (IE) sequences (as shown in Table 2), which have been shown to be highly intolerant to mutations, most of which result in reduced activity (see J.C. Makris et al PNAS 85 (7): 2224-28 (1988)). Later work demonstrated that chimeric sequences derived from IE and OE, known as "mosaic ends" (Table 2), together with mutant Tn5 enzymes increased transposition activity approximately 100-fold relative to the native system (see Maggie Zhou et al, journal of Molecular Biology (5): 913-25 (1998)), a superactive system used in the Illumina DNA Flex PCR-Free (RUO) product of Illumina. The crystal structure of Tn5 complexed with DNA substrate indicates that 13 base pairs of 19 base pairs have nucleobase specific crystal contacts (see Douglas R. Davies et al, science 289 5476:77-85 (2000)), while other bases have been shown to play a role in catalysis (see Mindy Steiniger-White et al, journal of Molecular Biology (5): 971-82 (2002)). Typically, the activity of Tn5 has been assessed by an in vivo reporting system (mastoid formation assay, described in Zhou et al J.mol. Biol.276:913-925 (1998)).
In table 2, the sequence of the normal font indicates the consensus sequence, and the bold italic sequence is derived from the native IE substrate.
A representative wild type mosaic end sequence (transfer strand) is SEQ ID NO. 1. A variety of mutant Tn5 and transposon ends are described in WO 2015160895 and US 9080211 (each incorporated herein by reference in its entirety) and are applicable in the methods described herein.
Several dnases or combinations of enzymes can mediate the selective removal of modified bases such as uracil, inosine, ribosyl, 8-oxo G, thymine glycol, modified purine and modified pyrimidine (see table 3 and characteristics of DNA repair enzymes and structure specific endonucleases, new England Biolabs, downloaded from www.international.neb.com/tools-and-resources/selection-charts and Jacobs and 20 at 2022, 1/20)Chromosoma 121:1-20 (2012)). Such enzymes include modified specific endonucleases or modified specific glycosylases. Modified purines for use with the modified specific glycosylase include 3-methyladenine (3 mA) and 7-methylguanine (7 mG). Modified pyrimidines for use with a modification-specific glycosylase may include 5-methylcytosine (5 mC), 5-formylcytosine (5 fC) and 5-carboxycytosine (5 caC). Selective removal of uracil and 8-oxo G using DNA repair enzymes has been used in certain sequencing platforms.
Because only one strand of the mosaic end, called the "transfer strand", is covalently attached to the library insertion sequence during transposition, the incorporation of such modified bases, particularly into the mosaic end transfer strand, enables selective cleavage and removal of the mosaic end transfer strand. However, this type of cleavage and removal of the mosaic terminal will require mutation of the mosaic terminal sequence from its typical sequence (SEQ ID NO: 1).
/>
* The N-glycosylase can be paired with an AP lyase/endonuclease (e.g., endoIII or endoVIII). Alternatively, the non-basic sites are chemically labile and can be cleaved with heat and/or basic conditions.
In table 3, endo=endonuclease, fpg=formamidopyrimidine-DNA glycosylase, ogg=oxoguanine glycosylase (OGG), haag=human 3-alkyladenine DNA glycosylase, ung=uracil-N-glycosylase, nth=cloned Nth gene, tdg=thymine-DNA glycosylase, mbd4=mammalian DNA glycosylase-methyl-CpG binding domain protein 4, and ROS 1=endonuclease ROS1 (with bifunctional DNA glycosylase/lyase activity).
Disclosed herein is a modified transposon end sequence comprising a mosaic end sequence, wherein the mosaic end sequence comprises one or more mutations compared to a wild type mosaic end sequence, wherein the mutations comprise substitutions using: uracil; inosine; ribose; 8-oxo-guanine; thymine glycol; modified purines (such as 3mA or 7 mG); or a modified pyrimidine. In some embodiments, these substitutions are used in a method of cleaving the transposon ends after transposition, as described below.
In some embodiments, the mosaic terminal sequence may be a mosaic terminal sequence for use with a Tn5 transposase. In some embodiments, the modified transposon end sequence has a mutation in the mosaic end sequence compared to SEQ ID No. 1.
In some embodiments, the modified transposon end sequences comprise a mosaic end sequence comprising one or more mutations compared to SEQ ID No. 1, wherein the one or more mutations comprise substitutions at A16, C17, A18 and/or G19. In some embodiments, the modified transposon end sequence comprises a mosaic end sequence comprising substitutions at a 16. In some embodiments, the modified transposon end sequence comprises a mosaic end sequence comprising substitutions at C17. In some embodiments, the modified transposon end sequence comprises a mosaic end sequence comprising substitutions at a 18. In some embodiments, the modified transposon end sequence comprises a mosaic end sequence comprising substitutions at G19. In some embodiments, the modified transposon end sequences comprise SEQ. ID No. 5, 7, 9 or 14-22. Data for representative modified transposon end sequences are shown in FIG. 6A (transposition in solution) and FIG. 12 (transposition mediated by fBLT).
In some embodiments, the mosaic end sequence comprises more than one mutation. In some embodiments, the mosaic end sequence comprises NO more than 8 mutations compared to the wild type sequence (SEQ ID NO:1 in some embodiments).
In addition to the one or more mutations at a16, C17, a18 and/or G19, additional mutations may be present in the mosaic terminal sequence. In some embodiments, the mosaic end sequence comprises one or more mutations in addition to the one or more mutations at A16, C17, A18 and/or G19 as compared to SEQ ID NO: 1. In some embodiments, the mosaic end sequence comprises one to four substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
In some embodiments, the mosaic end sequence has a substitution mutation in addition to the one or more mutations at A16, C17, A18 and/or G19 as compared to SEQ ID NO. 1. In some embodiments, the mosaic end sequence has two substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19. In some embodiments, the mosaic end sequence has three substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19. In some embodiments, the mosaic end sequence has four substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
In some embodiments, the substitution at a16 is a16T, A16C, A G, A16U, A inosine, a16 ribose, a 16-8-oxoguanine, a16 thymine glycol, a16 modified purine, or a16 modified pyrimidine; the substitution at C17 is C17T, C17A, C G, C17U, C inosine, C17 ribose, C17-8-oxoguanine, C17 thymine glycol, C17 modified purine, or C17 modified pyrimidine; the substitution at a18 is a18G, A18T, A18C, A18U, A inosine, a18 ribose, a 18-8-oxoguanine, a18 thymine glycol, a18 modified purine, or a18 modified pyrimidine; and/or the substitution at G19 is G19T, G19C, G19A, G19U, G inosine, G19 ribose, G19-8-oxoguanine, G19 thymine glycol, G19 modified purine, or G19 modified pyrimidine. In some embodiments, the modified purine is 3mA or 7mG. In some embodiments, the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
In some embodiments, the mutation comprises a substitution using: uracil; inosine; ribose; 8-oxo-guanine; thymine glycol; a modified purine; and/or modified pyrimidines. In some embodiments, these mutations allow for a method of cleaving the mosaic terminal sequences after transposition.
In some embodiments, the modified transposon end sequences comprise mutations at a16, C17, a18, or G19.
In some embodiments, the modified transposon end sequence comprises two mutations selected from the group consisting of mutations at a16, C17, a18, or G19. In some embodiments, the modified transposon end sequences comprise three mutations selected from the group consisting of mutations at a16, C17, a18, or G19. In some embodiments, the modified transposon end sequences comprise four mutations at a16, C17, a18, and G19.
In some embodiments, the modified transposon end sequences have one to four substitution mutations at A16, C17, A18 and/or G19 compared to SEQ ID NO. 1. In some embodiments, the modified transposon end sequence has a substitution mutation compared to the wild type sequence (SEQ ID NO:1 in some embodiments). In some embodiments, the modified transposon end sequence has two substitution mutations compared to the wild type sequence (SEQ ID NO:1 in some embodiments). In some embodiments, the modified transposon end sequence has three substitution mutations compared to the wild type sequence (SEQ ID NO:1 in some embodiments). In some embodiments, the modified transposon end sequence has four substitution mutations compared to the wild type sequence (SEQ ID NO:1 in some embodiments).
II, transposition-connecting textLibrary preparation method
Disclosed herein are methods of library preparation that couple transposition and adaptor ligation. Thus, these library preparation methods may be referred to as "mixed transposition-ligation library preparation". Such methods may use modified Tn 5-mosaicked terminal sequences that allow cleavage of the transferred transposon ends after transposition (as shown in fig. 1A). As used herein, "hybrid Tn 5-ligation method" refers to a method involving transposition, cleavage of a mosaic terminal sequence, and ligation of adaptors.
In some embodiments, the cleavage of the mosaic end sequence allows it to be removed from the library fragment. Although the methods of the invention use ligation after cleavage of the mosaic end sequences in order to incorporate adaptors for potential downstream sequencing methods, the methods of the invention are not limited to embodiments requiring ligation of adaptor sequences.
The BLT designed for fragmentation of the mosaic terminal sequence may be referred to as "fragmenting enzyme BLT" (fBLT). Although fBLT itself does not contain a fragmenting enzyme, fBLT is designed to produce fragments similar to those produced with a fragmenting enzyme, as the resulting fragments lack all or part of the mosaic terminal sequence. fBLT was designed for cleavage (post transposition) to remove all or part of the mosaic end sequence after fragment generation via transposition.
The methods of the invention can decouple enzymatic fragmentation and adaptor ligation activities of transposases (such as Tn5 transposase) by programmatically cleaving the mosaic end sequences from library fragments. As described herein, transposases (Tn 5 in some embodiments) can tolerate many mutations and nucleobase modifications within a mosaic end substrate. By incorporating modified bases into the transfer strand of the mosaic end, the enzyme can achieve selective cleavage and removal. This technique eliminates the constraint of requiring a 19-bp mosaic end sequence adjacent to the library insert sequence and enables hybrid transposase-ligation library preparation methods, thus enabling the use of fBLT in library preparation workflows that have been developed based on ligation chemistry. The method of the invention thus improves on the existing workflow for mechanical cleavage or enzymatic fragmentation of dsDNA followed by end repair and adaptor ligation (fig. 1B).
Described herein is a method of preparing a double-stranded nucleic acid fragment comprising an adapter, the method comprising combining a sample comprising a nucleic acid with a transposome complex, and preparing the fragment; combining the sample with an enzyme or a mixture of enzymes and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine diol, modified purine and/or modified pyrimidine within the mosaic end sequence to remove all or a portion of the first transposon end from the fragment; and ligating the adaptors to the 5 'and/or 3' ends of the fragments. In some embodiments, the modified purine is 3-methyladenine or 7-methylguanine. In some embodiments, the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine. In some embodiments, the enzyme or mixture of enzymes is a modification-specific endonuclease or a modification-specific DNA glycosylase. In some embodiments, the modification-specific DNA glycosylase is used with an endonuclease/lyase, which need not be modification-specific. In contrast, endonucleases/lyases recognize abasic sites.
In some embodiments, the enzyme or mixture of enzymes is a combination of (1) an endonuclease or (2) a DNA glycosylase with thermal, alkaline conditions, or an endonuclease/lyase that recognizes abasic sites.
In some embodiments, the process is performed in a single reaction vessel. In other words, the process may be carried out without the need to separate the reaction products from each other.
A. Transposome complexes
As used herein, a "transposome complex" or "transposome" is comprised of at least one transposase (or other enzyme as described herein) and a transposon recognition sequence. The present invention is not limited to a particular transposase.
A "transposome complex" is composed of at least one transposase and a transposon recognition sequence. In some such systems, the transposase binds to a transposon recognition sequence to form a functional complex capable of catalyzing a transposition reaction. In certain aspects, the transposon recognition sequence is a double stranded transposon end sequence. The transposase or integrase binds to a transposase recognition site in the target nucleic acid and inserts a transposon recognition sequence into the target nucleic acid. In some such insertion events, one strand of the transposon recognition sequence (or terminal sequence) is transferred into the target nucleic acid, also resulting in a cleavage event. Exemplary transposition programs and systems that may be readily adapted for use with the transposases of the present disclosure are described, for example, in PCT publication No. WO10/048605, U.S. patent publication No. 2012/0301925, U.S. patent publication No. 2012/13470087, or U.S. patent publication No. 2013/0143774, each of which is incorporated herein by reference in its entirety.
In some such systems, the transposase binds to a transposon recognition sequence to form a functional complex capable of catalyzing a transposition reaction. In certain aspects, the transposon recognition sequence is a double stranded transposon end sequence. The transposase binds to a transposase recognition site in the target nucleic acid and inserts a transposon recognition sequence into the target nucleic acid. In some such insertion events, one strand of the transposon recognition sequence (or end sequence) is transferred into the target nucleic acid, resulting in a cleavage event. Exemplary transposable procedures and systems that can be readily adapted for use with transposases.
"transposase" means an enzyme capable of forming a functional complex with a composition comprising a transposon end (e.g., a transposon end composition) and catalyzing the insertion or transposition of a composition comprising a transposon end into a double stranded target nucleic acid. Transposases as shown herein may also include integrases from retrotransposons and retroviruses.
Exemplary transposases that can be used with certain embodiments provided herein include (or are encoded from): tn5 transposase, sleeping Beauty (SB) transposase, vibrio harveyi, muA transposase and a Mu transposase recognition site comprising R1 and R2 terminal sequences, staphylococcus aureus (Staphylococcus aureus) Tn552, ty1, tn7 transposase, tn/O and IS10, water-borne transposase, tc1, P element, tn3, bacterial insertion sequences, retroviruses and retrotransposons of yeast. Further examples include engineered versions of IS5, tn10, tn903, IS911 and transposase family enzymes. The methods described herein may also include combinations of transposases, not just single transposases.
In some embodiments, the transposase is Tn5, tn7, muA, or vibrio harveyi transposase or an active mutant thereof. In other embodiments, the transposase is a Tn5 transposase or a mutant thereof. In other embodiments, the transposase is a Tn5 transposase or a mutant thereof. In other embodiments, the transposase is a Tn5 transposase or an active mutant thereof. In some embodiments, the Tn5 transposase is a high activity Tn5 transposase or an active mutant thereof. In some aspects, the Tn5 transposase is a Tn5 transposase as described in PCT publication WO2015/160895, which is incorporated herein by reference. In some aspects, the Tn5 transposase is a high activity Tn5 with mutations at positions 54, 56, 372, 212, 214, 251 and 338 relative to the wild type Tn5 transposase. In some aspects, the Tn5 transposase is a high activity Tn5 with the following mutations relative to the wild type Tn5 transposase: E54K, M56A, L372P, K212R, P214R, G251R and A338V. In some embodiments, the Tn5 transposase is a fusion protein. In some embodiments, the Tn5 transposase fusion protein comprises a fused elongation factor Ts (Tsf) tag. In some embodiments, the Tn5 transposase is an ultra-high activity Tn5 transposase comprising mutations at amino acids 54, 56 and 372 relative to the wild type sequence. In some embodiments, the ultra-high activity Tn5 transposase is a fusion protein, optionally wherein the fusion protein is the elongation factor Ts (Tsf). In some embodiments, the recognition site is a Tn 5-type transposase recognition site (Goryshin and Reznikoff, J.biol.chem.,273:7367, 1998). In one embodiment, a transposase recognition site that forms a complex with an ultra-high activity Tn5 transposase (e.g., EZ-Tn5TM transposase, epicentre Biotechnologies, madison, wis.) is used. In some embodiments, the Tn5 transposase is a wild type Tn5 transposase.
As used throughout, the term transposase refers to an enzyme that is capable of forming a functional complex with a composition comprising a transposon (e.g., transposon composition) and, in an in vitro transposition reaction, catalyzes the insertion or transposition of the transposon-containing composition into a double stranded target nucleic acid incubated therewith. Transposases of the provided methods may also include integrases from retrotransposons and retroviruses. Exemplary transposases useful in the provided methods include the wild-type or mutant forms of Tn5 transposase and MuA transposase.
A "transposition reaction" is a reaction in which one or more transposons are inserted into a target nucleic acid at random or nearly random sites. The essential components in the transposition reaction are a transposase and a DNA oligonucleotide that exhibits the nucleotide sequence of the transposon, including the transferred transposon sequence and its complement (i.e., the untransferred transposon end sequences) as well as other components required to form a functional transposition or transposome complex. The methods of the present disclosure are exemplified by the use of a transposition complex formed from an ultra-high activity Tn5 transposase and a Tn 5-type transposon end or from a MuA or Hypermu transposase and a Mu transposon end comprising Rl and R2 end sequences (see, e.g., goryshin, I. And Reznikoff, W.S., J.Biol.Chem.,273:7367,1998; and Mizuuchi, cell,35:785,1983; savilahti, H et al, EMBO J.,14:4893,1995; which are incorporated herein by reference in their entirety). However, any transposition system that is capable of inserting transposon ends in a random or nearly random manner with sufficient efficiency to tag a target nucleic acid for its intended purpose can be used in the provided methods. Other examples of known transposition systems that can be used in the provided methods include, but are not limited to, staphylococcus aureus Tn552, tyl, transposons Tn7, tn/O and IS10, mariner transposase, tel, P elements, tn3, bacterial insert sequences, retrotransposons of retroviruses and yeast (see, e.g., colegao O R et al, J. Bacteriol.,183:2384-8,2001; kirby C et al, mol. Microbiol.,43:173-86,2002;Devine S E and Boeke J D., nucleic Acids res.,22:3765-72,1994; international patent application WO 95/23875;Craig,N L,Science.271:1512,1996;Craig,N L, reviewed in Curr Top Microbiol immunol, 204:27-8238 et al, curr Top Microbiol immunol, 204:49-82,1996;Lampe D J et al, EMBO J, 15:5470-9,1996;Plasterk R H,Curr Top Microbiol Immunol,204:125-43,1996;Gloor,G B,Methods Mol.Biol,260:97-1 14,2004;Ichikawa H and Ohtsumo E, J biol. Chem.265:18829-32,1990, ohtsumo, F and Sekine, Y, curr. Top. Microbiol. Immunol.204:1-26,1996;Brown P O et al, proc Natl Acad Sci USA,86:2525-9,1989, boeJ D and CorcesVG, annu Rev microbiol.43:403-34,1989, which are incorporated herein by reference in their entirety.
Methods for inserting transposons into a target sequence may be performed in vitro using any suitable transposon system for which suitable in vitro transposition systems are available or may be developed based on knowledge in the art. Generally, an in vitro transposition system suitable for use in the methods of the present disclosure requires at least a transposase of sufficient purity, sufficient concentration, and sufficient in vitro transposition activity, and a transposon with which the transposase forms a functional complex with the corresponding transposase capable of catalyzing a transposition reaction. Suitable transposon end sequences that may be used include, but are not limited to, wild type, derivative or mutant transposon end sequences that form a complex with a transposase selected from the group consisting of wild type, derivative or mutant transposases.
In some embodiments, the transposase comprises a Tn5 transposase. In some embodiments, the Tn5 transposase is an ultra-high activity Tn5 transposase.
In some embodiments, the transposome complex comprises a dimer of two molecules of a transposase. In some embodiments, the transposome complex is a homodimer in which two molecules of the transposase each bind to the same type of first transposon and second transposon (e.g., the sequences of the two transposons bound to each monomer are the same, thereby forming a "homodimer"). In some embodiments, the compositions and methods described herein employ two populations of transposome complexes. In some embodiments, the transposases in each population are the same. In some embodiments, the transposome complexes in each population are homodimers, wherein a first population has a first adapter sequence in each monomer and a second population has a different adapter sequence in each monomer.
The term "transposon end" refers to double-stranded nucleic acid DNA that exhibits only the nucleotide sequences necessary to form a complex with a transposase or integrase that functions in an in vitro transposition reaction ("transposon end sequences"). In some embodiments, the transposon end is capable of forming a functional complex with a transposase in a transposition reaction. As non-limiting examples, the transposon ends may include a 19-bp outer end ("OE") transposon end, an inner end ("IE") transposon end, or a "chimeric end" ("ME") transposon end, identified by a wild type or mutant Tn5 transposase, or R1 and R2 transposon ends as described in the disclosure of US 2010/01200098, the contents of which are incorporated herein by reference in their entirety. Transposon ends may comprise any nucleic acid or nucleic acid analogue suitable for forming a functional complex with a transposase or integrase in an in vitro transposition reaction. For example, a transposon end may comprise DNA, RNA, modified bases, unnatural bases, modified backbones, and may comprise a nick in one or both strands. Although the term "DNA" is used in this disclosure in connection with compositions of transposon ends, it should be understood that any suitable nucleic acid or nucleic acid analog may be used for the transposon ends.
The term "transfer strand" refers to the portion of a transposon pair that is transferred from a sample to a nucleic acid fragment during a transposition reaction. Similarly, the term "non-transferred strand" refers to the portion of a transposon pair that is not transferred from a sample to a nucleic acid fragment during a transposition reaction. Within a transposon pair, the transfer strand and the non-transfer strand may be fully or partially complementary. In an in vitro transposition reaction, the 3' end of the transfer strand is ligated or transferred to the target DNA. In an in vitro transposition reaction, non-transferred strands of transposon end sequences exhibiting full or partial complementarity to the transferred transposon end sequences do not bind or transfer to the target DNA.
In some embodiments, the transfer strand and the non-transfer strand are covalently joined. For example, in some embodiments, the transferred strand sequence and the non-transferred strand sequence are provided on a single oligonucleotide, e.g., in a hairpin configuration. Thus, although the free end of the non-transferred strand is not directly joined to the target DNA by the transposition reaction, the non-transferred strand is indirectly attached to the DNA fragment, as the non-transferred strand is connected to the transferred strand by the loop of the hairpin structure. Additional examples of transposome structures and methods of making and using transposomes can be found in the disclosure of US 2010/012000998, the contents of which are incorporated herein by reference in their entirety.
As used herein, "transposome complex" and "transposome" are equivalent.
In some embodiments, the first transposon comprises a transfer strand in a transposition reaction. In some embodiments, the second transposon comprises a non-transferred strand in a transposition reaction.
In some embodiments, the transposomes comprise modified transposon ends having mutations in the mosaic end sequences.
In some embodiments, the transposome complex comprises a transposase; a first transposon comprising a modified transposon end sequence comprising uracil, inosine, ribose, 8-oxoguanine, thymine glycol, a modified purine, and/or a modified pyrimidine; and a second transposon comprising a second transposon end sequence complementary to at least a portion of the first transposon end sequence.
In some embodiments, the first transposon comprises ribose and the transposome complex is in solution. In some embodiments, the first transposon comprises uracil, inosine, 8-oxo-guanine, thymine diol, a modified purine, and/or a modified pyrimidine, and the transposome complex is immobilized on a solid support.
In some embodiments, the first transposon comprises a modified transposon end sequence. In some embodiments, the transposase is Tn5. In some embodiments, the first transposon is a transfer strand. In some embodiments, the second transposon is a non-transferred strand.
In some embodiments, uracil in a first transposon base pairs with a in a second transposon. In some embodiments, inosine in the first transposon is base paired with C in the second transposon. In some embodiments, the ribose in the first transposon is base paired with A, C, T or G in the second transposon. In some embodiments, thymine glycol in a first transposon base pairs with a in a second transposon. In some embodiments, the modified purine in the first transposon is 3-methyladenine base paired with T in the second transposon. In some embodiments, the modified purine in the first transposon is 7-methylguanine base paired with C in the second transposon. In some embodiments, the modified pyrimidine in the first transposon is a 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine base-paired with a G in the second transposon.
In some embodiments, the second transposon comprises a sequence complementary to SEQ ID No. 1. In some embodiments, the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1. In some embodiments, the second transposon comprises SEQ ID No. 4.
In some cases, the second transposon end may be fully complementary to the first transposon end, while in other cases it may be partially complementary. Without being bound by theory, a transposase may have higher activity when the transposon end pair (i.e., the first transposon and the second transposon) contains fewer mutations. For example, a transposome complex comprising a second transposon that comprises a sequence complementary to SEQ ID No. 1 (i.e., a sequence complementary to a wild type mosaic end sequence) may mediate greater activity than a transposome complex comprising a second transposon end complementary to a first transposon end comprising a modified transposon end sequence as described herein. In other cases, the second transposon may be fully complementary to the first transposon to facilitate more compact annealing of the transposon pair.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising an A16U, A16-8-oxoguanine or A16 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising a C17-8-oxoguanine or C17 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising an A18-8-oxoguanine or A18 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising a G19-8-oxoguanine or G19 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the transposon complex comprises a transposon pair comprising a first transposon and a second transposon, wherein the first transposon comprises a modified transposon end sequence as described herein, and wherein the second transposon comprises a transposon end sequence comprising a mosaic end sequence complementary to a wild type mosaic end sequence. The transposon pair may comprise any modified transposon end sequences described herein.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising an A16U, A16-8-oxoguanine or A16 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising a C17-8-oxoguanine or C17 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising an A18-8-oxoguanine or A18 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising a G19-8-oxoguanine or G19 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
In some embodiments, there is a mismatch between the first transposon and the second transposon at a position where the first transposon comprises a mutation compared to the wild type mosaic end sequence. In other words, the first and second transposons need not be perfectly complementary (i.e., U in the first transposon need not base pair with a in the second transposon).
1. Transposome complexes in solution
In some embodiments, the transposome complexes are in solution, which may be referred to as solution phase transposome complexes or soluble transposome complexes. In some embodiments, double-stranded nucleic acids (such as DNA) bound to the solution phase transposome complexes undergo fragment tagging to yield nucleic acid fragments that are free in solution. Such a process, referred to herein as "fragment tagging", generally involves modification of DNA by a transposome complex comprising a transposase complexed with an adapter comprising a transposon end sequence. In some embodiments, the solution is a fragment labelling buffer.
Protocols useful for labeling soluble fragments are well known in the art, such as for DNAThe protocols for the preparation of XT DNA libraries are those described in the kit (see the Nextera XT reference guide, documents 770-2012-011). Representative data for the labeling of soluble fragments are shown in FIGS. 7C-9.
In some embodiments, certain modified transposon ends may perform better when the transposition reaction is performed in solution. For example, a modified transposon end comprising ribose may perform better when included in a transposome complex in solution than when the transposome complex is immobilized on a solid support.
In some embodiments, the modified transposon ends contained in the solution phase transposon comprise uracil, inosine, 8-oxoguanine, thymine glycol, a modified purine, and/or a modified pyrimidine. In some embodiments, the modified transposon ends contained in the solution phase transposon complex comprise ribose.
In another example, a modified transposon end comprising a modification at position 16 of SEQ ID No. 1 may perform better when included in a transposome complex in solution than when included in a transposome complex immobilized on a solid support. This difference may be caused by a variety of factors, such as the affinity of the different modified transposon ends for transposase and the method used to prepare the bead linked transposomes.
2. Immobilized transposome complexes and bead-linked transposomes
In some embodiments, the transposome complexes are immobilized to a solid support. In some embodiments, the solid support is contained in a fragment labelling buffer. In some embodiments, double-stranded nucleic acids (such as DNA) bound to the immobilized transposome complexes undergo fragment tagging to obtain immobilized nucleic acid fragments. The transposome complexes that can be used for fragment tagged bead immobilization may be referred to as "fBLT". A representative protocol for library preparation with fBLT is shown in FIG. 20.
In some embodiments, the first transposon comprises uracil, inosine, 8-oxo-guanine, thymine diol, a modified purine, and/or a modified pyrimidine, and the transposome complex is immobilized on a solid support.
In some embodiments, the density of transposomes immobilized on the solid surface is selected to modulate the fragment size and library yield of the immobilized fragments. In some embodiments, the transposome complexes are at least 10 3 、10 4 、10 5 Or 10 6 Composites/mm 2 Is present on the solid support.
In some embodiments, the length of the double-stranded nucleic acid fragments in the immobilized library is modulated by increasing or decreasing the density of transposome complexes on the solid support.
Many different types of immobilized transposomes can be used in these methods, as described in US 9683230, which is incorporated herein in its entirety.
In the methods and compositions presented herein, the transposome complexes are immobilized to a solid support. In some embodiments, the transposome complex and/or the capture oligonucleotide is immobilized to the vector by one or more polynucleotides, such as a polynucleotide comprising a transposon end sequence. In some embodiments, the transposome complexes may be immobilized by coupling a transposase to a linker molecule of a solid support. In some embodiments, both the transposase and the polynucleotide are immobilized to a solid support. When referring to the immobilization of a molecule (e.g., a nucleic acid) to a solid support, the terms "immobilized" and "attached" are used interchangeably herein and are intended to encompass direct or indirect and covalent or non-covalent attachment unless otherwise indicated explicitly or by context. In some embodiments, covalent attachment may be used, but it is generally all that is desired that the molecule (e.g., nucleic acid) remain immobilized or attached to the carrier under conditions intended for use of the carrier (e.g., in applications requiring nucleic acid amplification and/or sequencing).
Certain embodiments may utilize a solid support composed of an inert substrate or matrix (e.g., glass slide, polymer beads, etc.) that has been functionalized, for example, by application of an intermediate material layer or coating containing reactive groups that allow covalent attachment to biomolecules such as polynucleotides. Examples of such carriers include, but are not limited to, polyacrylamide hydrogels supported on an inert substrate (such as glass), in particular polyacrylamide hydrogels as described in WO 2005/065814 and US2008/0280773, the contents of which are incorporated herein by reference in their entirety. In such embodiments, the biomolecules (e.g., polynucleotides) may be directly covalently attached to the intermediate material (e.g., hydrogel), but the intermediate material itself may be non-covalently attached to the substrate or matrix (e.g., glass substrate). The term "covalently attached to a solid support" should accordingly be construed to cover this type of arrangement.
The terms "solid surface", "solid support" and other grammatical equivalents herein refer to any material that is suitable for, or that can be modified to be suitable for, the attachment of a transposome complex. As will be appreciated by those skilled in the art, the number of possible substrates is very large. Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene, and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethane, teflon TM Etc.), polysaccharides, nylon or nitrocellulose, ceramics, resins, silica or silica-based materials (including silicon and modified silicon), carbon, metals, inorganic glass, plastics, fiber optic strands, and various other polymers. Solid supports and solid surfaces that are particularly useful for some embodiments are located within the flow cell device. An exemplary flow cell is described in further detail below.
In some embodiments, the solid support comprises a patterned surface adapted to immobilize the transposome complexes in an ordered pattern. "patterned surface" refers to an arrangement of different regions in or on an exposed layer of a solid support. For example, one or more of these regions may be characteristic of the presence of one or more transposome complexes. The features may be separated by interstitial regions where transposome complexes are not present. In some embodiments, the pattern may be in an x-y format of features in rows and columns. In some embodiments, the pattern may be a repeating arrangement of features and/or interstitial regions. In some embodiments, the pattern may be a random arrangement of features and/or interstitial regions. In some embodiments, the transposome complexes are randomly distributed on the solid support. In some embodiments, the transposome complexes are distributed on the patterned surface. Exemplary patterned surfaces that can be used in the methods and compositions set forth herein are described in U.S. application 13/661,524 and U.S. patent application publication 2012/0316086A1, each of which is incorporated herein by reference.
In some embodiments, the solid support comprises an array of holes or recesses in the surface. This may be fabricated using a variety of techniques including, but not limited to, photolithography, imprint, molding, and microetching techniques, as is generally known in the art. Those skilled in the art will appreciate that the technique used will depend on the composition and shape of the array substrate.
The composition and geometry of the solid support may vary with its use. In some embodiments, the solid support is a planar structure, such as a slide, chip, microchip, and/or array. Thus, the surface of the substrate base may be in the form of a planar layer. In some embodiments, the solid support comprises one or more surfaces of a flow cell. As used herein, the term "flow cell" refers to a chamber that includes a solid surface through which one or more fluidic reagents can flow. Examples of flow cells and related fluidic systems and detection platforms that can be readily used in the methods of the present disclosure are described, for example, in the following: bentley et al, nature,456:53-59 (2008); WO 04/018497, US 7,057,026, WO 91/06678, WO 07/123744, US 7,329,492, US 7,211,414, US 7,315,019, US 7,405,281 and US2008/0108082, each of which is incorporated herein by reference.
In some embodiments, the solid support or surface thereof is non-planar, such as an inner or outer surface of a tube or container. In some embodiments, the solid support comprises a microsphere or a bead. By "microsphere" or "bead" or "particle" or grammatical equivalents is meant herein small discrete particles. Suitable bead compositions include, but are not limited to, plastics, ceramics, glass, polystyrene, methylstyrene, acrylic polymers, paramagnetic materials, thoria sol, carbon graphite, titanium dioxide, latex or cross-linked dextran (such as agarose gel), cellulose, nylon, cross-linked micelles, and teflon, as well as any of the other materials for the solid support outlined herein, may be used. "Microsphere Selection Guide" of Bangs Laboratories of fischer in indiana is a useful guideline. In certain embodiments, the microsphere is a magnetic microsphere or bead.
The beads need not be spherical; irregular particles may be used. Alternatively or in addition, the beads may be porous. The bead size is in the range of nanometers (i.e., 100 nm) to millimeters (i.e., 1 mm), where the beads are 0.2 microns to 200 microns, or 0.5 microns to 5 microns, although smaller or larger beads may be used in some embodiments.
In some embodiments, fragment labeling on the beads allows for a more consistent fragment labeling reaction than fragment labeling reaction in solution.
The density of these surface-bound transposomes may be adjusted by varying the density of the first polynucleotide or by the amount of transposase added to the solid support. For example, in some embodiments, the transposome complexes are present on the solid support at a density of at least 103, 104, 105, or 106 complexes per mm 2.
Attachment of the nucleic acid to the carrier, whether rigid or semi-rigid, may be by covalent or non-covalent attachment. Exemplary connections are shown in U.S. patent nos. 6,737,236, 7,259,258, 7,375,234, and 7,427,678; and U.S. patent publication 2011/0059865Al, each of which is incorporated herein by reference. In some embodiments, the nucleic acid or other reaction component may be attached to a gel or other semi-solid support, which in turn is attached or adhered to a solid support. In such embodiments, the nucleic acid or other reaction component is understood to be a solid phase.
In some embodiments, the solid support comprises microparticles, beads, planar supports, patterned surfaces, or pores. In some embodiments, the planar carrier is an inner or outer surface of a tube.
In some embodiments, the solid support has immobilized thereon a library of tagged DNA fragments prepared.
In some embodiments, the solid support comprises a capture oligonucleotide and a first polynucleotide immobilized thereon, wherein the first polynucleotide comprises a 3' portion comprising a transposon end sequence and a first tag.
In some embodiments, the solid support further comprises a transposase bound to the first polynucleotide to form a transposome complex.
In some embodiments, the solid support comprises a capture oligonucleotide and a second polynucleotide immobilized thereon, wherein the second polynucleotide comprises a 3' portion comprising a transposon end sequence and a second tag.
In some embodiments, the solid support further comprises a transposase bound to the second polynucleotide to form a transposome complex.
In some embodiments, the kit comprises a solid support as described herein. In some embodiments, the kit further comprises a transposase. In some embodiments, the kit further comprises a reverse transcriptase polymerase. In some embodiments, the kit further comprises a second solid support for immobilizing DNA.
A number of different methods of immobilizing transposome complexes have been described, such as those described in WO 2018/156519, which is incorporated herein in its entirety. In some embodiments, the first transposon included in the transposome complex comprises an affinity element. In some embodiments, the affinity element is attached to the 5' end of the first transposon. In some embodiments, the first transposon comprises a linker. In some embodiments, the linker has a first end attached to the 5' end of the first transposon and a second end attached to the affinity element.
In some embodiments, the transposome complex further comprises a second transposon that is complementary to at least a portion of the first transposon end sequence. In some embodiments, the second transposon comprises an affinity element. In some embodiments, the affinity element is attached to the 3' end of the second transposon. In some embodiments, the second transposon comprises SEQ ID No. 13. In some embodiments, the second transposon comprises a linker. In some embodiments, the linker has a first end attached to the 3' end of the second transposon and a second end attached to the affinity element.
In some embodiments, the affinity element comprises biotin, avidin, streptavidin, an antibody, or an oligonucleotide. In some embodiments, the affinity element is biotin. In some embodiments, the affinity element comprises an oligonucleotide that can bind to a capture oligonucleotide comprised on the surface of a solid support. In some embodiments, the affinity element comprises an antibody that can bind to a ligand comprised on the surface of a solid support.
As used herein, "bead-linked transposomes" of "BLT" refers to transposomes immobilized on beads. Bead-linked transposomes (BLT) are key technologies in certain NGS library preparation methods, such as library preparation products of Illumina. The bead-linked transposomes take advantage of the unique advantages of enzymatic Tn 5-mediated fragment tagging, as well as the additional advantages of providing library normalization and avoiding the need for quantification of input DNA (fig. 3 and Stephen Bruinsma et al, BMC Genomics 19 (1): 1-16 (2018)). The disadvantage of solution-based fragment labelling schemes is that control of the ratio between genomic DNA substrate and Tn5 enzyme directly affects library fragment size, creating a source of performance variability. BLT is able to better control library fragment size by conjugating a predetermined amount of transposomes to a solid support. Furthermore, the known amount of transposomes bound to the beads provides an upper limit on the amount of DNA substrate that can be converted into a library, resulting in library normalization.
In some embodiments, the solid support comprises a transposome complex as described herein immobilized thereon. In some embodiments, the solid support comprises a bead (i.e., fBLT). Representative data generated with fBLT are shown in fig. 10A-18B.
B. Transposition reaction for fragmentation
Transposition is an enzyme mediated process by which DNA sequences are inserted, deleted and replicated in the genome. This process has been adapted for widespread use in fragmented double-stranded nucleic acids (such as double-stranded DNA and DNA: RNA duplex). Transposition can generate DNA fragments without using the standard fragmenting enzyme protocol outlined in fig. 4. In some embodiments, the method of making library fragments using modified transposon end sequences is performed with a transposome (such as fBLT, as shown in fig. 19) immobilized on a solid support. The method of preparing libraries with fBLT may take about 5.5 hours (as shown in fig. 20), similar to other ligation-based library preparation methods.
In some embodiments, generating fragments with modified transposon ends (such as with fBLT) by the methods of the invention avoids oxidative-related DNA damage during sonication. Such oxidative DNA damage resulting from sonication is well known in the art (see, e.g., costello Nucleic Acids Research 41 (6): e67 (2013)). For example, the use of fBLT results in a reduction of false positive G > T transversions by about 50 times, as these transversions may be driven by oxidative damage to guanine during sonication.
Although this transposition reaction will be described with Tn5, other transposases (described below) may also mediate similar reactions.
The well-studied E.coli (E.coli) Tn5 transposon was mobilized by "cut and paste" transposon machinery. First, tn5 transposase Tnp (hereinafter Tn 5) recognizes a conserved substrate sequence on either side of the transposon DNA, and then excises or "cleaves" the transposon DNA from the genome. Tn5 then inserts or "sticks" the transposon DNA into the target DNA.
Tn5 has been utilized in many library preparation reagents (such as those of Illumina) due to its ability to "tag" fragments, i.e., to "tag" and "fragment" genomic fragments simultaneously, thereby greatly reducing the time and complexity involved in conventional sonication/ligation-based library preparation protocols. To support its use in library preparation, tn5 preloads transposons consisting of a conserved substrate sequence, termed a "mosaic end" or "end sequence", attached to an adaptor sequence (e.g., illumina a14 and B15 adaptor sequences). This transposome complex comprising the Tn5 transposase and the adaptor-carrying transposon sequence is then mixed with the genomic DNA sample. The resulting library-prepared transposons carry only short adaptor sequences, resulting in both fragmentation of genomic DNA and tagging with short adaptor sequences (fig. 2).
In some embodiments, transposition with a modified transposon end described herein produces equivalent results to transposition with a wild type (i.e., a transposon end that does not contain a mutation described herein). In some embodiments, preparing a fragment with a transposome complex as described herein results in preparing at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the number of fragments as compared to preparing a fragment with a transposome complex comprising a first transposon comprising a transposon end sequence comprising a wild type mosaic end sequence comprising SEQ ID No. 1.
In some embodiments, the transposition reaction is performed with a transposome complex comprising a modified transposon end at the 3' end of the transfer strand.
1. Library fragments generated by single fragment tagging events
In general, the fragment tagging method for library preparation requires tagging of both ends of the fragment to incorporate adaptor sequences for sequencing methods. However, the fragmentation method of the present invention (such as with fBLT) allows the possibility of preparing fragments by a single fragment labelling event, wherein a mosaic end sequence is added at only one end of the fragment. After fragment tagging/cleavage with a single fragment tagging event, both ends of the fragment may undergo end repair followed by ligation of adaptors, as shown in FIG. 18A. Because single fragment tagging events typically result in non-sequencable fragments (because adaptor sequences will be incorporated at only one end) using standard methods, the ability to sequence fragments after a single fragment tagging event may be referred to as "saving" the single fragment tagging event (as shown in FIG. 18B).
In some embodiments, the transposition reaction for fragmentation improves the preparation of libraries from samples comprising partially fragmented nucleic acids. In some embodiments, the transposition reaction for fragmentation can be used to fragment one end of a DNA molecule followed by end repair and ligation of adaptors at the fragmented and non-fragmented ends of the molecule. In such a workflow, cleavage of the ME sequence is performed only at one end of the fragment, but the other end of the fragment may also be subjected to end repair followed by adaptor ligation, as shown in FIG. 18A. In this way, adaptors are ligated to both ends of the fragments.
In some embodiments, the fBLT workflow allows for rescue of library fragments prepared with a single fragment tagging event. Rescue of library fragments prepared with a single fragment tagging event may improve the results of samples containing partially fragmented DNA, among other things. This is because fragments prepared from partially fragmented DNA by two fragment tagging events may be shorter than the preferred length for sequencing, resulting in a loss of successful sequencing. This effect may be partially offset by the ability to rescue individual fragment tagging events using the methods described herein.
In some embodiments, the sample comprises partially fragmented DNA. In some embodiments, the sample comprising partially fragmented DNA is Formalin Fixed Paraffin Embedded (FFPE) tissue or cell free DNA. In some embodiments, the library comprises fragments prepared by a single fragment tagging event.
2. Normalization with fBLT
The fBLT described so far can be used for library normalization. As used herein, "normalization" or "library normalization" refers to the process of diluting a library of variable 441 concentration to the same or similar concentration prior to volumetric pooling (volumetric pooling).
In some embodiments, normalization helps ensure a uniform read distribution for all samples during sequencing. In other words, the normalized library may help ensure a uniform representation in the final sequencing data. In some embodiments, library normalization using fBLT avoids downstream steps of manual normalization schemes.
The requirement for manual normalization of library concentrations is well known in the art (see, e.g., best Practices for Manually Normalizing Library Concentrations, illumina,2021, 22, 4). In some embodiments, the method of normalization with fBLT does not require calculation of library concentrations. In this way, the user can avoid time consuming and cumbersome calculations and dilutions during normalization.
Some bead-linked transposome (BLT) methods allow bead-based normalization, such asDNA Prep (M) fragment tagging (previously called Nextera DNA Flex). In some embodiments, fBLT similarly allows bead-based normalization. The ability to normalize with a bead-based approach avoids the time and potential sample loss of performing a separate normalization protocol after library preparation.
C. Adjustable library fragment size determination using fBLT
In some embodiments, fBLT (instead of solution phase transposomes) generates consistent fragment sizes and library yields. U.S. patent No. 9,683,230 and U.S. publication No. 2018-0155709 (each incorporated herein by reference in its entirety) describe the use of BLT to control the size of library fragments.
Fragment size is a function of the amount and size of transposomes to DNA and the ratio to duration of reaction. However, even if these parameters are controlled in the solution photo-segment labelling reaction, size selective fractionation is often required as an additional step to remove excess small fragments. In other words, fragment size control may be better managed with BLT than solution phase fragment tagging.
In some embodiments, fBLT allows for selection of final fragment size based on spatial separation of the bound transposomes, irrespective of the amount of transposome beads added to the fragment tagging reaction. Another limitation of solution-based fragment labelling is that some form of purification of the product of the fragment labelling reaction is typically required before and after PCR amplification. This typically requires some reaction transfer between the tubes. In contrast, the fragment tagged product on fBLT can be washed and then released for amplification or other downstream processing, thereby obviating the need for sample transfer. For example, in embodiments in which the transposomes are assembled on paramagnetic beads, purification of the fragment-tagged reaction products can be easily achieved by immobilizing the beads with a magnet and washing. Thus, in some embodiments, both fragment tagging and other downstream processing (such as PCR amplification) can be performed in a single tube, vessel, droplet, or other container.
In some embodiments, the density of transposomes immobilized on the solid surface is selected to modulate the fragment size and library yield of the immobilized fragments. In some embodiments, the spacing of active transposomes on the bead surface of fBLT can be used to control the size distribution of the inserted sequences. For example, gaps on the surface of the beads can be filled with inactive transposomes (e.g., transposomes with inactive transposons).
D. Removal of damascene ends
In order to be able to convert Tn5 into a fragmenting enzyme system, a mechanism to selectively remove the mosaic terminal sequences after transposition is required. Such potential mechanisms may include (1) restriction enzymes that recognize sequences within the ends of the mosaic, (2) single stranded dnases that utilize a 9-nucleotide gap that exists on either side of the insertion sequence after transposition, and (3) combinations of (a) endonucleases, or (b) DNA glycosylases with thermal, alkaline conditions, or endonucleases/lyases that recognize abasic sites (see fig. 5). Restriction enzymes are disadvantageous in that they will cleave at other homologous sites within the genomic DNA, resulting in bias. Single stranded nucleases can also potentially be used to remove the mosaic end sequences. However, double-stranded DNA is known to "breathe" at its ends, which often results in off-target digestion of double-stranded DNA and is difficult to control (see Neelam A. Desai and Vepatu Sharkar, FEMS Microbiology Reviews (5): 457-91 (2003)).
The method of the present invention using enzyme selective cleavage of the mosaic ends is a very attractive mechanism for converting Tn5 into a fragmented enzyme system (i.e., generating fragments lacking the mosaic ends). As used herein, "base modification" or "DNA base modification" refers to the location in a double-stranded nucleic acid of a modified base (such as those described in table 3) that would be recognized by an enzyme, such as (a) an endonuclease or (b) a DNA glycosylase in combination with heat, alkaline conditions, or an endonuclease/lyase that recognizes abasic sites, at which cleavage is triggered. In some embodiments, the endonuclease or DNA glycosylase is modification specific.
In some embodiments, the base modification uses the following cleavage: (1) An endonuclease or (2) a DNA glycosylase in combination with heat, alkaline conditions, or an endonuclease/lyase that recognizes abasic sites. For example, a DNA glycosylase can create an abasic site and then act on the abasic site by heat, alkaline conditions, or an endonuclease/lyase that recognizes the abasic site. The USER reagent is an exemplary enzyme mixture comprising a DNA glycosylase and an endonuclease/lyase recognizing abasic sites. The user can choose how to cleave at abasic sites according to their preferred workflow. FIG. 7B outlines how a modified specific endonuclease cleaves a modified base in a 1-step reaction, or a modified specific glycosylase followed by an AP lyase/endonuclease or heat cleaves a modified base in a 2-step reaction.
Fragments prepared according to the modification site at one or more of positions 16-19 of SEQ ID NO. 1, whereby such transposition reactions are followed by cleavage at the modified bases will comprise an insert with a 5' overhang with 5' phosphate and 3' -OH, as well as 0-3 bases of the ME sequence.
In some embodiments, cleavage of the modified mosaic end sequence is mediated by: (a) An endonuclease or (b) a DNA glycosylase in combination with heat, alkaline conditions, or an endonuclease/lyase recognizing a abasic site. In some embodiments, the combination of (a) an endonuclease or (b) a DNA glycosylase with heat, alkaline conditions, or an endonuclease/lyase recognizing abasic sites may mediate cleavage at uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine.
In some embodiments, the combination of (a) an endonuclease or (b) a DNA glycosylase with heat, alkaline conditions, or an endonuclease/cleavage glycosylase that recognizes an abasic site is USER, endonuclease V, RNA enzyme HII, carboxamide pyrimidine-DNA glycosylase (FPG), oxoguanine glycosylase (OGG), endonuclease III (Nth), endonuclease VIII, a mixture of human alkyl adenine DNA glycosylase plus endonuclease VIII or endonuclease III, thymine-DNA glycosylase (TDG) or a mixture of mammalian DNA glycosylase-methyl-CpG binding domain protein 4 (MBD 4) plus endonuclease VIII or endonuclease III, or DNA glycosylase/cleavage ROS1 (ROS 1). In some embodiments, ROS1 may act as a modification-endonuclease based on its bifunctional glycosylase/lyase activity.
In some embodiments, the modified transposon end sequence comprises uracil and the mixture is an N-glycosylase, and the apurinic or apyrimidinic site (AP) lyase/endonuclease is a uracil specific excision agent (USER). In some embodiments, the USER is a mixture of uracil DNA glycosylase and endonuclease VIII or endonuclease III.
In some embodiments, the modified transposon end sequence comprises inosine and the endonuclease is endonuclease V. In some embodiments, the modified transposon end sequence comprises ribose, and the endonuclease is rnase HII.
In some embodiments, the modified transposon end sequence comprises 8-oxoguanine and the endonuclease is a carboxamide pyrimidine-DNA glycosylase (FPG) or an oxoguanine glycosylase (OGG).
In some embodiments, the modified transposon end sequence comprises thymine diol and the DNA glycosylase is endonuclease III (Nth) or endonuclease VIII.
In some embodiments, the modified transposon end sequence comprises a modified purine, and the DNA glycosylase and endonuclease/lyase recognizing abasic sites are a mixture of human alkyl adenine DNA glycosylase (hAAG) plus endonuclease VIII or endonuclease III.
In some embodiments, the modified transposon end sequence comprises a modified pyrimidine, and the DNA glycosylase is TDG or MBD4, and the endonuclease/lyase recognizing the abasic site is endonuclease VIII or endonuclease III. An alternative modification-specific endonuclease for use with a modified transposon end comprising a modified pyrimidine is ROS1.
In some embodiments, the first transposon comprises a modified transposon end sequence comprising more than one mutation selected from uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine, and the mixture comprises an endonuclease or a DNA glycosylase and an endonuclease/lyase that recognizes abasic sites. In some embodiments, the endonuclease or DNA glycosylase and endonuclease/lyase recognizing abasic site mixture comprises more than one enzyme selected from the group consisting of: USER, endonuclease V, RNA enzyme HII, carboxamide pyrimidine-dnase (FPG), oxoguanine enzyme (OGG), endonuclease III (Nth), endonuclease VIII, hAAG plus endonuclease VIII/endonuclease III mixture, or TDG or MBD4 and endonuclease VIII/endonuclease III mixture, or ROS1. In some embodiments, the cleavage efficiency of the mosaic end sequence is improved by a method comprising a combination of a modified transposon end sequence and an endonuclease and/or a DNA glycosylase comprising more than one mutation with an endonuclease/lyase recognizing an abasic site, compared to a method comprising a combination of a modified transposon end sequence comprising a single mutation with a single endonuclease or a DNA glycosylase with an endonuclease/lyase recognizing an abasic site. For ROS1, a single endonuclease has both glycosylase and lyase actions.
In some embodiments, the method of fragmenting a double-stranded nucleic acid comprises combining a sample comprising the double-stranded nucleic acid with a transposome complex and preparing the fragment.
In some embodiments, a method of preparing a double-stranded nucleic acid fragment lacking all or a portion of a first transposon end comprises combining a sample comprising nucleic acid with a transposome complex and preparing the fragment; and combining the sample with (1) an endonuclease or (2) a DNA glycosylase and thermal, alkaline conditions, or a combination of endonucleases/lyases that recognize abasic sites, and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine diol, modified purine, and/or modified pyrimidine within the mosaic sequence to remove all or a portion of the first transposon end from the fragment. In some embodiments, the modified purine is 3-methyladenine or 7-methylguanine. In some embodiments, the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine. In some embodiments, the method cleaves all or a portion of the first transposon end (transfer strand) from the fragment.
In some embodiments, cleavage of the first transposon end generates a cohesive end for ligating adaptors. As used herein, a "cohesive end" is an end of a double-stranded fragment in which one strand is longer than the other (i.e., there is an overhang) and which allows ligation of adaptors comprising complementary overhangs.
In some embodiments, the adaptors are added after all or a portion of the first transposon ends are removed from the fragment. In some embodiments, the adaptors are added by ligation. In some embodiments, end repair and A-tailing can achieve adaptor ligation. Those skilled in the art will appreciate other means of adding adaptors, such as PCR amplification or click chemistry.
E. Ligation of adaptors
In some embodiments, a method of preparing a double-stranded nucleic acid fragment comprising an adapter comprises combining a sample comprising a nucleic acid with a transposome complex as described herein and preparing the fragment; combining the sample with (1) an endonuclease or (2) a DNA glycosylase and heat, alkaline conditions, or a combination of endonucleases/lyases that recognize abasic sites, and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine within the mosaic end sequence to remove all or a portion of the first transposon end from the fragment; and ligating the adaptors to the 5 'and/or 3' ends of the fragments. A representative overview of the steps is shown in fig. 7A.
In some embodiments, the sequence comprising the adapter is ligated to the library fragment after removal of all or a portion of the mosaic end sequence. Fragments that have undergone adaptor ligation to the 5 'and/or 3' ends of the fragment may be referred to as "tagged fragments".
In some embodiments, ligation is performed with a DNA ligase.
In some embodiments, the adaptors comprise double-stranded adaptors.
In some embodiments, adaptors are added to the 5 'and 3' ends of the fragments. In some embodiments, the adaptors added to the 5 'and 3' ends of the fragments are different.
Various library preparation methods including adaptor ligation steps are known in the art, such as TruSeq and TruSight Oncology 500 (see, e.g.,an adapter for use with RNA Sample Preparation v Guide,15026495rev.f, illumina, 2014) with other ligation methods may be used in the methods of the invention (see, e.g., illumina Adapter Sequences, illumina, 2021). Adapters useful in the present invention also include those described in WO 2008/093098, WO 2008/096146, WO 2018/208699, and WO 2019/055715, each of which is incorporated herein by reference in its entirety.
In some embodiments, adaptor ligation may allow for more flexible incorporation of adaptors (such as adaptors having longer lengths) than methods of tagging fragments via fragment tagging, wherein adaptor sequences are incorporated into fragments during a transposition reaction. In some methods involving fragment tagging, additional adaptor sequences (such as those described in U.S. patent publication No. 20180201992 A1) may be incorporated by PCR reactions, and the methods of the present invention may avoid the need for additional PCR steps to incorporate additional adaptor sequences.
Ligation techniques are commonly used to prepare NGS libraries for sequencing. In some embodiments, the ligating step uses an enzyme to ligate a specific adapter to both ends of the DNA fragment. In some embodiments, a-bases are added to the blunt end of each strand, ready for ligation to sequencing adaptors. In some embodiments, each adapter contains a T-base overhang that provides a complementary overhang for ligating the adapter to the A-tailed fragmented DNA.
Adaptor ligation schemes are known to have advantages over other methods. For example, adaptor ligation can be used to generate fully complementary sequences for sequencing primer hybridization sites of single, end-paired and indexed reads. In some embodiments, the adaptor ligation eliminates the need for additional PCR steps to add index tags and index primer sites.
In some embodiments, the adapter comprises a Unique Molecular Identifier (UMI), a primer sequence, an anchor sequence, a universal sequence, a spacer, an index sequence, a capture sequence, a barcode sequence, a cleavage sequence, a sequencing-related sequence, and combinations thereof. As used herein, "barcode sequence" refers to a sequence that can be used to distinguish samples. As used herein, a sequencing-related sequence may be any sequence that is related to a subsequent sequencing step. Sequencing related sequences can be used to simplify downstream sequencing steps. For example, a sequencing related sequence may be a sequence that is incorporated by the step of ligating an adapter to a nucleic acid fragment. In some embodiments, the adaptor sequences comprise P5 or P7 sequences (or their complements) to facilitate binding to the flow cell in certain sequencing methods.
In some embodiments, the adapter comprises UMI. In some embodiments, adaptors comprising UMI are ligated to the 3 'and 5' ends of the fragments.
In some embodiments, the adapter may be a fork adapter. As used herein, "fork-shaped adaptor" refers to an adaptor comprising two strands of nucleic acid, wherein each of the two strands comprises a region complementary to the other strand and a region not complementary to the other strand. In some embodiments, the two nucleic acid strands in the fork adapter are annealed together prior to ligation, wherein annealing is based on the complementary region. In some embodiments, the complementary regions each comprise 12 nucleotides. In some embodiments, fork adaptors are ligated to both strands at the ends of the double stranded DNA fragments. In some embodiments, a fork adaptor is ligated to one end of a double stranded DNA fragment. In some embodiments, fork adaptors are ligated to both ends of the double stranded DNA fragments. In some embodiments, the fork adaptors on opposite ends of the fragments are different. In some embodiments, one strand of a fork adapter is phosphorylated at its 5' end to facilitate ligation to a fragment. In some embodiments, one strand of the fork adapter has a phosphorothioate linkage immediately preceding the 3't. In some embodiments, the 3't is an overhang (i.e., does not pair with a nucleotide in the other strand of the fork adaptor). In some embodiments, the 3't overhang can base pair with the a tail present on the library fragment. In some embodiments, the phosphorothioate linkage blocks exonuclease digestion of the 3't overhang. In some embodiments, PCR with partially complementary primers is used to extend the ends and split the fork after adaptor ligation.
In some embodiments, the adapter may comprise a tag. As used herein, the term "tag" refers to a portion or domain of a polynucleotide that exhibits a sequence for a desired intended purpose or application. The tag domain may comprise any sequence provided for any desired purpose. For example, in some embodiments, the tag domain comprises one or more restriction endonuclease recognition sites. In some embodiments, the tag domain comprises one or more regions suitable for hybridization with a primer for a cluster amplification reaction. In some embodiments, the tag domain comprises one or more regions suitable for hybridization to a primer for a sequencing reaction. It should be appreciated that any other suitable feature may be incorporated into the tag domain. In some embodiments, the tag domain comprises a sequence from 5bp to 200bp in length. In some embodiments, the tag domain comprises a sequence from 10bp to 100bp in length. In some embodiments, the tag domain comprises a sequence of 20bp to 50bp in length. In some embodiments, the tag domain comprises a sequence of 5bp, 6bp, 7bp, 8bp, 9bp, 10bp, 20bp, 30bp, 40bp, 50bp, 60bp, 70bp, 80bp, 90bp, 100bp, 150bp, or 200bp in length.
The tag may include one or more functional sequences or components (e.g., primer sequences, anchor sequences, universal sequences, spacers, or index tag sequences) as needed or desired.
In some embodiments, the tag comprises a region for cluster amplification. In some embodiments, the tag comprises a region for initiating a sequencing reaction.
In some embodiments, the method further comprises amplifying the fragments on the solid support by reacting the polymerase with an amplification primer corresponding to a portion of the tag. In some embodiments, the portion of the adapter that is ligated to the fragment after removal of all or a portion of the mosaic end sequences comprises amplification primers. In some embodiments, the tag of the first transposon comprises an amplification primer.
In some embodiments, the tag comprises an a14 primer sequence. In some embodiments, the tag comprises a B15 primer sequence.
In some embodiments, the transposomes on a single bead carry a unique index, and if a large number of such indexed beads are used, phased transcripts will be produced.
Adaptors that are ligated to library fragments may have advantages over adaptors that are incorporated during fragment tagging. For example, by labeling individual fragments with unique sequence tags prior to PCR, unique Molecular Identifiers (UMI) can be used to achieve high sensitivity variant detection (see Jesse J. Salk et al, nature Reviews Genetics (5): 269-85 (2018)). Some library preparation products, such as TSO 500 (Illumina), include ligation-based UMI provision, wherein the UMI sequence is incorporated adjacent to the library insertion sequence, enabling simultaneous sequencing as part of the insertion read. Thus, the development of fBLT enables the use of existing ligation-based products (such as using existing adaptors and protocols) while enabling compatibility with existing enrichment workflows and on-board sequencing primers.
FIG. 19 presents some representative different adaptor workflows that a user may wish to employ with fBLT. For example, a high sensitivity UMI workflow can be used in which adaptors are incorporated into UMI. Alternatively, a PCR workflow to add UMI during PCR amplification can be used with standard fork adaptors. In addition, PCR-free workflow can be used with index fork adaptors that avoid the need for PCR. Thus, the advantage of fBLT is that they allow the user to select the adaptor of most interest for their particular pool preparation. Other library preparation methods, such as fragment tagging, have greater stringency in terms of the composition of the adaptor sequences that can be used.
F. Sample and target nucleic acid
In some embodiments, the sample comprises nucleic acid. The nucleic acid contained in the sample may be referred to as a "target nucleic acid". In some embodiments, the sample comprises DNA. In some embodiments, the DNA is genomic DNA. In some embodiments, the target nucleic acid is double-stranded DNA.
In some embodiments, the sample comprises RNA. In some embodiments, RNA can be converted to double-stranded cDNA or DNA: RNA duplex (i.e., RNA that hybridizes to a single strand of cDNA).
In some embodiments, the nucleic acid is double-stranded DNA. In some embodiments, the nucleic acid is RNA and a double-stranded cDNA or DNA: RNA duplex is generated prior to combining with the transposome complex.
The biological sample may be of any type comprising nucleic acids. For example, a sample may include nucleic acids in a variety of states of purification, including purified nucleic acids. However, the sample need not be fully purified and may comprise nucleic acids, for example, mixed with proteins, other nucleic acid materials, other cellular components, and/or any other contaminants. In some embodiments, the biological sample comprises a mixture of nucleic acids, proteins, other nucleic acid materials, other cellular components, and/or any other contaminants present in substantially the same proportions as found in vivo. For example, in some embodiments, these components are present in the same proportions as found in intact cells. In some embodiments, the biological sample has a 260/280 absorbance ratio of less than or equal to 2.0, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9, 0.8, 0.7, or 0.60. In some embodiments, the biological sample has a 260/280 absorbance ratio of at least 2.0, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9, 0.8, 0.7, or 0.60. Because the methods provided herein allow nucleic acids to bind to a solid support, other contaminants can be removed by merely washing the solid support after surface-bound transposition has occurred. Biological samples may include, for example, crude cell lysates or whole cells. For example, a crude cell lysate applied to a solid support in the methods illustrated herein need not be subjected to one or more separation steps conventionally used to separate nucleic acids from other cell components. Exemplary isolation procedures are shown in Maniatis et al, molecular Cloning: A Laboratory Manual, 2 nd edition, 1989 and Short Protocols in Molecular Biology, ausubel et al, incorporated herein by reference.
In some embodiments, the sample applied to the solid support has a 260/280 absorbance ratio of less than or equal to 1.7.
Thus, in some embodiments, a biological sample may include, for example, blood, plasma, serum, lymph, mucus, sputum, urine, semen, cerebrospinal fluid, bronchial aspirate, fecal and macerated tissue or lysates thereof, or any other biological sample comprising nucleic acids.
In some embodiments, the sample is blood. In some embodiments, the sample is a cell lysate. In some embodiments, the cell lysate is a crude cell lysate. In some embodiments, the method further comprises lysing cells in the sample after applying the sample to the solid support to produce a cell lysate.
In some embodiments, the sample is a biopsy sample. In some embodiments, the biopsy sample is a liquid or solid sample. In some embodiments, a biopsy sample from a cancer patient is used to evaluate the sequence of interest to determine if the subject has certain mutations or variants in the predicted gene.
One advantage of the methods and compositions presented herein is that biological samples can be added to the flow cell and that subsequent lysis and purification steps can both be performed in the flow cell without further transfer or treatment steps, simply by flowing the necessary reagents into the flow cell.
G. Gap-filling ligation, phosphorylation and A-tailing
In some embodiments, gaps in the DNA sequence left after the transposition event may also be filled using a strand displacement extension reaction that comprises Bst DNA polymerase and dNTP mix. In some embodiments, gap-filling ligation is performed using an extension-ligation mix buffer.
In some embodiments, the method comprises treating the plurality of 5' fragments with a polymerase and a ligase to extend and ligate the strands to produce the fully double stranded fragments.
The library of double stranded DNA fragments can then optionally be amplified (such as by cluster amplification) and sequenced with sequencing primers.
In some embodiments, all or a portion of the cleaved first transposon end is separated from the remainder of the sample.
In some embodiments, the method further comprises filling in the 3 'end of the fragment and phosphorylating the 3' end of the fragment with a kinase prior to ligation. In some embodiments, the ends generated by cleavage of the mosaic end sequences are not blunt-ended (i.e., one strand of the double-stranded fragment has a cohesive overhang compared to the other). In some embodiments, the cohesive protruding ends are unfilled and adaptors are attached to the cohesive protruding ends, wherein the adaptors have complementary cohesive ends.
In some embodiments, the fragment comprises 0-3 bases of the mosaic end sequence. In some embodiments, one strand of a double-stranded fragment has a different number of bases from the mosaic end sequence than the other strand (i.e., the ends of the fragment have overhangs and are not blunt ends). In some embodiments, overhangs generated by cleavage of the mosaic end sequences are filled in. In some embodiments, filling the ends generated by cleavage of the mosaic end sequences is performed with T4 DNA polymerase.
In some embodiments, the method further comprises adding a single a overhang to the 3' end of the fragment. In some embodiments, adding a single a overhang may be referred to as "a-tailing". In some embodiments, a-tailing improves ligation of adaptors (such as fork adaptors). In some embodiments, one strand of a fork adapter comprises a T-overhang that can base pair with an a-tail on a fragment.
In some embodiments, the polymerase adds a single a overhang. In some embodiments, the polymerase is (i) Taq or (ii) a Klenow fragment, without exonuclease activity.
H. Amplification of
The disclosure also relates to amplifying fragments produced according to the methods provided herein. In some embodiments, the fragment is tagged at one or both ends of the fragment by ligation of adaptors. In some embodiments, the immobilized fragments are amplified on a solid support. In some embodiments, the solid support is the same as the solid support upon which surface-bound transposition occurs. In such embodiments, the methods and compositions provided herein allow sample preparation by amplification and optionally by sequencing steps on the same solid support from the initial sample introduction step.
In some embodiments, the fragments are amplified prior to sequencing.
For example, in some embodiments, the immobilized fragments are amplified using a cluster amplification method, as exemplified by the disclosures of U.S. patent nos. 7,985,565 and 7,115,400, the contents of each of which are incorporated herein by reference in their entirety. The incorporated materials of us patent 7,985,565 and 7,115,400 describe methods of solid phase nucleic acid amplification that allow the amplification products to be immobilized on a solid support to form an array of clusters or "clusters" of immobilized nucleic acid molecules. Each cluster or cluster on such an array is formed from a plurality of identical immobilized polynucleotide strands and a plurality of identical immobilized complementary polynucleotide strands. The array so formed is generally referred to herein as a "clustered array". The products of solid phase amplification reactions, such as those described in U.S. Pat. nos. 7,985,565 and 7,115,400, are so-called "bridged" structures that are formed by annealing pairs of immobilized polynucleotide strands and immobilized complementary strands (both strands in some embodiments being immobilized to a solid support via covalent attachment at the 5' end). The cluster amplification method is an example of a method in which an immobilized nucleic acid template is used to generate an immobilized amplicon. Other suitable methods may also be used to generate immobilized amplicons from immobilized DNA fragments generated according to the methods provided herein. For example, one or more clusters or clusters may be formed via solid phase PCR, whether or not one or both of the amplification primers of each pair are immobilized.
In other embodiments, the fragments are amplified in solution. For example, in some embodiments, the fragments are cleaved or otherwise released from the solid support, and then the amplification primers hybridize to the released molecules in solution. In other embodiments, the amplification primers hybridize to the tagged fragments to perform one or more initial amplification steps, followed by subsequent amplification steps in solution. In some embodiments, the immobilized nucleic acid template may be used to generate a solution phase amplicon.
It will be appreciated that any of the amplification methods described herein or generally known in the art may be used with a universal primer or target specific primer to amplify a tagged fragment. Suitable methods for amplification include, but are not limited to, polymerase Chain Reaction (PCR), strand Displacement Amplification (SDA), transcription Mediated Amplification (TMA), and Nucleic Acid Sequence Based Amplification (NASBA), as described in U.S. patent No. 8,003,354, which is incorporated herein by reference in its entirety. The amplification methods described above may be used to amplify one or more nucleic acids of interest. For example, the immobilized DNA fragment may be amplified by PCR (including multiplex PCR), SDA, TMA, NASBA, or the like. In some embodiments, primers specific for the nucleic acid of interest are included in the amplification reaction.
Other suitable nucleic acid amplification methods may include oligonucleotide extension and ligation, rolling Circle Amplification (RCA) (Lizardi et al, nat. Genet.19:225-232 (1998), incorporated herein by reference) and Oligonucleotide Ligation Assays (OLA) (see generally U.S. Pat. Nos. 7,582,420, 5,185,243, 5,679,524 and 5,573,907,EP 0 320 308 B1,EP 0 336 731 B1,EP 0 439 182B1,WO 90/01069, WO 89/12696 and WO 89/09835, all of which are incorporated herein by reference). It should be appreciated that these amplification methods can be designed to amplify immobilized DNA fragments. For example, in some embodiments, the amplification method may comprise ligation probe amplification or an Oligonucleotide Ligation Assay (OLA) reaction containing primers specific for the nucleic acid of interest. In some embodiments, the amplification method may include a primer extension-ligation reaction that includes a primer specific for a nucleic acid of interest. As non-limiting examples of primer extension and ligation primers that can be specifically designed for amplifying a nucleic acid of interest, amplification can include primers for use in a GoldenGate assay (Illumina, inc., san Diego, CA), as exemplified by U.S. patent nos. 7,582,420 and 7,611,869, each of which is incorporated herein by reference in its entirety.
Exemplary isothermal amplification methods that may be used in the methods of the present disclosure include, but are not limited to, multiplex Displacement Amplification (MDA) as exemplified by, for example, dean et al, proc. Natl. Acad. Sci. USA 99:5261-66 (2002), or isothermal strand displacement nucleic acid amplification as exemplified by, for example, U.S. Pat. No. 6,214,587, each of which is incorporated herein by reference in its entirety. Other non-PCR-based methods that may be used in the present disclosure include: such as Strand Displacement Amplification (SDA), described, for example, in Walker et al, molecular Methods for Virus Detection, academic Press, inc., 1995; U.S. Pat. Nos. 5,455,166 and 5,130,238, and Walker et al, nucl. Acids Res, volume 20: pages 1691-1696 (1992); or hyperbranched strand displacement amplification, described, for example, in Lage et al, genome Research, volume 13, pages 294-307 (2003), each of which is incorporated herein by reference in its entirety. Isothermal amplification methods can be used for random primer amplification of genomic DNA along with 5 'to 3' exonuclease activity of a strand displacement Phi 29 polymerase or Bst DNA polymerase large fragment. The use of these polymerases takes advantage of their high processivity and strand displacement activity. The high processivity allows the polymerase to generate fragments ranging from 10kb to 20kb in length. As described above, a polymerase having low processivity and strand displacement activity (such as Klenow polymerase) can be used to produce smaller fragments under isothermal conditions. Additional descriptions of amplification reactions, conditions, and components are set forth in detail in the disclosure of U.S. patent 7,670,810, which is incorporated herein by reference in its entirety.
Another nucleic acid amplification method useful in the present disclosure is tagged PCR using a population of two domain primers with constant 5 'regions followed by random 3' regions, as in Grothues et al, nucleic Acids Res, vol.21, phase 5: pages 1321-1322 (1993), which are incorporated herein by reference in their entirety. Based on individual hybridization from the randomly synthesized 3' region, a first round of amplification was performed to allow for a large number of priming of heat denatured DNA. Due to the nature of the 3' region, it is envisaged that the start site is random throughout the genome. Unbound primer can then be removed and further replication can be performed using primers complementary to the constant 5' region.
I. Sequencing
In some embodiments, the method further comprises sequencing the fragment after removing all or a portion of the first transposon end from the fragment.
In some embodiments, the method further comprises sequencing the fragment after ligating the adaptors. In some embodiments, the method does not require fragment amplification prior to sequencing. In some embodiments, the fragments are amplified prior to sequencing.
In some embodiments, the method further comprises enriching for the fragment of interest after ligating the adaptors and prior to sequencing. Enrichment can be performed with a variety of commercially available reagents, such as RNA Prep (Illumina document number: 1000000124435) with an enrichment reference guide.
The disclosure also relates to sequencing tagged fragments produced according to the methods provided herein. In some embodiments, the method comprises sequencing one or more of the 5 'tagged and/or 3' tagged fragments or the fully double-stranded tagged fragments after ligating adaptors at one or both ends of the fragments. In some embodiments, the adapter comprises a sequence primer binding sequence to facilitate sequencing.
The tagged fragments may be sequenced according to any suitable sequencing method, such as direct sequencing, including sequencing-by-synthesis, sequencing-by-ligation, sequencing-by-hybridization, nanopore sequencing, and the like. In some embodiments, the tagged fragments are sequenced on a solid support. In some embodiments, the solid support used for sequencing is the same as the solid support on which the adaptor ligation occurs. In some embodiments, the solid support used for sequencing is the same as the solid support on which amplification occurs.
One exemplary sequencing method is sequencing-by-synthesis (SBS). In SBS, the extension of a nucleic acid primer along a nucleic acid template (e.g., a target nucleic acid or amplicon thereof) is monitored to determine the sequence of nucleotides in the template. The underlying chemical process may be polymerization (e.g., catalyzed by a polymerase). In certain polymerase-based SBS embodiments, fluorescently labeled nucleotides are added to the primers (and thus the primers are extended) in a template-dependent manner, such that detection of the order and type of nucleotides added to the primers can be used to determine the sequence of the template.
The flow-through cell provides a convenient solid support for containing amplified DNA fragments produced by the methods of the present disclosure. One or more amplified DNA fragments in this format may be subjected to SBS or other detection techniques involving repeated delivery of reagents in the circulation. For example, to initiate a first SBS cycle, one or more labeled nucleotides, DNA polymerase, etc. may flow into/through a flow cell containing one or more amplified nucleic acid molecules. Those sites where primer extension causes incorporation of the labeled nucleotide can be detected. Optionally, the nucleotide may also include a reversible termination property that terminates further primer extension upon addition of the nucleotide to the primer. For example, a nucleotide analog with a reversible terminator moiety may be added to the primer such that subsequent extension does not occur until the deblocking agent is delivered to remove the moiety. Thus, for embodiments using reversible termination, the deblocking reagent may be delivered to the flow-through cell (either before or after detection occurs). Washing may be performed between the various delivery steps. The cycle may then be repeated n times to extend the primer n nucleotides, thereby detecting a sequence of length n. Exemplary SBS procedures, fluidic systems, and detection platforms that can be readily adapted for use with amplicons produced by the methods of the present disclosure are described, for example, in the following documents: bentley et al, nature 456:53-59 (2008), WO 04/018497, US 7,057,026, WO 91/06678, WO 07/123744, US 7,329,492, US 7,211,414, US 7,315,019, US 7,405,281, and US 2008/0108082, each of which is incorporated herein by reference.
Other sequencing procedures using cycling reactions, such as pyrosequencing, may be used. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) when a particular nucleotide is incorporated into a nascent nucleic acid strand (Ronaghi et al Analytical Biochemistry 242 (1), 84-9 (1996), ronaghi, genome Res. 11, volume 1, pages 3-11 (2001), ronaghi et al Science, volume 281, 5375, page 363 (1998), U.S. Pat. No. 6,210,891, U.S. Pat. No. 6,258,568, and U.S. Pat. No. 6,274,320, each of which is incorporated herein by reference.
Some embodiments may utilize methods involving real-time monitoring of DNA polymerase activity. For example, nucleotide incorporation can be detected by Fluorescence Resonance Energy Transfer (FRET) interactions between a fluorophore-bearing polymerase and a gamma-phosphate labeled nucleotide or by use of a Zero Mode Waveguide (ZMW). Techniques and reagents for FRET-based sequencing are described, for example, in the following documents: levene et al, science 299,682-686 (2003); lundquist et al, opt. Lett.33,1026-1028 (2008); korlach et al, proc.Natl.Acad.Sci.USA 105,1176-1181 (2008), the disclosures of which are incorporated herein by reference.
Some SBS embodiments include detecting protons released upon incorporation of a nucleotide into an extension product. For example, sequencing based on proton release detection may use an electrical detector commercially available from Ion Torrent corporation (Guilford, CT, which is a Life Technologies sub-company) and related techniques or sequencing methods and systems described in US 2009/0026082A1, US 2009/0125889 A1, US 2010/0137543 A1, or US 2010/0282617A1, each of which is incorporated herein by reference. The method for amplifying a target nucleic acid using kinetic exclusion described herein can be easily applied to a substrate for detecting protons. More specifically, the methods set forth herein can be used to generate a clonal population of amplicons for detecting protons.
Another useful sequencing technique is nanopore sequencing (see, e.g., deamer et al, trends Biotechnol.18,147-151 (2000); deamer et al, acc. Chem. Res.35:817-825 (2002); li et al, nat. Mater.2:611-615 (2003), the disclosures of which are incorporated herein by reference). In some nanopore embodiments, the target nucleic acid or individual nucleotides removed from the target nucleic acid pass through the nanopore. Each nucleotide type can be identified by measuring fluctuations in the conductivity of the pore as the nucleic acid or nucleotide passes through the nanopore. (U.S. Pat. No. 7,001,792; soni et al Clin. Chem.53,1996-2001 (2007); healy, nanomed.2,459-481 (2007); cockroft et al J.am. Chem. Soc.130,818-820 (2008), the disclosures of which are incorporated herein by reference).
Exemplary methods applicable to array-based expression and genotyping assays for detection according to the present disclosure are described in the following documents: U.S. Pat. No. 7,582,420, 6,890,741, 6,913,884 or 6,355,431 or U.S. Pat. publication No. 2005/0053980 A1, 2009/0186349 A1 or US 2005/0181440 A1, each of which is incorporated herein by reference.
An advantage of the methods set forth herein is that they provide for rapid and efficient detection of multiple target nucleic acids in parallel. Thus, the present disclosure provides integrated systems that are capable of preparing and detecting nucleic acids using techniques known in the art, such as those exemplified above. Thus, the integrated system of the present disclosure may include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more immobilized DNA fragments, including components such as pumps, valves, reservoirs, fluidic lines, and the like. The flow-through cell may be configured for and/or used to detect a target nucleic acid in an integrated system. Exemplary flow cells are described, for example, in US 2010/011768 A1 and US patent publication 2012/0270305 A1, each of which is incorporated herein by reference. As illustrated for flow cells, one or more fluidic components of the integrated system may be used for amplification methods and detection methods. Taking a nucleic acid sequencing embodiment as an example, one or more fluidic components of an integrated system can be used in the amplification methods set forth herein as well as for delivering sequencing reagents in a sequencing method (such as those exemplified above). Alternatively, the integrated system may comprise a separate fluidic system to perform the amplification method and to perform the detection method. Examples of integrated sequencing systems capable of generating amplified nucleic acids and also determining nucleic acid sequences include, but are not limited to, the MiSeqTM platform (Illumina, inc., san Diego, calif.), and the devices described in U.S. patent publication 2012/0270305, which is incorporated herein by reference.
Examples
Example 1 representative methods for preparing libraries Using fBLT
The method of using Tn5 and fBLT may comprise the following steps.
Tn5 enzyme is complexed with a mutated Mosaic End (ME) transposon that contains a coding DNA base modification (e.g., uracil, 8-oxo G, etc.) near the 3' end of the transfer strand. If desired, the transposon DNA may be biotinylated to facilitate the formation of bead-linked transposomes (BLT, in this case fBLT).
2. The resulting transposomes are used to fragment input DNA (such as genomic DNA).
3. The resulting fragmented DNA is treated with an appropriate enzyme, either (1) an endonuclease or (2) a DNA glycosylase in combination with thermal, alkaline conditions, or an endonuclease/lyase recognizing abasic sites (e.g., USER or Fpg), which causes cleavage of the transfer strand. Depending on the site of the modified base and the nature of the enzyme used for cleavage, some bases of the mosaic end may remain attached to the library fragment.
4. The 3' ends of the library fragments were filled in using DNA polymerase. Depending on the enzyme used, a kinase may also be necessary to ensure proper phosphorylation for ligation.
5. A single A overhang is added to the 3' end of the library fragment using an A-tailed polymerase (e.g., taq, klenow without exonuclease activity).
6. The appropriate library adaptors are ligated to the library insert sequences using DNA ligase.
In this way, the library fragments can be generated using transposition while adaptors are added to the library fragments by ligation. This method allows all or part of the first transposon end to be removed from the fragment prior to ligating the adaptors. In some embodiments, all or a portion of the first transposon end is separated from the remainder of the sample prior to ligation.
Other modifications of the scheme are also possible; for example, alternative strategies, such as chemical methods, may be used to selectively cleave the mosaic ends. Furthermore, the short sequence of the remaining "mosaic ends" can potentially be used to facilitate a robust "sticky end" ligation of >1bp as an alternative to a-tailing of sample DNA and relying on weaker hybridization of single base overhangs between the sample and the adaptors prior to ligation.
EXAMPLE 2 mutation analysis of mosaic Ends with Tn5v3
Mutation screening experiments were performed focusing on the 4 base pairs at the 3' end of the transfer strand. The activity of Tn5v3 with modified transposon ends comprising mutated mosaic end sequences was measured using FRET activity assay.
As shown in fig. 6A, tn5v3 was able to recognize a variety of typical mutations within the mosaic end sequence, with only modest decreases in activity. Even mosaic ends with multiple mutations are tolerated, but the loss of activity is approximately 2-fold. Interestingly, the a18C mutation resulted in poor activity, but activity was rescued when the mutated transfer strand annealed to the wild-type non-transfer strand.
It has been demonstrated that Tn5v3 can tolerate typical mutations within the mosaic end, particularly near the 3' end of the transfer strand, and it has been investigated whether modified bases such as those listed in table 3 are also tolerated. Transposons were prepared with individual uracils, inosines or ribose and assayed using FRET activity assays. All tested ME containing modified base were resistant to Tn5, but the activity was moderately reduced (fig. 6B).
Example 3 preparation and sequencing Using a transposition-ligated library
Based on the finding that uracil, inosine and ribose modifications of ME transfer chains are well tolerated by Tn5v3, library preparation using the fragmenting enzyme-Tn 5 method was attempted using soluble transposomes (also known as solution phase transposomes, such as the method outlined in fig. 7A). The transposomes carrying modified bases are incubated with 1ng lambda DNA (NEB N3011S), based on protocols available for soluble fragment tagging, such as for Illumina DNA The protocols for the preparation of XT DNA libraries are those described in the kit (see the Nextera XT reference guide, documents 770-2012-011). The DNA library is then treated with an appropriate enzyme, i.e., a combination of (1) an endonuclease or (2) a DNA glycosylase with thermal, alkaline conditions, or an endonuclease/lyase recognizing abasic sites, followed by treatment with a T4DNA polymerase to fill the 3' ends of the library fragments. The library was then mixed-processed with Illumina a-tailing and ligation to ligate fork adaptors containing UMI. After PCR amplification, the resulting library was analyzed via a bioanalyzer. All three were modifiedThe base-enzyme pairs of (2) caused library formation, with the USER showing the highest conversion, as evident by fragment peaks between 300-400bp (FIG. 7C).
Example 4: comparison of USER-fragmenting enzyme library with alternative modification sites
uracil-USER pairs (i.e., uracil substitutions in the mosaic end sequences and USER as enzymes for cleavage of the mosaic end sequences after transposition) were selected for further characterization of the library preparation workflow. Together with the wild-type (WT) mosaic end (SEQ ID NO: 1), transposomes with modifications of U16, U17, U18 and U19 at the mosaic end were tested. Electrophoretic analysis of the resulting library showed that the fragment size distribution varied based on the site of modification (fig. 8A). Furthermore, qubit quantification of library yield showed that U16 ME yield was highest and decreased as uracil modification moved closer to the 3' end (fig. 8B). Transposomes were normalized for library preparation by FRET activity, so these differences might be due to variability in USER recognition of these alternative ME substrates. Importantly, wild-type transposomes do not cause library formation, indicating that cleavage of the mosaic end transfer strand is necessary to generate ligation-compatible library fragments, probably due to the gap-filling step with T4DNA polymerase. In other words, adaptors are not ligated to the library fragments unless the mosaic end sequences are cleaved.
The USER enzyme mixture works by excision of uracil bases, so for these modification sites, it is expected that 0-3 bases of the mosaic end will remain in the resulting library. After sequencing the U16, U17 and U18 libraries, evidence of this ME "scar" adjacent to the library insert was assessed. The UMI ligation adaptors used in this study contained variable 6-7 base pair UMI sequences adjacent to the "T" overhang, so the distribution of library fragments was expected to be offset by 1 base pair. Sampling 100,000 sequences from each library type showed the expected sequence characteristics for each ME modification site (fig. 9).
Example 5 Small bead-ligated transposomes of the fragment enzyme (fBLT)
Bead-linked transposomes are typically prepared by biotinylation of transposon DNA, which enables the resulting transposomes to bind to streptavidin beads. Initial efforts to fix the U16 transposomes resulted in significantly lower BLT activity than expected (data not shown). Based on this finding, a mixed transposon consisting of a U16-transferred strand and a wild type non-transferred strand was used for pilot studies due to the improvement of performance on BLT. These fBLT were loaded at a transposome density of 66 active units per μl (AU/μl) to achieve a library fragment distribution similar to the enriched BLT (eBLT) used in Illumina DNA Prep for enrichment.
Preliminary studies were performed to assess the feasibility of fBLT-based library preparation. The input DNA consisted of equimolar concentrations of NA12877 and NA12878 human gDNA in admixture with SspI linearized phiX DNA (approximately 15,000 genomic copies each, 50ng human gDNA equivalent).
A library of fragmenting enzyme-BLT was prepared using a streamlined workflow, in which the USER cleavage, end repair and A-tailing steps were combined (FIG. 10A). For comparison, libraries were also prepared with eBLT according to the protocol set forth in RNA Prep (1000000048041) with enriched reference guide. The resulting fBLT library had a median library yield of about 300ng and similar fragment size distribution compared to the eBLT library (fig. 10B and 10C). The slightly larger fragment size of the fBLT library could potentially be attributed to the 0.8X SPRI used after adaptor ligation.
After library preparation, the library was enriched according to the enrichment protocol set forth in RNA Prep (Illumina file number: 1000000124435) with enrichment reference guide using a pooled set consisting of TruSight cancer and a custom set of whole genomes targeting PhiX. The library was sequenced and the FASTQ file was trimmed to remove the UMI sequence from the fBLT samples. Comparative analysis of fBLT and eBLT was performed using Dragen Enrichment v3.7.5 in the absence of UMI to characterize the performance of the TruSight cancer group. Duplicate samples of each library type were analyzed, consisting of 10% NA12877 in NA12878 background. The fBLT library showed similar behavior to the eBLT library, but with lower average target coverage depth (table 4). fBLT performance has the potential to be improved by workflow and BLT optimization.
In table 4, the data are reported as the average of two replicates. The samples included 10% NA12877 gDNA doping in NA12878gDNA background and enriched using TruSight cancer group.
This method allows for enzymatic methods for DNA sample fragmentation for NGS library preparation, wherein the resulting fragments can be used for ligation of adaptors. The user's benefits are the elimination of the need to purchase expensive sonicators as the immobilization device and the ease and speed of using high throughput enzymatic methods for fragmentation of sample nucleic acids. The fBLT technology takes advantage of the unique benefits of the BLT technology and extends its compatibility to include and reuse a variety of ligation-based approaches. The key innovation to make this progress possible is the incorporation of mutations into the mosaic terminal sequence to produce modified bases that allow site-specific cleavage of the transferred first transposon end, while maintaining recognition of Tn 5. By decoupling the enzymatic fragmentation and adaptor tagging steps in the library preparation scheme, features such as fork adaptors, barcodes and UMI can be enabled to be added while maintaining compatibility with standard sequencing methods. Based on these unique advantages, fBLT can be used in a variety of applications such as UMI library preparation and PCR-free library preparation.
Example 6 optimization of conditions for the method with fBLT
Multiple fblts containing different modified transposon ends were examined. In contrast to SEQ ID NO. 1, the different modified transposon ends comprise substitutions at positions A16, C17, A18 or G19 within the mosaic end sequences.
Bead-linked transposomes (fBLT) carrying modified mosaic ends were incubated with 10-50ng of human sample DNA based on protocols available for labeling of BLT fragments, such as those described for Illumina DNA Prep for kits for preparation with enriched libraries (see Illumina DNA Prep with enriched reference guidelines, document 1000000048041). Subsequently, the DNA library is treated with an appropriate DNA endonuclease to cleave the mosaic ends. The samples were then subjected to a ligation-based library preparation workflow. The fragmented sample DNA was treated with Illumina end repair, a-tailing and ligation reagents to effect adaptor ligation.
The results of library transformation with different fblts are shown in figure 12. Experiments were performed to directly assess BLT activity without downstream sequencing. The inosine, oxo-guanine and uracil mutations at position a16 all resulted in lower BLT activity compared to the same mutations at positions C17, a18 and G19. Thus, while the modification at position A16 of SEQ ID NO. 1 is well tolerated in soluble transposomes, the modification at other positions results in higher BLT activity. Thus, transposon ends with modifications at position A16 may be of higher value for methods using soluble transposomes.
A number of different fblts with inosine, oxo-guanine or uracil mutations at positions C17, a18 and G19 were then assessed as shown in fig. 13A to 13C. The data show that inosine modification has the best performance as measured by library transformation efficiency and variant call metrics. In particular, the G19I (I19) modification has high performance and can be used with biotinylated non-transfer chains to allow immobilization on fBLT (as outlined in fig. 11).
Although G19I shows excellent properties, G19U (U19) resulted in a relatively high number of chimeric reads, with portions of the reads mapped to different chromosomes associated with a18I (I18) and C17O (O17) modifications (fig. 14). These chimeric reads may be due to a variety of potential factors, such as the performance of different endonucleases (e.g., uracil cleavage by USER reagents may not be as robust as cleavage of other modified nucleotides by their respective endonucleases). Chimeric reads are undesirable sequencing artifacts, so for certain methods, it may be preferable for the USER to avoid uracil modifications (and subsequent cleavage of the chimeric terminal sequences with USER reagents) to reduce the risk of chimeric reads.
Taken together, these data indicate that a18 and G19 modifications, such as G19I and a18I, can exhibit high activity with fBLT.
Example 7 comparison of fBLT with other fragmentation methods
Fragmentation with A18I (I18) fBLT and via using the workflow shown in FIG. 15AdsDNA/>Or fragmentation by sonication.
Samples were sheared sonicated using a Covaris sonicator (model LE 220) according to the manufacturer's recommended protocol designed for 175bp fragments (see, e.g., quick Guide to DNA Shearing with LE, covaris, month 5 of 2020).
Sample fragmentation with the NEBNExt dsDNA-fragmenting enzyme was performed according to the manufacturer's protocol and reagents. A time course study was performed using a variety of samples of interest to predetermine optimal sample incubation conditions. Incubation for 10 minutes at room temperature (about 20 ℃) is the best single condition available for all samples of interest.
Fragmentation with fBLT was performed as described in example 6.
All fragments were end repaired and a-tailed in the same manner using Illumina reagent. The same pool containing the UMI fork was ligated with fragments prepared by each method. This outlines the advantages of the fBLT method of the present invention in that it can use pre-existing adaptors that have been developed for other types of ligation-based library preparation.
After PCR amplification, the resulting library was enriched using the TruSight cancer group according to the enrichment protocol set forth in RNA Prep (Illumina file number: 1000000124435) with enrichment reference guidelines.
The sample used for evaluation was 50ng of input genomic DNA (gDNA), 1% mixture of NA12877 in NA12878 background (50 ng input). Based on this mixture, 84 expected heterozygous variants (resulting in a Variant Allele Frequency (VAF) of 0.5%) were present. The results are shown in FIG. 15B, anddsDNA/>or the fBLT method shows higher sensitivity and specificity compared to the sonication protocol.
Error rates with different fragmentation methods were also evaluated. As shown in FIG. 16, with fBLT ordsDNA/>Significantly higher duplex error rates and single strand forward error rates were observed for the sonicated samples compared to the prepared samples. Such an increase in error rate generally indicates that there is more noise in the data, i.e., more variability in the sequenced library fragments.
Chain G of samples prepared using fBLT>T error rate of 1.4X10 -5 Whereas the error rate of the samples prepared via sonication was 70×10 -5 . These data indicate false positive G for the fBLT method compared to the sonication method>The T-transversion is reduced by about 50 times. The improvement in error rate of fBLT may be because the fBLT method avoids oxidative damage to guanine that may be induced by sonication.
FIG. 17 shows that in a series of samples, the fBLT method is superior to enzymatic dsDNA/>Methods, and gives library transformation efficiencies similar to sonication protocols.
Therefore, fBLT is a method of preparing a library that has the advantage of improved sensitivity/specificity and reduced error rate compared to other fragmented library preparation methods currently in use. Sample 1 in fig. 17 represents a genomic DNA sample, while samples 2-6 represent formalin-fixed paraffin embedded (FFPE) samples. The higher conversion efficiency of sample 1 compared to other samples is a function of the higher quality of DNA in the genomic DNA sample compared to the FFPE sample, as is well known in the art. An increase in dCq for samples 2-6 showed that the sample quality was worse for the higher numbered samples and higher for sample 1 with genomic DNA.
FIG. 19 summarizes some of the advantages of the fBLT method, including that a user may select a variety of different adaptors for ligation onto fragments prepared with fBLT based on a preferred downstream workflow. For example, a user desiring a streamlined workflow may use indexed fork adaptors, which may avoid downstream PCR incorporation into the index sequence. Similarly, a user desiring to call for different fragments with high sensitivity may use an adaptor comprising UMI (UMI adaptor) so that amplicons of the same fragment can be identified from sequencing results after PCR amplification. Thus, fBLT can combine the advantages of fragment tagging for library preparation with the flexibility of ligation-based schemes, where a variety of different adaptors can be incorporated into library fragments.
EXAMPLE 8 fBLT for use with formalin fixed Paraffin-embedded samples
A protocol was developed for preparing library fragments from formalin-fixed paraffin-embedded (FFPE) samples using fBLT. FFPE samples may contain critical information, such as profiles from tumor samples, but FFPE material is typically highly fragmented, which can interfere with standard library preparation protocols.
As shown in fig. 18A, DNA is typically partially fragmented in FFPE tissue. Standard fragment tagging protocol requires 2 fragment tagging events per library fragment (i.e., one fragment tagging event at each end of the fragment). However, due to this partial fragmentation of the starting DNA in FFPE samples, 2 fragmentation events with DNA from FFPE tissue can result in a high rate of very small fragments that are not desired for sequencing.
In contrast, fBLT can be used to prepare single-fragment tagged fragments, i.e., fragments in which fBLT tags only one end of the fragment, can be rescued by ligation of adaptors. After cleavage of the mosaic ends, both ends of the fragment can be repaired and ligated to the adaptors. Thus library fragments from FFPE tissue can be generated by a single fragment tagging event followed by ligation at both ends of the fragments, resulting in rescue of the fragments.
As shown in fig. 18B, fragments prepared from DNA within FFPE can be rescued when prepared by a single fragment tagging event by fBLT. Thus, the fBLT workflow may improve library preparation from FFPE tissue and other samples that may contain partially fragmented DNA.
Example 9 workflow of fBLT library preparation and optional enrichment
Based on optimization experiments, a preliminary workflow of fBLT library preparation followed by enrichment was developed. This workflow is shown in fig. 20. In general, after labelling with BLT fragments, the fragment tagged products are cleaned and then the Mosaic Ends (ME) are cleaved. After end repair and a-tailing, the adaptors are ligated (where the user may select adaptors, such as adaptors comprising UMI). The user can then perform Solid Phase Reversible Immobilization (SPRI) bead purification, followed by index PCR and another SPRI bead purification, if desired. Such a workflow may take about 5.5 hours. The workflow time is similar to other ligation-based library preparation schemes.
If the user wishes to enrich the library, this can be done, such as hybridization, followed by capture. Such methods may take about 5 hours.
Equivalent content
The above written description is considered to be sufficient to enable one skilled in the art to practice the embodiments. The foregoing detailed description and examples detail certain embodiments and describe the best mode contemplated by the inventors. It should be understood, however, that no matter how detailed the foregoing may be described in text, the embodiments may be practiced in many ways and should be interpreted according to the appended claims and any equivalents of the appended claims.
As used herein, the term "about" refers to a value, including, for example, integers, fractions and percentages, whether or not explicitly indicated. The term "about" generally refers to a range of values (e.g., +/-5-10% of the range recited) that one of ordinary skill in the art would consider equal to the recited value (e.g., having the same function or result). When a term such as "at least" and "about" precedes a list of numerical values or ranges, the term modifies all values or ranges provided in the list. In some cases, the term "about" may include numerical values rounded to the nearest significant figure.
Sequence listing
<110> ILLUMINA Co
<120> improved library preparation method
<130> 01243-0027-00PCT
<150> US 63/167,150
<151> 2021-03-29
<150> US 63/224,201
<151> 2021-07-21
<160> 22
<170> patent In version 3.5
<210> 1
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> Mosaic End (ME) sequence
<400> 1
agatgtgtat aagagacag 19
<210> 2
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> external end (OE) sequence
<400> 2
ctgactctta tacacaagt 19
<210> 3
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> Inner (IE) sequence
<400> 3
ctgtctcttg atcagatct 19
<210> 4
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> mosaic Ends (ME') (non-transfer Strand)
<400> 4
ctgtctctta tacacatct 19
<210> 5
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> modified ME with A16U substitution (transfer chain)
<400> 5
agatgtgtat aagagucag 19
<210> 6
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> modified ME' (non-transfer chain)
<400> 6
ctgactctta tacacatct 19
<210> 7
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> modified ME with C17U substitution (transfer chain)
<400> 7
agatgtgtat aagagauag 19
<210> 8
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> modified ME' (non-transfer chain)
<400> 8
ctatctctta tacacatct 19
<210> 9
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> modified ME with A18U substitution (transfer chain)
<400> 9
agatgtgtat aagagacug 19
<210> 10
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> modified ME' (non-transfer chain) with T18A substitution
<400> 10
cagtctctta tacacatct 19
<210> 11
<211> 14
<212> DNA
<213> artificial sequence
<220>
<223> A14 primer sequence
<400> 11
tcgtcggcag cgtc 14
<210> 12
<211> 15
<212> DNA
<213> artificial sequence
<220>
<223> B15 primer sequence
<400> 12
gtctcgtggg ctcgg 15
<210> 13
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> biotinylated ME
<220>
<221> modified base
<222> (1)..(1)
<223> 5' phosphoric acid
<220>
<221> modified base
<222> (19)..(19)
<223> 3 Biotin
<400> 13
ctgtctctta tacacatct 19
<210> 14
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> I19 TS, modified ME with G19I substitution
<220>
<221> feature not yet classified
<222> (19)..(19)
<223> n is inosine
<400> 14
agatgtgtat aagagacan 19
<210> 15
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> U19 TS, modified ME with G19U substitution
<400> 15
agatgtgtat aagagacau 19
<210> 16
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> O16 TS, modified ME with A16O substitution
<220>
<221> modified base
<222> (16)..(16)
<223> 8-oxo-guanine
<400> 16
agatgtgtat aagaggcag 19
<210> 17
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> O17 TS, modified ME with C17O substitution
<220>
<221> modified base
<222> (17)..(17)
<223> 8-oxo-guanine
<400> 17
agatgtgtat aagagagag 19
<210> 18
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> O18 TS, modified ME with A18O substitution
<220>
<221> modified base
<222> (18)..(18)
<223> 8-oxo-guanine
<400> 18
agatgtgtat aagagacgg 19
<210> 19
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> O19 TS, modified ME with G19O substitution
<220>
<221> modified base
<222> (19)..(19)
<223> 8-oxo-guanine
<400> 19
agatgtgtat aagagacag 19
<210> 20
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> I16 TS, modified ME with A16I substitution
<220>
<221> feature not yet classified
<222> (16)..(16)
<223> n is inosine
<400> 20
agatgtgtat aagagncag 19
<210> 21
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> I17 TS, modified ME with C17I substitution
<220>
<221> feature not yet classified
<222> (17)..(17)
<223> n is inosine
<400> 21
agatgtgtat aagaganag 19
<210> 22
<211> 19
<212> DNA
<213> artificial sequence
<220>
<223> I18 TS, modified ME with A18I substitution
<220>
<221> feature not yet classified
<222> (18)..(18)
<223> n is inosine
<400> 22
agatgtgtat aagagacng 19

Claims (113)

1. A modified transposon end sequence comprising a mosaic end sequence, wherein the mosaic end sequence comprises one or more mutations compared to a wild type mosaic end sequence, wherein the mutations comprise substitutions using:
a. uracil;
b. Inosine;
c. ribose;
d.8-oxoguanine;
e. thymine glycol;
f. a modified purine; or (b)
g. Modified pyrimidines.
2. The modified transposon end sequence of claim 1, wherein the wild type mosaic end sequence comprises SEQ ID No. 1, and further wherein the one or more mutations comprise substitutions at a16, C17, a18 and/or G19.
3. The modified transposon end sequence of claims 1-2, wherein the mosaic end sequence comprises no more than 8 mutations compared to the wild type sequence.
4. The modified transposon end sequence of claim 2, wherein the mosaic end sequence comprises one or more mutations in addition to the one or more mutations at a16, C17, a18 and/or G19 compared to SEQ ID No. 1.
5. The modified transposon end sequence of claim 2, wherein the mosaic end sequence comprises one to four substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
6. The modified transposon end of claim 2, wherein the mosaic end sequence has one substitution mutation in addition to the one or more mutations at a16, C17, a18 and/or G19 compared to SEQ ID No. 1.
7. The modified transposon end of claim 2, wherein the mosaic end sequence has two substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
8. The modified transposon end of claim 2, wherein the mosaic end sequence has three substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
9. The modified transposon end of claim 2, wherein the mosaic end sequence has four substitution mutations compared to SEQ ID No. 1 in addition to the one or more mutations at a16, C17, a18 and/or G19.
10. The modified transposon end sequence of any one of claims 2 to 9, wherein:
the substitution at a16 is a16T, A16C, A16G, A16U, A inosine, a16 ribose, a 16-8-oxoguanine, a16 thymine glycol, a16 modified purine, or a16 modified pyrimidine;
the substitution at C17 is C17T, C17A, C17G, C17U, C inosine, C17 ribose, C17-8-oxoguanine, C17 thymine glycol, C17 modified purine, or C17 modified pyrimidine;
The substitution at a18 is a18G, A18T, A18C, A18U, A inosine, a18 ribose, a 18-8-oxoguanine, a18 thymine glycol, a18 modified purine, or a18 modified pyrimidine; and/or
The substitution at G19 is G19T, G19C, G19A, G19U, G inosine, G19 ribose, G19-8-oxoguanine, G19 thymine glycol, G19 modified purine, or G19 modified pyrimidine.
11. The modified transposon end sequence of any one of claims 2 to 9, wherein the mutation comprises a substitution using:
a. uracil;
b. inosine;
c. ribose;
d.8-oxoguanine
e. Thymine glycol;
f. a modified purine; and/or
g. Modified pyrimidines.
12. The modified transposon end sequence of any one of claims 2 to 11, wherein the modified transposon end sequence comprises a mutation at a16, C17, a18 or G19.
13. The modified transposon end sequence of any one of claims 2 to 11, wherein the modified transposon end sequence comprises two mutations selected from the group consisting of mutations at a16, C17, a18, or G19.
14. The modified transposon end sequence of any one of claims 2 to 11, wherein the modified transposon end sequence comprises three mutations selected from the group consisting of mutations at a16, C17, a18, or G19.
15. The modified transposon end sequence of any one of claims 2 to 11, wherein the modified transposon end sequence comprises four mutations at a16, C17, a18 and G19.
16. The modified transposon end of any one of claims 2 to 11, wherein the modified transposon end sequence has one to four substitution mutations at a16, C17, a18 and/or G19 compared to SEQ ID No. 1.
17. The modified transposon end of any one of claims 1 to 11, wherein the modified transposon end sequence has a substitution mutation compared to the wild type sequence.
18. The modified transposon end of any one of claims 1-11, wherein the modified transposon end sequence has two substitution mutations compared to the wild type sequence.
19. The modified transposon end of any one of claims 1-11, wherein the modified transposon end sequence has three substitution mutations compared to the wild type sequence.
20. The modified transposon end of any one of claims 1-11, wherein the modified transposon end sequence has four substitution mutations compared to the wild type sequence.
21. The modified transposon end of any one of claims 1 to 20, wherein the modified purine is 3-methyladenine or 7-methylguanine.
22. The modified transposon end of any one of claims 1-20, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
23. A transposome complex, the transposome complex comprising:
a. a transposase;
b. a first transposon comprising a modified transposon end sequence comprising uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine; and
c. a second transposon comprising a second transposon end sequence complementary to at least a portion of the first transposon end sequence.
24. The transposome complex of claim 23 wherein the first transposon comprises ribose, uracil, inosine, 8-oxoguanine, thymine diol, a modified purine, and/or a modified pyrimidine, and the transposome complex is in solution.
25. The transposome complex of claim 23 wherein the first transposon comprises uracil, inosine, 8-oxoguanine, thymine diol, a modified purine, and/or a modified pyrimidine, and the transposome complex is immobilized on a solid support.
26. A transposome complex as defined in any one of claims 23 to 25 wherein the first transposon comprises the modified transposon end sequence of any one of claims 1-22.
27. The transposome complex of any one of claims 23-26 wherein the transposase is Tn5.
28. The transposome complex of any one of claims 23-27 wherein the first transposon is a transfer strand.
29. The transposome complex of any one of claims 23-28 wherein the second transposon is a non-transferred strand.
30. The transposome complex of any one of claims 23-29 wherein uracil in the first transposon base pairs with a in the second transposon.
31. The transposome complex of any one of claims 23-30 wherein inosine in the first transposon base pairs with C in the second transposon.
32. A transposome complex as claimed in any one of claims 23 to 31 wherein ribose in the first transposon base pairs with A, C, T or G in the second transposon.
33. The transposome complex of any one of claims 23-32 wherein thymine diol in the first transposon base pairs with a in the second transposon.
34. The transposome complex of any one of claims 23-33 wherein the modified purine in the first transposon is 3-methyladenine base paired with T in the second transposon.
35. The transposome complex of any one of claims 23-34 wherein the modified purine in the first transposon is 7-methylguanine base paired with C in the second transposon.
36. A transposome complex as defined in any one of claims 23 to 34 wherein the modified pyrimidine in the first transposon is a 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine base paired with a G base in the second transposon.
37. The transposome complex of any one of claims 23-36, wherein the first transposon or the second transposon comprises an affinity element.
38. The transposome complex of claim 37, wherein the first transposon comprises an affinity element.
39. The transposome complex of claim 38, wherein the affinity element is attached to the 5' end of the first transposon.
40. The transposome complex of claim 38 or 39, wherein the first transposon included in the targeted transposome complex comprises a linker.
41. The transposome complex of claim 40 wherein the linker has a first end attached to the 5' end of the first transposon and a second end attached to an affinity element.
42. The transposome complex of claim 37, wherein the second transposon comprises an affinity element.
43. The transposome complex of claim 42 wherein the affinity element is attached to the 3' end of the second transposon.
44. The transposome complex of claim 43 wherein the second transposon comprises SEQ ID No. 13.
45. The transposome complex of claim 44 wherein the second transposon comprises a linker.
46. The transposome complex of claim 45 wherein the linker has a first end attached to the 3' end of the second transposon and a second end attached to an affinity element.
47. The transposome complex of any one of claims 37-46, wherein the affinity element comprises biotin, avidin, streptavidin, an antibody, or an oligonucleotide.
48. The transposome complex of any one of claims 23-47, wherein the second transposon comprises:
a. a second transposon end sequence complementary to SEQ ID No. 1; or (b)
b. A second transposon end that is fully complementary to the first transposon end.
49. The transposome complex of claim 48 wherein the first transposon comprises a modified transposon end sequence comprising an a16U, A16-8-oxoguanine or a16 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
50. The transposome complex of claim 48 wherein the first transposon comprises a modified transposon end sequence comprising a C17-8-oxoguanine or C17 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
51. The transposome complex of claim 48 wherein the first transposon comprises a modified transposon end sequence comprising an a 18-8-oxoguanine or a18 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
52. The transposome complex of claim 48 wherein the first transposon comprises a modified transposon end sequence comprising a G19-8-oxoguanine or G19 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
53. The transposome complex as defined in any one of claims 23-52, wherein the transposome complex is in solution.
54. A solid support having immobilized thereon a transposome complex according to any one of claims 23-52.
55. A method of fragmenting double-stranded nucleic acids, the method comprising combining a sample comprising double-stranded nucleic acids with a transposome complex as defined in any one of claims 23 to 53 or a solid support as defined in claim 54, and preparing fragments.
56. A method of preparing a double stranded nucleic acid fragment lacking all or a portion of the first transposon end, the method comprising:
a. combining a sample comprising nucleic acid with a transposome complex of any one of claims 23-53 or with a solid support of claim 54, and preparing a fragment; and
b. combining the sample with (1) an endonuclease or (2) a DNA glycosylase and thermal, alkaline conditions, or a combination of endonucleases/lyases that recognize abasic sites, and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine, a modified purine, and/or a modified pyrimidine within the mosaic sequence to remove all or a portion of the first transposon end from the fragment.
57. The method of claim 56, wherein the modified purine is 3-methyladenine or 7-methylguanine.
58. A method according to claim 56, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
59. The method of claim 57 or 58, further comprising sequencing the fragment after removing all or a portion of the first transposon end from the fragment.
60. The method of claim 59, wherein the method does not require amplification of the fragment prior to sequencing.
61. The method of claim 59, wherein the fragments are amplified prior to sequencing.
62. The method of any one of claims 59 to 61, further comprising enriching for fragments of interest after ligating adaptors and prior to sequencing.
63. A method of preparing a double-stranded nucleic acid fragment comprising an adaptor, the method comprising:
a. combining a sample comprising nucleic acid with a transposome complex of any one of claims 23-53 or with a solid support of claim 54, and preparing a fragment;
b. combining the sample with (1) an endonuclease or (2) a DNA glycosylase and thermal, alkaline conditions, or a combination of endonucleases/lyases that recognize abasic sites, and cleaving the first transposon end at uracil, inosine, ribose, 8-oxoguanine, thymine diol, a modified purine, and/or a modified pyrimidine within the mosaic end sequence to remove all or a portion of the first transposon end from the fragment; and
c. adaptors are ligated to the 5 'and/or 3' ends of the fragments.
64. The method according to claim 63, wherein the modified purine is 3-methyladenine or 7-methylguanine.
65. The method of claim 63, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
66. The method of any one of claims 56-65, wherein the nucleic acid is double stranded DNA.
67. The method of any one of claims 56-65, wherein the nucleic acid is RNA and a double-stranded cDNA or DNA: RNA duplex is generated prior to combining with the transposome complex.
68. The method of any one of claims 56 to 67, wherein all or a portion of the first transposon end that is lysed is separated from the remainder of the sample.
69. The method of any one of claims 63 to 68, further comprising filling the 3 'end of the fragment and phosphorylating the 3' end of the fragment with a kinase prior to ligation.
70. The method of claim 69, wherein the filling is with T4 DNA polymerase.
71. The method of claim 70, further comprising adding a single a overhang to the 3' end of the fragment.
72. The method of claim 71, wherein a polymerase adds the single a overhang.
73. The method of claim 72, wherein the polymerase is (i) Taq or (ii) a Klenow fragment, without exonuclease activity.
74. The method of any one of claims 56-73, wherein the fragment comprises 0-3 bases of the mosaic terminal sequence.
75. The method of any one of claims 56-74, wherein preparing a fragment results in preparing at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the number of fragments as compared to preparing a fragment with a transposon complex comprising a first transposon, the first transposon comprising a transposon end sequence comprising a wild type mosaic end sequence comprising SEQ ID No. 1.
76. The method of any one of claims 63-75, further comprising sequencing the fragment after ligating the adaptors.
77. The method of claim 76, wherein the method does not require amplification of the fragment prior to sequencing.
78. The method of claim 77, wherein the fragments are amplified prior to sequencing.
79. The method of any one of claims 76 to 78, further comprising enriching for fragments of interest after ligating the adaptors and prior to sequencing.
80. The method of any one of claims 56 to 79, wherein the modified transposon end sequence comprises uracil and the combination of DNA glycosylase and endonuclease/lyase recognizing abasic sites is Uracil Specific Excision Reagent (USER).
81. The method of claim 80, wherein the USER is a mixture of uracil DNA glycosylase and endonuclease VIII or endonuclease III.
82. The method of any one of claims 56 to 79, wherein the modified transposon end sequence comprises inosine and the endonuclease is endonuclease V.
83. The method of any one of claims 56 to 79, wherein the modified transposon end sequence comprises ribose and the endonuclease is rnase HII.
84. The method of any one of claims 56-79, wherein the modified transposon end sequence comprises 8-oxoguanine and the endonuclease is a carboxamide pyrimidine-DNA glycosylase (FPG) or oxoguanine glycosylase (OGG).
85. The method of any one of claims 56 to 79, wherein the modified transposon end sequence comprises thymine diol and the DNA glycosylase is the endonuclease EndoIII (Nth) or endoviii.
86. The method of any one of claims 56-79, wherein the modified transposon end sequence comprises a modified purine and the DNA glycosylase is a human 3-alkyladenine DNA glycosylase and the endonuclease is endonuclease III or endonuclease VIII.
87. The method of claim 86, wherein the modified purine is 3-methyladenine or 7-methylguanine.
88. The method of any one of claims 56 to 79, wherein the modified transposon end sequence comprises a modified pyrimidine and:
a. the DNA glycosylase is thymine-DNA glycosylase (TDG) or mammalian DNA glycosylase-methyl-CpG binding domain protein 4 (MBD 4), and the endonuclease/lyase recognizing abasic sites is endonuclease III or endonuclease VIII; or alternatively
b. The endonuclease is a DNA glycosylase/lyase ROS1 (ROS 1).
89. The method of claim 88, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
90. The method of any one of claims 56-89, wherein the first transposon comprises a modified transposon end sequence comprising more than one mutation selected from uracil, inosine, ribose, 8-oxoguanine, thymine diol, a modified purine, or a modified pyrimidine, and the combination of (1) an endonuclease or (2) a DNA glycosylase and a heat, alkaline conditions, or an endonuclease/lyase that recognizes abasic sites is an enzyme mixture.
91. The method of claim 90, wherein the modified purine is 3-methyladenine or 7-methylguanine.
92. The method of claim 90, wherein the modified pyrimidine is 5-methylcytosine, 5-formylcytosine, or 5-carboxycytosine.
93. The method of any one of claims 63-92, wherein cleaving the first transposon end generates a cohesive end for ligating the adaptors.
94. The method of claim 93, wherein the sticky end is longer than one base.
95. The method of any one of claims 63-94, wherein the adaptors comprise double-stranded adaptors.
96. The method of any one of claims 63-95, wherein adaptors are added to the 5 'and 3' ends of the fragments.
97. The method of claim 96, wherein adaptors added to the 5 'and 3' ends of the fragments are different.
98. The method of any one of claims 63-97, wherein the adapter comprises a Unique Molecular Identifier (UMI), a primer sequence, an anchor sequence, a universal sequence, a spacer, an index sequence, a capture sequence, a barcode sequence, a cleavage sequence, a sequencing-related sequence, and combinations thereof.
99. The method of any one of claims 98, wherein the adapter comprises UMI.
100. The method of claim 99, wherein adaptors comprising UMI are ligated to the 3 'and 5' ends of the fragments.
101. The method of any one of claims 63-100, wherein the adapter is a fork adapter.
102. The method of any one of claims 63-101, wherein the ligating is performed with a DNA ligase.
103. The process of any one of claims 63 to 102, wherein the process is carried out in a single reaction vessel.
104. The method of any one of claims 56 to 103, wherein the density of transposomes immobilized on the solid surface is selected to modulate fragment size and library yield of the immobilized fragments.
105. The method of any one of claims 56 to 104, wherein the method allows for bead-based normalization.
106. The method of any one of claims 56-105, wherein the sample comprises partially fragmented DNA.
107. The method of any one of claims 56 to 106, wherein the sample is formalin-fixed paraffin-embedded tissue or cell free DNA.
108. The method of any one of claims 56-107, wherein the library comprises fragments prepared by a single fragment tagging event.
109. A transposon pair having a first transposon and a second transposon, wherein the first transposon comprises the modified transposon end sequence of any one of claims 1 to 22, and wherein the second transposon comprises:
a. a transposon end sequence comprising a mosaic end sequence complementary to the wild type mosaic end sequence; or (b)
b. A transposon end sequence that is fully complementary to the first transposon end.
110. The pair of transposons of claim 109, wherein the first transposon comprises a modified transposon end sequence comprising an a16U, A16-8-oxo guanine or a16 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
111. The pair of transposons according to claim 109, wherein the first transposon comprises a modified transposon end sequence comprising a C17-8-oxoguanine or C17 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
112. The pair of transposons of claim 109, wherein the first transposon comprises a modified transposon end sequence comprising an a 18-8-oxoguanine or a18 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
113. The pair of transposons of claim 109, wherein the first transposon comprises a modified transposon end sequence comprising a G19-8-oxoguanine or G19 inosine substitution compared to SEQ ID No. 1 and the second transposon comprises a second transposon end sequence complementary to SEQ ID No. 1 or a second transposon end fully complementary to the first transposon end.
CN202280024450.7A 2021-03-29 2022-03-28 Improved library preparation method Pending CN117062910A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/167,150 2021-03-29
US202163224201P 2021-07-21 2021-07-21
US63/224,201 2021-07-21
PCT/US2022/022167 WO2022212269A1 (en) 2021-03-29 2022-03-28 Improved methods of library preparation

Publications (1)

Publication Number Publication Date
CN117062910A true CN117062910A (en) 2023-11-14

Family

ID=88657681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280024450.7A Pending CN117062910A (en) 2021-03-29 2022-03-28 Improved library preparation method

Country Status (1)

Country Link
CN (1) CN117062910A (en)

Similar Documents

Publication Publication Date Title
CN109415758B (en) Tagmentation using adaptor-containing immobilized transposomes
KR20230161979A (en) Improved library manufacturing methods
US20220403376A1 (en) Surface-Based Tagmentation
EP3981884B1 (en) Single cell whole genome libraries for methylation sequencing
AU2021257967A1 (en) Methods and compositions for preparing sequencing libraries
US11634765B2 (en) Methods and compositions for paired end sequencing using a single surface primer
KR20210069737A (en) Sample preparation on a solid support
EP3260554B1 (en) Methods, compositions, systems, apparatuses and kits for nucleic acid paired end sequencing
CN116438319A (en) Sequencing templates comprising multiple inserts, compositions and methods for improving sequencing throughput
KR20210114918A (en) complex surface-bound transposomal complex
CA3168144A1 (en) Methods of targeted sequencing
CN112654718A (en) Methods and compositions for cluster generation by bridge amplification
CN116323971A (en) Sequence-specific targeted transposition and selection and sorting of nucleic acids
KR20230161955A (en) Improved methods for isothermal complementary DNA and library preparation
CN117062910A (en) Improved library preparation method
RU2798952C2 (en) Obtaining a nucleic acid library using electrophoresis
US20240026348A1 (en) Methods of Preparing Directional Tagmentation Sequencing Libraries Using Transposon-Based Technology with Unique Molecular Identifiers for Error Correction
CN117651767A (en) Improved methods for isothermal complementary DNA and library preparation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination