WO2020180813A1 - Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing - Google Patents

Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing Download PDF

Info

Publication number
WO2020180813A1
WO2020180813A1 PCT/US2020/020694 US2020020694W WO2020180813A1 WO 2020180813 A1 WO2020180813 A1 WO 2020180813A1 US 2020020694 W US2020020694 W US 2020020694W WO 2020180813 A1 WO2020180813 A1 WO 2020180813A1
Authority
WO
WIPO (PCT)
Prior art keywords
adaptor
sequencing
nucleic acid
binding site
stranded
Prior art date
Application number
PCT/US2020/020694
Other languages
French (fr)
Inventor
Yanhong Tong
Thomas PERROUD
Dietrich Wilhelm Karl Lueerssen
Original Assignee
Qiagen Sciences, Llc
Qiagen Manchester Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qiagen Sciences, Llc, Qiagen Manchester Ltd. filed Critical Qiagen Sciences, Llc
Publication of WO2020180813A1 publication Critical patent/WO2020180813A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors

Definitions

  • NGS Next generation sequencing
  • Rolonies rolling circle colonies
  • amplification of a circularized DNA fragment offer certain advantages as a template for sequencing, including high image efficiency due to bright signals from hundreds of reaction sites being in the compact rolony, reduced reagent consumption due to the compactness of rolonies allowing for high density arrays, and improved sequencing accuracy.
  • the present disclosure provides adaptors, kits, and methods for nucleic acid library construction for rolony -based sequencing.
  • the present disclosure provides a method of producing a library of circular, single-stranded nucleic acid templates, each circular, single-stranded nucleic acid template comprising a strand of a double-stranded target nucleic acid, a strand of a first adaptor or the complement thereof, and a strand of a second adaptor or the complement thereof, the method comprising:
  • first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises:
  • a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
  • UMI unique molecular identifier
  • a double stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c. adding a second adaptor to a 3’ terminus of the sense strand and to a 5’ terminus of the antisense strand of the plurality of fragments of double-stranded nucleic acids to produce a library of linear, double-stranded nucleic acid templates, wherein the second adaptor comprises a second universal primer binding site, and wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site; d. optionally amplifying the library of linear, double-stranded nucleic acid templates with a first universal primer that binds to the first primer binding site and a second universal primer that binds to the second primer binding site;
  • the present disclosure provides a set of partially double- stranded adaptors for producing a library of circular, single-stranded nucleic acid templates
  • each adaptor of the set comprises:
  • a single-stranded region comprising a first sequencing primer binding site, a unique molecular identifier (UMI), and a sample index sequence, wherein the first sequencing primer binding site further comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
  • UMI unique molecular identifier
  • kits for producing a library of circular, single-stranded nucleic acid templates comprising:
  • a second adaptor comprising a second universal primer binding site, wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site, and
  • FIG. 1 shows an exemplary first adaptor scheme, including a linker, molecular barcode (also referred to as unique molecular identifier (UMI)), sample index, and first sequencing primer.
  • molecular barcode also referred to as unique molecular identifier (UMI)
  • sample index also referred to as sample index
  • UMI unique molecular identifier
  • FIG. 2 shows an exemplary second adaptor for rolony -based sequencing, including a target-specific sequence and a second universal primer sequence.
  • FIG. 3 shows exemplary steps (steps A to F) using a target specific PCR primer during library construction and clonal amplification to generate a bottom-strand rolony.
  • the target specific PCR primer is used to generate a double stranded nucleic acid molecule containing a region of interest (ROI) and 1 st and 2 nd adaptors.
  • ROI region of interest
  • the 5’ end of 1 st adaptor of top strand is phosphorylated for circularization and ligation.
  • Rolling circle amplification (RCA) primer hybridizes to single-stranded, circular nucleic acid template (top strand) and is used to generate a bottom-strand rolony having sequencing primer binding sites for Seq 1 A primer and Seq 2 A primer.
  • RCA rolling circle amplification
  • FIG. 4 shows exemplary steps (steps A to F) using a target specific PCR primer during library construction and clonal amplification to generate a top-strand rolony.
  • the target specific PCR primer is used to generate a double stranded nucleic acid molecule containing a region of interest (ROI) and 1 st and 2 nd adaptors.
  • ROI region of interest
  • 2 nd adaptor of bottom strand is phosphorylated for circularization and ligation.
  • RCA primer hybridizes to single-stranded, circular nucleic acid template (bottom strand) and is used to generate a top-strand rolony having sequencing primer binding sites for Seq IB primer and Seq 2B primer.
  • FIG. 5 shows an embodiment of library construction and clonal amplification workflow. Steps labeled 3 A-3F refer to steps or products depicted in Figure 3 with the corresponding label. Steps labeled 4A-4F refer to steps or products depicted in Figure 4 with the corresponding label.
  • FIG. 6 shows an embodiment of library construction and clonal amplification for paired-end sequencing.
  • Steps labeled 3 A-3F refer to steps or products depicted in Figure 3 with the corresponding label.
  • Steps labeled 4A-4F refer to steps or products depicted in Figure 4 with the corresponding label.
  • Top and bottom rolonies are seeded on the same flow cell, with separate inlets and outlets. Sequencing for each strand is performed in separated areas of flow cell at the same time in the sequencer.
  • FIG. 7 shows an embodiment of library construction with a first adaptor and a universal adaptor (second adaptor).
  • the first adaptor and universal adaptor (second adaptor) are joined to a region of interest (ROI) via blunt ligation.
  • the ligation product is amplified using a pair of universal primers, one of which is 5’ phosphorylated.
  • the top strand is circularized for clonal amplification (rolling circle amplification (RCA)) to generate a bottom-strand rolony having sequencing primer binding sites for Seq 1 A primer and Seq 2A primer.
  • RCA rolling circle amplification
  • FIG. 8 shows an embodiment of library construction with a first adaptor and a universal adaptor (second adaptor).
  • the first adaptor and universal adaptor (second adaptor) are joined to a region of interest (ROI) via blunt ligation.
  • the ligation product is amplified using a pair of universal primers, one of which is 5’ phosphorylated.
  • the bottom strand is circularized for clonal amplification (RCA) to generate a top-strand rolony having sequencing primer binding sites for Seq IB primer and Seq 2B primer.
  • FIG. 9 shows an embodiment of library construction compatible for use with dual indices and production of a bottom strand rolony.
  • the bottom-strand rolony can be sequentially sequenced by sequencing primer Seq 1 A (hybridizes to the linker region of the first adaptor) for sequencing ROI, sequencing primer Seq 2A (hybridizes to the first sequencing primer site of the first adaptor) for sequencing first sample index and UMI, and sequencing primer Seq 3 A (hybridizes to the 3 rd universal primer binding site of the second adaptor) for sequencing the second sample index.
  • FIG. 10 shows an embodiment of library construction compatible for use with dual indices and production of a top-strand rolony.
  • the top-strand rolony can be sequentially sequenced using sequencing primer Seq IB (hybridizes to the linker region of the first adaptor) for sequencing the first sample index and UMI, sequencing primer Seq 2B (hybridizes to the bridge oligonucleotide binding site of the 2 nd adaptor and optionally a portion of the first sequencing primer site of the first adaptor) for sequencing the second sample index, and sequencing primer Seq 3B (hybridizes to the 3 rd universal primer binding site of the second adaptor) for sequencing the ROI.
  • sequencing primer Seq IB hybridizes to the linker region of the first adaptor
  • sequencing primer Seq 2B hybridizes to the bridge oligonucleotide binding site of the 2 nd adaptor and optionally a portion of the first sequencing primer site of the first adaptor
  • sequencing primer Seq 3B hybridizes
  • FIG. 11 shows an exemplary second adaptor comprising a second sample index.
  • FIG. 12 shows an exemplary library construct as described in Example 1 and depicted in step D of Figure 3 (“3D structure”), which is circularized by a bridge oligonucleotide (see step E of Figure 3), ligated, and amplified by RCA using a RCA amplification primer to produce a bottom strand rolony.
  • the sequencing primers Seq 1 and Seq 2 (corresponding to Seq 1A and Seq 2A in step F of Figure 3, respectively) bind to primer binding sites within the first adaptor sequence.
  • BC bar code (unique molecular identifier (UMI)).
  • Index sample index.
  • I insert sequence or region of interest sequence.
  • Seq 2 sequencing primer #2 (for sequencing region of interest).
  • Seq 1 sequencing primer #1 (for sequencing UMI and sample index).
  • FIG. 13 shows an exemplary library construct as described in Example 2 and depicted in step D of Figure 3 (“3D structure”), which is circularized by a bridge oligonucleotide (see step E of Figure 3) and amplified by RCA using a RCA
  • Sequencing primer Seq 2 (corresponding to Seq 2A in step F of Fig. 3) binds to a primer binding site within the first adaptor sequence (linker region), while sequencing primer Seq 1 (corresponding to Seq 1A in step F of Fig. 3) binds to a primer binding site created by the junction of the first adaptor and second adaptor upon circularization and ligation.
  • Seq 2 sequencing primer #2 (for sequencing region of interest).
  • Seq 1 sequencing primer #1 (for sequencing UMI and sample index).
  • FIG. 14 shows an exemplary library construct as described in Example 3 and depicted in step D of Figure 4 (“4D structure”), which is circularized by a bridge oligonucleotide (see step E of Figure 4) and amplified by RCA using a RCA
  • sequencing primer Seq 1 (corresponding to Seq IB of Figure 4) binds to a primer binding site within the first adaptor sequence
  • sequencing primer Seq 1 (corresponding to Seq IB of Figure 4) binds to a primer binding site created by the junction of the first adaptor and SPE primer upon circularization.
  • BC bar code (unique molecular identifier (UMI)).
  • UMI unique molecular identifier
  • Index sample index.
  • I insert sequence or region of interest sequence.
  • Seq 1 sequencing primer #1 (for sequencing sample index and UMI).
  • Seq 2 sequencing primer #2 (for sequencing region of interest).
  • Fig. 15 shows the sequence of the first adaptor in Table 2.
  • Fig. 16 shows the sequence of the first adaptor in Table 3.
  • Fig. 17 shows the sequence of the first adaptor in Table 4.
  • the present disclosure provides adaptor design and nucleic acid library construction for rolony -based sequencing. Specifically, the present disclosure provides inter alia a partially double- stranded adaptor (referred to as“first adaptor” below) for generating circular, single- stranded nucleic acid templates.
  • first adaptor partially double- stranded adaptor
  • the first adaptor comprises: (i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; and (ii) a double-stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site.
  • UMI optional unique molecular identifier
  • the present disclosure also provides inter alia a method for constructing nucleic acid library for rolony -based sequencing.
  • the first adaptor provided herein may be added to one end of a double-stranded target nucleic acid fragment, while a second adaptor may be added to the other end of the fragment.
  • the second adaptor comprises a second universal primer binding site, which in turn comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site.
  • the resulting target nucleic acid fragment flanked by the first and second adaptor may be optionally amplified, denatured, and circularized in the presence of the bridge oligonucleotide to generate a circular, single-stranded nucleic acid template.
  • Such a template may be further amplified to generate rolonies via rolling circle amplification (RCA) and subsequently sequenced.
  • RCA rolling circle amplification
  • the libraries of nucleic acid templates constructed according to the method disclosed herein may be used in sequencing target nucleic acids useful in diagnosing and monitoring diseases (e.g ., cancers), charactering diseases (e.g, responsiveness to particular treatments), and other areas where obtaining target nucleic acid sequences is desirable.
  • diseases e.g ., cancers
  • charactering diseases e.g, responsiveness to particular treatments
  • other areas where obtaining target nucleic acid sequences is desirable.
  • any ranges provided herein include all the values in the ranges.
  • the term“or” is generally employed in its sense include“and/or” (i.e., to mean either one, both, or any combination thereof of the alternatives) unless the content dictates otherwise.
  • the singular forms“a,”“an,” and“the” include plural referents unless the content dictates otherwise.
  • “comprise” and their variants are used synonymously and to be construed as non limiting.
  • the term“about” refers to + 10% of a reference value.
  • “about 50°C” refers to“50°C ⁇ 5°C” (i.e., 50°C ⁇ 10% of 50°C).
  • nucleic acid refers to a polymer comprising ribonucleosides or deoxyribonucleosides that are covalently bonded typically by phosphodiester linkages between subunits.
  • Nucleic acids include DNA and RNA.
  • DNA includes, but is not limited to, genomic DNA, linear DNA, circular DNA, plasmid DNA, cDNA, cell free DNA ( e.g ., tumor derived or fetal DNA).
  • RNA includes but is not limited to hnRNA, mRNA, noncoding RNA, cell free RNA (e.g., tumor derived RNA).
  • Non coding RNA includes but is not limited to rRNA, tRNA, lncRNA (long non coding RNA), lincRNA (long intergenic non coding RNA), miRNA, and siRNA.
  • A“target nucleic acid,” also referred to as“target sequence,”“region of interest” (ROI), or“insert sequence,” refers to a nucleic acid molecule of interest.
  • a target nucleic acid may be from any source, such as a cell sample, tissue sample, fluid sample, or organism from a plant, animal, virus, bacteria, fungus, parasite, insect, mammal, bird, reptile, amphibian, or human, or a forensic sample or environmental sample.
  • Exemplary samples include whole blood, blood products, plasma, serum, red blood cells, white blood cells, buffy coat, urine, sputum, saliva, semen lymphatic fluid, amniotic fluid, cerebrospinal fluid, peritoneal effusions, pleural effusions, fluid from cysts, synovial fluid, vitreous humor, aqueous humor, bursa fluid, eye washes, eye aspirates, pulmonary lavage, bone marrow aspirates, lung aspirates, biopsy samples, swab samples, animal (including human) or plant tissues, including but not limited to samples from liver, spleen, kidney, lung, intestine, brain, heart, muscle, pancreas, cell cultures, lysates, extracts, or materials and fractions obtained from the samples described above or any cells and microorganisms and viruses that may be present on or in a sample and the like.
  • a target nucleic acid may be a naturally occurring sequence (e.g, DNA, genomic DNA (gDNA), cDNA, mitochondrial DNA, cell free DNA (cfDNA), RNA, mRNA, rRNA, tRNA, cfRNA, long non-coding RNA, microRNA), artificial sequence, or a combination thereof.
  • a target nucleic acid may be from a gene, a regulatory element, a non-coding sequence, or a combination thereof.
  • a target nucleic acid may be single-stranded or double-stranded.
  • a target nucleic acid may be obtained or isolated directly from a sample, or a product of a fragmentation reaction, a reverse transcription reaction, an amplification reaction, and the like, of nucleic acids obtained from a sample.
  • Target nucleic acids can be isolated from a sample according to methods known in the art to provide a nucleic acid sample (e.g ., DNA, RNA).
  • a target nucleic acid may be of any appropriate length.
  • a target nucleic acid may have a length in a particular size range, for example, about 50 to about 2,000 nucleotides, about 50 to about 1,000 nucleotides, about 50 to about 750 nucleotides, about 50 to about 600 nucleotides, about 50 to about 500 nucleotides, about 50 to about 400 nucleotides, about 50 to about 300 nucleotides, about 50 to about 200 nucleotides, about 100 to about 2,000 nucleotides, about 100 to about 1,000
  • nucleotides about 100 to about 750 nucleotides, about 100 to about 600 nucleotides, about 100 to about 500 nucleotides, about 100 to about 400 nucleotides, about 100 to about 300 nucleotides, about 100 to about 200 nucleotides, about 150 to about 2,000 nucleotides, about 150 to about 1,000 nucleotides, about 150 to about 750 nucleotides, about 150 to about 600 nucleotides, about 150 to about 500 nucleotides, about 150 to about 400 nucleotides, about 150 to about 300 nucleotides, or about 150 to about 200 nucleotides in length.
  • a target nucleic acid may have a length in the range of about 30 to 400 nucleotides.
  • the members of the library may have similar lengths, e.g., within a specific length range.
  • the optimal target nucleic acid size for the library is determined by a number of factors, including sequencing application (e.g, de novo sequencing vs. re-sequencing) and selected next generation sequencing platform.
  • target nucleic acids e.g, genomic DNA, RNA, or cDNA are fragmented.
  • Fragmenting nucleic acids may be performed physically, enzymatically, or chemically from larger nucleic acids to a desired size range.
  • Physical fragmentation includes acoustic shearing, sonication, and hydrodynamic shearing.
  • Enzymatic fragmentation may use an endonuclease that cleaves target nucleic acids into small fragments with 5’ phosphate and 3’ hydroxyl groups.
  • Chemical fragmentation may be accomplished using heat or divalent metal cation (e.g, magnesium or zinc).
  • target nucleic acids are subjected to size selection to obtain target nucleic acids within a defined or desired size range.
  • A“nucleic acid template” refers to a nucleic acid construct that comprises a target nucleic acid flanked between a“first adaptor” and a“second adaptor.”
  • a first adaptor refers to an adaptor sequence 5’ to the target nucleic acid
  • a second adaptor refers to an adaptor 3’ to the target nucleic acid.
  • a first adaptor refers to an adaptor sequence 5’ to one strand (e.g ., the sense strand) of the target nucleic acid
  • a second adaptor refers to an adaptor sequence 3’ to the strand of the target nucleic acid.
  • the sense strand of a double-stranded target nucleic acid may be any of the two stands of the target nucleic acid.
  • the antisense strand of the target nucleic acid is the strand other than the sense strand.
  • a nucleic acid template may be linear or circular.
  • a nucleic acid template may be single stranded or double stranded.
  • the target nucleic acid is directly adjacent to a first adaptor, a second adaptor, or both the first adaptor and second adaptor.
  • additional bases e.g., 1, 2 or more bases
  • a nucleic acid template is a member of a library of nucleic acid templates.
  • a nucleic acid template is DNA.
  • An“adaptor” refers to an engineered nucleic acid that is added to each end of a target nucleic acid to produce a nucleic acid template for sequencing.
  • An adaptor may comprise a subsequence for a particular function, e.g, library construction, library amplification, immobilization on a substrate, sequencing of nucleic acid templates, or any combination thereof.
  • an adaptor may comprise a restriction endonuclease recognition site, primer binding site for amplification during library construction (e.g, universal primer, target specific primer, single primer extension primer), binding site for a bridge oligonucleotide for circularization of a template nucleic acid, binding site for immobilizing a template nucleic acid on a substrate, primer binding site for sequencing (e.g, primer binding site for sequencing by synthesis methods or probe binding site for combinatorial probe anchor ligation (cPAL) methods), sample index sequence, unique molecular identifier (UMI) sequence, or any combination thereof.
  • An adaptor may comprise multiple, functionally distinct subsequences.
  • bases 1-26 at the 5’ end of the top strand are the sequence of a first sequencing primer (Seq 2) (SEQ ID NO: 10), bases 1-17 at the 5’ end of the top strand (i.e., 5’CTC ACA CTC ACC ACG TC) are the sequence of a first universal primer (universal primer 1) (SEQ ID NO:3); bases 1-10 at the 5’ end of the top strand (i.e., 5’CTC ACA CTC A) (SEQ ID NO: 36) are a portion of a bridge oligonucleotide binding site; the 26 bases at the 3’ terminus of the top strand (not including the T-overhang) (i.e., CTC ACT CGT CAC AGC ACC T
  • An adaptor may be single-stranded, double-stranded, or partially double-stranded.
  • the length of a single-stranded or double-stranded adaptor may vary depending upon the particular sequencing platform selected and intended use, but may range from about 3 nucleotides to about 200 nucleotides, from about 5 nucleotides to about 150 nucleotides, from about 10 nucleotides to about 100 nucleotides, from about 15 nucleotides to about 100 nucleotides, from about 20 nucleotides to about 100 nucleotides, from about 40 nucleotides to about 100 nucleotides, from about 5 nucleotides to about 80 nucleotides, from about 10 nucleotides to about 80 nucleotides, or from about 15 nucleotides to about 80 nucleotides.
  • the adaptor length is 15-100 nucleotides.
  • one of the strands may have a length as described above for a single-stranded or double-stranded adaptor.
  • an adaptor may comprise one or more modified nucleotides, e.g., having modifications to the nitrogenous base, 5-carbon sugar, phosphate moiety, or any combination thereof.
  • a“primer binding site” or“primer binding sequence” refers to a sequence to which a primer (or oligonucleotide) specifically binds.
  • Primer binding sequences are of sufficient length to allow hybridization of a primer.
  • the primer or a portion thereof is completely complementary to the primer binding sequence.
  • the primer or a portion thereof is substantially complementary to the primer binding site, that is, at least 90% of the nucleotides of the primer or the portion thereof are complementary to the nucleotides of the primer binding site.
  • a primer binding site is at least about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides long and/or at most about 60,
  • an adaptor comprises two or more primer binding sites
  • the two or more primer binding sites may be overlapping, partially overlapping, or non-overlapping.
  • the two or more primer binding sites may be immediately adjacent to each other or separated by one or more nucleotides (e.g ., about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides) and/or about 40, 35, or 30 or less nucleotides.
  • sample index also referred to as“index” or“index sequence” refers to a component of an adaptor comprising a unique combination of bases that identifies template nucleic acids belonging to a common library or sample.
  • sample indexes in template nucleic acids allows for multiplexing, e.g., sequencing of multiple different libraries or multiple different samples in a single reaction.
  • an index sequence can be used to orientate a sequence imager for purposes of detecting individual sequencing reactions.
  • an index sequence is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long.
  • An index sequence may be from about 2 nucleotides to about 25 nucleotides in length, from about 5 nucleotides to about 20 nucleotides in length, or from about 8 nucleotides to about 15 nucleotides in length.
  • a template nucleic acid comprises a single sample index. Sample multiplexing has the inherent risk of index mis-assignment (cross-talk), which occurs when a sequence read derived from one sample in a pool of samples is incorrectly matched to a sample index from a different sample in the pool of samples.
  • Index cross talk can be introduced by a variety of mechanisms. Dual sample indices (dual indices) may minimize the incidence of index cross-talk and improve sequencing accuracy and sensitivity. The use of dual indices may also increase multiplexing capability by combination of the two indices.
  • a template nucleic acid comprises dual sample indices.
  • a“unique molecular identifier” also referred to as“bar code” or“molecular bar code” refers to a component of an adaptor comprising a unique combination of bases that is used to identify unique nucleic acid molecules.
  • a UMI may be used to identify PCR duplicates derived from the same nucleic acid molecule that were generated during library amplification. Thus, a UMI may be used to de- duplicate sequencing reads derived from a single molecule.
  • a UMI is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long.
  • a UMI may be from about 2 nucleotides to about 25 nucleotides in length, from about 5 nucleotides to about 20 nucleotides in length, or from about 8 nucleotides to about 15 nucleotides in length.
  • a UMI is designed to have between 2 and 15 degenerate base positions, but preferably has between 6 and 12 base positions.
  • A“degenerate base position” is a base position that more than 1 nucleotide ( e.g ., 2, 3, or 4 different nucleotides) may occupy.
  • a UMI is designed to assign a completely unique sequence tag to each target nucleic acid molecule.
  • a UMI is not designed to assign a completely unique sequence tag to each molecule, but rather is designed to have a low probability of assigning any given sequence tag to a particular molecule. The greater the number of possible UMI sequences, the lower the probability of any particular sequence being assigned to a molecule.
  • UMI sequences are used to track the lineage of molecules from initial copying through amplification, processing and sequencing. They can be used to distinguish sequences that arise from polymerase misincorporations or sequencer errors from sequences that are derived from true mutant template molecules. UMIs can also be used to distinguish sequences that have the wrong sample index assignment as a result of cross-over of sample indices during pooled amplification.
  • UMI sequences can be assigned to more than one target nucleic acid molecule
  • meaningful analysis of UMI sequences requires first identifying target nucleic acid sequences (e.g., nucleic acid variants) and then analyzing the distribution of UMI sequences associated with those target nucleic acid sequences.
  • the number of different UMIs in a first adaptor may be at least 100, 1,000, 5,000, 100,000, 500,000, 1,000,000, or 5,000,000.
  • a first adaptor comprises from 5’ to 3’ : a single-stranded region comprising a first sequencing primer binding site, a first sample index sequence, wherein the first sequencing primer binding site further comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; and a double-stranded linker region of about 20 nucleotides to about 30 nucleotides for ligation, wherein the double stranded linker region further comprises a second sequencing primer binding site ( Figure 1).
  • the double-stranded linker region of the first adaptor is designed for dual purposes: direct ligation of the first adaptor to double stranded target nucleic acids and to provide a sequencing primer binding site for sequencing the target nucleic acid (region of interest) (see, e.g., Figure 3, bottom-strand rolony and Figure 4, top-strand rolony).
  • the UMI, sample index, or any portion thereof is not contained within the double-stranded linker region of the first adaptor.
  • the first adaptor may further comprise a UMI between the first sequencing primer binding site and the first sample index sequence or between the first sample index sequence and the double-stranded linker region.
  • the first sequencing primer binding site of the first adaptor is designed for multiple purposes: to provide a sequencing primer binding site for sequencing the UMI and sample index (see, e.g. , Figure 3, bottom-strand rolony); to provide a universal primer binding site for library enrichment; to provide a first portion of a bridge oligonucleotide binding site for circularization; and optionally to provide a sequencing primer binding site for sequencing the region of interest (see, e.g, Figure 4, top-strand rolony).
  • “linker” or“linker region” generally refers to the double- stranded nucleic acid sequence that is part of an adaptor and directly ligated with a target nucleic acid.
  • the first adaptors present in one or more libraries of nucleic acid templates comprise a shared or common linker region sequence.
  • the double-stranded linker region is about 15 to about 35 nucleotides in length, such as 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length, preferably 20-30 nucleotides in length.
  • the double-stranded linker region of the first adaptor may be formed by annealing the first adaptor’s two complementary strands of different lengths that possess a complementary linker region. In some embodiments, it may be advantageous for the double-stranded linker region of the first adaptor to be as short as possible without loss of function.“Function” in this context means that the double-stranded linker region forms a stable duplex under standard reaction conditions for an enzyme-catalyzed nucleic acid ligation reaction (e.g ., incubation at a temperature ranging from about 4° C to about 40° C in a ligation buffer appropriate for the enzyme), such that the two strands forming the double- stranded linker region of the first adaptor remain partially annealed during ligation of the first adaptor to a target nucleic acid.
  • an enzyme-catalyzed nucleic acid ligation reaction e.g ., incubation at a temperature ranging from about 4° C to about
  • the double-stranded linker region of the first adaptor may be of sufficient length and have a certain percent of GC content to reach the desired Tm for sequencing primer hybridization on the selected sequencing instrument.
  • the Tm requirement depends on the sequencing temperature, which is defined by the enzymes and buffers utilized during sequencing.
  • the double-stranded linker region of the first adaptor can be about 20-30 nucleotides in length, with about 50-80% GC content, with T m more than 60°C in the sequencing buffer.
  • the relatively lengthy double-stranded linker region can also improve adaptor structure uniformity during the annealing process of adaptor production/manufacturing, which can improve ligation efficiency.
  • different sample index sequences or different UMI sequences of the first adaptors may require different optimal conditions for ligating the first adaptors to target nucleic acids.
  • the first sequencing binding site of the first adaptor may be of sufficient length and have a certain percent of GC content sufficient to reach the desired T m for sequencing primer hybridization on the selected sequencing instrument (see, e.g., Example 1).
  • the length and GC content of the first sequencing binding site of the first adaptor can be reduced, because a portion of the first sequencing binding site is provided by the second universal primer binding site of the second adaptor following circularization of the template nucleic acid (see, Example 2 and Figure 12).
  • the first sequencing binding site of the first adaptor can be about 10-20 nucleotides in length, with about 30-80% GC content.
  • modified nucleotides e.g, having modifications to the nitrogenous base, 5-carbon sugar, phosphate moiety, or any combination thereof, spacers, or both are incorporated into the first adaptor to improve system working performance, automation and surface fixation.
  • spacer modifications include C3 spacer, C6 spacer, C12 spacer, spacer 9, spacer 18 (hexaethyleneglycol), dSpacer (abasic furan), ribospacer rSpacer, PC spacer, and hexanediol.
  • the first sequencing primer binding site of the first adaptor further comprises a first portion of a bridge oligonucleotide binding site.
  • bridge oligonucleotide also known as“guide oligonucleotide,” refers to a nucleic acid sequence designed for circularization of linear, single- stranded nucleic acid templates.
  • the bridge oligonucleotide comprises a sequence complementary to the 5’ end and 3’ end of the two flanking adaptors.
  • the 5’ end and 3’ end of the single- stranded nucleic acid template hybridizes to the bridge oligonucleotide, which brings the 5’ end and 3’ end of the single-stranded nucleic acid template in close proximity for ligation.
  • the 5’end of the single-stranded nucleic acid template is phosphorylated prior to the ligation reaction to enhance ligation efficiency (see, e.g, Figures 3, 4 and 7-10).
  • a second adaptor comprises a second universal primer binding site, wherein the second universal primer binding site in turn comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site.
  • the second adaptor is single-stranded.
  • the second universal primer binding site is designed for multiple purposes: to provide a universal primer binding site for library enrichment; to provide a second portion of a bridge oligonucleotide binding site for circularization; and optionally to provide a third sequencing primer binding site for sequencing the target nucleic acid.
  • the third sequencing binding site of the second adaptor may be of sufficient length and have a certain percent of GC content sufficient to reach the desired T m for sequencing primer hybridization on the selected sequencing instrument.
  • the length and GC content of the third sequencing binding site of the second adaptor can be reduced, because part of the third sequencing binding site is provided by the first universal primer binding site of the first adaptor following circularization of the template nucleic acid (see, Figures 13, 14).
  • the third sequencing binding site of the second adaptor can be about 10-20 nucleotides in length, with about 30-80% GC content.
  • the second adaptor further comprises a target-specific sequence 5’ to the second universal primer biding site (see“Target specific PCR primer” in Figure 2).
  • the presence of the target-specific sequence in the second adaptor allows target enrichment via PCR using a first universal primer and the target specific PCR primer.
  • a second adaptor comprises from 5’ to 3’: a second portion of a bridge oligonucleotide binding site; a second sample index; a third universal primer binding site; and a target nucleic acid specific sequence, wherein the bridge oligonucleotide binding site further comprises a fourth sequencing primer binding site (for sequencing sample index) and the third universal primer binding site further comprises a fifth sequencing primer binding site (for sequencing ROI) (see Figure 11).
  • a second adaptor may comprise a third sequencing primer binding site, a second sample index, a second universal primer binding site, and a target-nucleic acid specific sequence, wherein the third sequencing primer binding site further comprises a second portion of the bridge oligonucleotide binding site ( Figures 9-11).
  • a second adaptor is a universal adaptor that comprises a second universal primer binding site without any target nucleic acid specific sequence (see Figures 7 and 8). Such a second adaptor may be useful in whole genome sequencing or other assays that do not require target enrichment.
  • An exemplary universal adaptor is as follows:
  • An adaptor may be added to a target nucleic acid using a variety of methods including enzymatic ligation (blunt-end ligation, stick end ligation), chemical ligation, or primer extension.
  • the first adaptor is preferably added to a target nucleic acid via ligation (see, e.g., Figures 3, 4, and 7-10).
  • a second adaptor comprises a target-specific sequence
  • it is preferably added to a target nucleic acid via primer extension using the second adaptor as a primer (see, e.g., Figures 3 and 4).
  • An adaptor may be added to a target nucleic acid in whole (see, e.g. , Figures 3, 4, 7 and 8) or in phases where adjacent or overlapping pieces are assembled (see, e.g, Figures 9 and 10 wherein the second adaptor is added via target-enrichment PCR amplification and universal PCR amplification).
  • primer refers to an oligonucleotide that is
  • a primer may have about 10 to about 100 nucleotides in length, about 12 to about 80 nucleotides in length, or about 15 to about 50 nucleotides in length. In certain embodiments, a primer may have about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
  • a primer may comprise DNA, RNA, one or more modified nucleotides that contain modifications to the nitrogenous base, 5-carbon sugar, and/or phosphate moieties, or a combination thereof.
  • modified nucleotides include nucleotides comprising 2’-0-methylribose, 5-hydroxybutynyl-2’-deoxyridine
  • polynucleotides i.e., a sequence of nucleotides related by Watson-Crick base-pairing rules.
  • sequence“A-G-T” is complementary to the sequence“T-C-A.”
  • Complementarity may be“partial,” in which only some of the nucleic acids’ bases are matched according to the base pairing rules. Or, there may be“complete” or“total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of
  • Such hybridization preferably corresponds to stringent hybridization conditions. Again, such hybridization may occur with“near” or “substantial” complementarity of the antisense oligomer to the target sequence, as well as with exact complementarity.
  • the melting temperature (Tm) of an oligonucleotide used in the present disclosure is the temperature at which 50% of the oligonucleotide is duplexed with its perfect complement and 50% is free in a solution, such as 115 mM KC1.
  • Tm is determined by measuring the absorbance change of the oligonucleotide with its complement as a function of temperature (i.e., generation of a melting curve). The Tm is the reading halfway between the double-stranded DNA and single stranded DNA plateaus in the melting curve.
  • Factors influencing Tm include length of the
  • oligonucleotide molecule e.g., an oligonucleotide that is 14-20 nucleotides in length
  • Tm e.g., an oligonucleotide that is 14-20 nucleotides in length
  • Tm 2 °C(A + T) + 4 °C(G + C)
  • the above formula assigns 2°C to each A-T pair and 4°C to each G-C pair.
  • the Tm then is the sum of these values for all individual pairs in a DNA double strand.
  • a primer may be 100% complementary or partially complementary to the primer binding sequence in an adaptor to which it hybridizes.
  • a primer is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementary to the primer binding sequence in an adaptor to which it hybridizes.
  • The“% complementary” is determined based on the length of the primer binding sequence. For example, if a 20-nucleotide primer has 14 nucleotides complementary to a 15-nucleotide primer binding sequence in an adaptor, the % complementary is 93% (/. ⁇ ?., 14/15).
  • a primer further comprises additional sequence at the 5' end of the primer that is not complementary to the template nucleic acid sequence (e.g ., primer binding site in the adaptor).
  • the non-complementary portion of a primer may be at a length that does not interfere with the hybridization between the primer and its primer binding site. In some embodiments, the non-complementary portion is about 1 to 50, 1 to 40, 1 to 30, or 1 to 20 nucleotides long.
  • primers include but are not limited to an“extension primer,” a “universal primer,” a“target-specific primer,” a“RCA amplification primer,” or a “sequencing primer.”
  • An“extension primer” is used in a primer extension reaction by a DNA polymerase.
  • a primer extension reaction is a single primer extension (SPE) reaction where a SPE primer comprising a target nucleic acid specific sequence and a 5’ universal primer binding site repeatedly hybridizes to the same target locus from different nucleic acid templates resulting in target nucleic acid enrichment.
  • An extension primer may be referred to as a“PCR primer,” an“amplification primer” or the like when used in an amplification reaction such as PCR.
  • an extension primer is about 10 to about 50 nucleotides long, such as about 15 to about 35 nucleotides long.
  • a “universal primer” or a“universal PCR primer” refers to a primer that binds to sequence present in the nucleic acid template. Typically, the universal primer hybridizes to common sequences present in adaptors or target-specific primers. The universal primer can bind to and direct primer extension from the universal priming site. Universal primers may be used to amplify a library of target nucleic acid templates to be sequenced. A universal primer may be referred to as a“boosting primer” when used in combination with a target specific primer for target enrichment PCR.
  • a universal primer is about 15 to about 25 nucleotides long.
  • A“target-specific primer,”“target-specific nucleic acid primer,” or the like refers to a primer that hybridizes to target nucleic acid specific sequence, rather than adaptor specific sequence.
  • a target-specific primer may comprise an additional region, such as a universal primer binding sequence.
  • the region specific to a target nucleic acid sequence in a target-specific primer is about 13 to about 25 nucleotides long, and the additional region if present is about 10 to about 20 nucleotide long.
  • the overall length of a target-specific primer preferably about 23 to about 45 nucleotide long if the primer comprises the additional region.
  • A“RCA amplification primer” or the like refers to a primer used for RCA amplification. Its sequence may be a portion of a first adaptor (e.g ., the linker sequence of the first adaptor or a substantial portion thereof) or a second adaptor or a substantial portion thereof.
  • a substantial portion of a first or second adaptor refers to a portion of the first or second adaptor that is at least 10, 11, 12, 13, 14, or 15 nucleotides in length.
  • a RCA primer is about 13 to about 20 nucleotides long. Additional description of RCA primers may be found in the subsection“Rolling Circle
  • A“sequencing primer” refers to a primer that is used in sequencing reactions, e.g., sequencing-by-synthesis reactions or sequencing-by-ligation reaction, such as a combinatorial probe-anchor ligation reaction (cPAL).
  • a sequencing primer is about 15 to about 35 nucleotides long, such as about 15 to about 30 nucleotides long.
  • the present disclosure provides methods of producing a library of circular, single-stranded nucleic acid templates, each circular single-stranded nucleic acid template comprising a strand of a double-stranded target nucleic acid, a strand of a first adaptor or the complement thereof, and a strand of a second adaptor or the complement thereof.
  • MPS massively parallel sequencing
  • at least one library of nucleic acid templates is produced and individual constructs in the library are sequenced in parallel.
  • large numbers of libraries are pooled together and sequenced simultaneously during a single sequencing run.
  • MPS methods are typically performed on a large library or pool of libraries of nucleic acid templates.
  • the complement of a strand of an adaptor is an oligonucleotide that is of about the same (including the same) length as the strand of the adaptor and is completely complementary to the strand of the adaptor.
  • an exemplary first adaptor is a partially double-stranded oligonucleotide with the longer strand that is 60 nucleotides long and the shorter strand that is 30 nucleotides long
  • the complement of the longer strand of the first adaptor would be about 60 nucleotides long and is completely complementary to the longer strand (i.e., contains no mismatch, no internal insertion, and no internal deletion).
  • an exemplary second adaptor is a target-specific primer that is 40 nucleotides in length and comprises a target-specific sequence and a universal primer sequence
  • the complement of the second adaptor is about 40 nucleotides in length and is completely complementary to the second adaptor.
  • the method comprises: a) providing a plurality of fragments of double-stranded target nucleic acids; b) adding a first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises: (i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; (ii) a double stranded linker region of about 15 to about 35 bases (preferably about 20 to 30 bases) for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c) adding a second adaptor to a
  • oligonucleotide and ligating the first adaptor and second adaptor, thereby producing the library of circular, single-stranded nucleic acid templates.
  • the double-stranded target nucleic acids are obtained from isolated nucleic acids from a sample ( e.g ., genomic DNA).
  • the double-stranded target nucleic acids may be fragmented by physical, chemical, or enzymatic, means and fragments of double-stranded target nucleic acids of a desired size range are selected.
  • the ends of the size selected fragments of double-stranded target nucleic acids may then be repaired to produce blunt-ended, size-selected double-stranded target nucleic acids.
  • 3’ A-tails may then be added to the blunt-ended, size-selected fragments of double-stranded target nucleic acids using a DNA polymerase.
  • Matching 3’ T overhangs may be added to the first adaptor to facilitate ligation with the A-tailed, double-stranded, target nucleic acids.
  • the first adaptor may be present at one or both ends of the A-tailed, double-stranded target nucleic acids.
  • “Ligation” means to form a covalent bond or linkage between the termini of two or more nucleic acids.
  • the ligation may be an enzymatic ligation, which forms a phosphodiester linkage between a 5’ carbon terminal nucleotide of one DNA strand with a 3’ carbon of another DNA strand, or a chemical ligation.
  • a library of nucleic acid templates is generated which have common sequences at their 5' and 3' ends (step B of Figures 3 and 4, ligation products).
  • the term“common” is interpreted as meaning common to all templates in the library.
  • all templates within the library will contain regions of common sequence at (or proximal to) their 5' and 3' ends.
  • blunt-ended, size-selected double-stranded target nucleic acids may be directly ligated to the double-stranded linker region of the first adaptor (see, e.g., Figures 7 and 8).
  • the end of the first adaptor and/or the end of the universal adaptor can be modified to prevent non-desired ligation.
  • the modification can be chemical modification, for example but not limited to, C3 spacer.
  • the second adaptor may comprise a target nucleic acid specific sequence (see, e.g., Figures 3 and 4).
  • the second adaptor hybridizes to the double-stranded target nucleic acids via the target nucleic acid specific sequence, which is used to enrich target nucleic acids with the first universal primer via single primer extension.
  • the library of nucleic acid templates comprises the first adaptor at only one end (step C of Figures 3 and 4, SPE
  • step C of Figures 3 and 4, SPE amplification products The other end of the nucleic acid templates is replaced by the second adaptor (step C of Figures 3 and 4, SPE amplification products).
  • the library of nucleic acid templates undergoes another round of amplification using the first universal primer and the second universal primer (step D of Figures 3 and 4).
  • a proof-reading DNA polymerase is used during the step of universal PCR amplification.
  • a non-proof-reading DNA polymerase is used during the step of universal PCR amplification, it should be noted that a non-templated 3’ A is added to the 3’ end of the amplicons.
  • Any corresponding bridge oligonucleotide used for circularization should be designed to accommodate this “A” addition (e.g, a corresponding“T” may be added in the bridge oligonucleotide at the junction of the first adaptor and second adaptor, see also Tables 2-4).
  • the amplified library of double-stranded nucleic acid templates may then be prepared for circularization.
  • the library is denatured, for example, by heat, chemical (e.g, NaOH, high salt concentration, high pH), to produce a library of linear, single- stranded nucleic acid templates.
  • the single-stranded nucleic acid templates may preferably undergo 5’ phosphorylation to facilitate circularization and increase ligation strand specificity.
  • the 5’ phosphorylation group can be added enzymatically, for example using a T4 polynucleotide kinase.
  • the 5’ phosphorylation group can also be added to the strand that is to be circularized during the universal PCR step by using a 5’ phosphorylated universal primer in the universal amplification reaction (step D of Figures 3 and 4, universal amplification products).
  • the linear, single-stranded nucleic acid template is circularized by ligating the first adaptor and second adaptor (step E of Figures 3 and 4).
  • a single stranded DNA ligase e.g, CircLigaseTM
  • double stranded DNA ligase e.g, T4 DNA ligase
  • a bridge oligonucleotide hybridizes to the 5’ end and 3’ end of the two flanking adaptor molecules and brings the 5’ end and 3’ end of the single-stranded nucleic acid template in close proximity to facilitate ligation (see, e.g. , Figures 12-14).
  • Step D of Figures 3 and 4 shows different strands of double-stranded DNA templates are phosphorylated.
  • the corresponding rolonies are concatemers of the bottom strand, named as“bottom-strand rolony” (step F of Figure 3).
  • the corresponding rolonies are concatemers of the top strand, named as“top-strand rolony” (step F of Figure 4).
  • the second adaptor does not comprise any target nucleic acid specific sequence.
  • a library of circular, single-stranded nucleic acid templates may be constructed in a method using a first adaptor and a second adaptor that is universal adaptor (see Figures 7 and 8).
  • the method comprises: a) providing a plurality of fragments of double-stranded target nucleic acids; b) adding a first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises: (i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; (ii) a double stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c) adding a second adaptor to a 3’ terminus of the sense strand and to
  • a library of circular, single-stranded nucleic acid templates may be constructed for use with dual sample indices (see Figures 9 and 10).
  • a second sample index may be introduced into library constructs in a variety of ways.
  • An exemplary second adaptor comprising a second sample index is shown in Figure 11.
  • the second adaptor comprises from 5’ to 3’: (i) a bridge oligonucleotide binding site; (ii) a second sample index; (iii) a 3 rd universal primer binding site; and (iv) a target nucleic acid specific sequence, wherein the bridge oligonucleotide binding site comprises a portion of the bridge oligonucleotide binding site and a 4 th sequencing primer binding site, or portion thereof ( e.g ., for sequencing second sample index) and the 3 rd universal primer binding site further comprises a 5 th sequencing primer binding site (e.g., for sequencing ROI).
  • the second adaptor is added to the target nucleic acid in a series of PCR with portions of the second adaptor as shown in steps C and D of Figures 9 and 10, and Figure 11.
  • steps A to C are the same as those in Figure 3 except that the “target specific PCR primer” is referred to as the“2 nd adaptor” in Figure 3.
  • step D of Figure 9 a 5’ phosphorylated universal primer and a primer comprising (i) a bridge oligonucleotide binding site; (ii) a second sample index; and (iii) a 3 rd universal primer binding site of the second adaptor as described above are used in universal PCR amplification to generate double stranded template nucleic acids.
  • Rolonies may then be produced from the library of circular, single-stranded nucleic acid templates prepared as described above.
  • A“rolony” or“rolling circle colony” is a single-stranded DNA concatemer that is produced by rolling circle amplification (RCA) of a circularized DNA fragment.
  • A“concatemer” refers to a long, continuous DNA molecule that comprises multiple copies of the same DNA sequence linked in series.
  • a concatemer may comprise at least 2, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 monomers, wherein each monomer comprises a nucleic acid template (e.g ., first adaptor-target nucleic acid-second adaptor).
  • “DNA nanoballs” or“DNBs” are single-stranded DNA concatemers of sufficient length to form random coils that fill a roughly spherical volume in solution (e.g., SSC buffer at room temperature). In some embodiments, DNA nanoballs have a diameter of from about 100 to 300 nm.
  • “concatemer,”“DNA nanoball,” and“rolony” may be used interchangeably.
  • Rolling circle amplification refers to amplification of a circular, nucleic acid template using at least one primer that hybridizes to one strand of the circular nucleic acid template to produce rolonies that represent the other strand of the circular nucleic acid template.
  • a rolling circle amplification primer may comprise random sequence, sequence that hybridizes to an adaptor, or sequence that hybridizes to a junction region of two adaptors created when the nucleic acid template was circularized.
  • the RCA primer hybridizes to the linker region of the first adaptor, the first sequencing primer binding site of the first adaptor, or the second universal primer binding site of the second adaptor.
  • Using an RCA primer that hybridizes to the“sense” or“top” strand of the circular nucleic acid template for RCA produces a“bottom- strand rolony” (steps E and F of Figure 3 and Figure 12).
  • Using an RCA primer that hybridizes to the“antisense” or“bottom” strand of the circular nucleic acid template for RCA produces a“top-strand rolony” (steps E and F of Figure 4 and 14).
  • Each monomer in the rolonies produced according to the methods provided in the present disclosure comprises two separate sequencing primer binding sites on the same strand (see,“Seq 1 A” and“Seq 2A” in step F of Figure 3 for bottom-strand rolony and “Seq IB” and“Seq 2B” in step F of Figure 4 for top strand rolony).
  • the rolonies may be used as templates for sequencing reactions.
  • RCA based clonal amplification provides a simple solution that can often eliminate the need for emulsion PCR (ePCR) and thereby provide the option of eliminating an often expensive and labor-intensive step in many next generation sequencing methods.
  • ePCR emulsion PCR
  • DNA polymerase having suitable strand displacement activities is used to produce the rolonies.
  • DNA polymerases having strand displacement activity include, but are not limited to, Phi29, Bst DNA polymerase, SensiPhi DNA polymerase, Klenow fragment of DNA polymerase I, and Deep-VentR DNA polymerase
  • Table 1 shows the differences of the rolonies generated by different DNA strands.
  • Rolonies produced according to the present disclosure may be immobilized on a substrate.
  • a substrate comprises a plurality of sites for attachment of a plurality of rolonies.
  • Exemplary substrates include planar substrates ( e.g ., slides), non-planar substrates, bead substrates, or arrays comprising spots or wells.
  • Exemplary materials used for substrates include glass, ceramic, silica, silicon, quartz, various plastics, metal, elastomer (e.g., silicone), and polyacrylamide.
  • Rolonies may be immobilized to the surface of a substrate using a variety of techniques, including covalent and non-covalent attachment.
  • a substrate surface may comprise short oligonucleotides that form complexes, e.g, double-stranded duplexes, with a component ( e.g ., an adaptor sequence or a portion thereof) of the rolonies.
  • a substrate surface may comprise reactive functionalities that interact with complementary functionalities on the rolonies to form a covalent linkage (chemical attachment). For example, during RCA, modified nucleotides may be used to incorporate moieties such as bromide or thiol that can then be used in a crosslinking reaction.
  • Thiol-modified DNA can be covalently linked to a mercaptosilanized glass via an alkylating reagent such as iodoacetamide.
  • rolonies are immobilized through non-specific interactions with the substrate surface, such as via electrostatic interactions, hydrogen bonding, van der Waals forces, etc.
  • rolonies can be non-specifically, electrostatically deposited onto glass surfaces with polyamine attached.
  • rolonies are deposited onto a solid substrate randomly so that the rolonies on resulting substrate do not form a defined pattern.
  • rolonies may be confined to discrete regions on a substrate. The discrete regions may be arranged in a pattern, e.g., rectilinear pattern, hexagonal pattern, etc. A regular pattern or array may be advantageous for detection and analysis of sequencing data.
  • rolonies are immobilized on a flow cell.
  • a flow cell is a glass slide containing small fluidic channels, through which polymerases, dNTPs and buffers can be pumped.
  • the glass inside the channels may be dotted with short oligonucleotides complementary to at least a portion of an adaptor sequence of rolonies.
  • Rolonies may be hybridized to these oligonucleotides and thus immobilized onto the flow cell.
  • a flow cell or its fluidic channels may be coated with moieties that non-specifically (i.e., not in a sequence-dependent manner) bind to rolonies.
  • the coating may be uniform on the flow cell surface or its fluidic channel surface or may be patterned with areas capable of binding rolonies separated by those incapable of binding rolonies.
  • the method comprises hybridizing a sequencing primer that is complementary to at least a portion of at least one adapter.
  • a“sequence read” The output of a sequencing reaction is called a“sequence read,” which is a single, uninterrupted series of nucleotides representing the sequence of at least a portion of the rolony.
  • Any suitable sequencing method may be used to determine the sequence of at least a portion of the rolonies produced from the library of circular nucleic acid templates, including for example, sequencing by synthesis, sequencing by ligation, combinatorial probe anchor ligation (cPAL), pyrosequencing, etc. Sequencing by synthesis has been described in U.S. Pat. Nos. 6,210,891; 6,828,100, 6,833,246;
  • the sequencing method comprises sequential sequencing.
  • “sequential sequencing” refers to a sequencing process involving multiple different sequencing primers sequentially used in a sequencing run on the same substrate (e.g ., flow cell).
  • An exemplary sequential sequencing method using two different sequencing primers provides: 1) hybridization of the second sequencing primer (Seq 2) to concatemers produced from a library of circular nucleic acid templates; 2) sequencing at least one portion of the concatemer with X cycles following the second sequence primer, thereby generating a first sequencing fragment; 3) removing the first sequencing fragment in the sequencing instrument; 4) hybridization of the first sequencing primer (Seq 1) to the concatemers produced from the library of circular nucleic acid templates; 5) sequencing at least one portion of the concatemer with Y cycles following the first sequencing primer, thereby generating a second sequencing fragment.
  • X and/or Y is/are more than 2, 3, 4, 5, 6, 7, 8, 9, or 10 cycles.
  • the regions of interest may be sequenced first (see Figures 12, 13). This outcome is a significant difference from single-read sequencing which sequences the sample index and unique molecular identifier first. Sequencing the region of interest first may offer the advantage of high quality signal.
  • the order of the sequencing primers can be changed, e.g ., to change the order of what portion of the rolony is sequenced first (e.g, ROI or UMI/sample index).
  • a library of nucleic acid templates constructed with the first adaptor and second adaptor according to the methods provided in the present disclosure may be used for single-end sequencing or paired-end, rolony-based sequencing (see Figure 6).
  • “paired-end sequencing,” also referred to as“pairwise sequencing,” generally refers to the obtaining two sequencing“reads” of a template nucleic acid from both ends or strands of a single template nucleic acid.
  • paired-end sequencing may involve obtaining sequencing reads from a top strand rolony and bottom strand rolony produced from a single double stranded template nucleic acid. Paired end sequencing offers the advantage of improved accuracy and ability to identify indels. There is significantly more information that may be gained from sequencing two stretches each of“N” bases from a single template nucleic acid than from sequencing“N” bases from each of two independent template nucleic acids in a random fashion.
  • the SPE amplification products can be separated into two reactions for the following steps: universal PCR amplification with phosphorylated primers for one specific strand (either top or bottom strand), separate clonal amplifications to generate top strand and bottom strand rolonies.
  • the top strand and bottom strand rolonies can be seeded on the same flow cell, designed with two separate inlets and outlets (see Figure 6, step 3). Sequencing for the top rolony strands and bottom rolony strands are performed in separate areas of the flow cell at the same time in the sequencer.
  • the top strand and bottom strand rolonies are seeded on different flow cells, designed with single set of inlet and outlet. Sequencing for the top rolony strands and bottom rolony strands are performed in different flow cells at the same time in the sequencer.
  • the present disclosure also provides a set of first adaptors for preparing library of rolonies for sequencing.
  • the set of first adaptors comprises a plurality of partially double-stranded adaptors, each adaptor of the set comprises:
  • a single-stranded region comprising a first sequencing primer binding site, a unique molecular identifier (UMI), and a sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
  • UMI unique molecular identifier
  • a double stranded linker region of about 20 to about 30 bases for ligation to a double-stranded nucleic acid, wherein the double stranded linker region comprises a second sequencing primer binding site;
  • the present disclosure also provides a plurality of sets of first adaptors for preparing library of rolonies for sequencing.
  • Each set of first adaptors is as described above wherein the first sample index sequences of different sets are different from each other.
  • Different sets of first adaptors are added to at one end of target nucleic acids from different samples or sources.
  • a second adaptor is then added to the other end of the target nucleic acids.
  • the resulting nucleic acids comprising the target nucleic acids flanked by the first and second adaptors may be combined together for amplification (optional), circularization, RCA amplification, and sequencing.
  • the present disclosure provides use of the set or the plurality of sets of partially double-stranded first adaptors in preparing library of rolonies for sequencing.
  • the present disclosure also provides a kit for preparing a library of rolonies for sequencing comprising one or more of the following: (1) a set of first adaptor; (2) a second adaptor; (3) a first universal primer; (4) a second universal primer; (5) a bridge oligonucleotide; (6) a RCA primer; (7) a first sequencing primer; (8) one or more additional sequencing primers.
  • kit for preparing a library of rolonies for sequencing comprising one or more of the following: (1) a set of first adaptor; (2) a second adaptor; (3) a first universal primer; (4) a second universal primer; (5) a bridge oligonucleotide; (6) a RCA primer; (7) a first sequencing primer; (8) one or more additional sequencing primers.
  • the kit may further comprise a DNA ligase; a DNA polymerase with or without proofreading activity; a DNA polymerase with strand displacement activity, a DNA polymerase for sequencing, reaction buffers suitable for ligation, primer extension or sequencing, or any combination thereof.
  • kits are typically contained in separate vessels or compartments. However, when appropriate, some of the components may be provided as a mixture or composition. Additional descriptions of the components are provided in other sections, including the Examples, of the present disclosure.
  • kits for preparing a library of rolonies for sequencing In a related aspect, the present disclosure provides use of the kits for preparing a library of rolonies for sequencing.
  • EXAMPLE 1 DESIGN OF OLIGONUCLEOTIDES/ADAPTORS FOR PRODUCTION OF BOTTOM-
  • Table 2 shows exemplary oligonucleotide/adaptor sequences for designing a template nucleic acid and production of a bottom-strand rolony, where the binding site for the sequencing primer for the region of interest (Seq 2) and sequencing primer for the sample index and UMI (Seq 1) are both only present in the first adaptor (Figure 3).
  • Figure 12 shows the corresponding structure of 3D (linear universal amplification product) and 3E (circular nucleic acid template product).
  • the RCA amplification primer is designed based on the universal sequence of the second adaptor.
  • Table 3 shows exemplary oligonucleotide/adaptor sequences for designing a template nucleic acid and production of a bottom-strand rolony where the binding site for the sequencing primer for the ROI (Seq 2) is present in the first adaptor, and the binding site for the sequencing primer for sample index and barcode (EIMI) (Seq 1) is created by the junction of the first adaptor and SPE primer upon circularization.
  • Figure 13 shows the corresponding structure of 3D (linear universal amplification product) and 3E (circular nucleic acid template product).
  • the sequencing primer binding site for Seq 1 in the first adaptor can be shorter than the one used in example 1 due to the contribution of additional bases from the second adaptor to the sequencing binding site.
  • the RCA amplification primer is designed based on the linker region of the first adaptor. This RCA amplification primer design can also be applied to Example 1.
  • Table 4 shows exemplary oligonucleotide/adaptor sequences for designing a template nucleic acid and production of a top-strand rolony where the binding site for the sequencing primer for the sample index and barcode (UMI) (Seq 2) is present in the first adaptor, and the binding site for the sequencing primer for the ROI (Seq 1) is created by the junction of the first adaptor and second adaptor upon circularization.
  • Figure 14 shows the corresponding structure of 4D (linear universal amplification product) and 4E (circular template nucleic acid product).
  • the sequencing primer binding site for Seq 2 in the first adaptor can be shorter than the one used in example 1 due to the contribution of additional bases from the second adaptor to the Seq 1 sequencing binding site.
  • the RCA amplification primer is designed based on the linker region of the first adaptor. This RCA amplification primer design can also be applied to Example 1.
  • oligonucleotide/adaptor sequences in this example can be used to generate a top-strand rolony and corresponding bottom-strand rolony in a separate tubes; sequenced on a flow cell at different regions at the same time with different sequencing primers for paired- end sequencing.
  • BC barcode or unique molecular identifier (UMI)

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure provides compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing. Also provided are kits for preparing a library of rolonies for sequencing.

Description

COMPOSITIONS AND METHODS FOR ADAPTOR DESIGN AND NUCLEIC ACID LIBRARY CONSTRUCTION FOR ROLONY-B ASED SEQUENCING
STATEMENT REGARDING SEQUENCE LISTING
The Sequence Listing associated with this application is provided in text format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The name of the text file containing the Sequence Listing is
830109_417WO_SEQUENCE_LISTING.txt. The text file is 7.8 KB, was created on February 27, 2020, and is being submitted electronically via EFS-Web.
BACKGROUND
Next generation sequencing (NGS) has been widely used for the detection and confirmation of genetic changes. Rolonies (rolling circle colonies), which are single stranded DNA concatemers produced by rolling circle amplification of a circularized DNA fragment, offer certain advantages as a template for sequencing, including high image efficiency due to bright signals from hundreds of reaction sites being in the compact rolony, reduced reagent consumption due to the compactness of rolonies allowing for high density arrays, and improved sequencing accuracy. However, current adapter designs and library construction methods for rolony -based sequencing do not accommodate all of the following features: (i) ability to sequence multiple, distinct, separate fragments in the same sequencing run ( e.g ., target sequence, index sequence, unique molecular identifier sequence); (ii) compatible with use of single primer extension (SPE); (iii) compatible with use of unique molecular identifier; (iv) can be used for paired-end sequencing; and (v) can be used with dual sample indexes.
BRIEF SUMMARY
The present disclosure provides adaptors, kits, and methods for nucleic acid library construction for rolony -based sequencing.
In one aspect, the present disclosure provides a method of producing a library of circular, single-stranded nucleic acid templates, each circular, single-stranded nucleic acid template comprising a strand of a double-stranded target nucleic acid, a strand of a first adaptor or the complement thereof, and a strand of a second adaptor or the complement thereof, the method comprising:
a. providing a plurality of fragments of double-stranded target nucleic acids;
b. adding a first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises:
(i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
(ii) a double stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c. adding a second adaptor to a 3’ terminus of the sense strand and to a 5’ terminus of the antisense strand of the plurality of fragments of double-stranded nucleic acids to produce a library of linear, double-stranded nucleic acid templates, wherein the second adaptor comprises a second universal primer binding site, and wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site; d. optionally amplifying the library of linear, double-stranded nucleic acid templates with a first universal primer that binds to the first primer binding site and a second universal primer that binds to the second primer binding site;
e. denaturing the library of linear, double-stranded nucleic acid templates to produce a library of linear, single-stranded nucleic acid templates; and
f. circularizing the library of linear, single-stranded nucleic acid templates by adding a bridge oligonucleotide and ligating the first adaptor and second adaptor, thereby producing the library of circular, single-stranded nucleic acid templates. In another aspect, the present disclosure provides a set of partially double- stranded adaptors for producing a library of circular, single-stranded nucleic acid templates,
wherein the set comprises a plurality of partially double-stranded adaptors; wherein each adaptor of the set comprises:
(i) a single-stranded region comprising a first sequencing primer binding site, a unique molecular identifier (UMI), and a sample index sequence, wherein the first sequencing primer binding site further comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
(ii) a double stranded linker region of about 15 to about 35 bases for ligation to a double-stranded nucleic acid, wherein the double stranded linker region comprises a second sequencing primer binding site; and
wherein the plurality of the adaptors are identical to each other except their UMI sequences are different from each other.
In a further aspect, the present disclosure provides a kit for producing a library of circular, single-stranded nucleic acid templates, comprising:
(i) the set of partially double-stranded adaptors provided herein
(ii) a second adaptor comprising a second universal primer binding site, wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site, and
(iii) the bridge oligonucleotide.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
FIG. 1 shows an exemplary first adaptor scheme, including a linker, molecular barcode (also referred to as unique molecular identifier (UMI)), sample index, and first sequencing primer.
FIG. 2 shows an exemplary second adaptor for rolony -based sequencing, including a target-specific sequence and a second universal primer sequence. FIG. 3 shows exemplary steps (steps A to F) using a target specific PCR primer during library construction and clonal amplification to generate a bottom-strand rolony. The target specific PCR primer is used to generate a double stranded nucleic acid molecule containing a region of interest (ROI) and 1st and 2nd adaptors. The 5’ end of 1st adaptor of top strand is phosphorylated for circularization and ligation. Rolling circle amplification (RCA) primer hybridizes to single-stranded, circular nucleic acid template (top strand) and is used to generate a bottom-strand rolony having sequencing primer binding sites for Seq 1 A primer and Seq 2 A primer.
FIG. 4 shows exemplary steps (steps A to F) using a target specific PCR primer during library construction and clonal amplification to generate a top-strand rolony.
The target specific PCR primer is used to generate a double stranded nucleic acid molecule containing a region of interest (ROI) and 1st and 2nd adaptors. The 5’ end of 2nd adaptor of bottom strand is phosphorylated for circularization and ligation. RCA primer hybridizes to single-stranded, circular nucleic acid template (bottom strand) and is used to generate a top-strand rolony having sequencing primer binding sites for Seq IB primer and Seq 2B primer.
FIG. 5 shows an embodiment of library construction and clonal amplification workflow. Steps labeled 3 A-3F refer to steps or products depicted in Figure 3 with the corresponding label. Steps labeled 4A-4F refer to steps or products depicted in Figure 4 with the corresponding label.
FIG. 6 shows an embodiment of library construction and clonal amplification for paired-end sequencing. Steps labeled 3 A-3F refer to steps or products depicted in Figure 3 with the corresponding label. Steps labeled 4A-4F refer to steps or products depicted in Figure 4 with the corresponding label. Top and bottom rolonies are seeded on the same flow cell, with separate inlets and outlets. Sequencing for each strand is performed in separated areas of flow cell at the same time in the sequencer.
FIG. 7 shows an embodiment of library construction with a first adaptor and a universal adaptor (second adaptor). The first adaptor and universal adaptor (second adaptor) are joined to a region of interest (ROI) via blunt ligation. The ligation product is amplified using a pair of universal primers, one of which is 5’ phosphorylated. The top strand is circularized for clonal amplification (rolling circle amplification (RCA)) to generate a bottom-strand rolony having sequencing primer binding sites for Seq 1 A primer and Seq 2A primer.
FIG. 8 shows an embodiment of library construction with a first adaptor and a universal adaptor (second adaptor). The first adaptor and universal adaptor (second adaptor) are joined to a region of interest (ROI) via blunt ligation. The ligation product is amplified using a pair of universal primers, one of which is 5’ phosphorylated. The bottom strand is circularized for clonal amplification (RCA) to generate a top-strand rolony having sequencing primer binding sites for Seq IB primer and Seq 2B primer.
FIG. 9 shows an embodiment of library construction compatible for use with dual indices and production of a bottom strand rolony. The bottom-strand rolony can be sequentially sequenced by sequencing primer Seq 1 A (hybridizes to the linker region of the first adaptor) for sequencing ROI, sequencing primer Seq 2A (hybridizes to the first sequencing primer site of the first adaptor) for sequencing first sample index and UMI, and sequencing primer Seq 3 A (hybridizes to the 3rd universal primer binding site of the second adaptor) for sequencing the second sample index.
FIG. 10 shows an embodiment of library construction compatible for use with dual indices and production of a top-strand rolony. The top-strand rolony can be sequentially sequenced using sequencing primer Seq IB (hybridizes to the linker region of the first adaptor) for sequencing the first sample index and UMI, sequencing primer Seq 2B (hybridizes to the bridge oligonucleotide binding site of the 2nd adaptor and optionally a portion of the first sequencing primer site of the first adaptor) for sequencing the second sample index, and sequencing primer Seq 3B (hybridizes to the 3rd universal primer binding site of the second adaptor) for sequencing the ROI.
FIG. 11 shows an exemplary second adaptor comprising a second sample index.
FIG. 12 shows an exemplary library construct as described in Example 1 and depicted in step D of Figure 3 (“3D structure”), which is circularized by a bridge oligonucleotide (see step E of Figure 3), ligated, and amplified by RCA using a RCA amplification primer to produce a bottom strand rolony. The sequencing primers Seq 1 and Seq 2 (corresponding to Seq 1A and Seq 2A in step F of Figure 3, respectively) bind to primer binding sites within the first adaptor sequence. BC = bar code (unique molecular identifier (UMI)). Index = sample index. I = insert sequence or region of interest sequence. Seq 2 = sequencing primer #2 (for sequencing region of interest).
Seq 1 = sequencing primer #1 (for sequencing UMI and sample index).
FIG. 13 shows an exemplary library construct as described in Example 2 and depicted in step D of Figure 3 (“3D structure”), which is circularized by a bridge oligonucleotide (see step E of Figure 3) and amplified by RCA using a RCA
amplification primer to produce a bottom strand rolony. Sequencing primer Seq 2 (corresponding to Seq 2A in step F of Fig. 3) binds to a primer binding site within the first adaptor sequence (linker region), while sequencing primer Seq 1 (corresponding to Seq 1A in step F of Fig. 3) binds to a primer binding site created by the junction of the first adaptor and second adaptor upon circularization and ligation. BC = bar code (unique molecular identifier (UMI)). Index = sample index. I = insert sequence or region of interest sequence. Seq 2 = sequencing primer #2 (for sequencing region of interest). Seq 1 = sequencing primer #1 (for sequencing UMI and sample index).
FIG. 14 shows an exemplary library construct as described in Example 3 and depicted in step D of Figure 4 (“4D structure”), which is circularized by a bridge oligonucleotide (see step E of Figure 4) and amplified by RCA using a RCA
amplification primer to produce a top strand rolony. Sequencing primer Seq 2
(corresponding to Seq 2B in step F of Figure 4) binds to a primer binding site within the first adaptor sequence, while sequencing primer Seq 1 (corresponding to Seq IB of Figure 4) binds to a primer binding site created by the junction of the first adaptor and SPE primer upon circularization. BC = bar code (unique molecular identifier (UMI)). Index = sample index. I = insert sequence or region of interest sequence. Seq 1 = sequencing primer #1 (for sequencing sample index and UMI). Seq 2 = sequencing primer #2 (for sequencing region of interest).
Fig. 15 shows the sequence of the first adaptor in Table 2.
Fig. 16 shows the sequence of the first adaptor in Table 3.
Fig. 17 shows the sequence of the first adaptor in Table 4.
DETAILED DESCRIPTION
The present disclosure provides adaptor design and nucleic acid library construction for rolony -based sequencing. Specifically, the present disclosure provides inter alia a partially double- stranded adaptor (referred to as“first adaptor” below) for generating circular, single- stranded nucleic acid templates. The first adaptor comprises: (i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; and (ii) a double-stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site. Such an adaptor design, especially the relative long double-stranded linker region, provides multiple advantages as described in detail below.
The present disclosure also provides inter alia a method for constructing nucleic acid library for rolony -based sequencing. According to such a method, the first adaptor provided herein may be added to one end of a double-stranded target nucleic acid fragment, while a second adaptor may be added to the other end of the fragment. The second adaptor comprises a second universal primer binding site, which in turn comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site. The resulting target nucleic acid fragment flanked by the first and second adaptor may be optionally amplified, denatured, and circularized in the presence of the bridge oligonucleotide to generate a circular, single-stranded nucleic acid template. Such a template may be further amplified to generate rolonies via rolling circle amplification (RCA) and subsequently sequenced.
The libraries of nucleic acid templates constructed according to the methods disclosed herein have one or more of the following characteristics:
i) possessing an asymmetric structure (target nucleic acid interposed between a longer first adaptor and a shorter second adaptor),
ii) being multifunctional that permits sequencing of multiple, distinct, separate fragments by for example sequential sequencing, thus minimizing signal loss with increased sequencing cycles
iii) being compatible with single primer extension,
iv) being compatible with use of unique molecular identifiers, v) being compatible with dual sample indices,
vi) being able to be used for paired-end sequencing based on the rolony formation from different stands of double-stranded library input,
vii) avoiding the sequencing of the low diversity region (linker region of the first adaptor) and shortening the refocus frequency and turnaround time of image-based focusing sequencer,
viii) allowing sequencing the target nucleic acid first rather than the UMI and sample index, thus providing the feasibility to sequence the regions of interest using the cycles with the lowest phasing and highest quality,
ix) allowing flexible universal primer and bridge oligonucleotide design, and
x) improving the ligation efficiency and consistency by for example including a relatively long linker region in designing adaptors.
The libraries of nucleic acid templates constructed according to the method disclosed herein may be used in sequencing target nucleic acids useful in diagnosing and monitoring diseases ( e.g ., cancers), charactering diseases (e.g, responsiveness to particular treatments), and other areas where obtaining target nucleic acid sequences is desirable.
In the following description, any ranges provided herein include all the values in the ranges. It should also be noted that the term“or” is generally employed in its sense include“and/or” (i.e., to mean either one, both, or any combination thereof of the alternatives) unless the content dictates otherwise. Also, as used in this specification and the appended claims, the singular forms“a,”“an,” and“the” include plural referents unless the content dictates otherwise. The terms“include,”“have,”
“comprise” and their variants are used synonymously and to be construed as non limiting. The term“about” refers to + 10% of a reference value. For example,“about 50°C” refers to“50°C ± 5°C” (i.e., 50°C ± 10% of 50°C).
A. Target Nucleic Acids and Template Nucleic Acids
The term“nucleic acid,”“nucleic acids,” or“polynucleotide” as used herein refers to a polymer comprising ribonucleosides or deoxyribonucleosides that are covalently bonded typically by phosphodiester linkages between subunits. Nucleic acids include DNA and RNA. DNA includes, but is not limited to, genomic DNA, linear DNA, circular DNA, plasmid DNA, cDNA, cell free DNA ( e.g ., tumor derived or fetal DNA). RNA includes but is not limited to hnRNA, mRNA, noncoding RNA, cell free RNA (e.g., tumor derived RNA). Non coding RNA includes but is not limited to rRNA, tRNA, lncRNA (long non coding RNA), lincRNA (long intergenic non coding RNA), miRNA, and siRNA.
A“target nucleic acid,” also referred to as“target sequence,”“region of interest” (ROI), or“insert sequence,” refers to a nucleic acid molecule of interest. A target nucleic acid may be from any source, such as a cell sample, tissue sample, fluid sample, or organism from a plant, animal, virus, bacteria, fungus, parasite, insect, mammal, bird, reptile, amphibian, or human, or a forensic sample or environmental sample. Exemplary samples include whole blood, blood products, plasma, serum, red blood cells, white blood cells, buffy coat, urine, sputum, saliva, semen lymphatic fluid, amniotic fluid, cerebrospinal fluid, peritoneal effusions, pleural effusions, fluid from cysts, synovial fluid, vitreous humor, aqueous humor, bursa fluid, eye washes, eye aspirates, pulmonary lavage, bone marrow aspirates, lung aspirates, biopsy samples, swab samples, animal (including human) or plant tissues, including but not limited to samples from liver, spleen, kidney, lung, intestine, brain, heart, muscle, pancreas, cell cultures, lysates, extracts, or materials and fractions obtained from the samples described above or any cells and microorganisms and viruses that may be present on or in a sample and the like. A target nucleic acid may be a naturally occurring sequence (e.g, DNA, genomic DNA (gDNA), cDNA, mitochondrial DNA, cell free DNA (cfDNA), RNA, mRNA, rRNA, tRNA, cfRNA, long non-coding RNA, microRNA), artificial sequence, or a combination thereof. A target nucleic acid may be from a gene, a regulatory element, a non-coding sequence, or a combination thereof. A target nucleic acid may be single-stranded or double-stranded.
A target nucleic acid may be obtained or isolated directly from a sample, or a product of a fragmentation reaction, a reverse transcription reaction, an amplification reaction, and the like, of nucleic acids obtained from a sample. Target nucleic acids can be isolated from a sample according to methods known in the art to provide a nucleic acid sample ( e.g ., DNA, RNA).
A target nucleic acid may be of any appropriate length. In certain embodiments, a target nucleic acid may have a length in a particular size range, for example, about 50 to about 2,000 nucleotides, about 50 to about 1,000 nucleotides, about 50 to about 750 nucleotides, about 50 to about 600 nucleotides, about 50 to about 500 nucleotides, about 50 to about 400 nucleotides, about 50 to about 300 nucleotides, about 50 to about 200 nucleotides, about 100 to about 2,000 nucleotides, about 100 to about 1,000
nucleotides, about 100 to about 750 nucleotides, about 100 to about 600 nucleotides, about 100 to about 500 nucleotides, about 100 to about 400 nucleotides, about 100 to about 300 nucleotides, about 100 to about 200 nucleotides, about 150 to about 2,000 nucleotides, about 150 to about 1,000 nucleotides, about 150 to about 750 nucleotides, about 150 to about 600 nucleotides, about 150 to about 500 nucleotides, about 150 to about 400 nucleotides, about 150 to about 300 nucleotides, or about 150 to about 200 nucleotides in length. Preferably, a target nucleic acid may have a length in the range of about 30 to 400 nucleotides. In a library of nucleic acid templates, each comprising a target nucleic acid sequence, the members of the library may have similar lengths, e.g., within a specific length range. The optimal target nucleic acid size for the library is determined by a number of factors, including sequencing application (e.g, de novo sequencing vs. re-sequencing) and selected next generation sequencing platform. In certain embodiments, target nucleic acids (e.g, genomic DNA, RNA, or cDNA) are fragmented. Fragmenting nucleic acids may be performed physically, enzymatically, or chemically from larger nucleic acids to a desired size range. Physical fragmentation includes acoustic shearing, sonication, and hydrodynamic shearing. Enzymatic fragmentation may use an endonuclease that cleaves target nucleic acids into small fragments with 5’ phosphate and 3’ hydroxyl groups. Chemical fragmentation may be accomplished using heat or divalent metal cation (e.g, magnesium or zinc). In certain embodiments, target nucleic acids are subjected to size selection to obtain target nucleic acids within a defined or desired size range.
A“nucleic acid template” refers to a nucleic acid construct that comprises a target nucleic acid flanked between a“first adaptor” and a“second adaptor.” A first adaptor refers to an adaptor sequence 5’ to the target nucleic acid, and a second adaptor refers to an adaptor 3’ to the target nucleic acid. In embodiments involving a double stranded target nucleic acid, a first adaptor refers to an adaptor sequence 5’ to one strand ( e.g ., the sense strand) of the target nucleic acid and a second adaptor refers to an adaptor sequence 3’ to the strand of the target nucleic acid. The sense strand of a double-stranded target nucleic acid may be any of the two stands of the target nucleic acid. The antisense strand of the target nucleic acid is the strand other than the sense strand. A nucleic acid template may be linear or circular. A nucleic acid template may be single stranded or double stranded. In certain embodiments, the target nucleic acid is directly adjacent to a first adaptor, a second adaptor, or both the first adaptor and second adaptor. In certain embodiments, additional bases (e.g., 1, 2 or more bases) are present between the target nucleic acid and the first adaptor, between the target nucleic acid and the second adaptor, or both. In certain embodiments, a nucleic acid template is a member of a library of nucleic acid templates. In certain embodiments, a nucleic acid template is DNA.
B. Adaptors
An“adaptor” refers to an engineered nucleic acid that is added to each end of a target nucleic acid to produce a nucleic acid template for sequencing. An adaptor may comprise a subsequence for a particular function, e.g, library construction, library amplification, immobilization on a substrate, sequencing of nucleic acid templates, or any combination thereof. For example, an adaptor may comprise a restriction endonuclease recognition site, primer binding site for amplification during library construction (e.g, universal primer, target specific primer, single primer extension primer), binding site for a bridge oligonucleotide for circularization of a template nucleic acid, binding site for immobilizing a template nucleic acid on a substrate, primer binding site for sequencing (e.g, primer binding site for sequencing by synthesis methods or probe binding site for combinatorial probe anchor ligation (cPAL) methods), sample index sequence, unique molecular identifier (UMI) sequence, or any combination thereof. An adaptor may comprise multiple, functionally distinct subsequences. Functionally distinct subsequences may be completely overlapping, partially overlapping, or non-overlapping within an adaptor. For example, in the first adaptor shown in Table 2, bases 1-26 at the 5’ end of the top strand (i.e., 5’ CTC AC A CTC ACC ACG TCG GCT CGC AG) are the sequence of a first sequencing primer (Seq 2) (SEQ ID NO: 10), bases 1-17 at the 5’ end of the top strand (i.e., 5’CTC ACA CTC ACC ACG TC) are the sequence of a first universal primer (universal primer 1) (SEQ ID NO:3); bases 1-10 at the 5’ end of the top strand (i.e., 5’CTC ACA CTC A) (SEQ ID NO: 36) are a portion of a bridge oligonucleotide binding site; the 26 bases at the 3’ terminus of the top strand (not including the T-overhang) (i.e., CTC ACT CGT CAC AGC ACC TCC TCC GC) are a ligation linker sequence and the sequence of a second sequencing primer (Seq 1) (SEQ ID NO: 9). An adaptor may be single-stranded, double-stranded, or partially double-stranded. The length of a single-stranded or double-stranded adaptor may vary depending upon the particular sequencing platform selected and intended use, but may range from about 3 nucleotides to about 200 nucleotides, from about 5 nucleotides to about 150 nucleotides, from about 10 nucleotides to about 100 nucleotides, from about 15 nucleotides to about 100 nucleotides, from about 20 nucleotides to about 100 nucleotides, from about 40 nucleotides to about 100 nucleotides, from about 5 nucleotides to about 80 nucleotides, from about 10 nucleotides to about 80 nucleotides, or from about 15 nucleotides to about 80 nucleotides. Preferably, the adaptor length is 15-100 nucleotides. For a partially double-stranded adaptor, one of the strands may have a length as described above for a single-stranded or double-stranded adaptor. In certain embodiments, an adaptor may comprise one or more modified nucleotides, e.g., having modifications to the nitrogenous base, 5-carbon sugar, phosphate moiety, or any combination thereof.
As used herein a“primer binding site” or“primer binding sequence” refers to a sequence to which a primer (or oligonucleotide) specifically binds. Primer binding sequences are of sufficient length to allow hybridization of a primer. In certain embodiments, the primer or a portion thereof is completely complementary to the primer binding sequence. In certain other embodiments, the primer or a portion thereof is substantially complementary to the primer binding site, that is, at least 90% of the nucleotides of the primer or the portion thereof are complementary to the nucleotides of the primer binding site. In certain embodiments, a primer binding site is at least about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides long and/or at most about 60,
55, 50, 45, 40, 35, or 30 nucleotides long. In embodiments where an adaptor comprises two or more primer binding sites, the two or more primer binding sites may be overlapping, partially overlapping, or non-overlapping. In embodiments wherein the two or more primer binding sites are non-overlapping in the same adaptor, they may be immediately adjacent to each other or separated by one or more nucleotides ( e.g ., about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides) and/or about 40, 35, or 30 or less nucleotides.
As used herein, a“sample index,” also referred to as“index” or“index sequence” refers to a component of an adaptor comprising a unique combination of bases that identifies template nucleic acids belonging to a common library or sample. The use of sample indexes in template nucleic acids allows for multiplexing, e.g., sequencing of multiple different libraries or multiple different samples in a single reaction. In some embodiments, an index sequence can be used to orientate a sequence imager for purposes of detecting individual sequencing reactions. In certain
embodiments, an index sequence is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. An index sequence may be from about 2 nucleotides to about 25 nucleotides in length, from about 5 nucleotides to about 20 nucleotides in length, or from about 8 nucleotides to about 15 nucleotides in length. In an embodiment, a template nucleic acid comprises a single sample index. Sample multiplexing has the inherent risk of index mis-assignment (cross-talk), which occurs when a sequence read derived from one sample in a pool of samples is incorrectly matched to a sample index from a different sample in the pool of samples. Index cross talk can be introduced by a variety of mechanisms. Dual sample indices (dual indices) may minimize the incidence of index cross-talk and improve sequencing accuracy and sensitivity. The use of dual indices may also increase multiplexing capability by combination of the two indices. In an embodiment, a template nucleic acid comprises dual sample indices.
As used herein, a“unique molecular identifier” (UMI), also referred to as“bar code” or“molecular bar code” refers to a component of an adaptor comprising a unique combination of bases that is used to identify unique nucleic acid molecules. A UMI may be used to identify PCR duplicates derived from the same nucleic acid molecule that were generated during library amplification. Thus, a UMI may be used to de- duplicate sequencing reads derived from a single molecule. In certain embodiments, a UMI is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. A UMI may be from about 2 nucleotides to about 25 nucleotides in length, from about 5 nucleotides to about 20 nucleotides in length, or from about 8 nucleotides to about 15 nucleotides in length.
A UMI is designed to have between 2 and 15 degenerate base positions, but preferably has between 6 and 12 base positions. A“degenerate base position” is a base position that more than 1 nucleotide ( e.g ., 2, 3, or 4 different nucleotides) may occupy. In certain embodiments, a UMI is designed to assign a completely unique sequence tag to each target nucleic acid molecule. In certain other embodiments, a UMI is not designed to assign a completely unique sequence tag to each molecule, but rather is designed to have a low probability of assigning any given sequence tag to a particular molecule. The greater the number of possible UMI sequences, the lower the probability of any particular sequence being assigned to a molecule. When many target nucleic acid molecules are copied and tagged, the same UMI sequence can be assigned to more than one template molecule. UMI sequences are used to track the lineage of molecules from initial copying through amplification, processing and sequencing. They can be used to distinguish sequences that arise from polymerase misincorporations or sequencer errors from sequences that are derived from true mutant template molecules. UMIs can also be used to distinguish sequences that have the wrong sample index assignment as a result of cross-over of sample indices during pooled amplification. Because the same UMI sequence can be assigned to more than one target nucleic acid molecule, meaningful analysis of UMI sequences requires first identifying target nucleic acid sequences (e.g., nucleic acid variants) and then analyzing the distribution of UMI sequences associated with those target nucleic acid sequences. The number of different UMIs in a first adaptor may be at least 100, 1,000, 5,000, 100,000, 500,000, 1,000,000, or 5,000,000.
As disclosed above, nucleic acid templates are flanked by a first adaptor and a second adaptor. A first adaptor comprises from 5’ to 3’ : a single-stranded region comprising a first sequencing primer binding site, a first sample index sequence, wherein the first sequencing primer binding site further comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; and a double-stranded linker region of about 20 nucleotides to about 30 nucleotides for ligation, wherein the double stranded linker region further comprises a second sequencing primer binding site (Figure 1). The double-stranded linker region of the first adaptor is designed for dual purposes: direct ligation of the first adaptor to double stranded target nucleic acids and to provide a sequencing primer binding site for sequencing the target nucleic acid (region of interest) (see, e.g., Figure 3, bottom-strand rolony and Figure 4, top-strand rolony). Preferably, the UMI, sample index, or any portion thereof is not contained within the double-stranded linker region of the first adaptor. In certain embodiments, the first adaptor may further comprise a UMI between the first sequencing primer binding site and the first sample index sequence or between the first sample index sequence and the double-stranded linker region. The first sequencing primer binding site of the first adaptor is designed for multiple purposes: to provide a sequencing primer binding site for sequencing the UMI and sample index (see, e.g. , Figure 3, bottom-strand rolony); to provide a universal primer binding site for library enrichment; to provide a first portion of a bridge oligonucleotide binding site for circularization; and optionally to provide a sequencing primer binding site for sequencing the region of interest (see, e.g, Figure 4, top-strand rolony).
As used herein,“linker” or“linker region” generally refers to the double- stranded nucleic acid sequence that is part of an adaptor and directly ligated with a target nucleic acid. In some embodiments, the first adaptors present in one or more libraries of nucleic acid templates comprise a shared or common linker region sequence. In certain embodiments, the double-stranded linker region is about 15 to about 35 nucleotides in length, such as 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length, preferably 20-30 nucleotides in length. The double-stranded linker region of the first adaptor may be formed by annealing the first adaptor’s two complementary strands of different lengths that possess a complementary linker region. In some embodiments, it may be advantageous for the double-stranded linker region of the first adaptor to be as short as possible without loss of function.“Function” in this context means that the double-stranded linker region forms a stable duplex under standard reaction conditions for an enzyme-catalyzed nucleic acid ligation reaction ( e.g ., incubation at a temperature ranging from about 4° C to about 40° C in a ligation buffer appropriate for the enzyme), such that the two strands forming the double- stranded linker region of the first adaptor remain partially annealed during ligation of the first adaptor to a target nucleic acid.
In some embodiments, it may be advantageous for the double-stranded linker region of the first adaptor to be of sufficient length and have a certain percent of GC content to reach the desired Tm for sequencing primer hybridization on the selected sequencing instrument. The Tm requirement depends on the sequencing temperature, which is defined by the enzymes and buffers utilized during sequencing. For example, the double-stranded linker region of the first adaptor can be about 20-30 nucleotides in length, with about 50-80% GC content, with Tm more than 60°C in the sequencing buffer. The relatively lengthy double-stranded linker region can also improve adaptor structure uniformity during the annealing process of adaptor production/manufacturing, which can improve ligation efficiency. Specifically, if the double-stranded linker region is shortened, different sample index sequences or different UMI sequences of the first adaptors may require different optimal conditions for ligating the first adaptors to target nucleic acids.
In some embodiments, it may be advantageous for the first sequencing binding site of the first adaptor to be of sufficient length and have a certain percent of GC content sufficient to reach the desired Tm for sequencing primer hybridization on the selected sequencing instrument (see, e.g., Example 1).
In some embodiments, the length and GC content of the first sequencing binding site of the first adaptor can be reduced, because a portion of the first sequencing binding site is provided by the second universal primer binding site of the second adaptor following circularization of the template nucleic acid (see, Example 2 and Figure 12). For example, the first sequencing binding site of the first adaptor can be about 10-20 nucleotides in length, with about 30-80% GC content. In some embodiments, modified nucleotides, e.g, having modifications to the nitrogenous base, 5-carbon sugar, phosphate moiety, or any combination thereof, spacers, or both are incorporated into the first adaptor to improve system working performance, automation and surface fixation. Examples of spacer modifications include C3 spacer, C6 spacer, C12 spacer, spacer 9, spacer 18 (hexaethyleneglycol), dSpacer (abasic furan), ribospacer rSpacer, PC spacer, and hexanediol.
As noted above, the first sequencing primer binding site of the first adaptor further comprises a first portion of a bridge oligonucleotide binding site. As used herein,“bridge oligonucleotide,” also known as“guide oligonucleotide,” refers to a nucleic acid sequence designed for circularization of linear, single- stranded nucleic acid templates. The bridge oligonucleotide comprises a sequence complementary to the 5’ end and 3’ end of the two flanking adaptors. The 5’ end and 3’ end of the single- stranded nucleic acid template hybridizes to the bridge oligonucleotide, which brings the 5’ end and 3’ end of the single-stranded nucleic acid template in close proximity for ligation. Preferably, the 5’end of the single-stranded nucleic acid template is phosphorylated prior to the ligation reaction to enhance ligation efficiency (see, e.g, Figures 3, 4 and 7-10).
A second adaptor comprises a second universal primer binding site, wherein the second universal primer binding site in turn comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site. In some embodiments, the second adaptor is single-stranded. The second universal primer binding site is designed for multiple purposes: to provide a universal primer binding site for library enrichment; to provide a second portion of a bridge oligonucleotide binding site for circularization; and optionally to provide a third sequencing primer binding site for sequencing the target nucleic acid.
In some embodiments, it may be advantageous for the third sequencing binding site of the second adaptor to be of sufficient length and have a certain percent of GC content sufficient to reach the desired Tm for sequencing primer hybridization on the selected sequencing instrument. In some embodiments, the length and GC content of the third sequencing binding site of the second adaptor can be reduced, because part of the third sequencing binding site is provided by the first universal primer binding site of the first adaptor following circularization of the template nucleic acid (see, Figures 13, 14). For example, the third sequencing binding site of the second adaptor can be about 10-20 nucleotides in length, with about 30-80% GC content.
In certain embodiments, the second adaptor further comprises a target-specific sequence 5’ to the second universal primer biding site (see“Target specific PCR primer” in Figure 2). The presence of the target-specific sequence in the second adaptor allows target enrichment via PCR using a first universal primer and the target specific PCR primer.
In certain embodiments, a second adaptor comprises from 5’ to 3’: a second portion of a bridge oligonucleotide binding site; a second sample index; a third universal primer binding site; and a target nucleic acid specific sequence, wherein the bridge oligonucleotide binding site further comprises a fourth sequencing primer binding site (for sequencing sample index) and the third universal primer binding site further comprises a fifth sequencing primer binding site (for sequencing ROI) (see Figure 11). In embodiments using dual sample indices, a second adaptor may comprise a third sequencing primer binding site, a second sample index, a second universal primer binding site, and a target-nucleic acid specific sequence, wherein the third sequencing primer binding site further comprises a second portion of the bridge oligonucleotide binding site (Figures 9-11).
In certain other embodiments, a second adaptor is a universal adaptor that comprises a second universal primer binding site without any target nucleic acid specific sequence (see Figures 7 and 8). Such a second adaptor may be useful in whole genome sequencing or other assays that do not require target enrichment. An exemplary universal adaptor is as follows:
GTAAAACGACGGCCAGTCAAGCTATGGAACACCACGTCCA (SEQ ID NO: 34)
CATTTTGCTGCCGGTCAGTTCGATACCTTGTGGTGCAGGT (SEQ ID NO: 35)
An adaptor may be added to a target nucleic acid using a variety of methods including enzymatic ligation (blunt-end ligation, stick end ligation), chemical ligation, or primer extension. For example, the first adaptor is preferably added to a target nucleic acid via ligation (see, e.g., Figures 3, 4, and 7-10). In certain embodiments where a second adaptor comprises a target-specific sequence, it is preferably added to a target nucleic acid via primer extension using the second adaptor as a primer (see, e.g., Figures 3 and 4). An adaptor may be added to a target nucleic acid in whole (see, e.g. , Figures 3, 4, 7 and 8) or in phases where adjacent or overlapping pieces are assembled (see, e.g, Figures 9 and 10 wherein the second adaptor is added via target-enrichment PCR amplification and universal PCR amplification).
Exemplary adaptors that can be used according to the methods of the present disclosure are provided in Tables 2-4.
C. Primers
As used herein, the term“primer” refers to an oligonucleotide that is
complementary to a primer binding site of a template nucleic acid and capable of being extended using the template nucleic acid as a template. A primer may have about 10 to about 100 nucleotides in length, about 12 to about 80 nucleotides in length, or about 15 to about 50 nucleotides in length. In certain embodiments, a primer may have about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides in length.
A primer may comprise DNA, RNA, one or more modified nucleotides that contain modifications to the nitrogenous base, 5-carbon sugar, and/or phosphate moieties, or a combination thereof. Examples of modified nucleotides include nucleotides comprising 2’-0-methylribose, 5-hydroxybutynyl-2’-deoxyridine
(Integrated DNA Technologies), 2- Amino-2’ deoxy adenosine (IB A Lifesciences), 5- Methyl-2’deoxycytidine (IB A Lifesciences), locked nucleic acids (LNA), peptide nucleic acid, and phosphorodiamidate morpholinos.
As used herein,“complementary” and“complementarity” refer to
polynucleotides (i.e., a sequence of nucleotides) related by Watson-Crick base-pairing rules. For example, the sequence“A-G-T,” is complementary to the sequence“T-C-A.” Complementarity may be“partial,” in which only some of the nucleic acids’ bases are matched according to the base pairing rules. Or, there may be“complete” or“total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of
hybridization between nucleic acid strands. Two sequences are described as “complementary” to one another when hybridization occurs in an antiparallel configuration.
A primer“specifically hybridizes” or“specifically binds” to a primer binding site if the primer hybridizes to the target under reaction conditions for which the primer is used ( e.g ., amplification conditions, primer extension conditions, and sequencing reaction conditions) with a Tm substantially greater than 45°C, preferably at least 50°C, and typically 60°C-80°C or higher. Such hybridization preferably corresponds to stringent hybridization conditions. Again, such hybridization may occur with“near” or “substantial” complementarity of the antisense oligomer to the target sequence, as well as with exact complementarity.
The melting temperature (Tm) of an oligonucleotide used in the present disclosure is the temperature at which 50% of the oligonucleotide is duplexed with its perfect complement and 50% is free in a solution, such as 115 mM KC1. Tm is determined by measuring the absorbance change of the oligonucleotide with its complement as a function of temperature (i.e., generation of a melting curve). The Tm is the reading halfway between the double-stranded DNA and single stranded DNA plateaus in the melting curve. Factors influencing Tm include length of the
oligonucleotide molecule, the specific sequence of the oligonucleotide, and buffer components, etc. Alternatively, the Tm of an oligonucleotide (e.g., an oligonucleotide that is 14-20 nucleotides in length) may be calculated based on the following formula:
Tm = 2 °C(A + T) + 4 °C(G + C)
The above formula assigns 2°C to each A-T pair and 4°C to each G-C pair. The Tm then is the sum of these values for all individual pairs in a DNA double strand.
A primer may be 100% complementary or partially complementary to the primer binding sequence in an adaptor to which it hybridizes. In certain embodiments, a primer is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementary to the primer binding sequence in an adaptor to which it hybridizes.
The“% complementary” is determined based on the length of the primer binding sequence. For example, if a 20-nucleotide primer has 14 nucleotides complementary to a 15-nucleotide primer binding sequence in an adaptor, the % complementary is 93% (/.<?., 14/15).
In certain embodiments, a primer further comprises additional sequence at the 5' end of the primer that is not complementary to the template nucleic acid sequence ( e.g ., primer binding site in the adaptor). The non-complementary portion of a primer may be at a length that does not interfere with the hybridization between the primer and its primer binding site. In some embodiments, the non-complementary portion is about 1 to 50, 1 to 40, 1 to 30, or 1 to 20 nucleotides long.
Examples of primers include but are not limited to an“extension primer,” a “universal primer,” a“target-specific primer,” a“RCA amplification primer,” or a “sequencing primer.”
An“extension primer” is used in a primer extension reaction by a DNA polymerase. In some embodiments, a primer extension reaction is a single primer extension (SPE) reaction where a SPE primer comprising a target nucleic acid specific sequence and a 5’ universal primer binding site repeatedly hybridizes to the same target locus from different nucleic acid templates resulting in target nucleic acid enrichment. An extension primer may be referred to as a“PCR primer,” an“amplification primer” or the like when used in an amplification reaction such as PCR. Preferably, an extension primer is about 10 to about 50 nucleotides long, such as about 15 to about 35 nucleotides long.
A "universal primer" or a“universal PCR primer” refers to a primer that binds to sequence present in the nucleic acid template. Typically, the universal primer hybridizes to common sequences present in adaptors or target-specific primers. The universal primer can bind to and direct primer extension from the universal priming site. Universal primers may be used to amplify a library of target nucleic acid templates to be sequenced. A universal primer may be referred to as a“boosting primer” when used in combination with a target specific primer for target enrichment PCR.
Preferably, a universal primer is about 15 to about 25 nucleotides long.
A“target-specific primer,”“target-specific nucleic acid primer,” or the like refers to a primer that hybridizes to target nucleic acid specific sequence, rather than adaptor specific sequence. In addition to a region that is specific to a target nucleic acid sequence, a target-specific primer may comprise an additional region, such as a universal primer binding sequence. Preferably, the region specific to a target nucleic acid sequence in a target-specific primer is about 13 to about 25 nucleotides long, and the additional region if present is about 10 to about 20 nucleotide long. The overall length of a target-specific primer preferably about 23 to about 45 nucleotide long if the primer comprises the additional region.
A“RCA amplification primer” or the like refers to a primer used for RCA amplification. Its sequence may be a portion of a first adaptor ( e.g ., the linker sequence of the first adaptor or a substantial portion thereof) or a second adaptor or a substantial portion thereof. A substantial portion of a first or second adaptor refers to a portion of the first or second adaptor that is at least 10, 11, 12, 13, 14, or 15 nucleotides in length. Preferably, a RCA primer is about 13 to about 20 nucleotides long. Additional description of RCA primers may be found in the subsection“Rolling Circle
Amplification” and the Examples below.
A“sequencing primer” refers to a primer that is used in sequencing reactions, e.g., sequencing-by-synthesis reactions or sequencing-by-ligation reaction, such as a combinatorial probe-anchor ligation reaction (cPAL). Preferably, a sequencing primer is about 15 to about 35 nucleotides long, such as about 15 to about 30 nucleotides long.
Exemplary primers that can be used in the methods according to the present disclosure are provided in Tables 2-4.
D. Library Construction
The present disclosure provides methods of producing a library of circular, single-stranded nucleic acid templates, each circular single-stranded nucleic acid template comprising a strand of a double-stranded target nucleic acid, a strand of a first adaptor or the complement thereof, and a strand of a second adaptor or the complement thereof. In massively parallel sequencing (MPS) methods, at least one library of nucleic acid templates is produced and individual constructs in the library are sequenced in parallel. Frequently, large numbers of libraries are pooled together and sequenced simultaneously during a single sequencing run. Thus, while reference may be made with respect to a target nucleic acid or a nucleic acid template, it will be recognized that MPS methods are typically performed on a large library or pool of libraries of nucleic acid templates.
The complement of a strand of an adaptor is an oligonucleotide that is of about the same (including the same) length as the strand of the adaptor and is completely complementary to the strand of the adaptor. For example, if an exemplary first adaptor is a partially double-stranded oligonucleotide with the longer strand that is 60 nucleotides long and the shorter strand that is 30 nucleotides long, then the complement of the longer strand of the first adaptor would be about 60 nucleotides long and is completely complementary to the longer strand (i.e., contains no mismatch, no internal insertion, and no internal deletion). If an exemplary second adaptor is a target-specific primer that is 40 nucleotides in length and comprises a target-specific sequence and a universal primer sequence, then the complement of the second adaptor is about 40 nucleotides in length and is completely complementary to the second adaptor.
In one aspect, the method comprises: a) providing a plurality of fragments of double-stranded target nucleic acids; b) adding a first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises: (i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; (ii) a double stranded linker region of about 15 to about 35 bases (preferably about 20 to 30 bases) for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c) adding a second adaptor to a 3’ terminus of the sense strand and to a 5’ terminus of the antisense strand of the plurality of fragments of double-stranded nucleic acids to produce a library of linear, double-stranded nucleic acid templates, wherein the second adaptor comprises a second universal primer binding site, wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site; d) optionally amplifying the library of linear, double-stranded nucleic acid templates with a first universal primer that binds to the first primer binding site and a second universal primer that binds to the second primer binding site; e) denaturing the library of linear, double-stranded nucleic acid templates to produce a library of linear, single-stranded nucleic acid templates; and f) circularizing the library of linear, single-stranded nucleic acid templates by adding a bridge
oligonucleotide and ligating the first adaptor and second adaptor, thereby producing the library of circular, single-stranded nucleic acid templates.
The double-stranded target nucleic acids are obtained from isolated nucleic acids from a sample ( e.g ., genomic DNA). The double-stranded target nucleic acids may be fragmented by physical, chemical, or enzymatic, means and fragments of double-stranded target nucleic acids of a desired size range are selected. The ends of the size selected fragments of double-stranded target nucleic acids may then be repaired to produce blunt-ended, size-selected double-stranded target nucleic acids. In certain embodiments, 3’ A-tails may then be added to the blunt-ended, size-selected fragments of double-stranded target nucleic acids using a DNA polymerase. Matching 3’ T overhangs may be added to the first adaptor to facilitate ligation with the A-tailed, double-stranded, target nucleic acids. At this time, the first adaptor may be present at one or both ends of the A-tailed, double-stranded target nucleic acids.“Ligation” means to form a covalent bond or linkage between the termini of two or more nucleic acids. The ligation may be an enzymatic ligation, which forms a phosphodiester linkage between a 5’ carbon terminal nucleotide of one DNA strand with a 3’ carbon of another DNA strand, or a chemical ligation. After the step of ligation with the first adaptor, a library of nucleic acid templates is generated which have common sequences at their 5' and 3' ends (step B of Figures 3 and 4, ligation products). In this context the term“common” is interpreted as meaning common to all templates in the library. As explained in further detail below, all templates within the library will contain regions of common sequence at (or proximal to) their 5' and 3' ends. In certain other
embodiments, blunt-ended, size-selected double-stranded target nucleic acids may be directly ligated to the double-stranded linker region of the first adaptor (see, e.g., Figures 7 and 8). To improve ligation efficiency, the end of the first adaptor and/or the end of the universal adaptor can be modified to prevent non-desired ligation. The modification can be chemical modification, for example but not limited to, C3 spacer. In certain embodiments, the second adaptor may comprise a target nucleic acid specific sequence (see, e.g., Figures 3 and 4). The second adaptor hybridizes to the double-stranded target nucleic acids via the target nucleic acid specific sequence, which is used to enrich target nucleic acids with the first universal primer via single primer extension. After the step of target enrichment, the library of nucleic acid templates comprises the first adaptor at only one end (step C of Figures 3 and 4, SPE
amplification products). The other end of the nucleic acid templates is replaced by the second adaptor (step C of Figures 3 and 4, SPE amplification products).
The library of nucleic acid templates undergoes another round of amplification using the first universal primer and the second universal primer (step D of Figures 3 and 4). In some embodiments, a proof-reading DNA polymerase is used during the step of universal PCR amplification. In embodiments where a non-proof-reading DNA polymerase is used during the step of universal PCR amplification, it should be noted that a non-templated 3’ A is added to the 3’ end of the amplicons. Any corresponding bridge oligonucleotide used for circularization should be designed to accommodate this “A” addition (e.g, a corresponding“T” may be added in the bridge oligonucleotide at the junction of the first adaptor and second adaptor, see also Tables 2-4).
The amplified library of double-stranded nucleic acid templates may then be prepared for circularization. The library is denatured, for example, by heat, chemical (e.g, NaOH, high salt concentration, high pH), to produce a library of linear, single- stranded nucleic acid templates. The single-stranded nucleic acid templates may preferably undergo 5’ phosphorylation to facilitate circularization and increase ligation strand specificity. The 5’ phosphorylation group can be added enzymatically, for example using a T4 polynucleotide kinase. The 5’ phosphorylation group can also be added to the strand that is to be circularized during the universal PCR step by using a 5’ phosphorylated universal primer in the universal amplification reaction (step D of Figures 3 and 4, universal amplification products). In some embodiments, the linear, single-stranded nucleic acid template is circularized by ligating the first adaptor and second adaptor (step E of Figures 3 and 4). A single stranded DNA ligase (e.g, CircLigase™) or double stranded DNA ligase (e.g, T4 DNA ligase) may be used. In some embodiments, a bridge oligonucleotide hybridizes to the 5’ end and 3’ end of the two flanking adaptor molecules and brings the 5’ end and 3’ end of the single-stranded nucleic acid template in close proximity to facilitate ligation (see, e.g. , Figures 12-14).
Step D of Figures 3 and 4 shows different strands of double-stranded DNA templates are phosphorylated. When the top strand is phosphorylated by PCR using universal primers with phosphorylation at the 5’ end of the first adaptor, the corresponding rolonies are concatemers of the bottom strand, named as“bottom-strand rolony” (step F of Figure 3). When the bottom strand is phosphorylated by PCR using primers with phosphorylation at the 5’ end of second adaptor, the corresponding rolonies are concatemers of the top strand, named as“top-strand rolony” (step F of Figure 4).
In some other embodiments, the second adaptor does not comprise any target nucleic acid specific sequence. For example, a library of circular, single-stranded nucleic acid templates may be constructed in a method using a first adaptor and a second adaptor that is universal adaptor (see Figures 7 and 8). The method comprises: a) providing a plurality of fragments of double-stranded target nucleic acids; b) adding a first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises: (i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site; (ii) a double stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c) adding a second adaptor to a 3’ terminus of the sense strand and to a 5’ terminus of the antisense strand of the plurality of fragments of double-stranded nucleic acids to produce a library of linear, double-stranded nucleic acid templates, wherein the second adaptor comprises a second universal primer binding site; d) amplifying the library of linear, double-stranded nucleic acid templates with a first universal primer that binds to the first primer binding site and a second universal primer that binds to the second primer binding site; e) denaturing the library of linear, double-stranded nucleic acid templates to produce a library of linear, single-stranded nucleic acid templates; and f) circularizing the library of linear, single-stranded nucleic acid templates by adding a bridge oligonucleotide and ligating the first adaptor and second adaptor, thereby producing the library of circular, single-stranded nucleic acid templates.
In certain further embodiments, a library of circular, single-stranded nucleic acid templates may be constructed for use with dual sample indices (see Figures 9 and 10).
A second sample index may be introduced into library constructs in a variety of ways. An exemplary second adaptor comprising a second sample index is shown in Figure 11. In some embodiments, the second adaptor comprises from 5’ to 3’: (i) a bridge oligonucleotide binding site; (ii) a second sample index; (iii) a 3rd universal primer binding site; and (iv) a target nucleic acid specific sequence, wherein the bridge oligonucleotide binding site comprises a portion of the bridge oligonucleotide binding site and a 4th sequencing primer binding site, or portion thereof ( e.g ., for sequencing second sample index) and the 3rd universal primer binding site further comprises a 5th sequencing primer binding site (e.g., for sequencing ROI). In some embodiments, the second adaptor is added to the target nucleic acid in a series of PCR with portions of the second adaptor as shown in steps C and D of Figures 9 and 10, and Figure 11. For example, in Figure 9, steps A to C are the same as those in Figure 3 except that the “target specific PCR primer” is referred to as the“2nd adaptor” in Figure 3. In step D of Figure 9, a 5’ phosphorylated universal primer and a primer comprising (i) a bridge oligonucleotide binding site; (ii) a second sample index; and (iii) a 3rd universal primer binding site of the second adaptor as described above are used in universal PCR amplification to generate double stranded template nucleic acids.
While the methods described in this section pertain to producing a library of nucleic acid templates, it is understood that these methods could also be readily applied to a method of producing a circular, single-stranded nucleic acid template, which may be used for production of a rolony for sequencing.
E. Rolling Circle Amplification
Rolonies may then be produced from the library of circular, single-stranded nucleic acid templates prepared as described above. A“rolony” or“rolling circle colony” is a single-stranded DNA concatemer that is produced by rolling circle amplification (RCA) of a circularized DNA fragment. A“concatemer” refers to a long, continuous DNA molecule that comprises multiple copies of the same DNA sequence linked in series. A concatemer may comprise at least 2, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 monomers, wherein each monomer comprises a nucleic acid template ( e.g ., first adaptor-target nucleic acid-second adaptor).“DNA nanoballs” or“DNBs” are single-stranded DNA concatemers of sufficient length to form random coils that fill a roughly spherical volume in solution (e.g., SSC buffer at room temperature). In some embodiments, DNA nanoballs have a diameter of from about 100 to 300 nm. As used herein,“concatemer,”“DNA nanoball,” and“rolony” may be used interchangeably.
“Rolling circle amplification” (RCA) refers to amplification of a circular, nucleic acid template using at least one primer that hybridizes to one strand of the circular nucleic acid template to produce rolonies that represent the other strand of the circular nucleic acid template. A rolling circle amplification primer may comprise random sequence, sequence that hybridizes to an adaptor, or sequence that hybridizes to a junction region of two adaptors created when the nucleic acid template was circularized. In some embodiments, the RCA primer hybridizes to the linker region of the first adaptor, the first sequencing primer binding site of the first adaptor, or the second universal primer binding site of the second adaptor. Using an RCA primer that hybridizes to the“sense” or“top” strand of the circular nucleic acid template for RCA produces a“bottom- strand rolony” (steps E and F of Figure 3 and Figure 12). Using an RCA primer that hybridizes to the“antisense” or“bottom” strand of the circular nucleic acid template for RCA produces a“top-strand rolony” (steps E and F of Figure 4 and 14). Each monomer in the rolonies produced according to the methods provided in the present disclosure comprises two separate sequencing primer binding sites on the same strand (see,“Seq 1 A” and“Seq 2A” in step F of Figure 3 for bottom-strand rolony and “Seq IB” and“Seq 2B” in step F of Figure 4 for top strand rolony). The rolonies may be used as templates for sequencing reactions.
RCA based clonal amplification provides a simple solution that can often eliminate the need for emulsion PCR (ePCR) and thereby provide the option of eliminating an often expensive and labor-intensive step in many next generation sequencing methods.
Preferably, a DNA polymerase having suitable strand displacement activities is used to produce the rolonies. DNA polymerases having strand displacement activity include, but are not limited to, Phi29, Bst DNA polymerase, SensiPhi DNA polymerase, Klenow fragment of DNA polymerase I, and Deep-VentR DNA polymerase
(NEB#M0258).
Table 1 shows the differences of the rolonies generated by different DNA strands. With use of the first adaptor, the sequencing of both kinds of rolonies (top strand and bottom strand) can avoid sequencing the low diversity linker region of the first adaptor.
Table 1 : Comparison of Rolonies Generated from Two Different Strands
Figure imgf000030_0001
F. Substrates
Rolonies produced according to the present disclosure may be immobilized on a substrate. A substrate comprises a plurality of sites for attachment of a plurality of rolonies. Exemplary substrates include planar substrates ( e.g ., slides), non-planar substrates, bead substrates, or arrays comprising spots or wells. Exemplary materials used for substrates include glass, ceramic, silica, silicon, quartz, various plastics, metal, elastomer (e.g., silicone), and polyacrylamide.
Rolonies may be immobilized to the surface of a substrate using a variety of techniques, including covalent and non-covalent attachment. In one embodiment, a substrate surface may comprise short oligonucleotides that form complexes, e.g, double-stranded duplexes, with a component ( e.g ., an adaptor sequence or a portion thereof) of the rolonies. In one embodiment, a substrate surface may comprise reactive functionalities that interact with complementary functionalities on the rolonies to form a covalent linkage (chemical attachment). For example, during RCA, modified nucleotides may be used to incorporate moieties such as bromide or thiol that can then be used in a crosslinking reaction. Thiol-modified DNA can be covalently linked to a mercaptosilanized glass via an alkylating reagent such as iodoacetamide. In another embodiment, rolonies are immobilized through non-specific interactions with the substrate surface, such as via electrostatic interactions, hydrogen bonding, van der Waals forces, etc. For example, rolonies can be non-specifically, electrostatically deposited onto glass surfaces with polyamine attached.
In certain embodiments, rolonies are deposited onto a solid substrate randomly so that the rolonies on resulting substrate do not form a defined pattern. In certain other embodiments, rolonies may be confined to discrete regions on a substrate. The discrete regions may be arranged in a pattern, e.g., rectilinear pattern, hexagonal pattern, etc. A regular pattern or array may be advantageous for detection and analysis of sequencing data.
In certain embodiments, rolonies are immobilized on a flow cell. A flow cell is a glass slide containing small fluidic channels, through which polymerases, dNTPs and buffers can be pumped. The glass inside the channels may be dotted with short oligonucleotides complementary to at least a portion of an adaptor sequence of rolonies. Rolonies may be hybridized to these oligonucleotides and thus immobilized onto the flow cell. Alternatively, a flow cell or its fluidic channels may be coated with moieties that non-specifically (i.e., not in a sequence-dependent manner) bind to rolonies. The coating may be uniform on the flow cell surface or its fluidic channel surface or may be patterned with areas capable of binding rolonies separated by those incapable of binding rolonies.
Methods of forming arrays of rolonies have also been described in Patent Publication Nos. W02007120208, W02006073504, WO2007133831, and
US2007099208, each of which is incorporated herein by reference in its entirety. G. Sequencing
In certain embodiments, following production of rolonies as described above, and optionally immobilizing the rolonies on a substrate surface, at least a portion of the rolony is sequenced. In certain embodiments, the method comprises hybridizing a sequencing primer that is complementary to at least a portion of at least one adapter.
The output of a sequencing reaction is called a“sequence read,” which is a single, uninterrupted series of nucleotides representing the sequence of at least a portion of the rolony.
Any suitable sequencing method may be used to determine the sequence of at least a portion of the rolonies produced from the library of circular nucleic acid templates, including for example, sequencing by synthesis, sequencing by ligation, combinatorial probe anchor ligation (cPAL), pyrosequencing, etc. Sequencing by synthesis has been described in U.S. Pat. Nos. 6,210,891; 6,828,100, 6,833,246;
6,911,345; 6,969,488; 6,897,023; 6,833,246; and 6,787,308; Patent Publication Nos. 200401061 30; 20030064398; and 20030022207; Margulies et al.,
2005, Nature 437:376-380; Ronaghi et al., 1996, Anal. Biochem. 242:84-89; Constans, A, 2003, The Scientist 17(13):36; and Bentley et al., 2008, Nature 456(7218): 53-59. Sequencing by ligation has been described in U.S. Patent Publication
Nos.WO 1999019341, W02005082098, W02006073504, WO2011/044437 and Shendure et al., 2005, Science, 309: 1728-1739.). Pyrosequencing has been described in Ronaghi et al., 1996, Anal. Biochem. 242:84-89.
In one embodiment, the sequencing method comprises sequential sequencing.
As used herein,“sequential sequencing” refers to a sequencing process involving multiple different sequencing primers sequentially used in a sequencing run on the same substrate ( e.g ., flow cell).
In embodiments where multiple sequencing primers are used in the same sequencing run, the order of addition of the sequencing primers may vary. An exemplary sequential sequencing method using two different sequencing primers (Seq 1 and Seq 2) provides: 1) hybridization of the second sequencing primer (Seq 2) to concatemers produced from a library of circular nucleic acid templates; 2) sequencing at least one portion of the concatemer with X cycles following the second sequence primer, thereby generating a first sequencing fragment; 3) removing the first sequencing fragment in the sequencing instrument; 4) hybridization of the first sequencing primer (Seq 1) to the concatemers produced from the library of circular nucleic acid templates; 5) sequencing at least one portion of the concatemer with Y cycles following the first sequencing primer, thereby generating a second sequencing fragment. In some embodiments, X and/or Y is/are more than 2, 3, 4, 5, 6, 7, 8, 9, or 10 cycles. In some embodiments where the order of addition of sequencing primers calls for with Seq 2 first, the regions of interest may be sequenced first (see Figures 12, 13). This outcome is a significant difference from single-read sequencing which sequences the sample index and unique molecular identifier first. Sequencing the region of interest first may offer the advantage of high quality signal. In some embodiments, the order of the sequencing primers can be changed, e.g ., to change the order of what portion of the rolony is sequenced first (e.g, ROI or UMI/sample index).
A library of nucleic acid templates constructed with the first adaptor and second adaptor according to the methods provided in the present disclosure may be used for single-end sequencing or paired-end, rolony-based sequencing (see Figure 6). As used herein,“paired-end sequencing,” also referred to as“pairwise sequencing,” generally refers to the obtaining two sequencing“reads” of a template nucleic acid from both ends or strands of a single template nucleic acid. In embodiments involving a circular template nucleic acid, paired-end sequencing may involve obtaining sequencing reads from a top strand rolony and bottom strand rolony produced from a single double stranded template nucleic acid. Paired end sequencing offers the advantage of improved accuracy and ability to identify indels. There is significantly more information that may be gained from sequencing two stretches each of“N” bases from a single template nucleic acid than from sequencing“N” bases from each of two independent template nucleic acids in a random fashion.
During the step of library construction, the SPE amplification products (see, e.g, step C of Figures 3 and 4) can be separated into two reactions for the following steps: universal PCR amplification with phosphorylated primers for one specific strand (either top or bottom strand), separate clonal amplifications to generate top strand and bottom strand rolonies. The top strand and bottom strand rolonies can be seeded on the same flow cell, designed with two separate inlets and outlets (see Figure 6, step 3). Sequencing for the top rolony strands and bottom rolony strands are performed in separate areas of the flow cell at the same time in the sequencer.
In some embodiments, the top strand and bottom strand rolonies are seeded on different flow cells, designed with single set of inlet and outlet. Sequencing for the top rolony strands and bottom rolony strands are performed in different flow cells at the same time in the sequencer.
H. Sets of 1st Adaptors for Preparing Library of Rolonies for Sequencing
The present disclosure also provides a set of first adaptors for preparing library of rolonies for sequencing. The set of first adaptors comprises a plurality of partially double-stranded adaptors, each adaptor of the set comprises:
(i) a single-stranded region comprising a first sequencing primer binding site, a unique molecular identifier (UMI), and a sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
(ii) a double stranded linker region of about 20 to about 30 bases for ligation to a double-stranded nucleic acid, wherein the double stranded linker region comprises a second sequencing primer binding site;
wherein the plurality of the adaptors are identical to each other except their UMI sequences are different from each other.
Different components of the partially double-stranded first adaptors are discussed above in the“Adaptors” section.
The present disclosure also provides a plurality of sets of first adaptors for preparing library of rolonies for sequencing. Each set of first adaptors is as described above wherein the first sample index sequences of different sets are different from each other. Different sets of first adaptors are added to at one end of target nucleic acids from different samples or sources. A second adaptor is then added to the other end of the target nucleic acids. The resulting nucleic acids comprising the target nucleic acids flanked by the first and second adaptors may be combined together for amplification (optional), circularization, RCA amplification, and sequencing. In a related aspect, the present disclosure provides use of the set or the plurality of sets of partially double-stranded first adaptors in preparing library of rolonies for sequencing.
I. Kits for Preparing Library of Rolonies for Sequencing
The present disclosure also provides a kit for preparing a library of rolonies for sequencing comprising one or more of the following: (1) a set of first adaptor; (2) a second adaptor; (3) a first universal primer; (4) a second universal primer; (5) a bridge oligonucleotide; (6) a RCA primer; (7) a first sequencing primer; (8) one or more additional sequencing primers. These components are described in other sections above.
In certain embodiments, the kit may further comprise a DNA ligase; a DNA polymerase with or without proofreading activity; a DNA polymerase with strand displacement activity, a DNA polymerase for sequencing, reaction buffers suitable for ligation, primer extension or sequencing, or any combination thereof.
The components of the kits are typically contained in separate vessels or compartments. However, when appropriate, some of the components may be provided as a mixture or composition. Additional descriptions of the components are provided in other sections, including the Examples, of the present disclosure.
In a related aspect, the present disclosure provides use of the kits for preparing a library of rolonies for sequencing.
EXAMPLES
EXAMPLE 1 : DESIGN OF OLIGONUCLEOTIDES/ADAPTORS FOR PRODUCTION OF BOTTOM-
STRAND ROLONY
Table 2 shows exemplary oligonucleotide/adaptor sequences for designing a template nucleic acid and production of a bottom-strand rolony, where the binding site for the sequencing primer for the region of interest (Seq 2) and sequencing primer for the sample index and UMI (Seq 1) are both only present in the first adaptor (Figure 3). Figure 12 shows the corresponding structure of 3D (linear universal amplification product) and 3E (circular nucleic acid template product). In this example, the RCA amplification primer is designed based on the universal sequence of the second adaptor.
Table 2: Oligonucleotide/ Adaptor Sequences for Example 1
Figure imgf000036_0001
Figure imgf000037_0001
Underlined sequence = 26 nucleotide long double-stranded region BC=barcode or unique molecular identifier (UMI)
Index= sample index EXAMPLE 2: DESIGN OF OLIGONUCLEOTIDES/ADAPTORS FOR PRODUCTION OF BOTTOM-
STRAND ROLONY
Table 3 shows exemplary oligonucleotide/adaptor sequences for designing a template nucleic acid and production of a bottom-strand rolony where the binding site for the sequencing primer for the ROI (Seq 2) is present in the first adaptor, and the binding site for the sequencing primer for sample index and barcode (EIMI) (Seq 1) is created by the junction of the first adaptor and SPE primer upon circularization. Figure 13 shows the corresponding structure of 3D (linear universal amplification product) and 3E (circular nucleic acid template product). By this design, the sequencing primer binding site for Seq 1 in the first adaptor can be shorter than the one used in example 1 due to the contribution of additional bases from the second adaptor to the sequencing binding site. In this example, the RCA amplification primer is designed based on the linker region of the first adaptor. This RCA amplification primer design can also be applied to Example 1.
Table 3: Oligonucleotide/ Adaptor Sequences for Example 2
Figure imgf000038_0001
Figure imgf000039_0001
Underlined sequence = 26 nucleotide long double-stranded region BC=barcode or unique molecular identifier (UMI)
Index= sample index EXAMPLE 3 : DESIGN OF OLIGONUCLEOTIDES/ADAPTORS FOR PRODUCTION OF TOP-
STRAND ROLONY
Table 4 shows exemplary oligonucleotide/adaptor sequences for designing a template nucleic acid and production of a top-strand rolony where the binding site for the sequencing primer for the sample index and barcode (UMI) (Seq 2) is present in the first adaptor, and the binding site for the sequencing primer for the ROI (Seq 1) is created by the junction of the first adaptor and second adaptor upon circularization. Figure 14 shows the corresponding structure of 4D (linear universal amplification product) and 4E (circular template nucleic acid product). By this design, the sequencing primer binding site for Seq 2 in the first adaptor can be shorter than the one used in example 1 due to the contribution of additional bases from the second adaptor to the Seq 1 sequencing binding site. In this example, the RCA amplification primer is designed based on the linker region of the first adaptor. This RCA amplification primer design can also be applied to Example 1.
The oligonucleotide/adaptor sequences in this example, together with those oligonucleotide/adaptor sequences in Example 2, can be used to generate a top-strand rolony and corresponding bottom-strand rolony in a separate tubes; sequenced on a flow cell at different regions at the same time with different sequencing primers for paired- end sequencing.
Table 4: Oligonucleotide/ Adaptor Sequences for Example 3
Figure imgf000040_0001
Figure imgf000041_0001
Underlined sequence = 26 nucleotide long double-stranded region
BC=barcode or unique molecular identifier (UMI)
Index= sample index
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible
embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
This application claims the benefit of priority to U.S. Provisional Application No. 62/814,417, filed March 6, 2019, which application is hereby incorporated by reference in its entirety.

Claims

1. A method of producing a library of circular, single-stranded nucleic acid templates, each circular, single-stranded nucleic acid template comprising a strand of a double-stranded target nucleic acid, a strand of a first adaptor or the complement thereof, and a strand of a second adaptor or the complement thereof, the method comprising:
a. providing a plurality of fragments of double-stranded target nucleic acids;
b. adding a first adaptor to a 5’ terminus of a sense strand and to a 3’ terminus of an antisense strand of the plurality of fragments of double-stranded nucleic acids, wherein the first adaptor comprises:
(i) a single-stranded region comprising a first sequencing primer binding site, an optional unique molecular identifier (UMI), and a first sample index sequence, wherein the first sequencing primer binding site comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
(ii) a double stranded linker region of about 15 to about 35 bases for ligation to the plurality of fragments of double-stranded nucleic acids, wherein the double stranded linker region comprises a second sequencing primer binding site; c. adding a second adaptor to a 3’ terminus of the sense strand and to a 5’ terminus of the antisense strand of the plurality of fragments of double-stranded nucleic acids to produce a library of linear, double-stranded nucleic acid templates, wherein the second adaptor comprises a second universal primer binding site, and wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site; d. optionally amplifying the library of linear, double-stranded nucleic acid templates with a first universal primer that binds to the first primer binding site and a second universal primer that binds to the second primer binding site;
e. denaturing the library of linear, double-stranded nucleic acid templates to produce a library of linear, single-stranded nucleic acid templates; and f. circularizing the library of linear, single-stranded nucleic acid templates by adding a bridge oligonucleotide and ligating the first adaptor and second adaptor, thereby producing the library of circular, single-stranded nucleic acid templates.
2. The method of claim 1, wherein the plurality of fragments of double- stranded target nucleic acids are derived from genomic DNA.
3. The method of claim 1 or 2, wherein the plurality of fragments of double-stranded target nucleic acids is generated by:
a. isolating genomic DNA from a sample,
b. optionally fragmenting the genomic DNA,
c. optionally selecting fragments of genomic DNA of a desired size range, d. repairing the ends of the genomic DNA of step a., the fragmented genomic DNA of step b., or the size selected fragments of genomic DNA of step c. to produce blunt-ended fragments of genomic DNA, and
e. adding 3’A-tails to the blunt-ended fragments of genomic DNA of step d., thereby producing the plurality of fragments of double-stranded target nucleic acids.
4. The method of any one of claims 1-3, wherein the first adaptor is added to the plurality of fragments of double-stranded nucleic acids by ligation.
5. The method of any one of claims 1-4, wherein the second adaptor further comprises a target nucleic acid specific sequence, and is added to the plurality of fragments of double-stranded nucleic acids by single primer extension.
6. The method of any one of claims 1-4, wherein the second adaptor is added to the plurality of fragments of double-stranded nucleic acids by ligation.
7. The method of any one of claims 1-6, wherein the second adaptor further comprises a second sample index sequence and optionally, a fourth sequencing primer binding site.
8 The method of any one of claims 1-7, further comprising producing rolonies from the library of circular nucleic acid templates.
9. The method of claim 8, wherein the rolonies are produced by rolling circle amplification.
10. The method of claim 8 or 9, wherein the rolonies comprise top-strand rolonies, bottom-strand rolonies, or both.
11. The method of any one of claims 8-10, further comprising immobilizing the rolonies on a substrate surface.
12. The method of any one of claims 8-11, further comprising sequencing at least a portion of at least one of rolonies.
13. The method of claim 12, wherein a portion of the target nucleic acid, a portion of the first adaptor, a portion of the second adaptor, or any combination thereof is sequenced.
14. The method of claim 12 or 13, wherein the sequencing is sequencing by synthesis, pyrosequencing, or sequencing by ligation.
15. The method of any one of claims 12-14, wherein the sequencing is single read sequencing or paired-end sequencing.
16. The method of any one of claims 12-15, wherein the sequencing is sequential sequencing.
17. A set of partially double-stranded adaptors for producing a library of circular, single-stranded nucleic acid templates, wherein the set comprises a plurality of partially double-stranded adaptors; wherein each adaptor of the set comprises:
(i) a single-stranded region comprising a first sequencing primer binding site, a unique molecular identifier (UMI), and a sample index sequence, wherein the first sequencing primer binding site further comprises a first universal primer binding site and a first portion of a bridge oligonucleotide binding site;
(ii) a double stranded linker region of about 15 to about 35 bases for ligation to a double-stranded nucleic acid, wherein the double stranded linker region comprises a second sequencing primer binding site; and
wherein the plurality of the adaptors are identical to each other except their UMI sequences are different from each other.
18. The set of partially double-stranded adaptors of claim 17, comprising at least 1,000 different adaptors.
19. A kit for producing a library of circular, single-stranded nucleic acid templates, comprising:
(i) the set of partially double-stranded adaptors of claim 17,
(ii) a second adaptor comprising a second universal primer binding site, wherein the second universal primer binding site comprises a second portion of the bridge oligonucleotide binding site and optionally a third sequencing primer binding site, and
(iii) the bridge oligonucleotide.
PCT/US2020/020694 2019-03-06 2020-03-02 Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing WO2020180813A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962814417P 2019-03-06 2019-03-06
US62/814,417 2019-03-06

Publications (1)

Publication Number Publication Date
WO2020180813A1 true WO2020180813A1 (en) 2020-09-10

Family

ID=69904245

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/020694 WO2020180813A1 (en) 2019-03-06 2020-03-02 Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing

Country Status (1)

Country Link
WO (1) WO2020180813A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022101162A1 (en) * 2020-11-13 2022-05-19 Miltenyi Biotec B.V. & Co. KG Paired end sequential sequencing based on rolling circle amplification
EP4001432A1 (en) * 2020-11-13 2022-05-25 Miltenyi Biotec B.V. & Co. KG Algorithmic method for efficient indexing of genetic sequences using associative arrays
WO2023201487A1 (en) * 2022-04-18 2023-10-26 京东方科技集团股份有限公司 Adapter, adapter ligation reagent, kit, and library construction method
WO2024036445A1 (en) * 2022-08-15 2024-02-22 深圳华大智造科技股份有限公司 Method for preparing sequencing library and kit for preparing sequencing library

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999019341A1 (en) 1997-10-10 1999-04-22 President & Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US20030022207A1 (en) 1998-10-16 2003-01-30 Solexa, Ltd. Arrayed polynucleotides and their use in genome analysis
US20030064398A1 (en) 2000-02-02 2003-04-03 Solexa, Ltd. Synthesis of spatially addressed molecular arrays
US20040106130A1 (en) 1994-06-08 2004-06-03 Affymetrix, Inc. Bioarray chip reaction apparatus and its manufacture
US6787308B2 (en) 1998-07-30 2004-09-07 Solexa Ltd. Arrayed biomolecules and their use in sequencing
US6828100B1 (en) 1999-01-22 2004-12-07 Biotage Ab Method of DNA sequencing
US6833246B2 (en) 1999-09-29 2004-12-21 Solexa, Ltd. Polynucleotide sequencing
US6897023B2 (en) 2000-09-27 2005-05-24 The Molecular Sciences Institute, Inc. Method for determining relative abundance of nucleic acid sequences
US6911345B2 (en) 1999-06-28 2005-06-28 California Institute Of Technology Methods and apparatus for analyzing polynucleotide sequences
WO2005082098A2 (en) 2004-02-27 2005-09-09 President And Fellows Of Harvard College Polony fluorescent in situ sequencing beads
US6969488B2 (en) 1998-05-22 2005-11-29 Solexa, Inc. System and apparatus for sequential processing of analytes
WO2006073504A2 (en) 2004-08-04 2006-07-13 President And Fellows Of Harvard College Wobble sequencing
US20070099208A1 (en) 2005-06-15 2007-05-03 Radoje Drmanac Single molecule arrays for genetic and chemical analysis
WO2007120208A2 (en) 2005-11-14 2007-10-25 President And Fellows Of Harvard College Nanogrid rolling circle dna sequencing
WO2007133831A2 (en) 2006-02-24 2007-11-22 Callida Genomics, Inc. High throughput genome sequencing on dna arrays
WO2011044437A2 (en) 2009-10-09 2011-04-14 Stc.Unm Polony sequencing methods
WO2015188192A2 (en) * 2014-06-06 2015-12-10 Cornell University Method for identification and enumeration of nucleic acid sequence, expression, copy, or dna methylation changes, using combined nuclease, ligase, polymerase, and sequencing reactions

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040106130A1 (en) 1994-06-08 2004-06-03 Affymetrix, Inc. Bioarray chip reaction apparatus and its manufacture
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
WO1999019341A1 (en) 1997-10-10 1999-04-22 President & Fellows Of Harvard College Replica amplification of nucleic acid arrays
US6969488B2 (en) 1998-05-22 2005-11-29 Solexa, Inc. System and apparatus for sequential processing of analytes
US6787308B2 (en) 1998-07-30 2004-09-07 Solexa Ltd. Arrayed biomolecules and their use in sequencing
US20030022207A1 (en) 1998-10-16 2003-01-30 Solexa, Ltd. Arrayed polynucleotides and their use in genome analysis
US6828100B1 (en) 1999-01-22 2004-12-07 Biotage Ab Method of DNA sequencing
US6911345B2 (en) 1999-06-28 2005-06-28 California Institute Of Technology Methods and apparatus for analyzing polynucleotide sequences
US6833246B2 (en) 1999-09-29 2004-12-21 Solexa, Ltd. Polynucleotide sequencing
US20030064398A1 (en) 2000-02-02 2003-04-03 Solexa, Ltd. Synthesis of spatially addressed molecular arrays
US6897023B2 (en) 2000-09-27 2005-05-24 The Molecular Sciences Institute, Inc. Method for determining relative abundance of nucleic acid sequences
WO2005082098A2 (en) 2004-02-27 2005-09-09 President And Fellows Of Harvard College Polony fluorescent in situ sequencing beads
WO2006073504A2 (en) 2004-08-04 2006-07-13 President And Fellows Of Harvard College Wobble sequencing
US20070099208A1 (en) 2005-06-15 2007-05-03 Radoje Drmanac Single molecule arrays for genetic and chemical analysis
WO2007120208A2 (en) 2005-11-14 2007-10-25 President And Fellows Of Harvard College Nanogrid rolling circle dna sequencing
WO2007133831A2 (en) 2006-02-24 2007-11-22 Callida Genomics, Inc. High throughput genome sequencing on dna arrays
WO2011044437A2 (en) 2009-10-09 2011-04-14 Stc.Unm Polony sequencing methods
WO2015188192A2 (en) * 2014-06-06 2015-12-10 Cornell University Method for identification and enumeration of nucleic acid sequence, expression, copy, or dna methylation changes, using combined nuclease, ligase, polymerase, and sequencing reactions

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BENTLEY ET AL., NATURE, vol. 456, no. 7218, 2008, pages 53 - 59
CONSTANS, A, THE SCIENTIST, vol. 17, no. 13, 2003, pages 36
MARGULIES ET AL., NATURE, vol. 437, 2005, pages 376 - 380
RONAGHI ET AL., ANAL. BIOCHEM., vol. 242, 1996, pages 84 - 89
RONAGHI ET AL., BIOCHEM., vol. 242, 1996, pages 84 - 89
SHENDURE ET AL., SCIENCE, vol. 309, 2005, pages 1728 - 1739

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022101162A1 (en) * 2020-11-13 2022-05-19 Miltenyi Biotec B.V. & Co. KG Paired end sequential sequencing based on rolling circle amplification
EP4001432A1 (en) * 2020-11-13 2022-05-25 Miltenyi Biotec B.V. & Co. KG Algorithmic method for efficient indexing of genetic sequences using associative arrays
WO2023201487A1 (en) * 2022-04-18 2023-10-26 京东方科技集团股份有限公司 Adapter, adapter ligation reagent, kit, and library construction method
WO2024036445A1 (en) * 2022-08-15 2024-02-22 深圳华大智造科技股份有限公司 Method for preparing sequencing library and kit for preparing sequencing library

Similar Documents

Publication Publication Date Title
CN110997932B (en) Single cell whole genome library for methylation sequencing
JP7032930B2 (en) Methods and Arrays for Generating and Sequencing Monoclonal Clusters of Nucleic Acids
WO2020180813A1 (en) Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing
EP3555305B1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
US20200370105A1 (en) Methods for performing spatial profiling of biological molecules
CN110062809B (en) Single-stranded circular DNA library for sequencing circular consensus sequences
AU2020315955A1 (en) Methods and compositions for high throughput sample preparation using double unique dual indexing
US20140274729A1 (en) Methods, compositions and kits for generation of stranded rna or dna libraries
CN104080958A (en) Compositions and methods for directional nucleic acid amplification and sequencing
WO2021128441A1 (en) Controlled strand-displacement for paired-end sequencing
JP6479759B2 (en) Nucleic acid amplification method on solid support
JP2017527295A (en) Linker elements and methods for constructing sequencing libraries using them
CN111936635A (en) Generation of single stranded circular DNA templates for single molecule sequencing
US20220364169A1 (en) Sequencing method for genomic rearrangement detection
JP2018527928A (en) High molecular weight DNA sample tracking tag for next generation sequencing
US20220195417A1 (en) Multiplex assembly of nucleic acid molecules
EP4347869A1 (en) Massive generation of chemically ligateable probes for multiplexed fish
US20210017596A1 (en) Sequential sequencing methods and compositions
EP2456892A2 (en) Method for sequencing a polynucleotide template
WO2022256227A1 (en) Methods for fragmenting complementary dna
WO2023025784A1 (en) Optimised set of oligonucleotides for bulk rna barcoding and sequencing
WO2022256228A1 (en) Method for producing a population of symmetrically barcoded transposomes
WO2023096674A1 (en) Encoded assays

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20713151

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20713151

Country of ref document: EP

Kind code of ref document: A1