WO2022101162A1 - Paired end sequential sequencing based on rolling circle amplification - Google Patents
Paired end sequential sequencing based on rolling circle amplification Download PDFInfo
- Publication number
- WO2022101162A1 WO2022101162A1 PCT/EP2021/081027 EP2021081027W WO2022101162A1 WO 2022101162 A1 WO2022101162 A1 WO 2022101162A1 EP 2021081027 W EP2021081027 W EP 2021081027W WO 2022101162 A1 WO2022101162 A1 WO 2022101162A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dna
- sense
- sequence
- nucleotides
- sequencing
- Prior art date
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 68
- 230000003321 amplification Effects 0.000 title description 9
- 238000003199 nucleic acid amplification method Methods 0.000 title description 9
- 238000005096 rolling process Methods 0.000 title description 6
- 108020004414 DNA Proteins 0.000 claims abstract description 73
- 238000000034 method Methods 0.000 claims abstract description 18
- 108020004491 Antisense DNA Proteins 0.000 claims abstract description 17
- 239000003816 antisense DNA Substances 0.000 claims abstract description 17
- 239000002773 nucleotide Substances 0.000 claims description 28
- 125000003729 nucleotide group Chemical group 0.000 claims description 28
- 108091034117 Oligonucleotide Proteins 0.000 claims description 23
- 102000053602 DNA Human genes 0.000 claims description 21
- 108091028732 Concatemer Proteins 0.000 claims description 14
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 claims description 9
- 239000000203 mixture Substances 0.000 claims description 8
- 239000007787 solid Substances 0.000 claims description 7
- 102000012410 DNA Ligases Human genes 0.000 claims description 5
- 108010061982 DNA Ligases Proteins 0.000 claims description 5
- 230000008045 co-localization Effects 0.000 claims description 5
- 238000003786 synthesis reaction Methods 0.000 claims description 4
- 108020000948 Antisense Oligonucleotides Proteins 0.000 claims description 3
- 239000004971 Cross linker Substances 0.000 claims description 3
- 239000000074 antisense oligonucleotide Substances 0.000 claims description 3
- 238000012230 antisense oligonucleotides Methods 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000013467 fragmentation Methods 0.000 claims description 3
- 238000006062 fragmentation reaction Methods 0.000 claims description 3
- 238000004925 denaturation Methods 0.000 claims description 2
- 230000036425 denaturation Effects 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 239000012634 fragment Substances 0.000 abstract description 4
- 239000013615 primer Substances 0.000 description 27
- 230000000692 anti-sense effect Effects 0.000 description 26
- 108091081021 Sense strand Proteins 0.000 description 9
- 230000000295 complement effect Effects 0.000 description 7
- 230000002068 genetic effect Effects 0.000 description 7
- 102000040430 polynucleotide Human genes 0.000 description 7
- 108091033319 polynucleotide Proteins 0.000 description 7
- 239000002157 polynucleotide Substances 0.000 description 7
- 238000009396 hybridization Methods 0.000 description 5
- 239000011807 nanoball Substances 0.000 description 5
- 238000001712 DNA sequencing Methods 0.000 description 4
- 238000012408 PCR amplification Methods 0.000 description 4
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 4
- FFUAGWLWBBFQJT-UHFFFAOYSA-N hexamethyldisilazane Chemical compound C[Si](C)(C)N[Si](C)(C)C FFUAGWLWBBFQJT-UHFFFAOYSA-N 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 125000006850 spacer group Chemical group 0.000 description 4
- 108091035707 Consensus sequence Proteins 0.000 description 3
- 108020004682 Single-Stranded DNA Proteins 0.000 description 3
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 description 2
- 108020004638 Circular DNA Proteins 0.000 description 2
- 108010007577 Exodeoxyribonuclease I Proteins 0.000 description 2
- 102100029075 Exonuclease 1 Human genes 0.000 description 2
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 108010052305 exodeoxyribonuclease III Proteins 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 229920000768 polyamine Polymers 0.000 description 2
- 235000012239 silicon dioxide Nutrition 0.000 description 2
- 239000000377 silicon dioxide Substances 0.000 description 2
- 238000012174 single-cell RNA sequencing Methods 0.000 description 2
- 229910052719 titanium Inorganic materials 0.000 description 2
- 239000010936 titanium Substances 0.000 description 2
- KWIUHFFTVRNATP-UHFFFAOYSA-N Betaine Natural products C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 1
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 description 1
- 229920000089 Cyclic olefin copolymer Polymers 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- KWIUHFFTVRNATP-UHFFFAOYSA-O N,N,N-trimethylglycinium Chemical compound C[N+](C)(C)CC(O)=O KWIUHFFTVRNATP-UHFFFAOYSA-O 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 241000231739 Rutilus rutilus Species 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 229960003237 betaine Drugs 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000003297 denaturating effect Effects 0.000 description 1
- 150000002148 esters Chemical class 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6853—Nucleic acid amplification reactions using modified primers or templates
- C12Q1/6855—Ligating adaptors
Definitions
- the present invention is directed to a process for DNA/RNA sequencing aided by hash-mapping to identify target DNA moieties.
- Paired-end sequencing is defined as a process to sequence both ends of a DNA fragment and to generate more accurate sequencing data. Since paired-end reads are more likely to align to a reference, the quality of the entire data set improves.
- the amount of information to be analysed is further increased by attempts to improve the quality of the genetic information collected. Since sequencing errors increase with increasing the length of the DNA or RNA strands to by analysed, the quality of the genetic information can be improved by focusing on rather short-read sequencing methods. The error rate of next generation sequencing (NGS) is often a culprit for some applications where low-level base mutation is critical. Pairing of sequencing reads (paired-end) is a way to improve accuracy and sensitivity of assays. However, this approach further increases the amount of genetic information to be analysed and inter alia processing time.
- NGS next generation sequencing
- A1, A2 and A3 each comprise 5 to 50 nucleotides
- BR comprise 3 -20 nucleotides
- UMI comprise 9 to 15 nucleotides c. dividing the mixture of the sense and anti-sense DNA oligonucleotides into two fractions d. providing oligonucleotide guides comprising 5 to 50 nucleotides capable of binding to A1 and A2 of the same oligonucleotide to each fraction e. circularizing and the sense and anti-sense DNA oligonucleotides by ligation with a DNA ligase into circular templates f. multiplying the circular templates of each fraction into DNA concatemers, combining the fractions and localizing the DNA concatemers on a surface g.
- sequence D determining the following sequences of nucleotides of the DNA concatemers from A3 in direction to A2 as sequence A from A2 in direction to A3 as sequence C from A1 in direction to A3 as sequence B from A3 in direction to A1 as sequence D h. merging the sequences A and B to generate sequence AB and sequences C and D to generate sequence CD by colocalization using solid surface rolony coordinates i. pairing the sequences AB and CD by matching the sequence information of the barcode region BR and universal identifier region UMI.
- the present approach integrates the usage of two pairs of sequencing primers for each of the strand of a given portion of a polynucleotide duplex of interest and allow the concomitant sequencing of the positive and negative strands.
- the method of the invention allows the sequencing a plurality of polynucleotide molecules where specific adapters are ligated to double-stranded DNA molecule.
- the double stranded polynucleotide molecules are denatured after adapter sequence ligation and circularized.
- Fig. 1 shows the process of the invention where targeted DNA libraries are used to generate sense and anti-sense circular template used in rolling circle amplification producing DNA concatemers forming DNA nanoballs called rolonies.
- the generated rolonies are sequenced in segments capturing both added unique identifier information for each strand and the target DN A of interest.
- Fig. 2 shows the sequential sequencing events for both the sense and antisense DNA.
- Primer 1 and 2 are used in a first round of sequencing generating sequencing reads A and C corresponding to the target DNA insert of the DNA library.
- the second round of sequencing utilizes primer 3 and 4 and are used on the same immobilized rolonies to generate sequencing reads B and D corresponding to the identifier region containing a unique molecular identifier (UMI) and barcode.
- UMI unique molecular identifier
- Fig. 3 shows the results obtained when using a library of human DNA to generate paired-end sequencing reads using the invention described.
- the DNA reads were generated using a sequence-by-synthesis platform capable of sequencing rolonies immobilized on a solid surface and following the invention described in Fig. 1.
- the amount and percentage of unique paired reads and paired-groups (repeats of unique pairs due to PCR amplification of the DNA library) observed based on the number of sequencing reads analyzed: 15,612,769 reads (partial sequencing run analysis) for a set of pre-defined tiles on the flowcell is indicated.
- the method of the invention may be used for target double stranded DNA nucleic acid library with a length of 50 to 2000 nucleotides.
- the target double stranded DNA nucleic acid library may be used as is i.e. as target double stranded DNA or may be obtained by segmentation/fragmentation of a double stranded DNA.
- adapters are used which contain regions that allows for the circularization of the template DNA using guide oligonucleotide ligation approach.
- such adapter as referred to as sequencing regions A1, A2 and A3.
- the adaptors also include a barcode region BR and a universal identifier region UMI, so that the sense and the anti- sense strands can be uniquely identified as pairs and a portion that allows the hybridization of primers allowing the sequencing of the DNA nanoballs/rolonies in multiple sections and in more than one for round of sequencing if required.
- the circularized DNA template generated from both sense and anti-sense strands are used in rolling circle amplification (RCA) to generate multiple copies of DNA that are used for sequencing.
- RCA rolling circle amplification
- the thus obtained copies of DNA concatemers are hereinafter referred to as “rolonies” or “DNA nanoballs”.
- the circularized single-stranded DNA templates fragments from each strand are used to generate individual rolonies and therefore the positive and negative strands are located on different rolonies.
- rolonies are preferable attached randomly to a solid surface for example via electrostatic charges on surfaces like polyamines, silicon dioxide, titanium, hexamethyldisilazane or others) via NHS ester-activated crosslinkers.
- the first portion of each polynucleotide molecule that generated a rolony (sense strand) is attached to a first location of the surface and the second portion of each polynucleotide molecule that generated a rolony (anti-sense strand) is attached to a second location of the surface.
- Each of the rolonies which comprises either the first or the second portion of the target polynucleotide molecule (sense and anti-sense) is sequenced in two segments sequentially.
- the first segment reads the actual targeted DNA and the second segment, the information contained in the adaptor portion and containing the unique molecular identifier (UMI) and sample barcodes. These two sequences coming from the same rolonies are linked together by co-localization and merged into one unique DNA read.
- the segment sequences coming from rolonies originating from the same polynucleotide sequence (positive and negative strand), but located randomly on the surface are linked/paired by using the unique identifier contained in one of the adaptor.
- step a) the target double stranded DNA nucleic acid library containing adaptor regions is denatured into a mixture of sense and anti-sense DNA single strands.
- any double- stranded adapted DNA library containing fragmented targeted DNA region to be sequenced can be used as starting material for the method of the invention.
- the target double stranded DNA nucleic acid library is obtained by fragmentation of a target double stranded DNA.
- step b) the sense and anti-sense DNA single strands are provided at the 3’ and 5’ ends with sequencing regions A1, A2 and A3, a barcode region BR and a universal identifier region UMI to obtain sense and anti- sense oligonucleotides having the general formula
- A1-UMI-BR-A3 - anti sense DNA single strand A2 (3’) Wherein A1, A2 and A3 each comprise 5 to 50 nucleotides;
- BR comprise 3 -20 nucleotides
- UMI comprise 9 to 15 nucleotides
- the two adaptors flanking the target insert DNA consist of a spacer region serving as the hybridization of sequencing primers (A1) followed by a UMI region of n >8 nucleotide(s) UMI and a n>3 nucleotide barcode region followed by another spacer region serving as the hybridization of second set of sequencing primers (Adaptor A3).
- the second adaptor contains spacer region serving as the hybridization of the third set of sequencing primers (Adapter A2) complete the library construct.
- step c) the mixture of the sense and anti-sense DNA oligonucleotides is divided into two fractions i.e. the double-stranded adapted DN A library is distributed into 2 tubes in equal amount and labeled sense and antisense.
- the two mixtures of sense and anti-sense DNA oligonucleotides in the two fractions are provided with oligonucleotide guides comprising 5 to 50 nucleotides capable of binding to A1 and A2 of the same oligonucleotide to each fraction.
- One fraction receives a guide oligonucleotide complementary to the sense strand of A l and A2 and one fraction receives a guide oligonucleotide complementary to the anti-sense strand of A1 and A2.
- the DNA is heat denatured at 95C and cold shocked at 4C to anneal the bridge oligonucleotide onto the denatured single stranded DNA library.
- the bridge oligos are complementary to each extremity of the adapter region (A1 and A2), bringing the 5’ and 3’ end of the DNA library fragment in close proximity of one another.
- the DNA library is circularized by ligation with a DNA ligase like a T4 DNA ligase into circular template DNA library.
- the circularization reaction is purified by treating the mixture with exonuclease I and III to eliminate the un-ligated non-circular DNA and excess bridge oligonucleotides.
- the purified single strand circular template is replicated by a polymerase capable of rolling circle amplification into a plurality of DNA concatemers forming a DNA nanoball or rolony.
- an oligonucleotide is used to prime the binding of the replicating enzyme and hybridized to the same regions used for the hybridization of the sequencing oligonucleotides.
- rolonies An equal amount (1:1 ratio) of the sense and antisense of the RCA products (rolonies) are mixed and placed onto a modified positively charged solid surface like glass, plastic equivalent (cyclo olefin polymer or others) containing polyamines such as silicon dioxide, titanium, hexamethyldisilazane or others).
- the rolonies can interact to the surface via electrostatic charges or via NHS ester- activated crosslinkers.
- step g) the sequence information is obtained from the following nucleotides of the DNA concatemers from A3 in direction to A2 as sequence A using primer 1 from A2 in direction to A3 as sequence C using primer 2 from A1 in direction to A3 as sequence B using primer 3 from A3 in direction to A1 as sequence D using primer 4
- sequence A and C of both sense and antisense rolonies is performed using two sets of sequencing primers (primers 1 & 2) complementary to A3 for the sense strand & A2 for the anti-sense strand respectively and flanking the insert regions.
- sequences A and C may have each a length of 50-2000 nucleotides whereas the sequences B and D may have each a length of 20 to 50 nucleotides.
- the second segment sequencing of the barcode (BC) and UMI portion of both sense and antisense region is performed using two new sets of sequencing primers (primers 3 and 4) complementary to A1 for the sense strand & A3 for the anti-sense strand flanking the UMI/barcode region.
- the sequencing is performed using massively parallel sequencing by synthesis approach using fluorescently-labeled nucleotides. Step h and i
- Each sequencing round generates two set of reads (sense and antisense) for each rolony and four sequencing reads total for each paired rolonies (originating for the same double-stranded adapted DNA library portion). The thus obtained four sequence reads are then combined into the sequence of the target double stranded DNA nucleic acid library.
- sequences A and B which originate from the same rolonies and therefore co-localized on the surface using the rolony coordinates, are combined to generate the sequencing read AB for the sense strand.
- sequences C and D to generate the sequencing read CD for the anti-sense strand.
- the sequencing read AB and CD contain the insert sequence and the barcode BR and UMI for the sense strand and the anti-sense respectively.
- sequences AB and CD are then paired using the sequence information of the UMIs to generate a consensus sequence of the target double stranded DNA nucleic acid library using information from both sense and anti-sense portions.
- the pairing of matching of the sequences AB and CD may be performed by using the sequence information of the barcode BR and UMI with their barcode genetic sequence of consecutive nucleotide bases A, T, G, or C. Same barcode genetic sequences are assigned a partition ID using hash-map functions to indicate a unique “key” element. Sorting of such UMIs in single-cell RNA sequencing experiments is for example described in “ UMI- count modeling and differential expression analysis for single-cell RNA sequencing” by Chen et al. Genome Biology (2016). Further, the identification of barcodes for single cell genomics is described by Tambe et al. BMC Bioinformatics (2019) and an implementation of Hamming distance to sort similar dictionary entries is disclosed in “Perfect Hamming code with a hash table for faster genome mapping” by Takenaka et al. BMC Bioinformatics (2011).
- a library of human DNA has been used for generating paired-end sequencing reads using the invention described.
- the DNA reads were generated using a sequence-by- synthesis platform capable of sequencing rolonies immobilized on a solid surface and following the invention described in Fig. 1.
- An exemplary process according to the invention is shown in Fig. 1
- Library DNA from targeted region consist of a targeted insert region depicted with a double strand region with solid and dotted line.
- the insert is flanked by a spacer region (A3) which is the position where sequencing primers 1 and 4 binds (Step 8 and 9).
- A3 spacer region
- A1 and A2 adapters are located at each extremity and complete the library construct.
- Step 1 The double stranded library DNA is split into 2 tubes (sense and antisense) with equal amount.
- Step 2 The double stranded library DNA is mixed with appropriate bridge oligonucleotide (anti-sense oligo for sense library strand and sense oligo for anti-sense library strand) and heat denatured at 95C and cold shocked at 4C to anneal the bridge onto the denatured single stranded library DNA.
- appropriate bridge oligonucleotide anti-sense oligo for sense library strand and sense oligo for anti-sense library strand
- Step 3 The denatured single stranded library DNA is circularized by ligation with T4 DNA Ligase.
- Step 4 The ligation reaction mix is treated with Exonuclease I and III to eliminate the un-ligated non circular DNA and bridge oligonucleotide.
- Step 5 Circularized single stranded DNA is purified with magnetic beads.
- Step 6 Rolling Circle Amplification (RCA) is performed with oligonucleotide primers designed from either A1 or A2 adaptor region. RCA primer complementary to the sense strand is used for sense-strand circle and RCA primer complementary to the anti-sense is used for anti- sense- strand circle.
- Step 7 The resulting RCA nanoball products are quantified by Qubit.
- the sense and the anti-sense RCA products are mixed in equal amount and place onto the flow cell for sequencing.
- Step 8 150-200 cycle 1st segment sequencing of the target insert region is performed with sequencing primers 1 and 2.
- Step 9 20 cycle of the 2nd segment sequencing of the UMI and Barcode region is performed with sequencing primers 3 and 4.
- Step 10 Primary sequencing data analysis is performed to generate the DNA sequencing reads.
- Step 11 Secondary sequencing data analysis is performed a. Combining the first and second sequencing reads originating from the same rolonies using the rolony coordinates on the flowcell (co-localization). b. Pair the reads from two different rolonies originating from the same double stranded DNA (plus and minus strand) using the sequence information of the identifier region (Barcode and UMI) .
- Step 12 Determined the amount of unique paired reads and paired-groups (repeats of unique pairs due to PCR amplification of the DNA library) observed for a set of pre-defined tiles on the flowcell
- Step 13 Establishing a consensus sequence of the double-strand DNA library using information for both sense and anti-strand DNA (paired reads).
- A1-A2 sense bridge is used as a splint-bridge to circularize the positive (sense) construct as well as a primer to perform the rolling circle amplification reaction.
- A1-A2 antisense bridge is used as a splint-bridge to circularize the negative
- 150 cycle 1st segment sequencing of the target insert region is performed with target insert sense-minus-2 primer and target insert antisense-minus-2 primer.
- Fig 3 shows the results obtained when using a library of E.coli shotgun library DNA to generate paired-end sequencing reads using the invention described. For demonstration, 22 tiles out of 759 total tiles were analyzed. The amount and percentage of unique paired reads and paired-groups (repeats of unique pairs due to PCR amplification of the DNA library) observed based on the number of sequencing reads analyzed:
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention related to a method for obtaining the sequence of both strands of a DNA nucleic acid library wherein the sense and anti-sense DNA single strands are fragmented and provided with sequencing regions, a barcode region and a universal identifier region which are then sequenced and wherein the sequence information of the fragments is merged into the final sequence by matching the sequence information of the barcode region BR and universal identifier region UMI.
Description
PAIRED END SEQUENTIAL SEQUENCING BASED ON ROLLING CIRCLE AMPLIFICATION
BACKGROUND
[0001] The present invention is directed to a process for DNA/RNA sequencing aided by hash-mapping to identify target DNA moieties.
[0002] Current DNA sequencing technology identify genetic information obtained from polynucleotides DNA conjugated to adapters comprising a barcode and or unique molecular identifier and routinely produces hundreds of millions of short reads spanning tens to hundreds of base pairs.
[0003] Paired-end sequencing is defined as a process to sequence both ends of a DNA fragment and to generate more accurate sequencing data. Since paired-end reads are more likely to align to a reference, the quality of the entire data set improves.
[0004] The technique of paired-end sequencing is well known for example by Edwards et al, Genomics, 6 593-608; Roach et al , Genomics 1995; 26 345-353 and allows for the determination of two or more reads from two or more location on a ribonucleic or deoxyribonucleic acid complex.
[0005] One major problem in all sequencing methods is the amount of genetic information to be analysed since DNA or RNA may contain millions of base pairs. Identification and indexing of genetic reads based on contiguous sequences or near-matches is therefore a common challenge within the fields of bioinformatics and next-generation sequencing (NGS).
[0006] The amount of information to be analysed is further increased by attempts to improve the quality of the genetic information collected. Since sequencing errors increase with increasing the length of the DNA or RNA strands to by analysed, the quality of the genetic information can be improved by focusing on rather short-read sequencing methods. The error rate of next generation sequencing (NGS) is often a culprit for some applications where low-level base mutation is critical. Pairing of sequencing reads (paired-end) is a way to improve accuracy and sensitivity of assays. However, this approach further increases the amount of genetic information to be analysed and inter alia processing time.
SUMMARY
[0007] It was found that paired-end sequencing methods of short-read sequences can be improved by gaining information from two templates that originated from the same DNA duplex. The additional information that is captured allows to reduce the sequencing errors in short reads as one strand as it can be used as a confirmation of the right base determination and helping the proper alignment of the reads onto a reference sequence.
[0008] Finally, by linking two reads that are separated by a certain known distance apart from each other or partly overlap, the overall average length of the generated reads is increased which permits an easier identification of important mutation types such as insertions, deletions, inversions, genomic rearrangement, repetitive sequence elements, gene fusion and novel transcripts
[0009] It was therefore an object of the invention to provide a method for obtaining the sequence of both strands of a DNA nucleic acid library characterized by the steps a. denaturation the target double stranded DNA nucleic acid library into a mixture of sense and anti-sense DNA single strands b. providing the sense and anti-sense DNA single strands at the 3’ and 5’ ends with sequencing regions A1, A2 and A3, a barcode region BR and a universal identifier region UMI to obtain sense and anti- sense oligonucleotides having the general formula.
(5’) A1-UMI-BR-A3 - sense DNA single strand-A2 (3’)
(3’) A1-UMI-BR-A3 - anti sense DNA single strand-A2 (5’)
Wherein A1, A2 and A3 each comprise 5 to 50 nucleotides;
BR comprise 3 -20 nucleotides;
UMI comprise 9 to 15 nucleotides c. dividing the mixture of the sense and anti-sense DNA oligonucleotides into two fractions d. providing oligonucleotide guides comprising 5 to 50 nucleotides capable of binding to A1 and A2 of the same oligonucleotide to each fraction e. circularizing and the sense and anti-sense DNA oligonucleotides by ligation with a DNA ligase into circular templates f. multiplying the circular templates of each fraction into DNA concatemers, combining the fractions and localizing the DNA concatemers on a surface g. determining the following sequences of nucleotides of the DNA concatemers
from A3 in direction to A2 as sequence A from A2 in direction to A3 as sequence C from A1 in direction to A3 as sequence B from A3 in direction to A1 as sequence D h. merging the sequences A and B to generate sequence AB and sequences C and D to generate sequence CD by colocalization using solid surface rolony coordinates i. pairing the sequences AB and CD by matching the sequence information of the barcode region BR and universal identifier region UMI.
[0010] The present approach integrates the usage of two pairs of sequencing primers for each of the strand of a given portion of a polynucleotide duplex of interest and allow the concomitant sequencing of the positive and negative strands.
[0011] The method of the invention allows the sequencing a plurality of polynucleotide molecules where specific adapters are ligated to double-stranded DNA molecule. The double stranded polynucleotide molecules are denatured after adapter sequence ligation and circularized.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] Fig. 1 shows the process of the invention where targeted DNA libraries are used to generate sense and anti-sense circular template used in rolling circle amplification producing DNA concatemers forming DNA nanoballs called rolonies. The generated rolonies are sequenced in segments capturing both added unique identifier information for each strand and the target DN A of interest.
[0013] Fig. 2 shows the sequential sequencing events for both the sense and antisense DNA. Primer 1 and 2 are used in a first round of sequencing generating sequencing reads A and C corresponding to the target DNA insert of the DNA library. The second round of sequencing utilizes primer 3 and 4 and are used on the same immobilized rolonies to generate sequencing reads B and D corresponding to the identifier region containing a unique molecular identifier (UMI) and barcode.
[0014] Fig. 3 shows the results obtained when using a library of human DNA to generate paired-end sequencing reads using the invention described. The DNA reads were generated using a sequence-by-synthesis platform capable of sequencing rolonies immobilized on a solid surface and following the invention described in Fig. 1. The amount and percentage of unique paired reads and paired-groups (repeats of unique pairs due to PCR
amplification of the DNA library) observed based on the number of sequencing reads analyzed: 15,612,769 reads (partial sequencing run analysis) for a set of pre-defined tiles on the flowcell is indicated.
DETAILED DESCRIPTION
[0015] The method of the invention may be used for target double stranded DNA nucleic acid library with a length of 50 to 2000 nucleotides. The target double stranded DNA nucleic acid library may be used as is i.e. as target double stranded DNA or may be obtained by segmentation/fragmentation of a double stranded DNA.
[0016] In the invention adapters are used which contain regions that allows for the circularization of the template DNA using guide oligonucleotide ligation approach. In the following, such adapter as referred to as sequencing regions A1, A2 and A3.
[0017] The adaptors also include a barcode region BR and a universal identifier region UMI, so that the sense and the anti- sense strands can be uniquely identified as pairs and a portion that allows the hybridization of primers allowing the sequencing of the DNA nanoballs/rolonies in multiple sections and in more than one for round of sequencing if required.
[0018] The circularized DNA template generated from both sense and anti-sense strands are used in rolling circle amplification (RCA) to generate multiple copies of DNA that are used for sequencing. The thus obtained copies of DNA concatemers are hereinafter referred to as “rolonies” or “DNA nanoballs”.
[0019] The circularized single-stranded DNA templates fragments from each strand are used to generate individual rolonies and therefore the positive and negative strands are located on different rolonies.
[0020] These of rolonies are preferable attached randomly to a solid surface for example via electrostatic charges on surfaces like polyamines, silicon dioxide, titanium, hexamethyldisilazane or others) via NHS ester-activated crosslinkers. Preferable the first portion of each polynucleotide molecule that generated a rolony (sense strand) is attached to a first location of the surface and the second portion of each polynucleotide molecule that generated a rolony (anti-sense strand) is attached to a second location of the surface. Each of the rolonies which comprises either the first or the second portion of the target polynucleotide molecule (sense and anti-sense) is sequenced in two segments sequentially. The first segment
reads the actual targeted DNA and the second segment, the information contained in the adaptor portion and containing the unique molecular identifier (UMI) and sample barcodes. These two sequences coming from the same rolonies are linked together by co-localization and merged into one unique DNA read. The segment sequences coming from rolonies originating from the same polynucleotide sequence (positive and negative strand), but located randomly on the surface are linked/paired by using the unique identifier contained in one of the adaptor.
Step a)
[0021] In step a), the target double stranded DNA nucleic acid library containing adaptor regions is denatured into a mixture of sense and anti-sense DNA single strands. In general, any double- stranded adapted DNA library containing fragmented targeted DNA region to be sequenced can be used as starting material for the method of the invention.
[0022] In a first embodiment, the target double stranded DNA nucleic acid library is obtained by fragmentation of a target double stranded DNA.
Step b
[0023] In step b), the sense and anti-sense DNA single strands are provided at the 3’ and 5’ ends with sequencing regions A1, A2 and A3, a barcode region BR and a universal identifier region UMI to obtain sense and anti- sense oligonucleotides having the general formula
(3’) A1-UMI-BR-A3 - sense DNA single strand A2 (5’)
(5’) A1-UMI-BR-A3 - anti sense DNA single strand A2 (3’) Wherein A1, A2 and A3 each comprise 5 to 50 nucleotides;
BR comprise 3 -20 nucleotides;
UMI comprise 9 to 15 nucleotides
[0024] The two adaptors flanking the target insert DNA consist of a spacer region serving as the hybridization of sequencing primers (A1) followed by a UMI region of n >8 nucleotide(s) UMI and a n>3 nucleotide barcode region followed by another spacer region
serving as the hybridization of second set of sequencing primers (Adaptor A3). The second adaptor contains spacer region serving as the hybridization of the third set of sequencing primers (Adapter A2) complete the library construct.
Step c
[0025] In step c) the mixture of the sense and anti-sense DNA oligonucleotides is divided into two fractions i.e. the double-stranded adapted DN A library is distributed into 2 tubes in equal amount and labeled sense and antisense.
Step d and e
[0026] First, the two mixtures of sense and anti-sense DNA oligonucleotides in the two fractions are provided with oligonucleotide guides comprising 5 to 50 nucleotides capable of binding to A1 and A2 of the same oligonucleotide to each fraction. One fraction receives a guide oligonucleotide complementary to the sense strand of A l and A2 and one fraction receives a guide oligonucleotide complementary to the anti-sense strand of A1 and A2.
Step f
[0027] In this step, the circular templates of each fraction (sense and anti-sense ) are multiplied into a series of DNA concatemers using Rolling Circle Amplification (RCA).
[0028] To this end, the DNA is heat denatured at 95C and cold shocked at 4C to anneal the bridge oligonucleotide onto the denatured single stranded DNA library. The bridge oligos are complementary to each extremity of the adapter region (A1 and A2), bringing the 5’ and 3’ end of the DNA library fragment in close proximity of one another.
[0029] Then, the DNA library is circularized by ligation with a DNA ligase like a T4 DNA ligase into circular template DNA library.
[0030] The circularization reaction is purified by treating the mixture with exonuclease I and III to eliminate the un-ligated non-circular DNA and excess bridge oligonucleotides.
[0031] Preferably, the purified single strand circular template is replicated by a polymerase capable of rolling circle amplification into a plurality of DNA concatemers forming a DNA nanoball or rolony. For this purpose, an oligonucleotide is used to prime the binding of the replicating enzyme and hybridized to the same regions used for the hybridization of the sequencing oligonucleotides.
[0032] An equal amount (1:1 ratio) of the sense and antisense of the RCA products (rolonies) are mixed and placed onto a modified positively charged solid surface like glass, plastic equivalent (cyclo olefin polymer or others) containing polyamines such as silicon dioxide, titanium, hexamethyldisilazane or others). The rolonies can interact to the surface via electrostatic charges or via NHS ester- activated crosslinkers.
Step g
[0033] In step g), the sequence information is obtained from the following nucleotides of the DNA concatemers from A3 in direction to A2 as sequence A using primer 1 from A2 in direction to A3 as sequence C using primer 2 from A1 in direction to A3 as sequence B using primer 3 from A3 in direction to A1 as sequence D using primer 4
[0034] The first segment sequencing of the targeted DNA region (sequence A and C of both sense and antisense rolonies is performed using two sets of sequencing primers (primers 1 & 2) complementary to A3 for the sense strand & A2 for the anti-sense strand respectively and flanking the insert regions.
[0035] The sequences A and C may have each a length of 50-2000 nucleotides whereas the sequences B and D may have each a length of 20 to 50 nucleotides.
[0036] After termination of the first reaction using ddNTP and or denaturating agent like betaine, the second segment sequencing of the barcode (BC) and UMI portion of both sense and antisense region is performed using two new sets of sequencing primers (primers 3 and 4) complementary to A1 for the sense strand & A3 for the anti-sense strand flanking the UMI/barcode region. The sequencing is performed using massively parallel sequencing by synthesis approach using fluorescently-labeled nucleotides.
Step h and i
[0037] Each sequencing round generates two set of reads (sense and antisense) for each rolony and four sequencing reads total for each paired rolonies (originating for the same double-stranded adapted DNA library portion). The thus obtained four sequence reads are then combined into the sequence of the target double stranded DNA nucleic acid library.
[0038] For this purpose, first the sequences A and B, which originate from the same rolonies and therefore co-localized on the surface using the rolony coordinates, are combined to generate the sequencing read AB for the sense strand. The same applies to sequences C and D to generate the sequencing read CD for the anti-sense strand. The sequencing read AB and CD contain the insert sequence and the barcode BR and UMI for the sense strand and the anti-sense respectively.
[0039] The sequences AB and CD are then paired using the sequence information of the UMIs to generate a consensus sequence of the target double stranded DNA nucleic acid library using information from both sense and anti-sense portions.
[0040] The pairing of matching of the sequences AB and CD may be performed by using the sequence information of the barcode BR and UMI with their barcode genetic sequence of consecutive nucleotide bases A, T, G, or C. Same barcode genetic sequences are assigned a partition ID using hash-map functions to indicate a unique “key” element. Sorting of such UMIs in single-cell RNA sequencing experiments is for example described in “ UMI- count modeling and differential expression analysis for single-cell RNA sequencing” by Chen et al. Genome Biology (2018). Further, the identification of barcodes for single cell genomics is described by Tambe et al. BMC Bioinformatics (2019) and an implementation of Hamming distance to sort similar dictionary entries is disclosed in “Perfect Hamming code with a hash table for faster genome mapping” by Takenaka et al. BMC Bioinformatics (2011).
EXAMPLES
[0041] A library of human DNA has been used for generating paired-end sequencing reads using the invention described. The DNA reads were generated using a sequence-by- synthesis platform capable of sequencing rolonies immobilized on a solid surface and following the invention described in Fig. 1.
[0042] An exemplary process according to the invention is shown in Fig. 1
[0043] Library DNA from targeted region consist of a targeted insert region depicted with a double strand region with solid and dotted line. The insert is flanked by a spacer region (A3) which is the position where sequencing primers 1 and 4 binds (Step 8 and 9). Next comes the 9 nucleotide UMI and 9 nucleotide Barcode region. A1 and A2 adapters are located at each extremity and complete the library construct.
[0044] Step 1: The double stranded library DNA is split into 2 tubes (sense and antisense) with equal amount.
[0045] Step 2: The double stranded library DNA is mixed with appropriate bridge oligonucleotide (anti-sense oligo for sense library strand and sense oligo for anti-sense library strand) and heat denatured at 95C and cold shocked at 4C to anneal the bridge onto the denatured single stranded library DNA.
[0046] Step 3: The denatured single stranded library DNA is circularized by ligation with T4 DNA Ligase.
[0047] Step 4: The ligation reaction mix is treated with Exonuclease I and III to eliminate the un-ligated non circular DNA and bridge oligonucleotide.
[0048] Step 5: Circularized single stranded DNA is purified with magnetic beads.
[0049] Step 6: Rolling Circle Amplification (RCA) is performed with oligonucleotide primers designed from either A1 or A2 adaptor region. RCA primer complementary to the sense strand is used for sense-strand circle and RCA primer complementary to the anti-sense is used for anti- sense- strand circle.
[0050] Step 7: The resulting RCA nanoball products are quantified by Qubit. The sense and the anti-sense RCA products are mixed in equal amount and place onto the flow cell for sequencing.
[0051] Step 8: 150-200 cycle 1st segment sequencing of the target insert region is performed with sequencing primers 1 and 2.
[0052] Step 9: 20 cycle of the 2nd segment sequencing of the UMI and Barcode region is performed with sequencing primers 3 and 4.
[0053] Step 10: Primary sequencing data analysis is performed to generate the DNA sequencing reads.
[0054] Step 11: Secondary sequencing data analysis is performed a. Combining the first and second sequencing reads originating from the same rolonies using the rolony coordinates on the flowcell (co-localization).
b. Pair the reads from two different rolonies originating from the same double stranded DNA (plus and minus strand) using the sequence information of the identifier region (Barcode and UMI) .
[0055] Step 12: Determined the amount of unique paired reads and paired-groups (repeats of unique pairs due to PCR amplification of the DNA library) observed for a set of pre-defined tiles on the flowcell
[0056] Step 13: Establishing a consensus sequence of the double-strand DNA library using information for both sense and anti-strand DNA (paired reads).
EXAMPLES
[0057] The construct and the primers used in the experiment according to the invention is depicted in Fig 4.
[0058] A1-A2 sense bridge is used as a splint-bridge to circularize the positive (sense) construct as well as a primer to perform the rolling circle amplification reaction.
[0059] A1-A2 antisense bridge is used as a splint-bridge to circularize the negative
(antisense) construct as well as a primer to perform the rolling circle amplification reaction.
[0060] Rolonies from (+) sense and (-) anti-sense circles are loaded onto the flowcell in a 1:1 equal ratio for a sequential paired-end sequencing.
[0061] 150 cycle 1st segment sequencing of the target insert region is performed with target insert sense-minus-2 primer and target insert antisense-minus-2 primer.
[0062] 20 cycle of the 2nd segment sequencing of the UMI and Barcode region is performed with UMI/BC-sense-0 primer and UMI/BC-antisense-0 primer.
[0063] Primary sequencing data analysis is performed to generate the DNA sequencing reads.
[0064] Secondary sequencing data analysis is performed
[0065] Combining the first and second sequencing reads originating from the same rolonies using the rolony coordinates on the flowcell (co-localization).
[0066] Pair the reads from two different rolonies originating from the same double stranded DNA (+) sense and (-) antisense strand using the sequence information of the identifier region (Barcode and UMI) .
[0067] Determined the amount of unique paired reads and paired-groups (repeats of unique pairs due to PCR amplification of the DNA library) observed for a set of pre-defined tiles on the flowcell
[0068] Establishing a consensus sequence of the double-strand DNA library using information for both sense and anti-strand DNA (paired reads).
[0069] Sequencing result is shown in Fig 3 and Fig 5. Fig 3 shows the results obtained when using a library of E.coli shotgun library DNA to generate paired-end sequencing reads using the invention described. For demonstration, 22 tiles out of 759 total tiles were analyzed. The amount and percentage of unique paired reads and paired-groups (repeats of unique pairs due to PCR amplification of the DNA library) observed based on the number of sequencing reads analyzed:
[0070] 15,612,769 reads (partial sequencing run analysis) for a set of pre-defined tiles on the flowcell is indicated.
[0071] Number of unique pairs: 2,708,170
[0072] Number of copies: 9,915,542
[0073] Unique (+) strands identified: 1,256524
[0074] Unique (-) strands identified: 2,304,544
[0075] Percent of paired reads: 35.69%
[0076] Percent of paired-groups: 63.51%
Claims
1. A method for obtaining the sequence of both strands of a DNA nucleic acid library characterized by the steps a. denaturation the target double stranded DNA nucleic acid library into a mixture of sense and anti-sense DNA single strands b. providing the sense and anti-sense DNA single strands at the 3’ and 5’ ends with sequencing regions A1, A2 and A3, a barcode region BR and a universal identifier region UMI to obtain sense and anti-sense oligonucleotides having the general formula.
(5’) A1-UMI-BR-A3 - sense DNA single strand- A2 (3’)
(3’) A1-UMI-BR-A3 - anti sense DNA single strand- A2 (5’) Wherein A1, A2 and A3 each comprise 5 to 50 nucleotides;
BR comprise 3 -20 nucleotides;
UMI comprise 9 to 15 nucleotides c. dividing the mixture of the sense and anti-sense DNA oligonucleotides into two fractions d. providing oligonucleotide guides comprising 5 to 50 nucleotides capable of binding to A1 and A2 of the same oligonucleotide to each fraction e. circularizing and the sense and anti-sense DNA oligonucleotides by ligation with a DNA ligase into circular templates f. multiplying the circular templates of each fraction into DNA concatemers, combining the fractions and localizing the DNA concatemers on a surface g. determining the following sequences of nucleotides of the DNA concatemers from A3 in direction to A2 as sequence A from A2 in direction to A3 as sequence C from A1 in direction to A3 as sequence B from A3 in direction to A1 as sequence D h. merging the sequences A and B to generate sequence AB and sequences C and D to generate sequence CD by colocalization using solid surface rolony coordinates i. pairing the sequences AB and CD by matching the sequence information of the barcode region BR and universal identifier region UMI.
2. Method according to claim 1 characterized in that the target double stranded DNA nucleic acid library has a length of 50 to 2000 nucleotides.
3. Method according to claim 1 characterized in that the target double stranded DNA nucleic acid library is obtained by segmentation/fragmentation of a target double stranded DNA.
4. Method according to any of the claims 1 to 3 characterized in that the DNA concatemers are localized on a positively charged surface.
5. Method according to claim 4 characterized in that the DNA concatemers interact to the surface via electrostatic charges or via NHS ester- activated crosslinkers.
6. Method according to any of the claims 1 to 5 characterized in that the sequences of nucleotides sequences of the DNA concatemers are determined by sequencing by synthesis using fluorescently labeled oligonucleotides.
7. Method according to any of the claims 1 to 6 characterized in that the sequences A and C have each a length of 50-2000 nucleotides.
8. Method according to any of the claims 1 to 7 characterized in that the sequences B and D have each a length of 20 to 50 nucleotides.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20207533.9 | 2020-11-13 | ||
EP20207533 | 2020-11-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022101162A1 true WO2022101162A1 (en) | 2022-05-19 |
Family
ID=73448923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2021/081027 WO2022101162A1 (en) | 2020-11-13 | 2021-11-09 | Paired end sequential sequencing based on rolling circle amplification |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022101162A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015188192A2 (en) * | 2014-06-06 | 2015-12-10 | Cornell University | Method for identification and enumeration of nucleic acid sequence, expression, copy, or dna methylation changes, using combined nuclease, ligase, polymerase, and sequencing reactions |
WO2018114706A1 (en) * | 2016-12-20 | 2018-06-28 | F. Hoffmann-La Roche Ag | Single stranded circular dna libraries for circular consensus sequencing |
WO2019117714A1 (en) * | 2017-12-11 | 2019-06-20 | Umc Utrecht Holding B.V. | Methods for preparing nucleic acid molecules for sequencing |
WO2020180813A1 (en) * | 2019-03-06 | 2020-09-10 | Qiagen Sciences, Llc | Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing |
-
2021
- 2021-11-09 WO PCT/EP2021/081027 patent/WO2022101162A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015188192A2 (en) * | 2014-06-06 | 2015-12-10 | Cornell University | Method for identification and enumeration of nucleic acid sequence, expression, copy, or dna methylation changes, using combined nuclease, ligase, polymerase, and sequencing reactions |
WO2018114706A1 (en) * | 2016-12-20 | 2018-06-28 | F. Hoffmann-La Roche Ag | Single stranded circular dna libraries for circular consensus sequencing |
WO2019117714A1 (en) * | 2017-12-11 | 2019-06-20 | Umc Utrecht Holding B.V. | Methods for preparing nucleic acid molecules for sequencing |
WO2020180813A1 (en) * | 2019-03-06 | 2020-09-10 | Qiagen Sciences, Llc | Compositions and methods for adaptor design and nucleic acid library construction for rolony-based sequencing |
Non-Patent Citations (4)
Title |
---|
CHEN ET AL.: "UMI-count modeling and differential expression analysis for single-cell RNA sequencing", GENOME BIOLOGY, 2018 |
EDWARDS ET AL., GENOMICS, vol. 26, 1995, pages 345 - 353 |
TAKENAKA ET AL.: "Perfect Hamming code with a hash table for faster genome mapping", BMC BIOINFORMATICS, 2011 |
TAMBE ET AL., BMC BIOINFORMATICS, 2019 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210363570A1 (en) | Method for increasing throughput of single molecule sequencing by concatenating short dna fragments | |
US10253363B2 (en) | Materials and methods to analyze RNA isoforms in transcriptomes | |
US9334532B2 (en) | Complexity reduction method | |
US8975028B2 (en) | Method for the identification of the clonal source of a restriction fragment | |
US20110257031A1 (en) | Nucleic acid, biomolecule and polymer identifier codes | |
CN110291207A (en) | Bar coded DNA for long-range sequencing | |
EP3956445B1 (en) | Multiplex assembly of nucleic acid molecules | |
US20140336058A1 (en) | Method and kit for characterizing rna in a composition | |
CN112513268A (en) | Methods and compositions for tracking the source of nucleic acid fragments for nucleic acid sequencing | |
CN108359723B (en) | Method for reducing deep sequencing errors | |
EP2333104A1 (en) | RNA analytics method | |
CN114207229A (en) | Flexible and high throughput sequencing of target genomic regions | |
US20190218606A1 (en) | Methods of reducing errors in deep sequencing | |
WO2022101162A1 (en) | Paired end sequential sequencing based on rolling circle amplification | |
EP2456892B1 (en) | Method for sequencing a polynucleotide template | |
US20240011020A1 (en) | Sequencing oligonucleotides and methods of use thereof | |
US20240352507A1 (en) | Method for increasing throughput of single molecule sequencing by concatenating short dna fragments | |
EP4259826A1 (en) | Methods for sequencing polynucleotide fragments from both ends | |
CN115279918A (en) | Novel nucleic acid template structure for sequencing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21806733 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21806733 Country of ref document: EP Kind code of ref document: A1 |