WO2023194331A1 - CONSTRUCTION OF SEQUENCING LIBRARIES FROM A RIBONUCLEIC ACID (RNA) USING TAILING AND LIGATION OF cDNA (TLC) - Google Patents
CONSTRUCTION OF SEQUENCING LIBRARIES FROM A RIBONUCLEIC ACID (RNA) USING TAILING AND LIGATION OF cDNA (TLC) Download PDFInfo
- Publication number
- WO2023194331A1 WO2023194331A1 PCT/EP2023/058731 EP2023058731W WO2023194331A1 WO 2023194331 A1 WO2023194331 A1 WO 2023194331A1 EP 2023058731 W EP2023058731 W EP 2023058731W WO 2023194331 A1 WO2023194331 A1 WO 2023194331A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rna
- sequencing
- strand cdna
- domain
- sample
- Prior art date
Links
- 229920002477 rna polymer Polymers 0.000 title claims abstract description 273
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 191
- 239000002299 complementary DNA Substances 0.000 title claims description 148
- 238000010276 construction Methods 0.000 title description 2
- 238000000034 method Methods 0.000 claims abstract description 162
- 230000003321 amplification Effects 0.000 claims description 89
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 89
- 102000039446 nucleic acids Human genes 0.000 claims description 85
- 108020004707 nucleic acids Proteins 0.000 claims description 85
- 150000007523 nucleic acids Chemical class 0.000 claims description 85
- 239000000523 sample Substances 0.000 claims description 82
- 230000027455 binding Effects 0.000 claims description 79
- 108091034117 Oligonucleotide Proteins 0.000 claims description 76
- 238000010804 cDNA synthesis Methods 0.000 claims description 66
- 125000003729 nucleotide group Chemical group 0.000 claims description 62
- 239000002773 nucleotide Substances 0.000 claims description 60
- 239000011324 bead Substances 0.000 claims description 47
- 239000012634 fragment Substances 0.000 claims description 44
- 239000007790 solid phase Substances 0.000 claims description 35
- 239000012472 biological sample Substances 0.000 claims description 32
- 230000004048 modification Effects 0.000 claims description 32
- 238000012986 modification Methods 0.000 claims description 32
- 230000000295 complement effect Effects 0.000 claims description 29
- 108020004999 messenger RNA Proteins 0.000 claims description 29
- 108020005093 RNA Precursors Proteins 0.000 claims description 28
- 239000002243 precursor Substances 0.000 claims description 28
- 238000009396 hybridization Methods 0.000 claims description 26
- 238000010839 reverse transcription Methods 0.000 claims description 26
- 102000044126 RNA-Binding Proteins Human genes 0.000 claims description 25
- 108090000623 proteins and genes Proteins 0.000 claims description 24
- 238000012360 testing method Methods 0.000 claims description 24
- 230000005291 magnetic effect Effects 0.000 claims description 22
- 239000003795 chemical substances by application Substances 0.000 claims description 20
- 101710159080 Aconitate hydratase A Proteins 0.000 claims description 18
- 101710159078 Aconitate hydratase B Proteins 0.000 claims description 18
- 101710105008 RNA-binding protein Proteins 0.000 claims description 18
- 108091028664 Ribonucleotide Proteins 0.000 claims description 18
- 239000002336 ribonucleotide Substances 0.000 claims description 18
- 125000002652 ribonucleotide group Chemical group 0.000 claims description 18
- 239000000872 buffer Substances 0.000 claims description 17
- -1 RNAse I Proteins 0.000 claims description 14
- 102000004169 proteins and genes Human genes 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 10
- 239000000758 substrate Substances 0.000 claims description 10
- 101710163270 Nuclease Proteins 0.000 claims description 9
- 108020002230 Pancreatic Ribonuclease Proteins 0.000 claims description 6
- 102000005891 Pancreatic ribonuclease Human genes 0.000 claims description 6
- 150000001768 cations Chemical class 0.000 claims description 5
- 230000008488 polyadenylation Effects 0.000 claims description 5
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 claims description 5
- 239000004365 Protease Substances 0.000 claims description 4
- 239000003599 detergent Substances 0.000 claims description 4
- 108091005804 Peptidases Proteins 0.000 claims description 3
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 claims description 3
- 102000008579 Transposases Human genes 0.000 claims description 3
- 108010020764 Transposases Proteins 0.000 claims description 3
- FYYHWMGAXLPEAU-UHFFFAOYSA-N Magnesium Chemical compound [Mg] FYYHWMGAXLPEAU-UHFFFAOYSA-N 0.000 claims description 2
- 108010046983 Ribonuclease T1 Proteins 0.000 claims description 2
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 claims description 2
- 238000009835 boiling Methods 0.000 claims description 2
- 239000013043 chemical agent Substances 0.000 claims description 2
- 239000011777 magnesium Substances 0.000 claims description 2
- 229910052749 magnesium Inorganic materials 0.000 claims description 2
- 238000005406 washing Methods 0.000 claims description 2
- 239000011701 zinc Substances 0.000 claims description 2
- 229910052725 zinc Inorganic materials 0.000 claims description 2
- 102000007474 Multiprotein Complexes Human genes 0.000 claims 1
- 108010085220 Multiprotein Complexes Proteins 0.000 claims 1
- 210000004027 cell Anatomy 0.000 description 65
- 238000006243 chemical reaction Methods 0.000 description 63
- 239000011541 reaction mixture Substances 0.000 description 51
- 102100034343 Integrase Human genes 0.000 description 39
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 39
- 239000000047 product Substances 0.000 description 39
- 239000000203 mixture Substances 0.000 description 37
- 101710086015 RNA ligase Proteins 0.000 description 34
- 238000000746 purification Methods 0.000 description 32
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 30
- 102100033215 DNA nucleotidylexotransferase Human genes 0.000 description 30
- 238000012217 deletion Methods 0.000 description 28
- 230000037430 deletion Effects 0.000 description 28
- 108020004414 DNA Proteins 0.000 description 23
- 239000000126 substance Substances 0.000 description 23
- 210000001519 tissue Anatomy 0.000 description 23
- 230000005855 radiation Effects 0.000 description 21
- 238000003752 polymerase chain reaction Methods 0.000 description 18
- 238000013459 approach Methods 0.000 description 17
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 15
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 14
- 230000015572 biosynthetic process Effects 0.000 description 14
- 230000000875 corresponding effect Effects 0.000 description 14
- 238000012155 cross-linking immunoprecipitation Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 14
- 238000002360 preparation method Methods 0.000 description 14
- 238000004132 cross linking Methods 0.000 description 13
- 238000002264 polyacrylamide gel electrophoresis Methods 0.000 description 12
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 11
- 238000012408 PCR amplification Methods 0.000 description 11
- 239000013614 RNA sample Substances 0.000 description 11
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 10
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 10
- 238000003860 storage Methods 0.000 description 10
- 238000003786 synthesis reaction Methods 0.000 description 10
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 10
- 102000053602 DNA Human genes 0.000 description 9
- LYCAIKOWRPUZTN-UHFFFAOYSA-N Ethylene glycol Chemical compound OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 9
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 9
- KRKNYBCHXYNGOX-UHFFFAOYSA-N citric acid Chemical compound OC(=O)CC(O)(C(O)=O)CC(O)=O KRKNYBCHXYNGOX-UHFFFAOYSA-N 0.000 description 9
- 238000013467 fragmentation Methods 0.000 description 9
- 238000006062 fragmentation reaction Methods 0.000 description 9
- 239000003002 pH adjusting agent Substances 0.000 description 9
- 108010067770 Endopeptidase K Proteins 0.000 description 8
- 239000003153 chemical reaction reagent Substances 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 230000014509 gene expression Effects 0.000 description 8
- 239000003112 inhibitor Substances 0.000 description 8
- 239000000463 material Substances 0.000 description 8
- 239000004055 small Interfering RNA Substances 0.000 description 8
- 241000894007 species Species 0.000 description 8
- 108020004635 Complementary DNA Proteins 0.000 description 7
- 102000004190 Enzymes Human genes 0.000 description 7
- 108090000790 Enzymes Proteins 0.000 description 7
- 108700020471 RNA-Binding Proteins Proteins 0.000 description 7
- 108020004459 Small interfering RNA Proteins 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 229940088598 enzyme Drugs 0.000 description 7
- 230000003993 interaction Effects 0.000 description 7
- 239000002609 medium Substances 0.000 description 7
- 230000037452 priming Effects 0.000 description 7
- 239000003161 ribonuclease inhibitor Substances 0.000 description 7
- 150000003839 salts Chemical class 0.000 description 7
- 238000010008 shearing Methods 0.000 description 7
- 239000007787 solid Substances 0.000 description 7
- VEXZGXHMUGYJMC-UHFFFAOYSA-N Hydrochloric acid Chemical compound Cl VEXZGXHMUGYJMC-UHFFFAOYSA-N 0.000 description 6
- 239000000020 Nitrocellulose Substances 0.000 description 6
- NBIIXXVUZAFLBC-UHFFFAOYSA-N Phosphoric acid Chemical compound OP(O)(O)=O NBIIXXVUZAFLBC-UHFFFAOYSA-N 0.000 description 6
- 108020004682 Single-Stranded DNA Proteins 0.000 description 6
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 6
- 230000006154 adenylylation Effects 0.000 description 6
- 210000004369 blood Anatomy 0.000 description 6
- 239000008280 blood Substances 0.000 description 6
- 210000001124 body fluid Anatomy 0.000 description 6
- 239000007853 buffer solution Substances 0.000 description 6
- 239000013592 cell lysate Substances 0.000 description 6
- 230000005670 electromagnetic radiation Effects 0.000 description 6
- 230000002255 enzymatic effect Effects 0.000 description 6
- KWIUHFFTVRNATP-UHFFFAOYSA-N glycine betaine Chemical compound C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 6
- 238000012158 iCLIP Methods 0.000 description 6
- KWGKDLIKAYFUFQ-UHFFFAOYSA-M lithium chloride Chemical compound [Li+].[Cl-] KWGKDLIKAYFUFQ-UHFFFAOYSA-M 0.000 description 6
- 229910052751 metal Inorganic materials 0.000 description 6
- 239000002184 metal Substances 0.000 description 6
- 229920001220 nitrocellulos Polymers 0.000 description 6
- 230000002441 reversible effect Effects 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 6
- 238000009966 trimming Methods 0.000 description 6
- 239000011534 wash buffer Substances 0.000 description 6
- 108700011259 MicroRNAs Proteins 0.000 description 5
- 108010083644 Ribonucleases Proteins 0.000 description 5
- 102000006382 Ribonucleases Human genes 0.000 description 5
- 239000010839 body fluid Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000011109 contamination Methods 0.000 description 5
- 230000009089 cytolysis Effects 0.000 description 5
- 238000006911 enzymatic reaction Methods 0.000 description 5
- 238000001114 immunoprecipitation Methods 0.000 description 5
- 239000006166 lysate Substances 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 239000002679 microRNA Substances 0.000 description 5
- 108020004418 ribosomal RNA Proteins 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 239000006228 supernatant Substances 0.000 description 5
- 238000007671 third-generation sequencing Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 108091093088 Amplicon Proteins 0.000 description 4
- 208000019300 CLIPPERS Diseases 0.000 description 4
- 102000016911 Deoxyribonucleases Human genes 0.000 description 4
- 108010053770 Deoxyribonucleases Proteins 0.000 description 4
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 4
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 4
- 241000124008 Mammalia Species 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 description 4
- 239000002202 Polyethylene glycol Substances 0.000 description 4
- 108091034057 RNA (poly(A)) Proteins 0.000 description 4
- 230000026279 RNA modification Effects 0.000 description 4
- 108020004566 Transfer RNA Proteins 0.000 description 4
- 208000021930 chronic lymphocytic inflammation with pontine perivascular enhancement responsive to steroids Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 239000012530 fluid Substances 0.000 description 4
- 238000003505 heat denaturation Methods 0.000 description 4
- 238000010438 heat treatment Methods 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- 239000007788 liquid Substances 0.000 description 4
- 230000001404 mediated effect Effects 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 210000000056 organ Anatomy 0.000 description 4
- 229920001223 polyethylene glycol Polymers 0.000 description 4
- 210000002966 serum Anatomy 0.000 description 4
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 4
- 239000001226 triphosphate Substances 0.000 description 4
- 235000011178 triphosphate Nutrition 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- DNIAPMSPPWPWGF-GSVOUGTGSA-N (R)-(-)-Propylene glycol Chemical compound C[C@@H](O)CO DNIAPMSPPWPWGF-GSVOUGTGSA-N 0.000 description 3
- CSCPPACGZOOCGX-UHFFFAOYSA-N Acetone Chemical compound CC(C)=O CSCPPACGZOOCGX-UHFFFAOYSA-N 0.000 description 3
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 3
- 108091032955 Bacterial small RNA Proteins 0.000 description 3
- 229920002307 Dextran Polymers 0.000 description 3
- 229920001917 Ficoll Polymers 0.000 description 3
- 101000665452 Homo sapiens RNA binding protein fox-1 homolog 2 Proteins 0.000 description 3
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 3
- 241000713869 Moloney murine leukemia virus Species 0.000 description 3
- 229910019142 PO4 Inorganic materials 0.000 description 3
- 239000004743 Polypropylene Substances 0.000 description 3
- 229920001213 Polysorbate 20 Polymers 0.000 description 3
- 102100038187 RNA binding protein fox-1 homolog 2 Human genes 0.000 description 3
- 238000003559 RNA-seq method Methods 0.000 description 3
- 108091012456 T4 RNA ligase 1 Proteins 0.000 description 3
- 239000000654 additive Substances 0.000 description 3
- 229910000147 aluminium phosphate Inorganic materials 0.000 description 3
- 229960003237 betaine Drugs 0.000 description 3
- 239000012148 binding buffer Substances 0.000 description 3
- 230000033228 biological regulation Effects 0.000 description 3
- 210000001772 blood platelet Anatomy 0.000 description 3
- 201000011510 cancer Diseases 0.000 description 3
- 239000004202 carbamide Substances 0.000 description 3
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 239000005547 deoxyribonucleotide Substances 0.000 description 3
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 239000000975 dye Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002496 gastric effect Effects 0.000 description 3
- 239000000499 gel Substances 0.000 description 3
- 229920001519 homopolymer Polymers 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 239000012139 lysis buffer Substances 0.000 description 3
- 238000002844 melting Methods 0.000 description 3
- 230000008018 melting Effects 0.000 description 3
- 239000012528 membrane Substances 0.000 description 3
- 230000011987 methylation Effects 0.000 description 3
- 238000007069 methylation reaction Methods 0.000 description 3
- DNIAPMSPPWPWGF-UHFFFAOYSA-N monopropylene glycol Natural products CC(O)CO DNIAPMSPPWPWGF-UHFFFAOYSA-N 0.000 description 3
- 239000002777 nucleoside Substances 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 239000010452 phosphate Substances 0.000 description 3
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 3
- 230000026731 phosphorylation Effects 0.000 description 3
- 238000006366 phosphorylation reaction Methods 0.000 description 3
- 210000002381 plasma Anatomy 0.000 description 3
- 229920000642 polymer Polymers 0.000 description 3
- 102000040430 polynucleotide Human genes 0.000 description 3
- 108091033319 polynucleotide Proteins 0.000 description 3
- 239000002157 polynucleotide Substances 0.000 description 3
- 239000000256 polyoxyethylene sorbitan monolaurate Substances 0.000 description 3
- 235000010486 polyoxyethylene sorbitan monolaurate Nutrition 0.000 description 3
- 229920001155 polypropylene Polymers 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 229960004063 propylene glycol Drugs 0.000 description 3
- 235000013772 propylene glycol Nutrition 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 239000011780 sodium chloride Substances 0.000 description 3
- 238000010561 standard procedure Methods 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- JYCQQPHGFMYQCF-UHFFFAOYSA-N 4-tert-Octylphenol monoethoxylate Chemical compound CC(C)(C)CC(C)(C)C1=CC=C(OCCO)C=C1 JYCQQPHGFMYQCF-UHFFFAOYSA-N 0.000 description 2
- JWBWJOKTZVXSRT-DWQAGKKUSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]-2-aminopentanoic acid Chemical compound N1C(=O)N[C@@H]2[C@H](CCCC(N)C(O)=O)SC[C@@H]21 JWBWJOKTZVXSRT-DWQAGKKUSA-N 0.000 description 2
- 229930024421 Adenine Natural products 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 2
- 108010077544 Chromatin Proteins 0.000 description 2
- 108091028732 Concatemer Proteins 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 2
- 108010042407 Endonucleases Proteins 0.000 description 2
- 101150118445 HNRNPC gene Proteins 0.000 description 2
- 102000010029 Homer Scaffolding Proteins Human genes 0.000 description 2
- 108010077223 Homer Scaffolding Proteins Proteins 0.000 description 2
- VQAYFKKCNSOZKM-IOSLPCCCSA-N N(6)-methyladenosine Chemical compound C1=NC=2C(NC)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O VQAYFKKCNSOZKM-IOSLPCCCSA-N 0.000 description 2
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 2
- 108091007412 Piwi-interacting RNA Proteins 0.000 description 2
- 101710124239 Poly(A) polymerase Proteins 0.000 description 2
- 229920002565 Polyethylene Glycol 400 Polymers 0.000 description 2
- 229920002594 Polyethylene Glycol 8000 Polymers 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 108091007415 Small Cajal body-specific RNA Proteins 0.000 description 2
- 102000039471 Small Nuclear RNA Human genes 0.000 description 2
- 108020003224 Small Nucleolar RNA Proteins 0.000 description 2
- 102000042773 Small Nucleolar RNA Human genes 0.000 description 2
- 108091060271 Small temporal RNA Proteins 0.000 description 2
- WYURNTSHIVDZCO-UHFFFAOYSA-N Tetrahydrofuran Chemical class C1CCOC1 WYURNTSHIVDZCO-UHFFFAOYSA-N 0.000 description 2
- 208000007536 Thrombosis Diseases 0.000 description 2
- 108091032917 Transfer-messenger RNA Proteins 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- 230000021736 acetylation Effects 0.000 description 2
- 238000006640 acetylation reaction Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 229960000643 adenine Drugs 0.000 description 2
- 230000001464 adherent effect Effects 0.000 description 2
- 238000001261 affinity purification Methods 0.000 description 2
- 229910052782 aluminium Inorganic materials 0.000 description 2
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006555 catalytic reaction Methods 0.000 description 2
- 210000003169 central nervous system Anatomy 0.000 description 2
- 238000007385 chemical modification Methods 0.000 description 2
- 210000003483 chromatin Anatomy 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 210000004748 cultured cell Anatomy 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 2
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 229960003964 deoxycholic acid Drugs 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- 210000003743 erythrocyte Anatomy 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 239000012145 high-salt buffer Substances 0.000 description 2
- 230000007062 hydrolysis Effects 0.000 description 2
- 238000006460 hydrolysis reaction Methods 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- DRAVOWXCEBXPTN-UHFFFAOYSA-N isoguanine Chemical compound NC1=NC(=O)NC2=C1NC=N2 DRAVOWXCEBXPTN-UHFFFAOYSA-N 0.000 description 2
- 238000005304 joining Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- QSHDDOUJBYECFT-UHFFFAOYSA-N mercury Chemical compound [Hg] QSHDDOUJBYECFT-UHFFFAOYSA-N 0.000 description 2
- 229910052753 mercury Inorganic materials 0.000 description 2
- 108091027963 non-coding RNA Proteins 0.000 description 2
- 102000042567 non-coding RNA Human genes 0.000 description 2
- 239000002853 nucleic acid probe Substances 0.000 description 2
- 150000003833 nucleoside derivatives Chemical class 0.000 description 2
- 230000003647 oxidation Effects 0.000 description 2
- 238000007254 oxidation reaction Methods 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000005298 paramagnetic effect Effects 0.000 description 2
- XEBWQGVWTUSTLN-UHFFFAOYSA-M phenylmercury acetate Chemical compound CC(=O)O[Hg]C1=CC=CC=C1 XEBWQGVWTUSTLN-UHFFFAOYSA-M 0.000 description 2
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 2
- 230000001402 polyadenylating effect Effects 0.000 description 2
- 230000001124 posttranscriptional effect Effects 0.000 description 2
- 238000001556 precipitation Methods 0.000 description 2
- 235000019419 proteases Nutrition 0.000 description 2
- RXWNCPJZOCPEPQ-NVWDDTSBSA-N puromycin Chemical compound C1=CC(OC)=CC=C1C[C@H](N)C(=O)N[C@H]1[C@@H](O)[C@H](N2C3=NC=NC(=C3N=C2)N(C)C)O[C@@H]1CO RXWNCPJZOCPEPQ-NVWDDTSBSA-N 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 210000003296 saliva Anatomy 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 108091029842 small nuclear ribonucleic acid Proteins 0.000 description 2
- FHHPUSMSKHSNKW-SMOYURAASA-M sodium deoxycholate Chemical compound [Na+].C([C@H]1CC2)[C@H](O)CC[C@]1(C)[C@@H]1[C@@H]2[C@@H]2CC[C@H]([C@@H](CCC([O-])=O)C)[C@@]2(C)[C@@H](O)C1 FHHPUSMSKHSNKW-SMOYURAASA-M 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 235000000346 sugar Nutrition 0.000 description 2
- 150000003573 thiols Chemical class 0.000 description 2
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000014616 translation Effects 0.000 description 2
- 210000002700 urine Anatomy 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- JWDFQMWEFLOOED-UHFFFAOYSA-N (2,5-dioxopyrrolidin-1-yl) 3-(pyridin-2-yldisulfanyl)propanoate Chemical compound O=C1CCC(=O)N1OC(=O)CCSSC1=CC=CC=N1 JWDFQMWEFLOOED-UHFFFAOYSA-N 0.000 description 1
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 1
- 150000003923 2,5-pyrrolediones Chemical class 0.000 description 1
- LFDLTLMEBDARLN-UHFFFAOYSA-N 2-[2-[2-[2-[2-[2-[2-[2-[2-(4-octylphenoxy)ethoxy]ethoxy]ethoxy]ethoxy]ethoxy]ethoxy]ethoxy]ethoxy]ethanol Chemical compound CCCCCCCCC1=CC=C(OCCOCCOCCOCCOCCOCCOCCOCCOCCO)C=C1 LFDLTLMEBDARLN-UHFFFAOYSA-N 0.000 description 1
- OSBLTNPMIGYQGY-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;2-[2-[bis(carboxymethyl)amino]ethyl-(carboxymethyl)amino]acetic acid;boric acid Chemical compound OB(O)O.OCC(N)(CO)CO.OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O OSBLTNPMIGYQGY-UHFFFAOYSA-N 0.000 description 1
- XQCZBXHVTFVIFE-UHFFFAOYSA-N 2-amino-4-hydroxypyrimidine Chemical compound NC1=NC=CC(O)=N1 XQCZBXHVTFVIFE-UHFFFAOYSA-N 0.000 description 1
- OTDJAMXESTUWLO-UUOKFMHZSA-N 2-amino-9-[(2R,3R,4S,5R)-3,4-dihydroxy-5-(hydroxymethyl)-2-oxolanyl]-3H-purine-6-thione Chemical compound C12=NC(N)=NC(S)=C2N=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OTDJAMXESTUWLO-UUOKFMHZSA-N 0.000 description 1
- FPQQSJJWHUJYPU-UHFFFAOYSA-N 3-(dimethylamino)propyliminomethylidene-ethylazanium;chloride Chemical compound Cl.CCN=C=NCCCN(C)C FPQQSJJWHUJYPU-UHFFFAOYSA-N 0.000 description 1
- ZLOIGESWDJYCTF-UHFFFAOYSA-N 4-Thiouridine Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=S)C=C1 ZLOIGESWDJYCTF-UHFFFAOYSA-N 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- ZLOIGESWDJYCTF-XVFCMESISA-N 4-thiouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=S)C=C1 ZLOIGESWDJYCTF-XVFCMESISA-N 0.000 description 1
- 208000035657 Abasia Diseases 0.000 description 1
- OIRDTQYFTABQOQ-KQYNXXCUSA-N Adenosine Natural products C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- 108091023043 Alu Element Proteins 0.000 description 1
- 241000269350 Anura Species 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 238000009020 BCA Protein Assay Kit Methods 0.000 description 1
- 108020000946 Bacterial DNA Proteins 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 108090000994 Catalytic RNA Proteins 0.000 description 1
- 102000053642 Catalytic RNA Human genes 0.000 description 1
- 239000004971 Cross linker Substances 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 1
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 1
- 102100031780 Endonuclease Human genes 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 102000010834 Extracellular Matrix Proteins Human genes 0.000 description 1
- 108010037362 Extracellular Matrix Proteins Proteins 0.000 description 1
- 244000104790 Gigantochloa maxima Species 0.000 description 1
- SXRSQZLOMIGNAQ-UHFFFAOYSA-N Glutaraldehyde Chemical compound O=CCCCC=O SXRSQZLOMIGNAQ-UHFFFAOYSA-N 0.000 description 1
- 108091029499 Group II intron Proteins 0.000 description 1
- 102000006479 Heterogeneous-Nuclear Ribonucleoproteins Human genes 0.000 description 1
- 108010019372 Heterogeneous-Nuclear Ribonucleoproteins Proteins 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 238000006736 Huisgen cycloaddition reaction Methods 0.000 description 1
- 206010062717 Increased upper airway secretion Diseases 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- 108091027974 Mature messenger RNA Proteins 0.000 description 1
- 229920006068 Minlon® Polymers 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 108091081548 Palindromic sequence Proteins 0.000 description 1
- 108090000526 Papain Proteins 0.000 description 1
- 229930040373 Paraformaldehyde Natural products 0.000 description 1
- 108091036407 Polyadenylation Proteins 0.000 description 1
- 239000004698 Polyethylene Substances 0.000 description 1
- 108010021757 Polynucleotide 5'-Hydroxyl-Kinase Proteins 0.000 description 1
- 102000008422 Polynucleotide 5'-hydroxyl-kinase Human genes 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 1
- 102100033154 Protein XRP2 Human genes 0.000 description 1
- 230000014632 RNA localization Effects 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 108091030145 Retron msr RNA Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 239000008051 TBE buffer Substances 0.000 description 1
- PZBFGYYEXUXCOF-UHFFFAOYSA-N TCEP Chemical compound OC(=O)CCP(CCC(O)=O)CCC(O)=O PZBFGYYEXUXCOF-UHFFFAOYSA-N 0.000 description 1
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- 101710120037 Toxin CcdB Proteins 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 108020005202 Viral DNA Proteins 0.000 description 1
- 108020000999 Viral RNA Proteins 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 241000269370 Xenopus <genus> Species 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- HAXFWIACAGNFHA-UHFFFAOYSA-N aldrithiol Chemical compound C=1C=CC=NC=1SSC1=CC=CC=N1 HAXFWIACAGNFHA-UHFFFAOYSA-N 0.000 description 1
- 150000001345 alkine derivatives Chemical class 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 210000004381 amniotic fluid Anatomy 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 150000001540 azides Chemical class 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 210000000941 bile Anatomy 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- OWMVSZAMULFTJU-UHFFFAOYSA-N bis-tris Chemical compound OCCN(CCO)C(CO)(CO)CO OWMVSZAMULFTJU-UHFFFAOYSA-N 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 239000006285 cell suspension Substances 0.000 description 1
- 108091092328 cellular RNA Proteins 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003196 chaotropic effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 125000003636 chemical group Chemical group 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 210000005266 circulating tumour cell Anatomy 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 1
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 1
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000009615 deamination Effects 0.000 description 1
- 238000006481 deamination reaction Methods 0.000 description 1
- 230000030609 dephosphorylation Effects 0.000 description 1
- 238000006209 dephosphorylation reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000009274 differential gene expression Effects 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 239000004205 dimethyl polysiloxane Substances 0.000 description 1
- 235000013870 dimethyl polysiloxane Nutrition 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 238000000295 emission spectrum Methods 0.000 description 1
- 230000002357 endometrial effect Effects 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 150000002148 esters Chemical class 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 210000001808 exosome Anatomy 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000002744 extracellular matrix Anatomy 0.000 description 1
- 210000004905 finger nail Anatomy 0.000 description 1
- 125000001153 fluoro group Chemical group F* 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- YQOKLYTXVFAUCW-UHFFFAOYSA-N guanidine;isothiocyanic acid Chemical compound N=C=S.NC(N)=N YQOKLYTXVFAUCW-UHFFFAOYSA-N 0.000 description 1
- 125000005179 haloacetyl group Chemical group 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 239000000815 hypotonic solution Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000007794 irritation Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000006317 isomerization reaction Methods 0.000 description 1
- 238000002032 lab-on-a-chip Methods 0.000 description 1
- 238000011005 laboratory method Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 239000011344 liquid material Substances 0.000 description 1
- 210000005228 liver tissue Anatomy 0.000 description 1
- 239000012160 loading buffer Substances 0.000 description 1
- 230000001926 lymphatic effect Effects 0.000 description 1
- 230000002934 lysing effect Effects 0.000 description 1
- 210000005075 mammary gland Anatomy 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000002175 menstrual effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000004770 neurodegeneration Effects 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 238000007899 nucleic acid hybridization Methods 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- CXQXSVUQTKDNFP-UHFFFAOYSA-N octamethyltrisiloxane Chemical compound C[Si](C)(C)O[Si](C)(C)O[Si](C)(C)C CXQXSVUQTKDNFP-UHFFFAOYSA-N 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 229910000489 osmium tetroxide Inorganic materials 0.000 description 1
- 239000012285 osmium tetroxide Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 235000019834 papain Nutrition 0.000 description 1
- 229940055729 papain Drugs 0.000 description 1
- 229920002866 paraformaldehyde Polymers 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- JLFNLZLINWHATN-UHFFFAOYSA-N pentaethylene glycol Chemical compound OCCOCCOCCOCCOCCO JLFNLZLINWHATN-UHFFFAOYSA-N 0.000 description 1
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 1
- 208000026435 phlegm Diseases 0.000 description 1
- 125000005642 phosphothioate group Chemical group 0.000 description 1
- 210000005059 placental tissue Anatomy 0.000 description 1
- 238000004987 plasma desorption mass spectroscopy Methods 0.000 description 1
- 239000004033 plastic Substances 0.000 description 1
- 229920003023 plastic Polymers 0.000 description 1
- 229920000435 poly(dimethylsiloxane) Polymers 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000000379 polymerizing effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 230000033117 pseudouridine synthesis Effects 0.000 description 1
- 229950010131 puromycin Drugs 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000013442 quality metrics Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 210000005084 renal tissue Anatomy 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000001177 retroviral effect Effects 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 239000012723 sample buffer Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000007790 scraping Methods 0.000 description 1
- 210000000582 semen Anatomy 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 150000008163 sugars Chemical class 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 125000004434 sulfur atom Chemical group 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 210000001179 synovial fluid Anatomy 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- 108091035539 telomere Proteins 0.000 description 1
- 102000055501 telomere Human genes 0.000 description 1
- 210000003411 telomere Anatomy 0.000 description 1
- YLQBMQCUIZJEEH-UHFFFAOYSA-N tetrahydrofuran Natural products C=1C=COC=1 YLQBMQCUIZJEEH-UHFFFAOYSA-N 0.000 description 1
- 125000003396 thiol group Chemical group [H]S* 0.000 description 1
- ATGUDZODTABURZ-UHFFFAOYSA-N thiolan-2-ylideneazanium;chloride Chemical compound Cl.N=C1CCCS1 ATGUDZODTABURZ-UHFFFAOYSA-N 0.000 description 1
- 210000000115 thoracic cavity Anatomy 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 238000005820 transferase reaction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
- 230000002861 ventricular Effects 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 238000003260 vortexing Methods 0.000 description 1
- DGVVWUTYPXICAM-UHFFFAOYSA-N β‐Mercaptoethanol Chemical compound OCCS DGVVWUTYPXICAM-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- a nucleotide sequence may be partially complementary to a target, in which not all nucleotides are complementary to every nucleotide in the target nucleic acid in all the corresponding positions.
- a primer may be perfectly i.e., 100%) complementary to the target nucleic acid, or the primer and the target nucleic acid may share some degree of complementarity which is less than perfect (e.g., 70%, 75%, 85%, 90%, 95%, 99%).
- the percent identity of two nucleotide sequences can be determined by aligning the sequences for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first sequence for optimal alignment).
- hybridization conditions means conditions in which a primer specifically hybridizes to a region of the target nucleic acid (e.g., a template RNA or other region of the double stranded product nucleic acid). Whether a primer specifically hybridizes to a target nucleic acid is determined by such factors as the degree of complementarity between the polymer and the target nucleic acid and the temperature at which the hybridization occurs, which may be informed by the melting temperature (Tu) of the primer.
- the melting temperature refers to the temperature at which half of the primer-target nucleic acid duplexes remain hybridized and half of the duplexes dissociate into single strands.
- poly(A) is a polyA-sequence.
- the poly(A) sequence is commonly known as a tail that consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA or DNA that has adenine bases.
- polyadenylation is part of the process that produces mature messenger RNA (mRNA) for translations.
- antibody refers to a type of protein that can specifically recognize and bind to a particular antigen, such as a protein, peptide, or specific nucleic acid modification.
- non-template ribonucleotides refers to the terminal transferase catalyzed addition of ribonucleotide to the 3’ end of solid-phase first strand cDNA without base-pairing to a template strand.
- homogenucleotide stretch refers to a stretch of length of nucleic acid made up of the same nucleotide (e.g., all dCTP, all dGTP, all dTTP, all dATP, all CTP, all GTP, all UTP, or all ATP).
- the invention provides methods of preparing a sequencing library from a ribonucleic acid (RNA) sample.
- Sequencing libraries produced by methods of the invention are those whose nucleic acid members include a partial or complete sequencing platform adapter sequence at their termini useful for sequencing using a sequencing platform of interest.
- Sequencing platforms of interest include, but are not limited to, HiSeq, MiSeq, NextSeq and NovaSeq sequencing systems from Illumina®; the PACBIO RS II Sequel systems form Pacific Biosciences; the SOLiD sequencing systems from Life TechnologiesTM; the MinlONTM, GridlONTM and PromethlONTM system from Oxford Nanopore, or any other sequencing platform of interest.
- RNA sample RNA sample
- first strand cDNA synthesis primer contains primer binding domains and is complementary to sequences within the RNA sample itself
- first strand cDNA synthesis primer is complementary to specific sequences that contain primer binding domains and were added to the 3’ end of an RNA precursor prior to cDNA synthesis.
- test sample comprising a plurality of template RNA or RNA precursors
- each of the first adapters comprises (i) a 5’ primer binding domain and a 3’ poly A domain or (ii) a 5’ poly A domain and a 3’ primer binding domain; each of the first strand cDNA synthesis primers comprises an RNA hybridization domain complementary to the template RNA or to the first adapters (e.g., oligo(dT)), and said each of the first strand cDNA synthesis primers is covalently linked to magnetic beads; each of the second adapters comprises primer binding sites, and each of the amplification primers comprises sequencing platform adapter constructs;
- each template RNA of step (a) already contains a known sequence, such as 3’ polyA domain or specific target sequences of interest as primer binding domain, to serve as hybridization domain; therefore it does not require the ligation of first adapters to the 3’ end of each template RNA of step (c).
- the precursor RNA is fragmented.
- nucleotides are added to the 3’ end of the precursor RNA through polyadenylation or ligation.
- each of the first adapters and/or each of the first strand cDNA synthesis primers further comprise a sample barcode and/or unique molecular identifier.
- each of the second adapters further comprises a sample barcode and/or unique molecular identifier.
- each of the second adapters further comprises a sequencing read primer domain.
- each of the first strand cDNA synthesis primers is not covalently linked to magnetic beads.
- the method further comprises tagmenting the plurality of the double stranded cDNA of step (h) with transposomes to generate a tagmented sample, wherein the transposomes comprise a transposase and a transposon nucleic acid; and wherein the transposon nucleic acid comprises a transposon end domain and a second post-tagmentation amplification primer binding domain.
- the RNA hybridization domain comprises a heteronucleotide stretch.
- any of the provided oligonucleotide adapters comprise one or more nucleotide analogs.
- the template RNA or the RNA precursor is messenger RNA.
- the RNA hybridization domain of each of the first strand cDNA synthesis primers is primed using randomers.
- the method further comprises pooling the plurality of first adapters ligated to the plurality of RNA precursors.
- the test sample comprising a plurality of template RNA or precursor RNA is obtained from a single cell.
- the method further comprises subjecting the sequencing library to a sequencing protocol.
- the method further comprises quantitating one or more RNA species of the test sample.
- the resultant first-strand cDNA is separated from the template RNA and is contacted with one source of nucleotide triphosphates (e.g., ATP) (not shown), a terminal transferase (not shown), T4 RNA ligase (not shown) and a second adapter oligonucleotide which includes a primer binding domain (PBS) under tailing and ligation conditions, e.g., conditions sufficient to ribotail and ligate the 3’ end of cDNA, which includes the addition of non-template ribonucleotides to the 3’ end of cDNA (depicted as (rA)i-s) followed by ligation of the second adapter oligonucleotide to the 3’ end of cDNA.
- nucleotide triphosphates e.g., ATP
- terminal transferase not shown
- T4 RNA ligase not shown
- PBS primer binding domain
- the resulting first-strand cDNA can be amplified with a primer that binds the primer binding domain of the second adapter oligonucleotide, generating full-length double-stranded cDNA that can be amplified and uncoupled from magnetic beads with primers that bind the primer binding domains at both ends of the cDNA, which can include additional sequencing adaptor sequences, such as the P5 and P7 sequences, as well as the forward and reverse indexes (e.g., i5, i7) for sequencing, as desired.
- additional sequencing adaptor sequences such as the P5 and P7 sequences, as well as the forward and reverse indexes (e.g., i5, i7) for sequencing, as desired.
- each of the first adapters and/or each of the second adapter oligonucleotides further comprises a sample barcode and/or unique molecular identifier.
- the cDNA synthesis primers are covalently linked to magnetic beads generating solid-phase first-strand cDNA.
- the cDNA synthesis primers are uncoupled, requiring additional purification procedures in-between aspects of the invention.
- condition sufficient to produce a double stranded product nucleic acid is meant reaction conditions that permit hybridization of the first strand cDNA synthesis primer to the template RNA and polymerase-mediated extension of its 3’ end. Achieving suitable reaction conditions may include selecting reaction mixture components, concentrations thereof, and a reaction temperature to create an environment in which the polymerase is active and the relevant nucleic acids in the reaction interact (e.g., hybridize) with one another in the desired manner.
- the reaction mixture may include buffer components that establish an appropriate pH, salt concentrations (e.g., KC1 concentration), metal cofactor concentration (e.g., Mg 2+ or Mn 2+ concentration), and the like, for the extension reaction to occur.
- salt concentrations e.g., KC1 concentration
- metal cofactor concentration e.g., Mg 2+ or Mn 2+ concentration
- nuclease inhibitors e.g., an RNase inhibitor and/or a DNase inhibitor
- additives for facilitating amplification/replication of GC rich sequences e.g., betaine, DMSO, ethylene glycol, 1,2-propanediol, or combinations thereof
- molecular crowding agents e.g., polyethylene glycol, Ficoll, dextran or the like
- enzyme-stabilizing components e.g., DTT, or TCEP, present at a final concentration ranging from 0.1 to 10 mM (e.g., 1 mM)
- any other reaction mixture components useful for facilitating polymerase-mediated extension reaction e.g., a DNase inhibitor and/or a DNase inhibitor
- any other reaction mixture components useful for facilitating polymerase-mediated extension reaction.
- the reaction mixture can have a pH suitable for the primer extension reaction, which in certain embodiments can range from 5 to 9, such as from 7 to 9, including from 8 to 9, e.g., 8 to 8.5.
- the reaction mixture includes a pH adjusting agent. pH adjusting agents of interest include, but are not limited to sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution and the like.
- the pH of the reaction mixture can be adjusted to the desired range by adding an appropriate amount of the pH adjusting agent.
- the temperature range suitable for production of the double stranded product nucleic acid may vary according to factors such as the particular polymerase employed, the melting temperatures of any optional primers employed, etc.
- the polymerase is a reverse transcriptase (e.g., an MMLV mutant such as SuperScript® IV reverse transcriptase from ThermoFisher®) and the reaction mixture conditions sufficient to produce the double stranded product nucleic acid include bringing the reaction mixture to a temperature ranging from 4C to 72C, such as from 16C to 70C, e.g., 37C to 50C, including 50C.
- a reverse transcriptase e.g., an MMLV mutant such as SuperScript® IV reverse transcriptase from ThermoFisher®
- the reaction mixture conditions sufficient to produce the double stranded product nucleic acid include bringing the reaction mixture to a temperature ranging from 4C to 72C, such as from 16C to 70C, e.g., 37C to 50C, including 50C.
- condition sufficient to ribotail and ligate the 3’ end of cDNA is meant reaction conditions that permit the terminal transferase-mediated extension of 3’ end of cDNA with nontemplate NTPs (e.g., ATP), followed by ligation of the second adapter oligonucleotide to the 3’ end of cDNA.
- Achieving suitable reaction conditions may include selecting reaction mixture components, concentrations thereof, and a reaction temperature to create an environment in which the terminal transferase and RNA ligase are active in the desired manner.
- the reaction mixture may include buffer components that establish an appropriate pH, salt concentrations (e.g., KC1 concentration), metal cofactor concentration (e.g., Mg 2+ or Mn 2+ concentration), and the like, for the extension and ligation reaction to occur.
- buffer components that establish an appropriate pH, salt concentrations (e.g., KC1 concentration), metal cofactor concentration (e.g., Mg 2+ or Mn 2+ concentration), and the like, for the extension and ligation reaction to occur.
- nuclease inhibitors e.g., a DNase inhibitor
- additives that inhibit secondary structures e.g., betaine, DMSO, ethylene glycol, 1,2-propanediol, or combinations thereof
- molecular crowding agents e.g., polyethylene glycol, Ficoll, dextran or the like
- reaction mixture can have a pH suitable for the ligation reaction, which in certain embodiments can range from 5 to 9, such as from 7 to 9, including from 8 to 9, e.g., 7 to 8. In some instances, the reaction mixture includes a pH adjusting agent.
- pH adjusting agents of interest include, but are not limited to sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution and the like.
- the pH of the reaction mixture can be adjusted to the desired range by adding an appropriate amount of the pH adjusting agent.
- the temperature range suitable for tailing and ligation may vary and include a temperature ranging from 4C to 37C.
- the terminal transferase is a terminal deoxynucleotidyl transferase (e.g., TdT from Takara®), which catalyzes the template-independent incorporation of ribonucleotides into the 3 ’-OH termini of single strand cDNA and is added to the reaction mixture to a final concentration from 0.1 to 10 units/ul (U/ul).
- TdT terminal deoxynucleotidyl transferase
- RNA ligase (e.g., T4 RNA Ligase 1) catalyzes the ligation of a 5’ phosphoryl-terminated nucleic acid donor (e.g., the second adapter oligonucleotide) to the 3 ’-OH termini of single strand cDNA through the formation of a 3 ’-5’ phosphodiester bond with hydrolysis of ATP to AMP and PPi, and is added to the reaction mixture to a final concentration from 1 to 50 units/ul (U/ul, e.g., 2.25 U/ul).
- the template ribonucleic acid (RNA) or RNA precursor within the RNA sample or the test sample may be a polymer of any length composed of ribonucleotides.
- the template RNA or precursor RNA may be any type of RNA (or sub-type thereof), including but not limited to, a messenger RNA (mRNA), a microRNA (miRNA), a small interfering RNA (siRNA), a transacting small interfering RNA (ta-siRNA), a natural small interfering RNA (nat-siRNA), a ribosomal RNA (rRNA), a transfer RNA (tRNA), a small nucleolar RNA (snoRNA), a small nuclear RNA (snRNA), a long non-coding RNA (IncRNA), a non-coding RNA (ncRNA), a transfer-messenger RNA (tmRNA), a precursor messenger RNA (pre-mRNA), a small Cajal body-specific RNA (scaRNA), a piwi-interacting RNA (piRNA), an endoribonuclease- prepared siRNA (esiRNA), a small temporal RNA (stRNA),
- the template RNA or RNA precursor may be subject to a variety of chemical modifications that can alter its structure, function, stability, or interactions with other molecules. Such modifications include, but are not limited to, methylation, acetylation, phosphorylation, oxidation, deamination, ribose methylation, uridine isomerization, pseudouridylation, and many others. These modifications can occur at various positions of the RNA molecule, including the bases, sugars, or phosphate backbone, and can be catalyzed by various enzymes or chemical reagents.
- RNA sample or test sample that includes the template RNA or RNA precursor may be combined into the reaction mixture in an amount sufficient for producing the product nucleic acid.
- the RNA sample or test sample that includes the template RNA or RNA precursor is isolated from 1 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more cells, such as 750 or more, 1000 or more, 2000 or more cells, including 5000 or more cells.
- the template RNA or RNA precursor may be present in any nucleic acid sample of interest, including but not limited to, a nucleic acid sample isolated from a single cell, a plurality of cells (e.g., cultured cells), a tissue, an organ, a body fluid, and/or an organism (e.g., bacteria, yeast, or higher eukaryotic organisms, such as aa plant, or a mouse, or a worm or the like).
- the nucleic acid sample is isolated from a cell(s), tissue, organ, and/or the like of a mammal (e.g., a human, a rodent (e.g., a mouse), or any other mammal of interest).
- the sample may be isolated from a bodily compartment suitable for use in diagnosis, such as blood, urine, saliva, platelets, microvesicles, exosomes, serum, or other bodily fluids.
- the nucleic acid sample is isolated form a source other than a mammal, such as bacteria, yeast, insects (e.g., drosophila), amphibians (e.g., frogs (e.g., Xenopus)), viruses, plants, or any other non-mammalian nucleic acid sample source.
- the test sample is a biological sample, such as a tissue and/or body fluid sample or a combination thereof.
- Biological samples in accordance with embodiments of the invention can be collected in any clinically acceptable manner.
- a biological sample can comprise a tissue, a body fluid, or a combination thereof.
- a biological sample is collected from a healthy subject.
- a biological sample is collected from a subject who is known to have a particular disease or disorder (e.g., a particular cancer or tumor).
- a biological sample is collected from a subject who is suspected of having a particular disease or disorder.
- tissue refers to a mass of connected cells and/or extracellular matrix material(s).
- tissues that are commonly used in conjunction with the present methods include skin, hair, fingernails, endometrial tissue, nasal passage tissue, central nervous system (CNS) tissue, neural tissue, eye tissue, liver tissue, kidney tissue, placental tissue, mammary gland tissue, gastrointestinal tissue, musculoskeletal tissue, genitourinary tissue, bone marrow, and the like, derived from, for example, a human or nonhuman mammal.
- CNS central nervous system
- Tissue samples in accordance with embodiments of the invention can be prepared and provided in the form of any tissue sample types known in the art, such as, for example and without limitation, formalin-fixed paraffin-embedded (FFPE), fresh, and fresh frozen (FF) tissue samples.
- FFPE formalin-fixed paraffin-embedded
- FF fresh frozen tissue samples.
- body fluid refers to a liquid material derived from a subject, e.g., a human or non-human mammal.
- body fluids that are commonly used in conjunction with the present methods include mucous, blood, plasma, serum, serum derivatives, synovial fluid, lymphatic fluid, bile, phlegm, saliva, sweat, tears, sputum, amniotic fluid, menstrual fluid, vaginal fluid, semen, urine, cerebrospinal fluid (CSF), such as lumbar or ventricular CSF, gastric fluid, a liquid sample comprising one or more material(s) derived from a nasal, throat, or buccal swab, a liquid sample comprising one or more materials derived from a lavage procedure, such as a peritoneal, gastric, thoracic, or ductal lavage procedure, and the like.
- CSF cerebrospinal fluid
- a biological sample can comprise a fine needle aspirate or biopsied tissue.
- a biological sample can comprise media containing cells or biological material.
- a biological sample can comprise a blood clot, for example, a blood clot that has been obtained from whole blood after the serum has been removed.
- a biological sample can comprise stool.
- a biological sample is drawn whole blood. In one aspect, only a portion of a whole blood sample is used, such as plasma, red blood cells, white blood cells, and platelets.
- a biological sample is separated into two or more component parts in conjunction with the present methods. For example, in some embodiments, a whole blood sample is separated into plasma, red blood cell, white blood cell, and platelet components.
- a sample includes a plurality of nucleic acids not only from the subject from which the sample was taken, but also from one or more other organisms, such as viral or bacterial DNA/RNA that is present within the subject at the time of sampling.
- producing a template RNA may include adding nucleotides to an end of the precursor RNA.
- the precursor RNA is a nonpolyadenylated RNA (e.g., a microRNA, small RNA, or the like), and producing the template RNA includes adenylating (e.g., polyadenylating) the precursor RNA.
- Adenylating the precursor RNA may be performed using any convenient approach.
- condition sufficient to produce a product chimeric nucleic acid is meant reaction conditions that permit the ligation of the first adapter oligonucleotide to the 3’ end of the RNA precursor, catalyzed by the RNA ligase. Achieving suitable reaction conditions may include selection reaction mixture components, concentrations thereof, and a reaction temperature to create an environment in which the terminal transferase and RNA ligase are active in the desired manner.
- pH adjusting agents of interest include, but are not limited to sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution and the like.
- the pH of the reaction mixture can be adjusted to the desired range by adding an appropriate amount of the pH adjusting agent.
- the temperature range suitable for ligation may vary and include a temperature ranging from 4 °C to 37 °C.
- the RNA ligase is T4 RNA Ligase I, which catalyzes the ligation of a 5’ pre-adenylated nucleic acid donor (e.g., the first adapter oligonucleotide) to the 3 ’-OH termini of the RNA precursor through the formation of a 3 ’-5’ phosphodiester bond, and is added to the reaction mixture to a final concentration from 1 to 50 units/ul (U/ul, e.g., 2.25 U/ul).
- a 5’ pre-adenylated nucleic acid donor e.g., the first adapter oligonucleotide
- U/ul e.g., 2.25 U/ul
- the test sample is obtained by a method for purifying ribosome nascent-chain complexes of a biological sample of interest to obtain ribosome-coated mRNA fragments that serve as RNA precursors for the methods described herein.
- test sample is obtained by a method for purifying an RNA molecule from a biological sample, where the RNA molecule carries a particular modification of interest, comprising:
- RNA fragment by contacting the biological sample with an agent capable of cleaving the phosphodiester bond, thereby generating a fragment of the RNA molecule, wherein the majority of fragments is around 100 nucleotides in length; (II) contacting the RNA fragment in said biological sample with a molecule that specifically interacts with a particular modification of interest, wherein said molecule can be a protein such as an antibody;
- step (IV) purifying the complex obtained in step (III) to provide RNA fragments containing the modification of interest, wherein said RNA fragments are used as precursor RNA in the method for preparing a sequence library disclosed herein.
- the agent capable of cleaving a phosphodiester bond in step (I) is a chemical agent, such as divalent cations (e.g., zinc, magnesium), which can catalyze the cleavage of the phosphodiester bond under specific conditions.
- divalent cations e.g., zinc, magnesium
- enzymatic fragmentation is used, where a ribonuclease or other suitable enzyme is added to the biological sample to cleave the RNA molecule at the site of interest. This approach allows for site-specific cleavage of the RNA molecule and can be optimized to achieve high specificity and efficiency.
- heat fragmentation which involves heating the RNA molecule to high temperatures, can also be used to break the phosphodiester bond and generate RNA fragments for downstream analysis.
- test sample is obtained by a method for purifying an RNA molecule interacting with an RNA binding protein (RBP) of interest in a biological sample, comprising:
- RNA molecule by contacting the RBP-RNA complex with an agent capable of cleaving a bond thereof, thereby generating a fragment of the RNA molecule, wherein the fragment is at least 22 nucleotide bases in length; (3) selecting the RBP-RNA fragment complex in said biological sample with a molecule that specifically interacts with a component of the RBP-RNA fragment complex; and
- step (3) purifying the RBP-RNA fragment complex obtained in step (3) to provide RNA fragments interacting with the RBP of interest, wherein said RNA fragments are used as precursor RNA in the method for preparing a sequence library disclosed herein.
- the agent capable of cleaving a bond in step (2) is a nuclease, including but not limited to, RNase A, RNase I, RNase T1 and/or MNase.
- purifying the RNA complex of step (IV) and (4) of the abovedisclosed methods for obtaining test samples comprises a chromatographic method.
- RNA-protein complex of step (IV) and (4) of the above-disclosed methods for obtaining test samples is performed under stringent conditions comprising:
- the covalent bond of step (1) and (III) of the above-disclosed methods for obtaining test samples is formed with irradiation.
- the source of irradiation may emit, in one embodiment, radiation of a discrete wavelength. In another embodiment, the source may emit radiation dispersed throughout a region of the electromagnetic radiation spectrum. In another embodiment, the source may emit a mixture of radiation, some of which is of a discrete wavelength, and some of which is dispersed throughout a region of the electromagnetic radiation spectrum.
- the irradiation may result from a polychromatic irradiation source.
- Polychromatic refers, in one embodiment, to a source that emits radiation of various wavelengths. Such wavelengths may be anywhere in the electromagnetic radiation spectrum.
- the radiation emission spectra of various types of irradiation sources are known in the art.
- the irradiation may result from a mercury light.
- Mercury lamps emit radiation of 254 nm, and may also have polychromatic background emissions at other discrete wavelengths, e.g., 313 nm, 365 nm, 405 nm, 436 nm, 546 nm, 579 nm, 1015 nm and 1140 nm. This is a fairly unique characteristic of these types of lamps (see U.S. Pat. No. 6,611,375).
- the irradiation may result from a two-photon excitation apparatus (So P T et al, Cell Mol Bio (Noisy le grand) 44:771).
- a two-photon excitation apparatus So P T et al, Cell Mol Bio (Noisy le grand) 44:771.
- Multiple photon means, in one embodiment, the simultaneous absorption of multiple photons by a reactive molecule. This method is described in detail in U.S. Pat. No. 6,316,153 and references therein.
- the device of the present invention comprises an additional filtering means.
- the filtering means comprises a liquid filter solution that transmits only a specific region of the electromagnetic spectrum.
- sources of irradiation are well known to those skilled in the art (see, for example Diffey, B L, Methods 28:4-13; and Chen J et al, Cancer J. 8: 154-63). Each type of radiation represents a separate embodiment of the present invention.
- a chemical group such as, for example, puromycin is added to RNA to facilitate formation of the covalent bond of step (1). This method is described in Rodriguez- Fonseca C et al (RNA 6:744-54).
- a photoreactive nucleoside e.g., 4-thiouridine and 6- thioguanosine
- a photoreactive nucleoside can be added to the biological sample of interest to increase crosslinking efficiency at a wavelength which is significantly absorbed by the photoreactive nucleoside such that covalent cross-links are formed between the modified RNA transcript and a protein and the RNA is not damaged.
- the covalent bond of step (1) or (III) of the above-disclosed methods for obtaining test samples is formed with a chemical.
- the chemical is formaldehyde.
- the chemical is a derivative of formaldehyde.
- the chemical is paraformaldehyde.
- the chemical is glutaraldehyde.
- the chemical is osmium tetroxide.
- the chemical is acetone.
- the chemical is an alcohol.
- the chemical is an NHS ester.
- the chemical is a Maleimides.
- the chemical is a haloacetyl.
- the chemical is a pyridyl disulfide.
- the chemical is a sulfhydryl modifier such as SATA, SPDP or Traut's Reagent.
- the chemical is hydrazide.
- the chemical is l-Ethyl-3-(3-Dimethylaminopropyl)-Carbodiimide Hydrochloride.
- the chemical is an aryl azide or a derivative thereof.
- the chemical is any other cross-linking compound known in the art. The cross-linking compound may, in one embodiment, be applied over a broad range of concentrations. Each type of chemical represents a separate embodiment of the present invention.
- the methods of the present disclosure include combining a polymerase with the plurality of template RNA to generate solid-phase first strand cDNA.
- a variety of polymerases may be employed when practicing the subject methods.
- the polymerase combined into the reaction mixture is a reverse transcriptase (RT).
- Reverse transcriptases suitable for the invention do not need to have template-switch capability and can include, but are not limited to, retroviral reverse transcriptase, retrotransposon reverse transcriptase, retroplasmid reverse transcriptase, retron reverse transcriptases, bacterial reverse transcriptases, group II intron-derived reverse transcriptase, and mutants, variants, derivatives, or functional fragments thereof, e.g., RNase H minus or Rnase H reduced enzymes (e.g. Maxima H Minus RT (ThermoFisher) or Superscript RT (ThermoFisher)) ( Figure 9).
- retroviral reverse transcriptase retrotransposon reverse transcriptase
- retroplasmid reverse transcriptase retron reverse transcriptases
- bacterial reverse transcriptases e.g., group II intron-derived reverse transcriptase, and mutants, variants, derivatives, or functional fragments thereof, e.g., RNase H minus or R
- the reverse transcriptase may be a Moloney Murine Leukemia Virus reverse transcriptase (MMLV RT).
- MMLV RT Moloney Murine Leukemia Virus reverse transcriptase
- a mix of two or more different polymerases is added to the reaction mixture, e.g., for improved processivity, proof-reading, and/or the like.
- the polymer is one that is heterologous relative to the template, or source thereof.
- the polymerase is combined into the reaction mixture such that the final concentration of the polymerase is sufficient to produce a desired amount of the product nucleic acid.
- the polymerase e.g., Superscript IV
- the first strand reaction mixture further includes a first strand cDNA synthesis primer.
- the first strand cDNA synthesis primer is covalently linked to a magnetic bead, and includes one, two or more domains.
- the primer may include a first (e.g., 3’) domain that hybridizes to the template RNA and a second (e.g., 5’) domain that does not hybridize to the template RNA.
- the sequence of the first and second domain may be independently defined or arbitrary.
- the first domain has a defined sequence (e.g., an oligo dT sequence or an RNA specific sequence) or an arbitrary sequence (e.g., a random sequence, such as a random hexamer sequence) and the sequence of the second domain is defined, e.g., a pre-tagmentation amplification primer binding domain or an amplification primer binding domain and may have any convenient sequence such as a sequencing primer binding domain.
- a defined sequence e.g., an oligo dT sequence or an RNA specific sequence
- an arbitrary sequence e.g., a random sequence, such as a random hexamer sequence
- the sequence of the second domain is defined, e.g., a pre-tagmentation amplification primer binding domain or an amplification primer binding domain and may have any convenient sequence such as a sequencing primer binding domain.
- the first strand cDNA synthesis primer may further include a first post-tagmentation amplification, e.g., PCR amplification, primer binding domain, which may have any convenient sequence such as a sequencing primer binding domain.
- a first post-tagmentation amplification e.g., PCR amplification
- primer binding domain which may have any convenient sequence such as a sequencing primer binding domain.
- the first strand cDNA synthesis primer includes a barcode domain for identification of the sample after pooling post reverse transcription.
- the first strand cDNA synthesis primer may include a unique molecular identifier or other barcode to mark each RNA molecule converted to cDNA individually.
- the sequence includes all or a component of a sequencing platform adapter construct.
- sequencing platform adapter construct is meant a nucleic acid construct that includes at least a portion of a nucleic acid domain (e.g., a sequencing platform adapter nucleic acid sequence) utilized by a sequencing platform of interest, such as a sequencing platform provided by Illumina ® (e.g., the HiSeq, MiSeq, NextSeq or NovaSeq); Pacific Biosciences (e.g., the PACBIO RS II sequencing system); or any other sequencing platform of interest.
- Illumina ® e.g., the HiSeq, MiSeq, NextSeq or NovaSeq
- Pacific Biosciences e.g., the PACBIO RS II sequencing system
- a sequencing platform adapter construct includes one or more nucleic acid domains selected from: a domain (e.g., a “capture site) or “capture sequence”) that specifically binds to a surface- attached sequencing platform oligonucleotide (e.g., the P5 or P7 oligonucleotides attached to the surface of a flow cell in an Illumina® sequencing system); a sequencing primer binding domain (e.g.
- a domain to which the Read 1 or Read 2 primers of the Illumina® platform may bind a domain to which the Read 1 or Read 2 primers of the Illumina® platform may bind
- a barcode domain e.g., a domain that uniquely identifies the sample source of the nucleic acid being sequences to enable sample multiplexing by marking every molecule from a given sample with a specific barcode or “tag”
- a barcode sequencing primer binding domain a domain to which a primer used for sequencing a barcode binds
- a molecular identification domain e.g., a molecular index tag, such as a randomized tag of 4, 6, or other number of nucleotides
- a barcode domain e.g., sample index tag
- a molecular identification domain e.g., a molecular index tag
- a sequencing platform adapter domain when present, may include one or more nucleic acid domain of any length and sequence suitable for the sequencing platform of interest.
- the nucleic acid domains may have a length and sequence that enables a polynucleotide (e.g., an oligonucleotide) employed by the sequencing platform of interest to specifically bind to the nucleic acid domain, e.g., for solid phase amplification and/or sequencing by synthesis of the cDNA insert flanked by the nucleic acid domains.
- a polynucleotide e.g., an oligonucleotide
- the first strand cDNA synthesis primer may include from 3’ to 5’, a first domain that hybridizes to the template RNA, e.g., an oligo dT domain, a barcode domain, a molecular identifier, a sequencing platform adapter domain, such as a sequencing read primer domain, and an amplification primer binding domain.
- the amplification primer binding domain will resemble a pretagmentation amplification primer binding domain and the first strand cDNA synthesis primer will also include a post-tagmentation amplification primer binding domain, which may be a unique domain or partially or completely overlap with another domain of the primer, such as the sequencing read primer domain, so long as that domain is compatible with respect to the amplification protocol being performed.
- the first adapter oligonucleotide may be pre-adenylated at its 5’ end and include from 3’ to 5’, a first domain, e.g., an oligo dT domain, a barcode domain, a molecular identifier, a sequencing platform adapter domain, such as a sequencing read primer domain, an amplification primer binding domain, and a chain terminator at its 3’ end, e.g., a near-infrared fluorescent dye.
- a first domain e.g., an oligo dT domain, a barcode domain, a molecular identifier
- a sequencing platform adapter domain such as a sequencing read primer domain, an amplification primer binding domain
- a chain terminator at its 3’ end, e.g., a near-infrared fluorescent dye.
- the nucleotide sequence of nucleic acid domains useful for sequencing on a sequencing platform of interest may vary and/or change over time.
- Adapter sequences are typically provided by the manufacturer of the sequencing platform (e.g., in technical documents provided with the sequencing system and/or available on the manufacturer’s website). Based on such information, the sequence of any sequencing platform adapter domains of the first strand cDNA synthesis primer, first or second adapter oligonucleotide, amplification primers, and/or the like, may be designed to include all or a portion of one or more nucleic acid domains in a configuration that enables sequencing the nucleic acid insert (corresponding to the template RNA) on the platform of interest.
- the subject methods include combining NTPs into the tailing and ligation reaction mixture.
- a single NTP e.g., ATP
- ATP e.g., ATP
- ATP may be added to the reaction mixture such that the final concentration is from 0.01 to 100 mM, e.g., 1 mM.
- nucleic acids that find use in practicing the methods of the present disclosure may include any useful nucleotide analogue and/or modification, including any of the nucleotide analogues and/or modifications described herein.
- the methods include using the product nucleic acid as a template for second-strand synthesis and/or amplification (e.g., for subsequent sequencing of the amplicons).
- the methods include contacting the product nucleic acid which primers that hybridize to primer binding domain present on both ends of the cDNA, under amplification conditions, such as PCR amplification conditions, sufficient to produce a product double stranded cDNA.
- Amplification conditions that may be employed include the addition of one or more primers (e.g., as described above) and dNTPs.
- the conditions may include combining a thermostable polymerase (e.g., a Tad, Pfu, TfL, Tth, Tli, and/or other thermostable polymerase) into the reaction mixture.
- a thermostable polymerase e.g., a Tad, Pfu, TfL, Tth, Tli, and/or other thermostable polymerase
- Amplification e.g., PCR amplification, results in the production of a product double stranded cDNA.
- the first strand cDNA is combined with a second adapter oligonucleotide, an NTP (in this example ATP (not shown)), a terminal transferase (not shown) and an RNA ligase (not shown).
- NTP in this example ATP (not shown)
- NTP in this example ATP
- terminal transferase not shown
- RNA ligase not shown
- Tailing occurs at the 3’ end of the first strand cDNA molecule through the addition of non-templates nucleotides (indicated by (rA)s), which increases the sensitivity of the subsequent ligation reaction which joins the 5’ end of the second adapter oligonucleotides to the 3’ end of the first strand cDNA.
- the 5’ end of the mRNA is captured, allowing for downstream amplification and enrichment of full- length cDNA, e.g., by LD PCR (Long Distance PCR).
- the components are included in a reaction mixture under conditions sufficient to produce a ligated nucleic acid product.
- Product double stranded cDNA is produced by contacting the ligated single stranded nucleic acid with amplification primers complementary to PCR primer binding domains present in the first strand cDNA synthesis primer and second adapter oligonucleotide.
- product double stranded cDNA is prepared for full-length sequencing on a long-read sequencing platform of interest.
- product double stranded cDNA is tagmented with one or more transposomes including a transposase and a transposon nucleic acid, where the transposon nucleic acid includes a transposon end domain for binding to the transposon protein and a second post-tagmentation amplification primer binding domain (e.g., a post-tagmentation PCR amplification primer binding domain), to produce a tagmented sample.
- the second post-tagmentation amplification primer binding domain comprises a sequencing read primer domain, e.g., a sequencing read primer domain that is different from any sequencing read primer domain present in the first strand cDNA synthesis primer, or second adapter oligonucleotide.
- the resultant tagmented sample is then subjected to amplification conditions, e.g., PCR amplification conditions, using post-tagmentation first and second amplification, e.g., PCR, primers.
- post-tagmentation first and second amplification primers may vary, and in some instances include sequencing platform adapter domains, e.g., a first primer including a first post-tagmentation amplification primer domain, a first sequencing indexing domain and a first sequencing adapter domain; and a second primer including a second post-tagmentation amplification primer domain, a second sequencing indexing domain and a second sequencing adapter domain, to produce a sequencing library.
- the sequencing platform adapter construct(s) may include any of the nucleic acid domains described elsewhere herein (e.g., a domain that specifically binds to a surface-attached sequencing platform oligonucleotide, a sequencing primer binding domain, a barcode domain, a barcode sequencing primer binding domain, a molecular identification domain, or any combination thereof).
- Such embodiments find use, e.g., where nucleic acids of the tagmented sample do not include all of the adapter domains useful or necessary for sequencing in a sequencing platform of interest, and the remaining adapter domains are provided by the primers used for the amplification of the nucleic acids of the tagmented sample.
- the method may be used to capture 5’ ends of RNAs, e.g., where end-capture is facilitated by the presence of a first posttagmentation amplification primer binding site in the second adapter oligonucleotide and a 3’ second post-tagmentation PCR primer binding site introduced by tagmentation.
- Capturing the 5’ ends of RNAs finds use, e.g., for 5’ end mutation or splice variant analysis, etc.
- Such a pooling step may include combining each first strand cDNA sample (or aliquot thereof) to be pooled into a single container (e.g., a single tube or other container, e.g., well, microfluidic chamber, droplet, nanowell, etc).
- the pooled solid-phase first strand cDNA sample is then tailed and ligated, e.g., as described above.
- individual sequencing reads can be traced back to particular starting RNA samples using the source, e.g., cell barcode, enabling multiplexed sequencing. Details regarding barcode-based multiplexed sequencing are described, e.g., in Wong eat al. (2013) Curr. Protoc. Mol. Biol. Chapter 7:Unit 7.11.
- the methods of the present disclosure further include subjecting the sequencing library to a sequencing protocol.
- the protocol may be carried out on any suitable sequencing platform.
- Sequencing platforms of interest include, but are not limited to, a sequencing platform provided by Illumina® (e.g., the HiSeq, MiSeq, NextSeq, NovaSeq sequencing systems); Pacific Biosciences (e.g., the PACBIO RS II Sequel sequencing system; or any other sequencing platform of interest.
- the sequencing protocol will vary depending on the particular sequencing system employed. Detailed protocols for sequencing a library, e.g., which may include further amplification (e.g., solid phase amplification), sequencing the amplicon, and analyzing the sequencing data are available from the manufacturer of the sequencing system employed.
- the subject methods may be used to generate sequencing libraries corresponding to mRNAs for downstream sequencing on a sequencing platform of interest.
- the subject methods may be used to generate a sequencing library corresponding to non-polyadenylated RNAs for downstream sequencing on a sequencing platform of interest.
- microRNAs may be polyadenylated and then used as templates for reverse transcription followed by tailing and ligation of cDNA described elsewhere herein. Random or gene-specific priming may also be used, depending on the goal of the researcher.
- the library may be mixed 50:50 with a control library (e.g., Illumina’s PhiX control library) and sequenced on the sequencing platform (e.g., an Illumina® sequencing system).
- the control library sequences may be removed and the remaining sequences mapped to the transcriptome of the source of the mRNAs (e.g., human, mouse, or any other mRNA source).
- processors suitable for the execution of computer programs include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read-only memory or a random-access memory, or both.
- the essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- Information carriers suitable for embodying computer program instructions and data include all forms of non- volatile memory, including, by way of example, semiconductor memory devices, (e.g., EPROM, EEPROM, solid state drive (SSD), and flash memory devices); magnetic disks, (e.g., internal hard disks or removable disks); magneto- optical disks; and optical disks (e.g., CD and DVD disks).
- semiconductor memory devices e.g., EPROM, EEPROM, solid state drive (SSD), and flash memory devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto- optical disks e.g., CD and DVD disks
- optical disks e.g., CD and DVD disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a reference data set can be stored locally within the computer, and the computer accesses the reference data set within the CPU for comparison purposes.
- Examples of communication networks include, but are not limited to, cell networks (e.g., 3G, 4G or 5G), a local area network (LAN), and a wide area network (WAN), e.g., the Internet.
- the subject matter described herein can be implemented as one or more computer program products, such as one or more computer programs tangibly embodied in an information carrier (e.g., in a non-transitory computer-readable medium) for execution by, or to control the operation of, a data processing apparatus (e.g., a programmable processor, a computer, or multiple computers).
- a computer program also known as a program, software, software application, app, macro, or code
- Systems and methods of the invention can include instructions written in any suitable programming language known in the art, including, without limitation, C, C++, Perl, Java, ActiveX, HTML5, Visual Basic, or JavaScript.
- Suitable computing devices typically include mass memory, at least one graphical user interface, at least one display device, and typically include communication between devices.
- the mass memory illustrates a type of computer-readable media, namely computer storage media.
- Computer storage media may include volatile, non-volatile, removable, and nonremovable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory, or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, Radiofrequency Identification (RFID) tags or chips, or any other medium that can be used to store the desired information, and which can be accessed by a computing device.
- Functions described herein can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Any of the software can be physically located at various positions, including being distributed such that portions of the functions are implemented at different physical locations.
- compositions of embodiments of the invention may include, e.g., one or more of any of the reaction mixture components described above with respect to the subject methods.
- the compositions may include one or more of a RNA (e.g., a control RNA), a first adapter oligonucleotide in some instances, a polymerase (e.g., a reverse transcriptase, or the like), a first-strand cDNA synthesis primer having any of the domains described above, a second adapter oligonucleotides having any of the domains described above, dNTPS, NTPs, a terminal transferase, an RNA ligase, a second strand cDNA primer having any of the domains described above, amplification primers having any of the domains described above, a salt, a metal cofactors one or more nuclease inhibitors (e.g., an RNase inhibitor), one or more enzyme- stabilizing components (e
- the tubes and/or plates in which the composition is present provide for efficient heat transfer to the composition (e.g., when placed in a heat block, water bath, thermocouples, and/or the like), so that the temperature of the composition may be altered within a short period of time, e.g., as necessary for a particular enzymatic reaction to occur.
- the composition is present in a thin-walled polypropylene tube, or a plate having thin-walled polypropylene wells or materials such as aluminum having high heat conductance.
- the compositions of the disclosure may be present in droplets.
- the first strand cDNA synthesis primer may be attached to the solid support or bead by methods known in the art - such as biotin linkage or by covalent linkage - and reaction allowed to proceed on the support.
- the oligos may be synthesized directly on the solid support - e.g., as described in Macosko, E Z et. Al, Cell 161, 1202-1214, May 21, 2015).
- compositions include, e.g., a microfluidic chip (e.g., a “lab-on-a-chip device”, e.g., a microfluidic device comprising channels and inlets).
- the composition may be present in an instrument configured to bring the composition to a desired temperature, e.g., a temperature-controlled water bath, heat block, heat block adaptor, or the like.
- the instrument configured to bring the composition to a desired temperature may be configured to bring the composition to a series of different desired temperatures, each for a suitable period of time (e.g., the instrument may be a thermocycler).
- kits e.g., a thermocycler
- kits may include, e.g., one or more of any of the reaction mixture components described above with respect to the subject methods.
- the kits may include: a first strand cDNA synthesis primer including a 3’ oligo(dT) domain and a 5’ amplification primer binding domain; a second adapter oligonucleotide including an amplification primer binding domain, e.g., as described above.
- kits may include a first adapter oligonucleotide including a 5’ amplification primer binding domain and a 3’ polyA domain, a first strand cDNA synthesis primer including an oligo dT domain, and a second adapter oligonucleotide including an amplification primer binding domain, as described above.
- kits may further include amplification primers which may include any of the domains/features described above in the section relating to the methods of the present disclosure.
- kits may further include one or more of a template ribonucleic acid (RNA), components for producing a template RNA from a precursor RNA (e.g., a poly(A) polymerase and associated reagents for polyadenylating a non-polyadenylated precursor RNA), components for purifying RNA-protein complexes of interest, a polymerase (e.g., a reverse transcriptase), a terminal transferase, an RNA ligase (e.g., T4 RNA ligase I), dNTPs, NTPs, a salt, a metal cofactors, one or more nuclease inhibitors (e.g., an RNase inhibitor and/or a DNase inhibitor), one or more molecular crowding agents (e.g., polyethylene glycol, or the like), one or more enzyme-stabilizing components (e.g., DTT), or any other desired kit component(s), such as solid supports, e.g., tubes
- kits may be present in separate containers, or multiple components may be present in a single container. In certain embodiments, it may be convenient to provide the components in a lyophilized form, so that they are ready to use and can be stored conveniently at room temperature.
- a subject kit may further include instructions for using the components of the kit, e.g., to practice the subject method.
- the instructions are generally recorded on a suitable recoding medium.
- the instructions may be printed on a substrate, such as paper or plastic, etc.
- the instructions may be present in the kits as a package insert, in the labelling of the container of the kit or components thereof (i.e., associated with the packaging or sub-packaging) etc.
- the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g., CD-ROM, diskette, Hard Disk Drive (HDD), portable flash drive, etc.
- HDD Hard Disk Drive
- the actual instructions are not present in the kit, but means for obtaining the instructions from the remote source, e.g., via the internet, are provided.
- An example of this embodiments is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
- the methods of the present disclosure find use in a variety of applications, including those that require the presence of particular nucleotide sequences at both ends of nucleic acids of interest.
- Such applications exist in the areas of basic research and diagnostics (e.g., clinical diagnostics) and include, but are not limited to, the generation of sequencing libraries.
- Such libraries may include adapter sequences that enable sequencing of the library members using any convenient sequencing platform, including: the HiSeq, MiSeq, NextSeq and NovaSeq sequencing systems from Illumina®, the PACBIO RS II Sequel sequencing system from Pacific Biosciences, the MinlON, GridlON or PromethlON from Oxford Nanopore Technologies, or any other convenient sequencing platform.
- the methods of the present disclosure find use in generating sequencing libraries corresponding to any RNA starting material of interest (e.g., mRNA) and are not limited to polyadenylated RNAs.
- the subject methods may be used to generate sequencing libraries from non-polyadenylated RNAs, including microRNAs, small RNAs, siRNAs, and/or any other type of non-polyadenylated RNAs of interest such as ribosome-associated mRNAs or RNA fragments associated with an RNA- binding protein of interest that were purified with appropriate methods (e.g., CLIP).
- the methods also find use in generating strand-specific information, which can be helpful in determining allele-specific expression or in distinguishing overlapping transcripts in the genome.
- An aspect of the subject methods is that - utilizing a template RNA - a cDNA species having sequencing platform adapter sequences at one or both of its ends is generated, by employing tailing and ligation of first strand cDNAs (TLC) that improves on traditional approaches for generating chimeric nucleic acid molecules and provides an alternative strategy to generate full-length cDNA, preserving the original 5’ end of the template RNA molecule.
- TLC first strand cDNAs
- TSOs Template Switch Oligos
- TLC The methods of the present disclosure (TLC) also rely on the incorporation of a short stretch of non-template nucleotides to the 3’ end of first-strand cDNA molecules, but differ in a number of important aspects: i) TLC incorporates ribonucleotides instead of deoxyribonucleotides to mimic the 3’ end of an RNA molecule for subsequent ligation reaction; ii) the non-template overhang is used as a ligation acceptor instead of anchoring sites for the TSO and greatly increases ligation efficiency; iii) TLC uncouples the terminal transferase reaction from reverse transcription, giving higher flexibility in RT conditions such as higher reaction temperatures that are beneficial for long and/or structured molecules; iv) TLC is not dependent on the presence of a 5’ cap structure of the template RNA as observed for TSOs (Wulf, M G et al, J Biol Chem, 294, 18220-18231, 2019), making it less restrictive in terms of the RNA molecules
- Uncoupling of the tailing reaction from the reverse transcription reaction as described in point (iii) and 5’-cap independence in point (iv) are crucial for the applicability of TLC to generate sequencing libraries from a more varied source of input materials.
- uncapped RNA molecules such as specific small RNA species, viral RNA, or RNA fragments obtained after purification of modified RNA or RNA-protein complexes following cross-linking and immunoprecipitation (e.g., CLIP), in which case reverse transcription frequently terminates prematurely at crosslinking sites, preventing the addition of
- the methods of the present disclosure instead employ an enzymatic ligation-based approach for the generation of sequencing libraries from RNA that is fully compatible with standard enzymes in conventional library workflows without extensive purification procedures which minimizes sample loss and lowers input requirements to as few as 500-1000 cells (e.g., equivalent to 5-20 ng of total RNA assuming a concentration of 10-20 pg of RNA/cell).
- RNA-binding proteins are instrumental for post-transcriptional gene regulation and play an active part in numerous human pathologies, including neurodegenerative diseases, cancer, as well as infection. Despite their crucial role in regulating all aspects of RNA metabolism, transcriptome-wide methods to profile RNA-protein interactions remain technically challenging. Protein-centric approaches to study RNA-protein interactions mainly rely on cross-linking and immunoprecipitation (CLIP) of RNA-binding proteins (RBPs) and generation of sequencing libraries from co-purified RNA.
- CLIP cross-linking and immunoprecipitation
- RNA is then 3’ adapter ligated prior to SDS polyacrylamide gel electrophoresis (SDS-PAGE) and transfer onto nitrocellulose from where RNA is liberated, purified and reverse transcribed into cDNA prior to second adapter ligation and PCR amplification to generate sequencing-compatible libraries.
- Major bottlenecks, particularly during library preparation, include extensive purification steps and suboptimal enzymatic reactions such as the second adapter ligation, that lead to sample loss, low complexity libraries and the requirement for large amounts of starting material ( ⁇ 20M cells) and sequencing depth.
- TLC When combined with CLIP, TLC follows the procedure outlined in FIG. 5 (Steps 2 - 9, with RNA precursors resulting from nuclease digestion (Step 2) following the purification of an RNA-binding protein of interest (not pictured). 3’ ends of RNA precursors of interest are then ligated to the first adapter oligonucleotide, containing a primer binding domain (PBS) and a polyA stretch (Step 3). Ligated RNA molecules are then captured on oligo(dT) beads and reverse transcribed into first strand cDNA, with the oligo(dT) serving as first strand cDNA synthesis primer (Step 4).
- Solid-phase cDNA is then separated from template RNA through heat denaturation and used as acceptor molecule for a subsequent ligation reaction (Steps 5-7).
- a tailing strategy is used that results in the addition of a few (e.g., 1-3) ( Figure 12) non-template ribonucleotides (e.g., ATP) at the 3’ end of solidphase first strand cDNA (Step 6).
- CLIP libraries prepared with TLC showed up to up to 68% overlap with eCLIP peaks, when restricting the comparison to genes with similar expression levels between 293T and HepG2 cells to account for underlying gene expression differences in the cell types that were profiled (Figure 13).
- CLIP libraries prepared with TLC also show improved sensitivity for de novo motif discovery and recapitulate previously reported motifs with high precision and stronger motif enrichment at the peak summit compared to eCLIP libraries ( Figure 14 and Figure 15).
- the streamlined TLC library preparation protocol drastically reduces both time and cost of CLIP experiments, while generating high quality RBP binding profiles from low input material.
- the larger number of crosslinked induced deletions further improves both the precision and specificity of CLIP libraries generated with TLC, by providing nucleotide resolution of crosslinking sites and distinguishing true binding sites from copurifying, non-crosslinked fragments.
- input requirements can be further reduced with high quality data obtained from as little as 500 cells, presenting a fully bead-based, single-tube library preparation strategy amenable to automation for high-throughput settings.
- TLC design innovations compared to other protocols include:
- RNA purification via poly(A) capture introduction of the poly(A) tail during adapter ligation allows capture und purification of RNA molecules within minutes using oligo(dT)-coupled magnetic beads instead of overnight precipitation. Furthermore, this strategy makes purification of RNA-protein complexes via SDS-PAGE optional, thus opening the potential for automation of the entire protocol on a liquid handling system.
- Solid-phase cDNA libraries Oligo(dT)-bead based capture is not only used for purification of RNA, but also for priming reverse transcription, resulting in cDNA covalently linked to magnetic beads. This allows efficient separation of adapter-ligated RNA from first strand cDNA via heat denaturation and facilitates purification and downstream reactions that can be performed on-bead in the same reaction tube.
- ssDNA ligations are inherently inefficient due to the low affinity of RNA ligases for DNA as an acceptor molecule. This causes the permanent loss of molecules that fail to ligate, resulting in low complexity libraries.
- a Terminal Transferase is included in the reaction which incorporates ATP (essential for ligation reaction) in the form of a short ribo-tail at the 3’ end of the cDNA, greatly increasing the affinity, and thus efficiency, of T4 RNA Ligase for the substrates.
- Antibody-bead mixture was washed twice in iCLIP lysis buffer to remove unbound antibody and RNAse-treated cell lysates were added alongside cOmplete EDTA-free Protease Inhibitor Cocktail (Merck, #11836170001) and incubated for 2 hours at 4°C on a rotating wheel.
- beads were washed twice in 200 pl High Salt Buffer (50 mM Tris-HCl pH 7.4, 1 M NaCl, 1 mM EDTA, 1% Igepal CA-630, 0.1% SDS, 0.5% sodium-deoxycholate), with the second wash at 4°C for 3 minutes on a rotating wheel, followed by two washed in 200 pl PNK Wash Buffer (20 mM Tris-HCl, pH 7.4, 10 mM MgC12, 0.2% Tween-20).
- 200 pl High Salt Buffer 50 mM Tris-HCl pH 7.4, 1 M NaCl, 1 mM EDTA, 1% Igepal CA-630, 0.1% SDS, 0.5% sodium-deoxycholate
- Nitrocellulose membranes were scanned on Odyssey® CLx Infrared Imager (LLCOR, 9141) with 169 pm resolution to visualise RNA localisation and then placed on filter paper soaked in PBS. Regions of interest were cut out from nitrocellulose membrane corresponding to -20-100 kDa above the molecular weight of the RBP of interest due to the ligation of L3 adapter (-15.9 kDa) and associated RNA (with 70nt of RNA averaging ⁇ 20kDa).
- Nitrocellulose pieces were placed in LoBind Eppendorf tubes and 200 pl Proteinase K buffer (lOOmM Tris- HCl, pH 7.4, 50 mM LiCl, 1 mM EDTA, 0.2% LiDS) containing 200 pg Proteinase K (Thermo Fisher, #AM2546) were added and incubated at 50°C for 45 minutes at 800rpm.
- 200 pl Proteinase K buffer lOOmM Tris- HCl, pH 7.4, 50 mM LiCl, 1 mM EDTA, 0.2% LiDS
- beads were washed twice in 125 pl oligo(dT) Wash Buffer (10 mM Tris-HCl, pH 7.4, 150 mM LiCl, 0.1 mM EDTA) and once in 20 pl IX First-Strand Buffer (50 mM Tris-HCl, pH 8.3, 75 mM KC1, 3 mM MgC12).
- Solid-phase cDNA on beads was washed once in 60 pl oligo(dT) Wash Buffer and once in 20 pl IX T4 RNA Ligase Buffer (50 mM Tris-HCl, 10 mM MgC12, ImM DTT, pH 7.5). Beads were resuspended in 5 pl of 5’ Adapter mix (2 pl 10X T4 RNA Ligase Buffer, 2 pl of 10 pM L## oligo (see Table 1), 1 pl 100% DMSO), incubated at 75°C for 2 minutes then immediately placed on ice.
- Size-selection of cDNA was performed using ProNEX® Size-Selective Purification System (Promega, #NG2002) with a ratio of 2.8X to enrich for cDNA inserts of at least 20 nucleotides in length (>80bp).
- Library yield was then estimated by amplifying 1 pl of purified cDNA via qPCR using the full length P5 and P7 index primers and 2-3 cycles are subtracted from the obtained Ct value for final library amplification.
- libraries were size-selected again using the ProNEX® Size-Selective Purification System, with a ratio of 1.8X to select fragments larger than 165bp.
- Quality control was performed using the Agilent High Sensitivity DNA Kit (Agilent, #5067-4626) and libraries were quantified using the KAPA Library Quantification Kit (Roche, #KK4824).
- the first adapter ligation was performed for 75 minutes at 25°C. Beads were washed as described above and either directly resuspended in Proteinase K reaction or in 20 pl of RecJ adapter removal reaction (1 X NEB Buffer 2 (NEB, #B7002S, 25U 5’ Deadenylase (NEB, #M0331S), 30U RecJ endonuclease (NEB, #M0264S), 10U SuperaselN and 20% PEG-400) and incubated at 37C for 30 minutes prior to Proteinase K treatment. Samples were then placed on magnet, and supernatant was transferred to fresh tubes containing oligo(dT) beads, with the remaining library preparation performed as described above.
- TLC-CLIP libraries were sequenced on an Illumina NextSeq500 using the High Output Kit v.2.5 for 75 cycles, using Illumina protocol #15048776. 5% PhiX were added to final library pools for increased complexity and sequencing run was performed with custom configuration, running 86 cycles for Read 1 and 6 index cycles.
- Sequencing data was demultiplexed by i7 index reads using bcl2fastq without any read trimming. Further demultiplexing by in-read 5’ barcodes and trimming of adapter sequences was performed using Flexbar v.3.5.0 (https://github.com/seqan/fl exbar) 19 in a two-step approach. In the first step, reads are demultiplexed by in-read barcodes allowing no mismatches, and UMIs are moved into the read header. Barcode sequences (see Table 1) including the UMI designated by the wildcard character ’N’ are provided in fasta format, with the arguments “-b barcodes.
- Flexbar-trimmed reads were aligned against hgl9 using STAR v.2.7.3a (https://github.com/alexdobin/STAR)20 with the following parameters to keep only uniquely mapping reads, removing the penalty for opening deletions and insertions and fully extending the 5-prime end of reads to preserve the end of cDNA molecules: outFilterMultimapNmax 1 — scoreDelOpen 0 — scorelnsOpen 0 — alignEndsType Extend5pOfReadl”. To retain UMI in read header during STAR alignment, any space in header needs to be removed prior to mapping.
- Aligned reads were deduplicated based on unique molecular identifiers using UMI-tools v.1.0.1 (https://github.com/CGATOxford/UMI-tools)21.
- the dedup command was used with the parameters extract-umi-method read id —method unique — spliced-is-unique” to group reads with the same mapping position and identical UMI, while treating reads starting at the same position as unique if one is spliced and the other is not. Peak Calling
- Enriched regions were identified using the peak calling algorithm CLIPper v.2.0.0 (https://github.com/YeoLab/clipper)7,16 with default settings and a p-value cutoff of 0.001 poisson-cutoff 0.01”.
- De novo motif discovery was performed using Homer27 v4.10 on peaks centred on either the apex region obtained from CLIPper or after centring peaks on the position with the highest deletion count.
- fmdMotifsGenome.pl was used with the parameters “-oligo -basic -rna -len5 -S10 -size given” where peak size is a 50-nucleotide window around the apex or with parameter “size 50” for peaks centred on deletions.
- Intronic antisense Alu sequences were extracted from Repeatmasker and intersected with deletion-centred peaks with a CID ratio larger than 10 from PAGE or noPAGE libraries, yielding splice sites that were either shared experimental conditions or specific to either PAGE or noPAGE libraries.
- N stands for any nucleotide
- n designates a phosphorothioated DNA base (nucleotide).
- PS phosphorothioate
- TLC follows the procedure outlined in FIG. 5 (Steps 2 - 9), with RNA precursors resulting from chemical fragmentation (Step 2) followed by purification of RNA fragments carrying a modification of interest through affinity purification (not pictured).
Abstract
The present invention provides a method for preparing a sequencing library from a ribonucleic acid (RNA) sample.
Description
CONSTRUCTION OF SEQUENCING LIBRARIES FROM A RIBONUCLEIC ACID (RNA) USING TAILING AND LIGATION OF cDNA (TLC)
FIELD OF THE INVENTION
The invention provides a method for preparing a sequencing library from a ribonucleic acid (RNA) sample.
BACKGROUND OF THE INVENTION
Massively parallel (or “next generation”) and long-read sequencing platforms are rapidly transforming data collection and analysis in genome, epigenome, transcriptome and epitranscriptome research. All current short-read sequencing (SRS) platforms, such as those marketed by Illumina®, Ion Torrent™, Roche™, Life Technologies™, as well as long-read sequencing (LRS) platforms from Pacific Biosciences and Oxford Nanopore Technology (ONT) require the addition of known adapter sequences to each end of a target polynucleotide.
When constructing high-throughput sequencing libraries from a ribonucleic acid (RNA), the generation of double-stranded cDNA is a crucial and often limiting step of available technologies. The most common strategies involve random priming of the second strand relying on the RNaseH activity of Reverse Transcriptases (RTs) resulting in short RNA fragments hybridized to the first-strand cDNA, that can prime second-strand synthesis. However, the location from where second stand synthesis is initiated from is not controlled in this aspect, leading to cDNA molecules of variable length. In many instances preservation of, and priming from the original 3’ end of the first-strand cDNA molecule is desirable to preserve the full- length of the original target RNA molecule, including: i) increased transcript coverage during RNA-Seq applications, including single-cell approaches; ii) profiling of 5’ ends of RNA molecules to identify transcription start sites (e.g. rapid amplification of cDNA ends (RACE- Seq); iii) full-length cDNA sequencing using long-read sequencing platforms; iv) profiling of short RNA species; v) ribosome footprinting; and vi) characterization of short RNA fragments co-purified with RNA-binding proteins or enriched for RNA modifications of interest following UVC crosslinking, in which case the RT termination site needs to be preserved to obtain information regarding the exact crosslinking position.
Current approaches to initiate second-strand synthesis from the 3’ end of cDNA molecules rely on template switch oligos (TSOs) for applications related to i)-iii), adapter ligation to both ends of the RNA target molecule for applications related to iv); and single-
stranded DNA (ssDNA) ligation or circularization of first-strand cDNA molecules for applications related to v)-vi).
The main limitation of TSOs is a bias towards capped RNA molecules, a restriction in possible RT conditions (e.g., limitation to low RT temperatures, which are not ideal for structured templates and processivity for long templates); the potential for intramolecular priming leading to truncated cDNA molecules; and high levels of contamination with adapter concatemers.
Limitations of ssDNA ligation and circularization of cDNA are mainly related to inefficient enzymatic reactions leading to permanent loss of cDNA molecules from the library pool and the large number of steps involved including extensive purifications to avoid unwanted amplification of unligated adapters, which otherwise can block subsequent amplification reactions through hybridisation with amplification primers.
There is a need for new methods of preparing a sequencing library that represents the full-length of the original RNA template for sequence analysis.
SUMMARY OF THE INVENTION
An aspect of the present invention provides a method for preparing a sequencing library from a ribonucleic acid (RNA) sample, the method comprising:
(a) obtaining a test sample comprising a plurality of template RNA or RNA precursors,
(b) providing a set of oligonucleotide adapters and primers, the set comprising a plurality of first adapters, a plurality of first strand cDNA synthesis primers, a plurality of second adapters, and a plurality of amplification primers, wherein each of the first adapters comprises (i) a 5’ primer binding domain and a 3’ poly A domain or (ii) a 5’ poly A domain and a 3’ primer binding domain; each of the first strand cDNA synthesis primers comprises an RNA hybridization domain complementary to the template RNA or to the first adapters (e.g., oligo(dT)), and said each of the first strand cDNA synthesis primers is covalently linked to magnetic beads; each of the second adapters comprises primer binding sites, and each of the amplification primers comprises sequencing platform adapter constructs;
(c) ligating the plurality of first adapters to the 3' end of the plurality of RNA precursors to generate template RNA;
(d) generating a plurality of solid-phase first strand cDNA through reverse transcription primed by the first strand cDNA synthesis primer starting from the plurality of template RNA of step (c) or of step (a);
(e) separating the solid-phase first strand cDNA from the plurality of template RNA;
(f) tailing the 3’ ends of the plurality of solid-phase first strand cDNA with non-template ribonucleotides;
(g) ligating the plurality of second adapters to 3' end of the plurality of solid-phase first strand cDNA;
(h) amplifying the plurality of solid-phase cDNA with amplification primers to generate a plurality of double stranded cDNA that are processed into a sequencing library through addition of sequencing platform adapter constructs.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 shows schematic representation of tailing and ligation of cDNA (TLC) strategy from full-length polyadenylated mRNA to obtain and amplify full-length cDNA. The first strand cDNA synthesis primer consists of a 3’ RNA hybridization domain (RHD) (e.g., oligo(dT)) and a 5’ primer binding site (PBS) containing any desirable sequence.
Figure 2 shows schematic representation of tailing and ligation of cDNA (TLC) strategy from full-length polyadenylated mRNA to obtain and amplify full-length cDNA, followed by tagmentation to generate fragments compatible with short-read sequencing platforms (a). The first strand cDNA synthesis primer consists of a 3’ RNA hybridization domain (RHD) (e.g., oligo(dT)), an adjacent post-tagmentation amplification primer binding domain and a 5’ pretagmentation amplification primer binding domain containing any desirable sequence. Variations in primer binding domains provided during tagmentation enable specific 5’ endcapture (b), or 3’ end-capture (c). TnRPl and TnRP2 present post-tagmentation amplification primer binding domains.
Figure 3 shows schematic representation of tailing and ligation of cDNA (TLC) strategy from fragmented RNA (a) or short RNA species (b), employing a random priming approach during reverse transcription (step 3). The first strand cDNA synthesis primer consists of a 3’ RNA hybridization domain (RHD) (e.g., a random er depicted as N) and a 5’ primer binding site (PBS) containing any desirable sequence.
Figure 4 shows schematic representation of tailing and ligation of cDNA (TLC) strategy from fragmented RNA (a) or short RNA species (b), employing polyadenylation of the precursor RNA prior to reverse transcription (step 3) to allow priming from the 3’ end of template RNA using oligo(dT) reverse transcription primers. The first strand cDNA synthesis
primer consists of a 3’ RNA hybridization domain (RHD) (e.g., oligo(dT)) and a 5’ primer binding site (PBS) containing any desirable sequence.
Figure 5 shows schematic representation of tailing and ligation of cDNA (TLC) strategy from fragmented RNA (a) or short RNA species (b), employing a ligation approach prior to reverse transcription (step 3), which adds a first adapter oligonucleotide to the 3’ end of the RNA precursor, introducing primer binding domains and hybridisation domains (e.g., a poly A stretch). Reverse transcription can then be initiated using first strand cDNA synthesis primers that contain a complementary sequence to the first adapter oligonucleotide (e.g., oligo(dT)).
Figure 6 shows a timecourse of capturing a poly-adenylated first adapter oligonucleotide on oligo(dT)25 Dynabeads.
Figure 7 shows denaturing 10% TBE-Urea PAGE of tailing reaction performed with different Terminal Transferases in the presence of ATP or dATP showing the addition of only a short ribotail in the presence of ribonucleotides.
Figure 8 shows denaturing 10% TBE-Urea PAGE of mock ligation, testing different experimental conditions. TLC ligation conditions are highlighted in black rectangle, eCLIP ligation conditions are highlighted in grey rectangle.
Figure 9 shows agarose gels of optimisation reactions to amplify cDNA from long template RNA. Efficiency to generate molecules of 4kb and 8kb length is shown for different reverse transcriptases and different reaction temperatures.
Figure 10 shows the percentage of usable reads out of total read fraction for different public CLIP libraries compared to TLC-CLIP libraries.
Figure 11 shows TLC libraries do not form concatemers in the absence of RNA input.
Figure 12 shows per base sequence content before (top) and after (bottom) homopolymer trimming of 1-2T bases to remove the overrepresentation of T nucleotides resulting from the ribotailing approach.
Figure 13 show the fraction of overlap between CLIP libraries produced with TLC and public eCLIP libraries, when restricting the comparison to peaks present on genes that have similar expression levels between 293T (TLC) and HepG2 (eCLIP) cells.
Figure 14 show the result of de novo motif discovery on RBFOX2 peaks, recapitulating the known binding motif and showing a larger fraction of peaks with motifs for CLIP libraries prepared with TLC compared to eCLIP.
Figure 15 shows the motif density at RBFOX2 peaks, comparing CLIP libraries prepared with TLC and eCLIP.
Figure 16 shows the percentage of reads carrying deletions in CLIP libraries compared with TLC and eCLIP.
Figure 17 shows a high correlation of crosslink-induced deletions at single nucleotide resolution between biological replicates.
Figure 18 shows the nucleotide resolution of crosslink-induced deletions when centred on the consensus motif.
Figure 19 shows that crosslink-induced deletions increase the specificity of CLIP libraries prepared with TLC by distinguishing between crosslinked fragments and co-purifying, non-crosslinked fragments.
Figure 20 shows the percentage of peaks that harbour the consensus motif, depending on the ratio of crosslink-induced deletions per peak.
Figure 21 shows that CLIP libraries prepared with TLC capture high-resolution position-dependent enrichment of RBPs from as little as 500 cells.
Figure 22 shows a representative example of the enrichment of modified RNA, in this case N6-methyladenosine (m6A), over input at specific transcript regions.
Figure 23 shows the dependency of deletions on UV crosslinking conditions (A) and their precise location at single-nucleotide positions (B) which allows the detection of the m6A core motif ‘GGAC’ using de novo motif discovery (C).
DETAILED DESCRIPTION OF THE INVENTION
All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. The publications and applications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. In addition, the materials, methods, and examples are illustrative only and are not intended to be limiting.
In the case of conflict, the present specification, including definitions, will control. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which the subject matter herein belongs. As used herein, the following definitions are supplied in order to facilitate the understanding of the present invention.
The term “comprise” is generally used in the sense of include, that is to say permitting the presence of one or more features or components. Also as used in the specification and claims, the language “comprising” can include analogous embodiments described in terms of “consisting of “ and/or “consisting essentially of’.
As used in the specification and claims, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise.
As used in the specification and claims, the term “and/or” used in a phrase such as “A and/or B” herein is intended to include “A and B”, “A or B”, “A”, and “B”.
A domain refers to a stretch of length of nucleic acid made up of a plurality of nucleotides, where the stretch of length provides a defined function to the nucleic acid. Examples of domains include primer binding domains, hybridization domains, barcode domains (such as source barcode domains), unique molecular identifier domains, sequencing
adaptor domains, sequencing indexing domains, etc. While the length of a given domain may vary, in some instances the length ranges from 1 to 100 nt, such as 5 to 50 nt.
Amplification primer binding domains are domains that are configured to bind via hybridization to an amplification primer.
Tagmentation involves fragmentation of double-stranded DNA and simultaneous tagging with primer binding domains and is employed in many next generation sequencing protocols. Pre-tagmentation amplification primer binding domains are domains which are configured to bind to pre-tagmentation amplification primers during an amplification that occurs before a tagmentation step, e.g., a cDNA amplification protocol which occurs prior to a tagmentation step. Post-tagmentation amplification primer binding domains are domains which are configured to bind to post-tagmentation amplification primers during an amplification that occurs after a tagmentation step, e.g., a tagmented sample amplification protocol which occurs after a tagmentation step.
A barcode domain is a domain that serves as an identifier of a nucleic acid. Barcode domains may vary, wherein examples include RNA source barcode domains, e.g., cell barcode domains, host barcode domains, etc.; container barcode domains, such as plate or well barcode domains; in-line barcode domains, indexing barcode domains, etc.
Unique Molecular Identifiers are employed in many next generation sequencing applications. Unique Molecular Identifiers (i.e., UMIs) are randomers of varying length, e.g., ranging in length in some instances from 6 to 12 nts, that can be used for counting of individual molecules of a given molecular species. Counting is achieved by attaching UMIs from a diverse pool of UMIs to individual molecules of a target of interest such that each individual molecule receives a unique UMI. By counting individual transcript molecules, PCR bias can be reduced during NGS library preparation and a more quantitative understanding of the sample population can be achieved. See e.g., U.S. Pat. No. 8,835,358; Fu et al., “Molecular Indexing Enables Quantitative Targeted RNA Sequencing and Reveals Poor Efficiencies in Standard Library Preparations,” PNAS (2014) 5: 1891-1896 and Fu et al., “Digital Encoding of Cellular mRNAs Enabling Precise and Absolute Gene Expression Measurement by Single-Molecule Counting,” Anal. Chem (2014)86:2867-2870.
The term “complementary” as used herein refers to a nucleotide sequence that basepairs by non-covalent bonds to all or a region of a target nucleic acid (e.g., a template RNA or other region of the double stranded product nucleic acid). In the canonical Watson-Crick basepairing, adenine (A) forms a basepair with thymine (T), as does guanine (G) with cytosine (C) in DNA. In RNA, thymine is replaced by uracil (U). As such, A is complementary to T and G is complementary to C. In RNA, A is complementary to U and vice versa. Typically, “complementary” refers to a nucleotide sequence that is at least partially complementary. The term “complementary” may also encompass duplexes that are fully complementary such that every nucleotide in one strand is complementary to every nucleotide in the other strand in corresponding positions. In certain cases, a nucleotide sequence may be partially complementary to a target, in which not all nucleotides are complementary to every nucleotide in the target nucleic acid in all the corresponding positions. For example, a primer may be perfectly i.e., 100%) complementary to the target nucleic acid, or the primer and the target nucleic acid may share some degree of complementarity which is less than perfect (e.g., 70%, 75%, 85%, 90%, 95%, 99%). The percent identity of two nucleotide sequences can be determined by aligning the sequences for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first sequence for optimal alignment). The nucleotides at corresponding positions are then compared, and the percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity = #of identical positions / total # of positions xlO). When a position in one sequence is occupied by the same nucleotide at the corresponding position in the other sequence, then the molecules are identical at that position. A non-limiting example of such a mathematical algorithm is described in Karlin et al., Proc Nati. Acad. Sci USA 90:5873-5877 (1993)/ Such an algorithm is incorporated into the NBLAST and XBLAST programs (version 2.0) as described in Altschul et al., Nucleic Acids Res. 25:389-3402 (1977). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., NBLAST) can be used. In one aspect, parameters for sequence comparison can be set at score=100, wordlength=12, or can be varied (e.g., wordlength=5orwordlength=20).
As used herein, the term “hybridization conditions” means conditions in which a primer specifically hybridizes to a region of the target nucleic acid (e.g., a template RNA or other region of the double stranded product nucleic acid). Whether a primer specifically hybridizes to a target nucleic acid is determined by such factors as the degree of complementarity between the polymer and the target nucleic acid and the temperature at which the hybridization occurs,
which may be informed by the melting temperature (Tu) of the primer. The melting temperature refers to the temperature at which half of the primer-target nucleic acid duplexes remain hybridized and half of the duplexes dissociate into single strands. The Tm of a duplex may be experimentally determined or predicted using the following formula Tm=81. 5+16.6 (logl0[Na+])+0.41 (fraction G+C)-(60/N), where N is the chain length and [Na+] is less than IM. See Sambrook and Russell (2001; Molecular Cloning: A Laboratory Manual, 3rd ed, Cold Spring Harbor Press, Cold Spring Harbor N.Y., Ch. 10). Other more advanced models that depend on various parameters may also be used to predict Tm of primer/target duplexes depending on various hybridization conditions. Approaches for achieving specific nucleic acid hybridization may be found in, e.g., Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes, part I, chapter 2, “Overview of principles of hybridization and the strategy of nucleic acid probe assays.” Elsevier (1993).
A “poly(A)” is a polyA-sequence. The poly(A) sequence is commonly known as a tail that consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA or DNA that has adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA (mRNA) for translations.
A “template RNA” or “RNA template” refers to a ribonucleic acid (RNA) molecule which serves as template during reverse transcription, during which an RNA-dependent DNA polymerase, or reverse transcriptase, synthesizes a complementary DNA (cDNA). The template RNA needs to contains a known or desired sequence which can hybridize with a first strand cDNA synthesis primer that is then extended in 5’ - 3’ direction during the reverse transcription reaction, resulting in a first strand cDNA with the first strand cDNA synthesis primer at its 5’ end, followed by a domain complementary to the template RNA at its 3’ end.
A “precursor RNA” or “RNA precursor” refers to a ribonucleic acid (RNA) molecule, or fragments thereof, which requires the addition of nucleotides to its 3’ ends, e.g., through polyadenylation or adapter ligation, before it can serve as template RNA during reverse transcription.
Enzymatic ligation of oligonucleotides is a standard procedure in numerous protocols for oligonucleotide manipulation and is required for sequencing, cloning and many other DNA- and RNA-based technologies. The enzymes involved in the catalysis of the ligation reaction
from phosphodiester bonds between 5’-phosphate ends of DNA or RNA and 3’-hydroxyl ends. The ligation reaction can join any 5’-phosphae with any 3 ’-hydroxyl end, since the reaction is not sequence specific. This lack of substrate specificity is a major advantage for a broad general application of enzymatic ligations and has contributed to its wide application.
As used herein, the term "RNA modifications" refers to a broad class of chemical modifications that can occur on RNA molecules, including messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). These modifications can include methylation, acetylation, phosphorylation, oxidation, and other chemical changes that alter the structure, stability, and function of RNA. The modifications can occur at various sites within RNA molecules, including the base, sugar, and phosphate moieties, and can affect various aspects of RNA biology, such as gene expression regulation, RNA processing, and translation.
As used herein, the term "antibody" refers to a type of protein that can specifically recognize and bind to a particular antigen, such as a protein, peptide, or specific nucleic acid modification.
As used herein, the term “non-template ribonucleotides” refers to the terminal transferase catalyzed addition of ribonucleotide to the 3’ end of solid-phase first strand cDNA without base-pairing to a template strand.
As used herein, the term “homonucleotide stretch” refers to a stretch of length of nucleic acid made up of the same nucleotide (e.g., all dCTP, all dGTP, all dTTP, all dATP, all CTP, all GTP, all UTP, or all ATP).
As used herein, the term “heteronucleotide stretch” refers to a stretch of length of nucleic acid made up of a plurality of nucleotides.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the methods. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges and are also encompassed within the methods, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also
included in the methods. Certain ranges are presented herein with numerical values being preceded by the term “about.” The term “about” is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.
It is appreciated that certain features of the methods, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the methods, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. All combinations of the embodiments are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed, to the extent that such combinations embrace operable processes and/or devices/systems/kits. In addition, all sub-combinations listed in the embodiments describing such variables are also specifically embraced by the present methods and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein. As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present methods. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
As disclosed herein, the invention provides methods of preparing a sequencing library from a ribonucleic acid (RNA) sample. Sequencing libraries produced by methods of the invention are those whose nucleic acid members include a partial or complete sequencing platform adapter sequence at their termini useful for sequencing using a sequencing platform of interest. Sequencing platforms of interest include, but are not limited to, HiSeq, MiSeq, NextSeq and NovaSeq sequencing systems from Illumina®; the PACBIO RS II Sequel systems form Pacific Biosciences; the SOLiD sequencing systems from Life Technologies™; the MinlON™, GridlON™ and PromethlON™ system from Oxford Nanopore, or any other sequencing platform of interest.
Methods of preparing sequencing libraries form a ribonucleic acid (RNA) sample are provided. Aspects of the present invention include combining the RNA sample with a first strand cDNA synthesis primer under first strand cDNA synthesis conditions, where in some embodiments the first strand cDNA synthesis primer contains primer binding domains and is complementary to sequences within the RNA sample itself, whereas in other embodiments the first strand cDNA synthesis primer is complementary to specific sequences that contain primer binding domains and were added to the 3’ end of an RNA precursor prior to cDNA synthesis. The resultant first-strand cDNA is combined with one type of nucleoside triphosphates (NTP) and a second adapter oligonucleotide under conditions that allow 3’ tailing and ligation of first- strand cDNA to the second adapter oligonucleotide, which contains primer binding domains. The resultant product is sufficient to produce double-stranded cDNA, which in one embodiment can then be subjected to amplification conditions, e.g., PCR amplification, using first and second amplification primers that include sequencing adaptor constructs to generate libraries for the desired sequencing platform, whereas in other embodiments, the resultant doublestranded cDNA is subjected to tagmentation prior to amplification. Aspects of the invention further include compositions produced by the methods and kits that find use in practicing the methods.
An aspect of the present invention provides a method for preparing a sequencing library from a ribonucleic acid (RNA) sample, the method comprising:
(a) obtaining a test sample comprising a plurality of template RNA or RNA precursors;
(b) providing a set of oligonucleotide adapters and primers, the set comprising a plurality of first adapters, a plurality of first strand cDNA synthesis primers, a plurality of second adapters, and a plurality of amplification primers, wherein each of the first adapters comprises (i) a 5’ primer binding domain and a 3’ poly A domain or (ii) a 5’ poly A domain and a 3’ primer binding domain; each of the first strand cDNA synthesis primers comprises an RNA hybridization domain complementary to the template RNA or to the first adapters (e.g., oligo(dT)), and said each of the first strand cDNA synthesis primers is covalently linked to magnetic beads; each of the second adapters comprises primer binding sites, and each of the amplification primers comprises sequencing platform adapter constructs;
(c) ligating the plurality of first adapters to the 3' end of the plurality of RNA precursors to generate template RNA;
(d) generating a plurality of solid-phase first strand cDNA through reverse transcription primed by the first strand cDNA synthesis primer starting from the plurality of template RNA of step (c) or of step (a);
(e) separating the solid-phase first strand cDNA from the plurality of template RNA;
(f) tailing the 3’ ends of the plurality of solid-phase first strand cDNA with non-template ribonucleotides;
(g) ligating the plurality of second adapters to 3' end of the plurality of solid-phase first strand cDNA;
(h) amplifying the plurality of solid-phase cDNA with amplification primers to generate a plurality of double stranded cDNA that are processed into a sequencing library through addition of sequencing platform adapter constructs.
In some embodiments of the method for preparing a sequencing library disclosed herein, each template RNA of step (a) already contains a known sequence, such as 3’ polyA domain or specific target sequences of interest as primer binding domain, to serve as hybridization domain; therefore it does not require the ligation of first adapters to the 3’ end of each template RNA of step (c).
In some embodiments of the method for preparing a sequencing library disclosed herein, the precursor RNA is fragmented.
In some embodiments of the method for preparing a sequencing library disclosed herein, nucleotides are added to the 3’ end of the precursor RNA through polyadenylation or ligation.
In other embodiments of the method for preparing a sequencing library disclosed herein, each of the first adapters and/or each of the first strand cDNA synthesis primers further comprise a sample barcode and/or unique molecular identifier.
In other embodiments of the method for preparing a sequencing library disclosed herein, each of the second adapters further comprises a sample barcode and/or unique molecular identifier.
In some embodiments of the method for preparing a sequencing library disclosed herein, each of the second adapters further comprises a sequencing read primer domain.
In some embodiments of the method for preparing a sequencing library disclosed herein, each of the first strand cDNA synthesis primers is not covalently linked to magnetic beads.
In some further embodiments of the method for preparing a sequencing library disclosed herein, the method further comprises tagmenting the plurality of the double stranded cDNA of step (h) with transposomes to generate a tagmented sample, wherein the transposomes comprise a transposase and a transposon nucleic acid; and wherein the transposon nucleic acid comprises a transposon end domain and a second post-tagmentation amplification primer binding domain.
In some embodiments of the method for preparing a sequencing library disclosed herein, the RNA hybridization domain comprises a heteronucleotide stretch.
In some embodiments of the method for preparing a sequencing library disclosed herein, any of the provided oligonucleotide adapters comprise one or more nucleotide analogs.
In some embodiments of the method for preparing a sequencing library disclosed herein, the template RNA or the RNA precursor is messenger RNA.
In some embodiments of the method for preparing a sequencing library disclosed herein, the RNA hybridization domain of each of the first strand cDNA synthesis primers is primed using randomers.
In some embodiments of the method for preparing a sequencing library disclosed herein, the method further comprises pooling the plurality of first adapters ligated to the plurality of RNA precursors.
In some embodiments of the method for preparing a sequencing library disclosed herein, the method further comprises pooling the plurality of solid-phase first strand cDNA.
In some embodiments of the method for preparing a sequencing library disclosed herein, the test sample comprising a plurality of template RNA or precursor RNA is obtained from a single cell.
In some embodiments of the method for preparing a sequencing library disclosed herein, the method further comprises subjecting the sequencing library to a sequencing protocol.
In some embodiments of the method for preparing a sequencing library disclosed herein, the method further comprises quantitating one or more RNA species of the test sample.
In some embodiment, the methods of the disclosure can be performed according to the schematic diagrammed in FIG. 1. As illustrated in FIG. 1, an RNA sample (squiggly line) can be combined with a reverse transcriptase (not shown), dNTPs (not shown), and a first strand cDNA synthesis primer covalently linked to a magnetic bead, in a reaction mixture under first strand cDNA synthesis conditions, e.g., conditions sufficient to produce a double stranded product nucleic acid that includes the template RNA hybridized to the first strand complementary deoxyribonucleic acid (cDNA), where the first strand cDNA is covalently linked to the magnetic bead and includes the first strand cDNA synthesis primer containing a primer binding domain at its 5’ end and a newly synthesized length or portion that is complementary to domains found in the template RNA. The resultant first-strand cDNA is separated from the template RNA and is contacted with one source of nucleotide triphosphates (e.g., ATP) (not shown), a terminal transferase (not shown), T4 RNA ligase (not shown) and a second adapter oligonucleotide which includes a primer binding domain (PBS) under tailing and ligation conditions, e.g., conditions sufficient to ribotail and ligate the 3’ end of cDNA, which includes the addition of non-template ribonucleotides to the 3’ end of cDNA (depicted as (rA)i-s) followed by ligation of the second adapter oligonucleotide to the 3’ end of cDNA. The resulting first-strand cDNA can be amplified with a primer that binds the primer binding domain of the second adapter oligonucleotide, generating full-length double-stranded cDNA that can be amplified and uncoupled from magnetic beads with primers that bind the primer binding domains at both ends of the cDNA, which can include additional sequencing adaptor sequences, such as the P5 and P7 sequences, as well as the forward and reverse indexes (e.g., i5, i7) for sequencing, as desired.
In another embodiment of the method for preparing a sequence library of the invention, each of the first adapters and/or each of the second adapter oligonucleotides further comprises a sample barcode and/or unique molecular identifier.
In a preferred embodiment of the method of preparing a sequence library of the invention, the cDNA synthesis primers are covalently linked to magnetic beads generating solid-phase first-strand cDNA. In another embodiment, the cDNA synthesis primers are uncoupled, requiring additional purification procedures in-between aspects of the invention.
By “conditions sufficient to produce a double stranded product nucleic acid” is meant reaction conditions that permit hybridization of the first strand cDNA synthesis primer to the template RNA and polymerase-mediated extension of its 3’ end. Achieving suitable reaction conditions may include selecting reaction mixture components, concentrations thereof, and a reaction temperature to create an environment in which the polymerase is active and the relevant nucleic acids in the reaction interact (e.g., hybridize) with one another in the desired manner. For example, in addition to the template RNA, the polymerase, the first strand cDNA synthesis primer and dNTPS, the reaction mixture may include buffer components that establish an appropriate pH, salt concentrations (e.g., KC1 concentration), metal cofactor concentration (e.g., Mg2+ or Mn2+ concentration), and the like, for the extension reaction to occur. Other components may be included, such as one or more nuclease inhibitors (e.g., an RNase inhibitor and/or a DNase inhibitor), one or more additives for facilitating amplification/replication of GC rich sequences, (e.g., betaine, DMSO, ethylene glycol, 1,2-propanediol, or combinations thereof), one or more molecular crowding agents (e.g., polyethylene glycol, Ficoll, dextran or the like), one or more enzyme-stabilizing components (e.g., DTT, or TCEP, present at a final concentration ranging from 0.1 to 10 mM (e.g., 1 mM)), and/or any other reaction mixture components useful for facilitating polymerase-mediated extension reaction. The reaction mixture can have a pH suitable for the primer extension reaction, which in certain embodiments can range from 5 to 9, such as from 7 to 9, including from 8 to 9, e.g., 8 to 8.5. In some instances, the reaction mixture includes a pH adjusting agent. pH adjusting agents of interest include, but are not limited to sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution and the like. For example, the pH of the reaction mixture can be adjusted to the desired range by adding an appropriate amount of the pH adjusting agent. The temperature range suitable for production of the double stranded product nucleic acid may vary according to factors such as the particular polymerase employed, the melting temperatures of any optional primers employed, etc. According to one embodiment, the polymerase is a reverse transcriptase (e.g., an MMLV mutant such as SuperScript® IV reverse transcriptase from ThermoFisher®) and the reaction mixture conditions sufficient to produce the double stranded
product nucleic acid include bringing the reaction mixture to a temperature ranging from 4C to 72C, such as from 16C to 70C, e.g., 37C to 50C, including 50C.
By “conditions sufficient to ribotail and ligate the 3’ end of cDNA” is meant reaction conditions that permit the terminal transferase-mediated extension of 3’ end of cDNA with nontemplate NTPs (e.g., ATP), followed by ligation of the second adapter oligonucleotide to the 3’ end of cDNA. Achieving suitable reaction conditions may include selecting reaction mixture components, concentrations thereof, and a reaction temperature to create an environment in which the terminal transferase and RNA ligase are active in the desired manner. For example, in addition to the first strand cDNA, the terminal transferase, the RNA ligase, the second adapter oligonucleotide and one source of NTPs (e.g., ATP), the reaction mixture may include buffer components that establish an appropriate pH, salt concentrations (e.g., KC1 concentration), metal cofactor concentration (e.g., Mg2+ or Mn2+ concentration), and the like, for the extension and ligation reaction to occur. Other components may be included, such as nuclease inhibitors (e.g., a DNase inhibitor), one or more additives that inhibit secondary structures, (e.g., betaine, DMSO, ethylene glycol, 1,2-propanediol, or combinations thereof), one or more molecular crowding agents (e.g., polyethylene glycol, Ficoll, dextran or the like), and/or any other reaction mixture components useful for facilitating tailing and ligation. The reaction mixture can have a pH suitable for the ligation reaction, which in certain embodiments can range from 5 to 9, such as from 7 to 9, including from 8 to 9, e.g., 7 to 8. In some instances, the reaction mixture includes a pH adjusting agent. pH adjusting agents of interest include, but are not limited to sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution and the like. For example, the pH of the reaction mixture can be adjusted to the desired range by adding an appropriate amount of the pH adjusting agent. The temperature range suitable for tailing and ligation may vary and include a temperature ranging from 4C to 37C. According to one embodiment, the terminal transferase is a terminal deoxynucleotidyl transferase (e.g., TdT from Takara®), which catalyzes the template-independent incorporation of ribonucleotides into the 3 ’-OH termini of single strand cDNA and is added to the reaction mixture to a final concentration from 0.1 to 10 units/ul (U/ul). The RNA ligase (e.g., T4 RNA Ligase 1) catalyzes the ligation of a 5’ phosphoryl-terminated nucleic acid donor (e.g., the second adapter oligonucleotide) to the 3 ’-OH termini of single strand cDNA through the formation of a 3 ’-5’ phosphodiester bond with hydrolysis of ATP to AMP and PPi, and is added to the reaction mixture to a final concentration from 1 to 50 units/ul (U/ul, e.g., 2.25 U/ul).
The template ribonucleic acid (RNA) or RNA precursor within the RNA sample or the test sample may be a polymer of any length composed of ribonucleotides. The template RNA or precursor RNA may be any type of RNA (or sub-type thereof), including but not limited to, a messenger RNA (mRNA), a microRNA (miRNA), a small interfering RNA (siRNA), a transacting small interfering RNA (ta-siRNA), a natural small interfering RNA (nat-siRNA), a ribosomal RNA (rRNA), a transfer RNA (tRNA), a small nucleolar RNA (snoRNA), a small nuclear RNA (snRNA), a long non-coding RNA (IncRNA), a non-coding RNA (ncRNA), a transfer-messenger RNA (tmRNA), a precursor messenger RNA (pre-mRNA), a small Cajal body-specific RNA (scaRNA), a piwi-interacting RNA (piRNA), an endoribonuclease- prepared siRNA (esiRNA), a small temporal RNA (stRNA), a signal recognition RNA, a telomere RNA, a ribozyme, or any combination of RNA types thereof or subtypes thereof.
The template RNA or RNA precursor may be subject to a variety of chemical modifications that can alter its structure, function, stability, or interactions with other molecules. Such modifications include, but are not limited to, methylation, acetylation, phosphorylation, oxidation, deamination, ribose methylation, uridine isomerization, pseudouridylation, and many others. These modifications can occur at various positions of the RNA molecule, including the bases, sugars, or phosphate backbone, and can be catalyzed by various enzymes or chemical reagents.
The RNA sample or test sample that includes the template RNA or RNA precursor may be combined into the reaction mixture in an amount sufficient for producing the product nucleic acid. In some embodiments, the RNA sample or test sample that includes the template RNA or RNA precursor is isolated from 1 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more cells, such as 750 or more, 1000 or more, 2000 or more cells, including 5000 or more cells.
The template RNA or RNA precursor may be present in any nucleic acid sample of interest, including but not limited to, a nucleic acid sample isolated from a single cell, a plurality of cells (e.g., cultured cells), a tissue, an organ, a body fluid, and/or an organism (e.g., bacteria, yeast, or higher eukaryotic organisms, such as aa plant, or a mouse, or a worm or the like). In certain aspects, the nucleic acid sample is isolated from a cell(s), tissue, organ, and/or the like of a mammal (e.g., a human, a rodent (e.g., a mouse), or any other mammal of interest). In other aspects, the sample may be isolated from a bodily compartment suitable for use in diagnosis, such as blood, urine, saliva, platelets, microvesicles, exosomes, serum, or other bodily fluids.
In other aspects, the nucleic acid sample is isolated form a source other than a mammal, such as bacteria, yeast, insects (e.g., drosophila), amphibians (e.g., frogs (e.g., Xenopus)), viruses, plants, or any other non-mammalian nucleic acid sample source.
In some embodiments, the test sample is a biological sample, such as a tissue and/or body fluid sample or a combination thereof. Biological samples in accordance with embodiments of the invention can be collected in any clinically acceptable manner. In some embodiments, a biological sample can comprise a tissue, a body fluid, or a combination thereof. In some embodiments, a biological sample is collected from a healthy subject. In some embodiments, a biological sample is collected from a subject who is known to have a particular disease or disorder (e.g., a particular cancer or tumor). In some embodiments, a biological sample is collected from a subject who is suspected of having a particular disease or disorder.
As used herein, the term "tissue" refers to a mass of connected cells and/or extracellular matrix material(s). Non-limiting examples of tissues that are commonly used in conjunction with the present methods include skin, hair, fingernails, endometrial tissue, nasal passage tissue, central nervous system (CNS) tissue, neural tissue, eye tissue, liver tissue, kidney tissue, placental tissue, mammary gland tissue, gastrointestinal tissue, musculoskeletal tissue, genitourinary tissue, bone marrow, and the like, derived from, for example, a human or nonhuman mammal. Tissue samples in accordance with embodiments of the invention can be prepared and provided in the form of any tissue sample types known in the art, such as, for example and without limitation, formalin-fixed paraffin-embedded (FFPE), fresh, and fresh frozen (FF) tissue samples.
As used herein, term "body fluid" refers to a liquid material derived from a subject, e.g., a human or non-human mammal. Non-limiting examples of body fluids that are commonly used in conjunction with the present methods include mucous, blood, plasma, serum, serum derivatives, synovial fluid, lymphatic fluid, bile, phlegm, saliva, sweat, tears, sputum, amniotic fluid, menstrual fluid, vaginal fluid, semen, urine, cerebrospinal fluid (CSF), such as lumbar or ventricular CSF, gastric fluid, a liquid sample comprising one or more material(s) derived from a nasal, throat, or buccal swab, a liquid sample comprising one or more materials derived from a lavage procedure, such as a peritoneal, gastric, thoracic, or ductal lavage procedure, and the like.
In some embodiments, a biological sample can comprise a fine needle aspirate or biopsied tissue. In some embodiments, a biological sample can comprise media containing cells or biological material. In some embodiments, a biological sample can comprise a blood clot, for example, a blood clot that has been obtained from whole blood after the serum has been removed. In some embodiments, a biological sample can comprise stool. In one preferred embodiment, a biological sample is drawn whole blood. In one aspect, only a portion of a whole blood sample is used, such as plasma, red blood cells, white blood cells, and platelets. In some embodiments, a biological sample is separated into two or more component parts in conjunction with the present methods. For example, in some embodiments, a whole blood sample is separated into plasma, red blood cell, white blood cell, and platelet components.
In some embodiments, a sample includes a plurality of nucleic acids not only from the subject from which the sample was taken, but also from one or more other organisms, such as viral or bacterial DNA/RNA that is present within the subject at the time of sampling.
Approaches, reagents and kits for isolating RNA form such sources are known in the art. For example, kits for isolating RNA from a source of interest are commercially available. In certain aspects, the RNA is isolated from a fixed biological sample, e.g., formalin-fixed, paraffin-embedded (FFPE) tissue. RNA from FFPE tissue may be isolated using commercially available kits.
In some embodiments as depicted in FIG. 3a, the subject methods include producing the template RNA from a precursor RNA. For example, when it is desirable to control the size of the template RNA that is combined into the reaction mixture, an RNA sample containing RNA precursors from a source of interest may be subjected to shearing/fragmentation, e.g., to generate a sample that includes template RNAs that are shorter in length as compared to precursor no-sheared RNAs (e.g., full-length mRNAs) in the original sample. In some embodiments, the RNA may be used directly from the lysed cell by placing the cell in a suitable buffer (e.g., a hypotonic solution), optionally in the presence of detergent (e.g., Tween-20, Triton X100, NP40, and/or IgepalCA-630), so as to lyse the cell. RT reaction components may then be added directly to the lysate without further isolation to generate cDNA from the cellular RNA. The template RNA may be generated by a shearing/fragmentation strategy including, but not limited to, passing the sample one or more times through a micropipette tip or fine-gauge needle, nebulizing the sample, sonicating the sample (e.g., using Bioruptor, Branson, or Covaris
sonicator), bead-mediated shearing, enzymatic shearing (e.g., using one or more RNA-shearing enzymes, or by enzymatic digestions, e.g., with restriction enzymes or other endonucleases appropriate for the polynucleotides of interest, including, but not limited to, RNase A, RNase I, RNase Tl, and MNase), chemical based fragmentation, e.g., using divalent cations, fragmentation buffer (which may be used in combination with heat) or any other suitable approach for shearing/fragmenting a precursor RNA to generate a shorter template RNA. In certain aspects, the template RNA generated by shearing/fragmentation of a starting nucleic acid sample has a particular length, as appropriate for the sequencing platform chosen.
Additional strategies for producing a template RNA from a precursor RNA may be employed as depicted in FIG. 4. For example, producing a template RNA may include adding nucleotides to an end of the precursor RNA. In certain aspects, the precursor RNA is a nonpolyadenylated RNA (e.g., a microRNA, small RNA, or the like), and producing the template RNA includes adenylating (e.g., polyadenylating) the precursor RNA. Adenylating the precursor RNA may be performed using any convenient approach. According to certain embodiments, the adenylation is performed enzymatically, e.g., using Poly(A) polymerase or any other enzyme suitable for catalyzing the incorporation of adenine residues at the 3 ’ terminus of the precursor RNA. Reaction mixtures for carrying out the adenylation reaction may include any useful components, including but not limited to, a polymerase, a buffer (e.g., a Tris-HCL buffer), one or more metal cations (e.g., MgCL2, MnCL2, or combinations thereof), a salt (e.g., NaCl), one or more enzyme-stabilizing components (e.g., DTT), ATP, and any other reaction components useful for facilitating the adenylation of a precursor RNA. The adenylation may be carried out at a temperature (e.g., 30C - 50 C, such as 37C) and pH (e.g., pH 7 - pH 8.5, such as pH 7.9) compatible with the polymerase being employed, e.g., polyA polymerase.
In another embodiment, illustrated in FIG. 5, approaches for adding nucleotides to a precursor RNA include ligation-based strategies, where an RNA target can be combined with an RNA ligase (e.g., T4 RNA ligase) and a first adapter oligonucleotide, which contains an amplification primer and a hybridization sequence domain complementary to the cDNA synthesis primer (e.g., polyA) under ligation conditions, e.g., conditions sufficient to produce a product chimeric nucleic acid that includes the template RNA with the first adapter oligonucleotide at its 3’ end. By “conditions sufficient to produce a product chimeric nucleic acid” is meant reaction conditions that permit the ligation of the first adapter oligonucleotide to the 3’ end of the RNA precursor, catalyzed by the RNA ligase. Achieving suitable reaction
conditions may include selection reaction mixture components, concentrations thereof, and a reaction temperature to create an environment in which the terminal transferase and RNA ligase are active in the desired manner. For example, in addition to the RNA precursor, first adapter oligonucleotide, and the RNA ligase, the reaction mixture may include buffer components that establish an appropriate pH, salt concentrations (e.g., KC1 concentration), metal cofactor concentration (e.g., Mg2+ or Mn2+ concentration), and the like, for the ligation reaction to occur. Other components may be included, such as nuclease inhibitors (e.g., a RNase inhibitor), one or more additives that inhibit secondary structures, (e.g., betaine, DMSO, ethylene glycol, 1,2- propanediol, or combinations thereof), one or more molecular crowding agents (e.g., polyethylene glycol, Ficoll, dextran or the like), and/or any other reaction mixture components useful for facilitating tailing and ligation. The reaction mixture can have a pH suitable for the ligation reaction, which in certain embodiments can range from 5 to 9, such as from 7 to 9, including from 8 to 9, e.g., 7 to 8. In some instances, the reaction mixture includes a pH adjusting agent. pH adjusting agents of interest include, but are not limited to sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution and the like. For example, the pH of the reaction mixture can be adjusted to the desired range by adding an appropriate amount of the pH adjusting agent. The temperature range suitable for ligation may vary and include a temperature ranging from 4 °C to 37 °C. According to one embodiment, the RNA ligase is T4 RNA Ligase I, which catalyzes the ligation of a 5’ pre-adenylated nucleic acid donor (e.g., the first adapter oligonucleotide) to the 3 ’-OH termini of the RNA precursor through the formation of a 3 ’-5’ phosphodiester bond, and is added to the reaction mixture to a final concentration from 1 to 50 units/ul (U/ul, e.g., 2.25 U/ul).
In such embodiment, the test sample is obtained by a method for purifying ribosome nascent-chain complexes of a biological sample of interest to obtain ribosome-coated mRNA fragments that serve as RNA precursors for the methods described herein.
In another embodiment, the test sample is obtained by a method for purifying an RNA molecule from a biological sample, where the RNA molecule carries a particular modification of interest, comprising:
(I) cleaving the RNA molecule by contacting the biological sample with an agent capable of cleaving the phosphodiester bond, thereby generating a fragment of the RNA molecule, wherein the majority of fragments is around 100 nucleotides in length;
(II) contacting the RNA fragment in said biological sample with a molecule that specifically interacts with a particular modification of interest, wherein said molecule can be a protein such as an antibody;
(III) contacting the biological sample with an agent that creates a covalent bond between the RNA molecule and the molecule that specifically interacts with the modification of interest, thereby generating a covalently bound complex containing the RNA with the modification of interest;
(IV) purifying the complex obtained in step (III) to provide RNA fragments containing the modification of interest, wherein said RNA fragments are used as precursor RNA in the method for preparing a sequence library disclosed herein.
In an embodiment, the agent capable of cleaving a phosphodiester bond in step (I) is a chemical agent, such as divalent cations (e.g., zinc, magnesium), which can catalyze the cleavage of the phosphodiester bond under specific conditions. The use of divalent cations is advantageous in that they can be easily removed from the reaction mixture, minimizing potential interference with downstream analysis. In another embodiment, enzymatic fragmentation is used, where a ribonuclease or other suitable enzyme is added to the biological sample to cleave the RNA molecule at the site of interest. This approach allows for site-specific cleavage of the RNA molecule and can be optimized to achieve high specificity and efficiency. In another embodiment, heat fragmentation, which involves heating the RNA molecule to high temperatures, can also be used to break the phosphodiester bond and generate RNA fragments for downstream analysis.
In another embodiment, the test sample is obtained by a method for purifying an RNA molecule interacting with an RNA binding protein (RBP) of interest in a biological sample, comprising:
(1) contacting the biological sample with an agent that creates a covalent bond between the RNA molecule and the RBP of interest, thereby generating a covalently bound RBP-RNA complex containing the RNA molecule;
(2) cleaving the RNA molecule by contacting the RBP-RNA complex with an agent capable of cleaving a bond thereof, thereby generating a fragment of the RNA molecule, wherein the fragment is at least 22 nucleotide bases in length;
(3) selecting the RBP-RNA fragment complex in said biological sample with a molecule that specifically interacts with a component of the RBP-RNA fragment complex; and
(4) purifying the RBP-RNA fragment complex obtained in step (3) to provide RNA fragments interacting with the RBP of interest, wherein said RNA fragments are used as precursor RNA in the method for preparing a sequence library disclosed herein.
In an embodiment, the agent capable of cleaving a bond in step (2) is a nuclease, including but not limited to, RNase A, RNase I, RNase T1 and/or MNase.
In an embodiment, purifying the RNA complex of step (IV) and (4) of the abovedisclosed methods for obtaining test samples comprises a chromatographic method.
In another embodiment, purifying the RNA-protein complex of step (IV) and (4) of the above-disclosed methods for obtaining test samples is performed under stringent conditions comprising:
(i) washing the complexes with buffer at least 5 times;
(ii) boiling the complexes in a denaturing ionic detergent;
(iii) separating the complexes by SDS-PAGE;
(iv) transferring said complexes to a substrate that preferentially binds RNA covalently crosslinked to protein over RNA not covalently crosslinked to protein; and
(v) digesting said protein with a protease to liberate said fragments of RNA from said RNA-protein complexes
In another embodiment, purifying the RNA complex of step (IV) and (4) of the abovedisclosed methods for obtaining test samples is performed using hybridization or affinity capture of nucleotides.
In one embodiment, the covalent bond of step (1) and (III) of the above-disclosed methods for obtaining test samples is formed with irradiation. The source of irradiation may emit, in one embodiment, radiation of a discrete wavelength. In another embodiment, the source may emit radiation dispersed throughout a region of the electromagnetic radiation spectrum. In another embodiment, the source may emit a mixture of radiation, some of which is of a discrete
wavelength, and some of which is dispersed throughout a region of the electromagnetic radiation spectrum.
In one embodiment, the irradiation may result from a polychromatic irradiation source. Polychromatic refers, in one embodiment, to a source that emits radiation of various wavelengths. Such wavelengths may be anywhere in the electromagnetic radiation spectrum. The radiation emission spectra of various types of irradiation sources are known in the art.
In another embodiment, the irradiation may result from a monochromatic irritation source. Monochromatic refers, in one embodiment, to a source that emits radiation of a single wavelength. In another embodiment, monochromatic refers to a source that emits radiation primarily of a single wavelength.
In another embodiment, the irradiation may result from a mercury light. Mercury lamps emit radiation of 254 nm, and may also have polychromatic background emissions at other discrete wavelengths, e.g., 313 nm, 365 nm, 405 nm, 436 nm, 546 nm, 579 nm, 1015 nm and 1140 nm. This is a fairly unique characteristic of these types of lamps (see U.S. Pat. No. 6,611,375).
In another embodiment, the irradiation may result from a two-photon excitation apparatus (So P T et al, Cell Mol Bio (Noisy le grand) 44:771). In this technique, small structures are formed by multiple photon-induced polymerization or cross-linking of a precursor composition. “Multiple photon” as used herein means, in one embodiment, the simultaneous absorption of multiple photons by a reactive molecule. This method is described in detail in U.S. Pat. No. 6,316,153 and references therein.
In one embodiment, the irradiation used to form the covalent bond of step (1) and (III) of the above-disclosed methods for obtaining test samples is ultraviolet irradiation. Ultraviolet radiation, in one embodiment, is a form of energy that occupies a portion of the electromagnetic radiation spectrum (the electromagnetic radiation spectrum ranges from cosmic rays to radio waves). Ultraviolet radiation can come from many natural and artificial sources. Depending on the source of ultraviolet radiation, it may be accompanied by other (non-ultraviolet) types of electromagnetic radiation (e.g., visible light).
Particular types of ultraviolet radiation are herein described in terms of wavelength. Wavelength is herein described in terms of nanometers (“nm”). In one embodiment, ultraviolet radiation extends from approximately 180 nm to 400 nm. In another embodiment, the ultraviolet radiation has a wavelength of about 254 nm. In another embodiment, the ultraviolet radiation has a different wavelength. When a radiation source, by virtue of filters or other means, does not allow radiation below a particular wavelength (e.g., 320 nm), it is said to have a low end “cutoff’ at that wavelength (e.g., “a wavelength cutoff at 300 nanometers”). Similarly, when a radiation source allows only radiation below a particular wavelength (e.g., 360 nm), it is the to have a high end “cutoff’ at that wavelength (e.g., “a wavelength cutoff at 360 nanometers”). In another embodiment, the source of ultraviolet radiation is a fluorescent source. All of these sources represent separate embodiments of the present invention. In one embodiment, the device of the present invention comprises an additional filtering means. In one embodiment, the filtering means comprises a liquid filter solution that transmits only a specific region of the electromagnetic spectrum. The use of sources of irradiation is well known to those skilled in the art (see, for example Diffey, B L, Methods 28:4-13; and Chen J et al, Cancer J. 8: 154-63). Each type of radiation represents a separate embodiment of the present invention.
In one embodiment, a chemical group such as, for example, puromycin is added to RNA to facilitate formation of the covalent bond of step (1). This method is described in Rodriguez- Fonseca C et al (RNA 6:744-54).
In some embodiments, a photoreactive nucleoside (e.g., 4-thiouridine and 6- thioguanosine) can be added to the biological sample of interest to increase crosslinking efficiency at a wavelength which is significantly absorbed by the photoreactive nucleoside such that covalent cross-links are formed between the modified RNA transcript and a protein and the RNA is not damaged.
In one embodiment, the covalent bond of step (1) or (III) of the above-disclosed methods for obtaining test samples is formed with a chemical. In one embodiment, the chemical is formaldehyde. In another embodiment, the chemical is a derivative of formaldehyde. In another embodiment, the chemical is paraformaldehyde. In another embodiment, the chemical is glutaraldehyde. In another embodiment, the chemical is osmium tetroxide. In another embodiment, the chemical is acetone. In another embodiment, the chemical is an alcohol. In another embodiment, the chemical is an NHS ester. In another embodiment, the chemical is a
Maleimides. In another embodiment, the chemical is a haloacetyl. In another embodiment, the chemical is a pyridyl disulfide. In another embodiment, the chemical is a sulfhydryl modifier such as SATA, SPDP or Traut's Reagent. In another embodiment, the chemical is hydrazide. In another embodiment, the chemical is l-Ethyl-3-(3-Dimethylaminopropyl)-Carbodiimide Hydrochloride. In another embodiment, the chemical is an aryl azide or a derivative thereof. In another embodiment, the chemical is any other cross-linking compound known in the art. The cross-linking compound may, in one embodiment, be applied over a broad range of concentrations. Each type of chemical represents a separate embodiment of the present invention.
The methods of the present disclosure include combining a polymerase with the plurality of template RNA to generate solid-phase first strand cDNA. A variety of polymerases may be employed when practicing the subject methods. In certain aspects, the polymerase combined into the reaction mixture is a reverse transcriptase (RT). Reverse transcriptases suitable for the invention do not need to have template-switch capability and can include, but are not limited to, retroviral reverse transcriptase, retrotransposon reverse transcriptase, retroplasmid reverse transcriptase, retron reverse transcriptases, bacterial reverse transcriptases, group II intron-derived reverse transcriptase, and mutants, variants, derivatives, or functional fragments thereof, e.g., RNase H minus or Rnase H reduced enzymes (e.g. Maxima H Minus RT (ThermoFisher) or Superscript RT (ThermoFisher)) (Figure 9). For example, the reverse transcriptase may be a Moloney Murine Leukemia Virus reverse transcriptase (MMLV RT). In certain aspects, a mix of two or more different polymerases is added to the reaction mixture, e.g., for improved processivity, proof-reading, and/or the like. In some instances, the polymer is one that is heterologous relative to the template, or source thereof. The polymerase is combined into the reaction mixture such that the final concentration of the polymerase is sufficient to produce a desired amount of the product nucleic acid. In certain aspects, the polymerase (e.g., Superscript IV) is present in the reaction mixture at a final concentration from 0.1 to 20 units/ul (U/ul), e.g., 2 U/ul.
As summarized above, the first strand reaction mixture further includes a first strand cDNA synthesis primer. In a preferred embodiment of the method, the first strand cDNA synthesis primer is covalently linked to a magnetic bead, and includes one, two or more domains. For example, the primer may include a first (e.g., 3’) domain that hybridizes to the template RNA and a second (e.g., 5’) domain that does not hybridize to the template RNA. The
sequence of the first and second domain may be independently defined or arbitrary. In certain aspects, the first domain has a defined sequence (e.g., an oligo dT sequence or an RNA specific sequence) or an arbitrary sequence (e.g., a random sequence, such as a random hexamer sequence) and the sequence of the second domain is defined, e.g., a pre-tagmentation amplification primer binding domain or an amplification primer binding domain and may have any convenient sequence such as a sequencing primer binding domain.
In addition to the first and second domains described above, in which the second domain contains a pre-tagmentation amplification primer binding domain, the first strand cDNA synthesis primer may further include a first post-tagmentation amplification, e.g., PCR amplification, primer binding domain, which may have any convenient sequence such as a sequencing primer binding domain.
In certain aspects, the first strand cDNA synthesis primer includes a barcode domain for identification of the sample after pooling post reverse transcription. In certain aspects, the first strand cDNA synthesis primer may include a unique molecular identifier or other barcode to mark each RNA molecule converted to cDNA individually. In some instances, the sequence includes all or a component of a sequencing platform adapter construct. By “sequencing platform adapter construct” is meant a nucleic acid construct that includes at least a portion of a nucleic acid domain (e.g., a sequencing platform adapter nucleic acid sequence) utilized by a sequencing platform of interest, such as a sequencing platform provided by Illumina ® (e.g., the HiSeq, MiSeq, NextSeq or NovaSeq); Pacific Biosciences (e.g., the PACBIO RS II sequencing system); or any other sequencing platform of interest. In certain aspects, a sequencing platform adapter construct includes one or more nucleic acid domains selected from: a domain (e.g., a “capture site) or “capture sequence”) that specifically binds to a surface- attached sequencing platform oligonucleotide (e.g., the P5 or P7 oligonucleotides attached to the surface of a flow cell in an Illumina® sequencing system); a sequencing primer binding domain (e.g. a domain to which the Read 1 or Read 2 primers of the Illumina® platform may bind); a barcode domain (e.g., a domain that uniquely identifies the sample source of the nucleic acid being sequences to enable sample multiplexing by marking every molecule from a given sample with a specific barcode or “tag”); a barcode sequencing primer binding domain (a domain to which a primer used for sequencing a barcode binds); a molecular identification domain (e.g., a molecular index tag, such as a randomized tag of 4, 6, or other number of nucleotides) for uniquely marking molecules of interest to determine expression levels based
on the number of instances a unique tag is sequences; or any combination of such domains. In certain aspects, a barcode domain (e.g., sample index tag) and a molecular identification domain (e.g., a molecular index tag) may be included in the same nucleic acid.
A sequencing platform adapter domain, when present, may include one or more nucleic acid domain of any length and sequence suitable for the sequencing platform of interest. The nucleic acid domains may have a length and sequence that enables a polynucleotide (e.g., an oligonucleotide) employed by the sequencing platform of interest to specifically bind to the nucleic acid domain, e.g., for solid phase amplification and/or sequencing by synthesis of the cDNA insert flanked by the nucleic acid domains. Example nucleic acid domains include the P5 (5’-AATGATACGGCGACCACCGA-3’) (SEQ ID NO: 1), P7 (5’- CAAGCAGAAGACGGCATACGAGAT-3’) (SEQ ID NO: 2), Read 1 sequencing primer (5’- ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’) (SEQ ID NO: 3) and Read 2 sequencing primer (5’-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’) (SEQ ID NO: 4) domains employed on the Illumina®-based sequencing platforms. For example, the first strand cDNA synthesis primer may include from 3’ to 5’, a first domain that hybridizes to the template RNA, e.g., an oligo dT domain, a barcode domain, a molecular identifier, a sequencing platform adapter domain, such as a sequencing read primer domain, and an amplification primer binding domain. In some aspects, the amplification primer binding domain will resemble a pretagmentation amplification primer binding domain and the first strand cDNA synthesis primer will also include a post-tagmentation amplification primer binding domain, which may be a unique domain or partially or completely overlap with another domain of the primer, such as the sequencing read primer domain, so long as that domain is compatible with respect to the amplification protocol being performed.
In some aspects, the first adapter oligonucleotide may be pre-adenylated at its 5’ end and include from 3’ to 5’, a first domain, e.g., an oligo dT domain, a barcode domain, a molecular identifier, a sequencing platform adapter domain, such as a sequencing read primer domain, an amplification primer binding domain, and a chain terminator at its 3’ end, e.g., a near-infrared fluorescent dye.
The nucleotide sequence of nucleic acid domains useful for sequencing on a sequencing platform of interest may vary and/or change over time. Adapter sequences are typically provided by the manufacturer of the sequencing platform (e.g., in technical documents provided
with the sequencing system and/or available on the manufacturer’s website). Based on such information, the sequence of any sequencing platform adapter domains of the first strand cDNA synthesis primer, first or second adapter oligonucleotide, amplification primers, and/or the like, may be designed to include all or a portion of one or more nucleic acid domains in a configuration that enables sequencing the nucleic acid insert (corresponding to the template RNA) on the platform of interest.
The first strand cDNA synthesis primer and first adapter oligonucleotide may include one or more nucleotides (or analogs thereof) that are modified or otherwise non-naturally occurring. For example, the primer may include one or more nucleotide analogs (e.g., LN A, FANA, 2’-O-Me RNA, 2’-fluoro RNA, or the like), linkage modifications (e.g., phosphothioates, 3 ’-3’ and 5 ’-5’ reversed linkages), 5’ and/or 3’ end modifications (e.g., 5’ and/or 3’ amino, biotin, DIG, phosphate, thiol, dyes, quenchers, etc.), one or more fluorescently labelled nucleotides, a near-infrared fluorescent dye (e.g., LiCOR IR800), or any other feature that provides a desired functionality to the primer that primers cDNA synthesis.
It is desirable to prevent any subsequent extension reactions which use the double stranded product nucleic acid as a template from extending beyond a particular position in the region of the double stranded product nucleic acid corresponding to the primer. For example, according to certain embodiments, the first strand cDNA synthesis primer includes a polymerase blocking modification that prevents a polymerase using the region corresponding to the primer as a template from polymerizing a nascent strand beyond the modification. Useful modifications include, but are not limited to, an abasic lesion (e.g., a tetrahydrofuran derivative), a nucleotide adduct, an iso-nucleotide base (e.g., isocytosine, isoguanine, and/or the like), or in a preferred embodiment the covalent linkage to a solid surface (e.g., paramagnetic beads). Blocking modifications may be included in any of the nucleic acid reagents used when practicing the methods of the present disclosure, including first strand cDNA synthesis primer, first and second adapter oligonucleotides, first and second amplification, e.g., PCR, primers used for amplifying the first-strand cDNA to produce the product of double stranded cDNA, amplification primers used for PCR amplification or tagmentation products and any combination thereof.
The use of first strand cDNA synthesis primers covalently linked to magnetic beads in step (d) simplifies all downstream procedures due to the ease of working on magnetic beads
which allows easy purification of cDNA and separation from RNA in step (e) using heat denaturation. Oligonucleotides linked to a bead surface are inert to harsh experimental conditions such as high concentrations of proteinase K and denaturing agents, thus enabling the capture and purification of target RNA molecules from a wide range of biological samples, eliminating the need for time-consuming RNA precipitations that are prone to sample loss especially at low concentrations. Capture of complementary nucleic acid domains on oligo(dT) beads is highly efficient, occurs within minutes (Figure 6) and allows stringent washes to remove traces of proteinase K or other inhibiting agents prior to reverse transcription. Solidphase cDNA also enables simple purification of the resultant cDNA molecule, which offers more flexibility in the optimization of reverse transcription conditions, as any adverse components for subsequent enzymatic reactions can be efficiently removed. Solid-phase cDNA can directly serve as acceptor molecule in the second adapter ligation without any additional purification procedures, thus minimizing samples loss.
As set forth above, the subject methods of the present disclosure include combining a terminal transferase with the first strand cDNA molecule, where the terminal transferase (TT) is capable of catalyzing template-independent addition of deoxyribonucleotides or ribonucleotides to the 3’ hydroxyl terminus of the cDNA molecule. Terminal transferases are highly processive in the presence of deoxynucleotide triphosphates (dNTPs), but self-terminate after incorporating only a few nucleotide triphosphates (NTPs). The terminal transferase may be capable of incorporating 1 or more additional deoxyribonucleotides at the 3’ end of the nascent DNA strand, in a time-dependent fashion (FIG. 7). In certain aspects, a terminal transferase incorporates 10 or less, (e.g., 1-3) additional ribonucleotides at the 3’ end of the first strand cDNA. All of the nucleotides may be the same (e.g., creating a homonucleotide stretch at the 3’ end of the first strand cDNA) or at least one of the nucleotides may be different from the other(s). In certain aspects, the terminal transferase activity results in the addition of a homoribonucleotide stretch of 1, 2, 3, 4, or more of the same ribonucleotides (e.g., all ATP, all GTP, all CTP, all UTP). These additional ribonucleotides are useful for increasing the efficiency of the subsequent ligation reaction, by effectively mimicking the 3’ end of an RNA molecule rather than a DNA molecule, which increases the affinity of T4 RNA ligase to join the second adapter molecule to the 3’ end of the first strand cDNA molecule that was extended by a short stretch of non-template ribonucleotides (Bullard and Bowater, 2006; Miura et al., 2019).
The terminal transferase is combined into the reaction mixture such that the final concentration of the terminal transferase is sufficient to produce a desired amount of the product nucleic acid. In certain aspects, the terminal transferase (e.g., Terminal Deoxynucleotidyl Transferase) is present in the reaction mixture at a final concentration from 0.1 to 20 units/ul (U/ul), e.g., 0.35 U/ul.
The methods of the present disclosure include combining an RNA ligase (e.g., T4 RNA ligase I) with the first strand cDNA molecule, where the RNA ligase is capable of catalysing the ligation of a 5’ phosphoryl-terminated nucleic acid donor to a 3’ hydroxyl-terminated nucleic acid acceptor through the formation of a 3’ - 5’ phosphodiester bond with the hydrolysis of ATP to AMP and PPi. Substrates for RNA ligases include single-stranded RNA and DNA, with high substrate affinity for RNA, but lower substrate specificity toward DNA. The inventors improved sensitivity and efficiency of the ssDNA ligation reaction joining the ssDNA second adapter oligonucleotide with the first-strand cDNA through the prior addition of a short stretch (e.g., 1-3 nts) of non-template ribonucleotides as described elsewhere (FIG. 8). The RNA ligase is combined into the reaction mixture such that the final concentration of the RNA ligase is sufficient to produce a desired amount of the product nucleic acid. In certain aspects, the RNA ligase (e.g., T4 RNA ligase I) is present in the reaction mixture at a final concentration from 0.1 to 200 units/ul (U/ul), e.g., 2.25 U/ul.
As set forth above, the subject methods include combining a second adapter oligonucleotide into the tailing and ligation reaction mixture. By “second adapter oligonucleotide” is meant an oligonucleotide which can serve as a donor during the ligation reaction. In this regard, the first strand cDNA molecule, after addition of non-template ribonucleotides, may be referred to as an “acceptor molecule” and the second adapter oligonucleotide may be referred to as a “donor molecule”. As used herein, an “oligonucleotide” can refer to a single-stranded multimer of nucleotides from 2 to 500 nts, e.g., 2 to 200 nts. Oligonucleotides may be synthetic or may be made enzymatically, and, in some embodiments are 10 to 50 nts in length.
The reaction mixture includes the second adapter oligonucleotides at a concentration sufficient to generate the desired ligation product. For example, the second adapter oligonucleotide may be added to the reaction mixture at a final concentration from 0.001 to lOOuM, including 1 uM.
The second adapter oligonucleotide may include one or more nucleotides (or analogs thereof) that are modified or otherwise non-naturally occurring. For example, the second adapter oligonucleotide may include one or more nucleotide analogs (e.g., LNA, FANA, 3’-O- Me RNA, 2’ fluoro RNA, or the like), linkage modifications (e.g., phosphorothioates, 3 ’-3’ and 5’ -5’ reversed linkages), 5’ and/or 3’ end modifications (e.g., 5’ and/or 3’ amino, biotin, DIG, phosphate, Thiol, dyes, quenchers, etc.), one or more fluorescently labelled nts, or any other feature that provides a desired functionality to the second adapter oligonucleotide. Any desired nucleotide analogs, linkage modifications and/or end modifications may be included in any of the nucleic acid reagents used when practicing the methods of the present disclosure, including the first strand cDNA synthesis primer, the first and second adapter oligonucleotide, the primers used for amplification, e.g., PCR amplifying the first-strand cDNA to produce the product double stranded cDNA, the post-tagmentation primers used for amplification.
The second adapter oligonucleotide includes a pre-tagmentation primer binding or amplification primer binding domain, which may also be referred to as a second strand synthesis amplification primer binding domain). For example, the second adapter oligonucleotide may include a sequence, where subsequent to ligation, second strand synthesis is performed using a primer that is complementary to that sequence. The second strand synthesis produces a second strand DNA complementary to the first strand cDNA. Alternatively, or additionally the product nucleic acid may be amplified using a primer pair in which one of the primers has a complementary sequence. According to certain embodiments, the second adapter oligonucleotide includes a first post-tagmentation (e.g., PCR) primer binding domain, e.g., for use in amplification of a tagmented product.
According to some embodiments, the second adapter oligonucleotide includes a number of additional components or domains, such as but not limited to: barcode domains, unique molecular identifier domains, a first post-tagmentation amplification primer binding domain (e.g., in those embodiments where such a domain is not present on the first strand cDNA synthesis primer), a sequencing platform adapter construct domain, etc., where these domains may be as described above.
As described above, the subject methods include combining NTPs into the tailing and ligation reaction mixture. In the preferred embodiment, a single NTP, (e.g., ATP) is added to the reaction mixture and serves both as substrate during the tailing reaction catalyzed by
terminal transferase and being hydrolyzed during the ligation reaction catalyzed by T4 RNA ligase 1. For example, ATP may be added to the reaction mixture such that the final concentration is from 0.01 to 100 mM, e.g., 1 mM.
Any nucleic acids that find use in practicing the methods of the present disclosure (e.g., the first strand cDNA synthesis primer, the first and second adapter oligonucleotide, a second strand synthesis primer, one or more primers for amplifying the double stranded product nucleic acid, and/or the like) may include any useful nucleotide analogue and/or modification, including any of the nucleotide analogues and/or modifications described herein.
Once the ligated product nucleic acid, e.g., that includes first strand cDNA linked to the second adapter oligonucleotide, is produced, the methods include using the product nucleic acid as a template for second-strand synthesis and/or amplification (e.g., for subsequent sequencing of the amplicons). According to one embodiment, the methods include contacting the product nucleic acid which primers that hybridize to primer binding domain present on both ends of the cDNA, under amplification conditions, such as PCR amplification conditions, sufficient to produce a product double stranded cDNA. Amplification conditions that may be employed include the addition of one or more primers (e.g., as described above) and dNTPs. The conditions may include combining a thermostable polymerase (e.g., a Tad, Pfu, TfL, Tth, Tli, and/or other thermostable polymerase) into the reaction mixture. Amplification, e.g., PCR amplification, results in the production of a product double stranded cDNA.
A method of producing a product double stranded cDNA according to one embodiment of the present disclosure is schematically illustrated in FIG.1. As illustrated in FIG. 1, an RNA sample that includes an mRNA is combined with a first strand cDNA synthesis primer (in this example, an oligo(dT) primer covalently linked to a magnetic bead), a reverse transcriptase (not shown) and dNTPs (not shown) to produce a product first strand cDNA. The resultant cDNA:RNA hybrids are then separated via heat denaturation and solid phase first strand cDNA is retained through immobilisation on a magnet. The first strand cDNA is combined with a second adapter oligonucleotide, an NTP (in this example ATP (not shown)), a terminal transferase (not shown) and an RNA ligase (not shown). Tailing occurs at the 3’ end of the first strand cDNA molecule through the addition of non-templates nucleotides (indicated by (rA)s), which increases the sensitivity of the subsequent ligation reaction which joins the 5’ end of the second adapter oligonucleotides to the 3’ end of the first strand cDNA. In this example, the 5’
end of the mRNA is captured, allowing for downstream amplification and enrichment of full- length cDNA, e.g., by LD PCR (Long Distance PCR). The components are included in a reaction mixture under conditions sufficient to produce a ligated nucleic acid product. Product double stranded cDNA is produced by contacting the ligated single stranded nucleic acid with amplification primers complementary to PCR primer binding domains present in the first strand cDNA synthesis primer and second adapter oligonucleotide.
Following production of the product double stranded cDNA, in one embodiment, product double stranded cDNA is prepared for full-length sequencing on a long-read sequencing platform of interest.
In another embodiment schematically illustrated in FIG. 2, product double stranded cDNA is tagmented with one or more transposomes including a transposase and a transposon nucleic acid, where the transposon nucleic acid includes a transposon end domain for binding to the transposon protein and a second post-tagmentation amplification primer binding domain (e.g., a post-tagmentation PCR amplification primer binding domain), to produce a tagmented sample. In certain aspects, the second post-tagmentation amplification primer binding domain comprises a sequencing read primer domain, e.g., a sequencing read primer domain that is different from any sequencing read primer domain present in the first strand cDNA synthesis primer, or second adapter oligonucleotide. The resultant tagmented sample is then subjected to amplification conditions, e.g., PCR amplification conditions, using post-tagmentation first and second amplification, e.g., PCR, primers. These post-tagmentation first and second amplification primers may vary, and in some instances include sequencing platform adapter domains, e.g., a first primer including a first post-tagmentation amplification primer domain, a first sequencing indexing domain and a first sequencing adapter domain; and a second primer including a second post-tagmentation amplification primer domain, a second sequencing indexing domain and a second sequencing adapter domain, to produce a sequencing library. The sequencing platform adapter construct(s) may include any of the nucleic acid domains described elsewhere herein (e.g., a domain that specifically binds to a surface-attached sequencing platform oligonucleotide, a sequencing primer binding domain, a barcode domain, a barcode sequencing primer binding domain, a molecular identification domain, or any combination thereof). Such embodiments find use, e.g., where nucleic acids of the tagmented sample do not include all of the adapter domains useful or necessary for sequencing in a
sequencing platform of interest, and the remaining adapter domains are provided by the primers used for the amplification of the nucleic acids of the tagmented sample.
According to certain embodiments, the methods of preparing sequencing libraries are end-capture methods for quantifying RNA (e.g., mRNA transcripts), e.g., for differential expression analysis as schematically illustrated in FIG. 2 b-c. In certain aspects, the end-capture methods capture the 3’ ends of RNAs, e.g., where end-capture is facilitated by the presence of a first post-tagmentation amplification primer binding site in the first strand cDNA synthesis primer and a second post-tagmentation PCR primer binding site introduced by tagmentation. It will be understood that numerous variations to the above example of end-capture methods are possible. Instead of capturing 3’ ends of RNAs, for example, the method may be used to capture 5’ ends of RNAs, e.g., where end-capture is facilitated by the presence of a first posttagmentation amplification primer binding site in the second adapter oligonucleotide and a 3’ second post-tagmentation PCR primer binding site introduced by tagmentation. Capturing the 5’ ends of RNAs finds use, e.g., for 5’ end mutation or splice variant analysis, etc. 5’ end capture may be carried out, e.g., by including a post-tagmentation primer binding domain (e.g., an RP2 sequence) in the second adapter oligonucleotide, rather than in the first strand cDNA synthesis primer. According to this variation, post-tagmentation amplification may be carried out using a post-tagmentation amplification primer that binds to the first post-tagmentation primer binding domain originally present in the second adapter oligonucleotides, in conjunction with a post-tagmentation amplification primer that binds to post-tagmentation primer binding domain, e.g., a TnRPl or TnRP2 sequence, added during a tagmentation step. Other variations include, e.g., replacing Illumina® specific sequencing domains in the various primers/oligonucleotides with sequencing domains required by sequence systems from e.g., Pacific Biosciences (e.g., the PACBIO RS II sequencing system); or any other sequencing platform of interest.
In some instances, following production of first strand cDNA and prior to tailing and ligation, the method includes pooling the plurality of solid-phase first strand cDNA with one or more additional first strand cDNAs (e.g., obtained from a different starting RNA source, e.g., cell) to produce a pooled cDNA sample. For example, the combining and contacting steps described above may be performed in parallel for different starting RNA sources, which in some cases can be single cells (e.g., circulating tumour cells or any other single cell of interest). The single cells may be obtained from the same individual or different individuals. According
to certain embodiments, the different starting RNA sources are RNA samples obtained from different individuals, e.g., different human patients or other human individuals from whom it is desirable to obtain nucleic acid (e.g., RNA or DNA) sequence information. In certain aspects, the first strand cDNAs are tagged during their production with a unique source identifier (e.g., a cell barcode) corresponding to the starting RNA sample from which the plurality of solidphase first strand cDNA were generated. The resultant first strand cDNAs produced in parallel may then be pooled prior to tailing and ligation. Such a pooling step may include combining each first strand cDNA sample (or aliquot thereof) to be pooled into a single container (e.g., a single tube or other container, e.g., well, microfluidic chamber, droplet, nanowell, etc). The pooled solid-phase first strand cDNA sample is then tailed and ligated, e.g., as described above. Upon sequencing the pooled sample, individual sequencing reads can be traced back to particular starting RNA samples using the source, e.g., cell barcode, enabling multiplexed sequencing. Details regarding barcode-based multiplexed sequencing are described, e.g., in Wong eat al. (2013) Curr. Protoc. Mol. Biol. Chapter 7:Unit 7.11.
In some aspects of the invention, the methods include the step of obtaining single cells. Obtaining single cells may be done according to any convenient protocol. A single cell suspension can be obtained using standard methods known in the art including, for example enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example, a 96-well plate, 384-well plate, or a plate with any number of wells. The multi well plate can be part of a chip and/or device. The present disclosure is not limited by the number of wells in the multiwell plate.
Following obtainment of single cells, e.g., as described above, mRNA can be released form the cells by lysing the cells. Lysis can be achieved by, for example, heating or freeze-thaw of the cells, or by the use of detergents or other chemical methods, or by a combination of these. However, any suitable lysis method can be used. A mild lysis procedure can advantageously be used to prevent the release of nuclear chromatin, thereby avoiding genomic contamination of the cDNA library, and to minimize degradation of mRNA. For example, heating the cells at 72C for 2 minutes in the presence of Tween-20 is sufficient to lyse the cells while resulting in no detectable genomic contamination from nuclear chromatin. Alternatively, cells can be heated 65C for 10 minutes in water; or 70C for 90 seconds in PCR buffer II (Applied Biosystems)
supplemented with 0.5% NP-40 (Kurimoto et al., Nucleic Acid Res 34(50:e42 (2006); or lysis can be achieved with a protease such as Proteinase K or by the use of chaotropic salts such as guanidine isothiocyanate (U.S. Publication No. 2007/0281313).
Synthesis of solid-phase first strand cDNA from template nucleic acid mRNA in the methods described herein can be performed directly on cell lysates, such that a reaction mix for reverse transcription is added directly to cell lysates. Alternatively, mRNA can be purified after its release from cells. This can help to reduce mitochondrial and ribosomal contamination. mRNA purification can be achieved by any method known in the art, for example, by binding the mRNA to a solid phase. Commonly used purification methods include paramagnetic beads (e.g., Dynabeads). Alternatively, specific contaminants, such as ribosomal RNA can be selectively removed using affinity purification.
Where desired, a given single cell workflow may include a pooling step where a cDNA product composition, e.g., made up of synthesized first strand cDNAs or synthesized double stranded cDNAs, is combined or pooled with the cDNA product compositions obtained from one or more additional cells. The number of different cDNA product compositions produced from different cells that are combined or pooled in such embodiments may vary. Prior to, or after pooling, the product cDNA composition(s) can be amplified, e.g., by polymerase chain reaction (PCR), such as described above.
As indicated above, in protocols that include a pooling step, the pooling step can be performed after first adapter oligonucleotide ligation or after first strand cDNA synthesis. As such, in certain embodiments of the methods described herein, RNA precursors are obtained from different samples of interest and a first ligation reaction mixture is added to the RNA precursors, resulting in ligated product RNA templates that include a sample barcode as described above. The barcoded RNA templates are then pooled and purified as desired and subjected to reverse transcription followed by tailing and ligation and finally amplification to produce sequencing libraries. In another embodiment, RNA templates are obtained from different cells or samples of interest and reverse transcription reaction mix is added, resulting in first strand cDNA including a cell or sample barcode. The tagged cDNA samples are then pooled and amplified to produce sequencing libraries. The sequencing libraries produced according to the methods of the present disclosure may exhibit a desired complexity (e.g., high complexity). The “complexity” of a sequencing library relates to the proportion of redundant
sequencing reads (e.g., sharing identical start sites) obtained upon sequencing the library. Complexity is inversely related to the proportion of redundant sequencing reads. In a low complexity library, certain target sequences are over-represented, while other targets suffer from little or no coverage. In a high complexity library, the sequencing reads more closely track the known distribution of target nucleic acids in the starting nucleic acid sample, and will include coverage, e.g., for targets known to be present at relatively low levels in the starting sample. The complexity of a library may be determined by mapping the sequencing reads to a reference genome or transcriptome. In combination with the incorporation of unique molecular identifiers, the number of PCR duplicates can be determined according to sequencing reads that have the same genomic starting position and identical unique molecular identifiers. High complexity libraries retain a larger number of sequencing reads after the removal of such PCR duplicates, which is increased using TLC over other methodologies (FIG. 9).
In certain aspects, the methods of the present disclosure further include subjecting the sequencing library to a sequencing protocol. The protocol may be carried out on any suitable sequencing platform. Sequencing platforms of interest include, but are not limited to, a sequencing platform provided by Illumina® (e.g., the HiSeq, MiSeq, NextSeq, NovaSeq sequencing systems); Pacific Biosciences (e.g., the PACBIO RS II Sequel sequencing system; or any other sequencing platform of interest. The sequencing protocol will vary depending on the particular sequencing system employed. Detailed protocols for sequencing a library, e.g., which may include further amplification (e.g., solid phase amplification), sequencing the amplicon, and analyzing the sequencing data are available from the manufacturer of the sequencing system employed.
In certain embodiments, the subject methods may be used to generate sequencing libraries corresponding to mRNAs for downstream sequencing on a sequencing platform of interest. According to certain embodiments, the subject methods may be used to generate a sequencing library corresponding to non-polyadenylated RNAs for downstream sequencing on a sequencing platform of interest. For example, microRNAs may be polyadenylated and then used as templates for reverse transcription followed by tailing and ligation of cDNA described elsewhere herein. Random or gene-specific priming may also be used, depending on the goal of the researcher. The library may be mixed 50:50 with a control library (e.g., Illumina’s PhiX control library) and sequenced on the sequencing platform (e.g., an Illumina® sequencing system). The control library sequences may be removed and the remaining sequences mapped
to the transcriptome of the source of the mRNAs (e.g., human, mouse, or any other mRNA source).
Aspects of the invention described herein can be performed using any type of computing device, such as a computer, that includes a processor, e.g., a central processing unit, or any combination of computing devices where each device performs at least part of the process or method. In some embodiments, systems and methods described herein may be performed with a handheld device, e.g., a smart tablet, or a smart phone, or a specialty device produced for the system.
Methods of the invention can be performed using software, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions can also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations (e.g., imaging apparatus in one room and host workstation in another, or in separate buildings, for example, with wireless or wired connections).
Processors suitable for the execution of computer programs include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random-access memory, or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non- volatile memory, including, by way of example, semiconductor memory devices, (e.g., EPROM, EEPROM, solid state drive (SSD), and flash memory devices); magnetic disks, (e.g., internal hard disks or removable disks); magneto- optical disks; and optical disks (e.g., CD and DVD disks). The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, the subject matter described herein can be implemented on a computer having an VO device, e.g., a CRT, LCD, LED, or projection device for displaying information to the user and an input or output device such as a keyboard and a
pointing device, (e.g., a mouse or a trackball), by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user can be received in any form, including acoustic, speech, or tactile input.
The subject matter described herein can be implemented in a computing system that includes a back-end component (e.g., a data server), a middleware component (e.g., an application server), or a front-end component (e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, and frontend components. The components of the system can be interconnected through a network by any form or medium of digital data communication, e.g., a communication network. For example, a reference set of data may be stored at a remote location and a computer can communicate across a network to access the reference data set for comparison purposes. In other embodiments, however, a reference data set can be stored locally within the computer, and the computer accesses the reference data set within the CPU for comparison purposes. Examples of communication networks include, but are not limited to, cell networks (e.g., 3G, 4G or 5G), a local area network (LAN), and a wide area network (WAN), e.g., the Internet.
The subject matter described herein can be implemented as one or more computer program products, such as one or more computer programs tangibly embodied in an information carrier (e.g., in a non-transitory computer-readable medium) for execution by, or to control the operation of, a data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). A computer program (also known as a program, software, software application, app, macro, or code) can be written in any form of programming language, including compiled or interpreted languages (e.g., C, C++, Perl), and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. Systems and methods of the invention can include instructions written in any suitable programming language known in the art, including, without limitation, C, C++, Perl, Java, ActiveX, HTML5, Visual Basic, or JavaScript.
A computer program does not necessarily correspond to a file. A program can be stored in a file or a portion of a file that holds other programs or data, in a single file dedicated to the
program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. A file can be a digital file, for example, stored on a hard drive, SSD, CD, or other tangible, non-transitory medium. A file can be sent from one device to another over a network (e.g., as packets being sent from a server to a client, for example, through a Network Interface Card, modem, wireless card, or similar).
Suitable computing devices typically include mass memory, at least one graphical user interface, at least one display device, and typically include communication between devices. The mass memory illustrates a type of computer-readable media, namely computer storage media. Computer storage media may include volatile, non-volatile, removable, and nonremovable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory, or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, Radiofrequency Identification (RFID) tags or chips, or any other medium that can be used to store the desired information, and which can be accessed by a computing device. Functions described herein can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Any of the software can be physically located at various positions, including being distributed such that portions of the functions are implemented at different physical locations.
Also provided by the present disclosure are compositions. Compositions of embodiments of the invention may include, e.g., one or more of any of the reaction mixture components described above with respect to the subject methods. For example, the compositions may include one or more of a RNA (e.g., a control RNA), a first adapter oligonucleotide in some instances, a polymerase (e.g., a reverse transcriptase, or the like), a first-strand cDNA synthesis primer having any of the domains described above, a second adapter oligonucleotides having any of the domains described above, dNTPS, NTPs, a terminal transferase, an RNA ligase, a second strand cDNA primer having any of the domains described above, amplification primers having any of the domains described above, a salt, a metal cofactors one or more nuclease inhibitors (e.g., an RNase inhibitor), one or more enzyme-
stabilizing components (e.g., DTT), or any other desired reaction mixture component(s). In certain aspects, the subject compositions include a first adapter oligonucleotide in addition to the compositions listed above.
The subject compositions may be present in any suitable environment. According to one embodiment, the composition is present in a reaction tube (e.g., a 0.2 mL tube, a 0.6 mL tube, a 1.5 mL tube or the like), or a well, or microfluidic chamber, or droplet, or other suitable container.
In certain aspects, the composition is present in two or more (e.g., a plurality of) reaction tubes or wells (e.g., a plate, such as a 96-well plate, a multi-well plate, e.g., containing about 1000, 5000, or more wells). The tubes and/or plates may be made of any suitable material, e.g., polypropylene, or the like, PDMS, or aluminum. The containers may also be treated to reduce adsorption of nucleic acids to the walls of the container. In certain aspects, the tubes and/or plates in which the composition is present provide for efficient heat transfer to the composition (e.g., when placed in a heat block, water bath, thermocouples, and/or the like), so that the temperature of the composition may be altered within a short period of time, e.g., as necessary for a particular enzymatic reaction to occur. According to certain embodiments, the composition is present in a thin-walled polypropylene tube, or a plate having thin-walled polypropylene wells or materials such as aluminum having high heat conductance. In some instances, the compositions of the disclosure may be present in droplets. In certain embodiments it may be convenient for the reaction to take place on a solid surface or a bead, in such case, the first strand cDNA synthesis primer may be attached to the solid support or bead by methods known in the art - such as biotin linkage or by covalent linkage - and reaction allowed to proceed on the support. Alternatively, the oligos may be synthesized directly on the solid support - e.g., as described in Macosko, E Z et. Al, Cell 161, 1202-1214, May 21, 2015).
Other suitable environments for the subject compositions include, e.g., a microfluidic chip (e.g., a “lab-on-a-chip device”, e.g., a microfluidic device comprising channels and inlets). The composition may be present in an instrument configured to bring the composition to a desired temperature, e.g., a temperature-controlled water bath, heat block, heat block adaptor, or the like. The instrument configured to bring the composition to a desired temperature may be configured to bring the composition to a series of different desired temperatures, each for a suitable period of time (e.g., the instrument may be a thermocycler).
Aspects of the present disclosure also include kits. The kits may include, e.g., one or more of any of the reaction mixture components described above with respect to the subject methods. For example, the kits may include: a first strand cDNA synthesis primer including a 3’ oligo(dT) domain and a 5’ amplification primer binding domain; a second adapter oligonucleotide including an amplification primer binding domain, e.g., as described above. In other instances, the kits may include a first adapter oligonucleotide including a 5’ amplification primer binding domain and a 3’ polyA domain, a first strand cDNA synthesis primer including an oligo dT domain, and a second adapter oligonucleotide including an amplification primer binding domain, as described above.
The kits may further include amplification primers which may include any of the domains/features described above in the section relating to the methods of the present disclosure.
The kits may further include one or more of a template ribonucleic acid (RNA), components for producing a template RNA from a precursor RNA (e.g., a poly(A) polymerase and associated reagents for polyadenylating a non-polyadenylated precursor RNA), components for purifying RNA-protein complexes of interest, a polymerase (e.g., a reverse transcriptase), a terminal transferase, an RNA ligase (e.g., T4 RNA ligase I), dNTPs, NTPs, a salt, a metal cofactors, one or more nuclease inhibitors (e.g., an RNase inhibitor and/or a DNase inhibitor), one or more molecular crowding agents (e.g., polyethylene glycol, or the like), one or more enzyme-stabilizing components (e.g., DTT), or any other desired kit component(s), such as solid supports, e.g., tubes, beads, microfluidic chips, etc.
In certain embodiments, the kits may include reagents for isolating RNA from a source of RNA. The reagents may be suitable for isolating nuclei acid samples from a variety of RNA sources including single cells, cultured cells, tissues, organs, or organisms. The subject kits may include reagents for isolating a nucleic acid sample from a fixed cell, tissue or organ, e.g., formalin-fixed, paraffin-embedded (FFPE) tissue. Such kits may include one or more deparaffinization agents, one or more agents suitable to de-cross link nucleic acids, end/or the like.
Components of the kits may be present in separate containers, or multiple components may be present in a single container. In certain embodiments, it may be convenient to provide
the components in a lyophilized form, so that they are ready to use and can be stored conveniently at room temperature.
In addition to the above-mentioned components, a subject kit may further include instructions for using the components of the kit, e.g., to practice the subject method. The instructions are generally recorded on a suitable recoding medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labelling of the container of the kit or components thereof (i.e., associated with the packaging or sub-packaging) etc. In other embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g., CD-ROM, diskette, Hard Disk Drive (HDD), portable flash drive, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from the remote source, e.g., via the internet, are provided. An example of this embodiments is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
The methods of the present disclosure find use in a variety of applications, including those that require the presence of particular nucleotide sequences at both ends of nucleic acids of interest. Such applications exist in the areas of basic research and diagnostics (e.g., clinical diagnostics) and include, but are not limited to, the generation of sequencing libraries. Such libraries may include adapter sequences that enable sequencing of the library members using any convenient sequencing platform, including: the HiSeq, MiSeq, NextSeq and NovaSeq sequencing systems from Illumina®, the PACBIO RS II Sequel sequencing system from Pacific Biosciences, the MinlON, GridlON or PromethlON from Oxford Nanopore Technologies, or any other convenient sequencing platform. The methods of the present disclosure find use in generating sequencing libraries corresponding to any RNA starting material of interest (e.g., mRNA) and are not limited to polyadenylated RNAs. For example, the subject methods may be used to generate sequencing libraries from non-polyadenylated RNAs, including microRNAs, small RNAs, siRNAs, and/or any other type of non-polyadenylated RNAs of interest such as ribosome-associated mRNAs or RNA fragments associated with an RNA- binding protein of interest that were purified with appropriate methods (e.g., CLIP). The methods also find use in generating strand-specific information, which can be helpful in
determining allele-specific expression or in distinguishing overlapping transcripts in the genome.
An aspect of the subject methods is that - utilizing a template RNA - a cDNA species having sequencing platform adapter sequences at one or both of its ends is generated, by employing tailing and ligation of first strand cDNAs (TLC) that improves on traditional approaches for generating chimeric nucleic acid molecules and provides an alternative strategy to generate full-length cDNA, preserving the original 5’ end of the template RNA molecule.
Prior art that documents the generation of sequencing libraries from RNA frequently relies on the use of Template Switch Oligos (TSOs) to introduce sequencing platform adapter sequences to the 3’ end of first-strand cDNA molecules, e.g., WO 2021/208036, WO 2020/136438, and WO 2017/048993. These methods rely on the addition of non-templated nucleotides to the 3’ end of the first-strand cDNA (e.g., typically CCC) during reverse transcription, which allow hybridization of the TSO that contains a short complementary sequence (e.g., typically GGG) followed by a non-complementary sequence (e.g., typically sequencing platform adapter sequences) that is added to the first-strand cDNA molecule through extension by the reverse transcriptase.
The methods of the present disclosure (TLC) also rely on the incorporation of a short stretch of non-template nucleotides to the 3’ end of first-strand cDNA molecules, but differ in a number of important aspects: i) TLC incorporates ribonucleotides instead of deoxyribonucleotides to mimic the 3’ end of an RNA molecule for subsequent ligation reaction; ii) the non-template overhang is used as a ligation acceptor instead of anchoring sites for the TSO and greatly increases ligation efficiency; iii) TLC uncouples the terminal transferase reaction from reverse transcription, giving higher flexibility in RT conditions such as higher reaction temperatures that are beneficial for long and/or structured molecules; iv) TLC is not dependent on the presence of a 5’ cap structure of the template RNA as observed for TSOs (Wulf, M G et al, J Biol Chem, 294, 18220-18231, 2019), making it less restrictive in terms of the RNA molecules that can serve as template RNA.
Uncoupling of the tailing reaction from the reverse transcription reaction as described in point (iii) and 5’-cap independence in point (iv) are crucial for the applicability of TLC to generate sequencing libraries from a more varied source of input materials. This includes but is not limited to, uncapped RNA molecules, such as specific small RNA species, viral RNA, or RNA fragments obtained after purification of modified RNA or RNA-protein complexes following cross-linking and immunoprecipitation (e.g., CLIP), in which case reverse transcription frequently terminates prematurely at crosslinking sites, preventing the addition of non-template nucleotides by the reverse transcriptase rendering template switching unfeasible.
Prior art relying on non-enzymatic joining of oligonucleotides rather than ligation-based approaches for the generation of sequencing libraries from RNA such as WO 2019/063803, and WO 2021/130151 rely on the click chemistry concept, a class of highly specific and efficient chemical reactions that occur rapidly under mild conditions, such as the copper-catalyzed reaction of azides with alkynes to give 1,2,3-traizoles (R. Huisgen, 1 ,3-Dipolar Cycloaddition Chemistry (Ed.: A. Padwa), Wiley, New York, 1984). However, the need to incorporate artificial nucleotides during reverse transcription can make the resulting cDNA incompatible with standard library preparation methods and repeated purification procedures between individual steps to avoid chemical inhibition of downstream enzymatic reactions (e.g., Copper- induced inhibition of polymerases) lead to sample loss throughout the workflow and large input requirements of 1 ug of total RNA (ClickTech Library Kit full-length mRNA Seq V2.0).
The methods of the present disclosure instead employ an enzymatic ligation-based approach for the generation of sequencing libraries from RNA that is fully compatible with standard enzymes in conventional library workflows without extensive purification procedures which minimizes sample loss and lowers input requirements to as few as 500-1000 cells (e.g., equivalent to 5-20 ng of total RNA assuming a concentration of 10-20 pg of RNA/cell). Furthermore, TLC is not prone to concatemerization, and performing TLC library preparation without input RNA does not yield fragments larger than the expected size without any insert (e.g., amplicons consisting purely of amplification primers and first and second adapter oligonucleotides) (Figure 11), which can be easily removed through size-selection using standard techniques known in the art, such as, for example and without limitation, AMPure size selective chemistry or purification via gel electrophoresis. This greatly reduces the amount of non-specific background and, by extension, sequencing cost. Accordingly, the methods of the
present disclosure are more efficient, versatile, cost-effective, and provide more flexibility than traditional approaches.
Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variations and modifications without departing from the spirit or essential characteristics thereof. The invention also includes all of the steps, features, compositions and compounds referred to or indicated in this specification, individually or collectively, and any and all combinations or any two or more of said steps or features. The present disclosure is therefore to be considered as in all aspects illustrated and not restrictive, the scope of the invention being indicated by the appended claims, and all changes which come within the meaning and range of equivalency are intended to be embraced therein.
The foregoing description will be more fully understood with reference to the following Examples. Such Examples, are, however, exemplary of methods of practising the present invention and are not intended to limit the application and the scope of the invention.
EXAMPLES
RNA-binding proteins are instrumental for post-transcriptional gene regulation and play an active part in numerous human pathologies, including neurodegenerative diseases, cancer, as well as infection. Despite their crucial role in regulating all aspects of RNA metabolism, transcriptome-wide methods to profile RNA-protein interactions remain technically challenging. Protein-centric approaches to study RNA-protein interactions mainly rely on cross-linking and immunoprecipitation (CLIP) of RNA-binding proteins (RBPs) and generation of sequencing libraries from co-purified RNA. Over the years, several variations of this technique emerged, most prominently iCLIP along with derivations such as enhanced CLIP (eCLIP), infrared CLIP (irCLIP) and more recent improvements including iCLIP and improved iCLIP (iiCLIP). These techniques enable the mapping of RNA binding sites at nucleotide resolution and while individual steps differ between protocols, they follow the same overall strategy: cells are cross-linked with UVC light followed by lysis and partial RNA digestion before or after immunoprecipitation of the RBP of interest. Co-purified RNA is then 3’ adapter ligated prior to SDS polyacrylamide gel electrophoresis (SDS-PAGE) and transfer onto nitrocellulose from where RNA is liberated, purified and reverse transcribed into cDNA prior to second adapter ligation and PCR amplification to generate sequencing-compatible libraries.
Major bottlenecks, particularly during library preparation, include extensive purification steps and suboptimal enzymatic reactions such as the second adapter ligation, that lead to sample loss, low complexity libraries and the requirement for large amounts of starting material (~20M cells) and sequencing depth.
The feasibility of TLC was demonstrated by preparing sequencing libraries from RNA fragments that co-precipitated with RNA-binding proteins during crosslinking and immunoprecipitation (CLIP) and show that the increased sensitivity of the library preparation reduces input requirements by a factor of up to 40.000 compared to eCLIP, with high quality libraries obtained from as little as 500 cells. Despite drastically lowered input material, TLC libraries require less PCR amplification compared to eCLIP libraries, increasing library complexity and resulting in a higher number of sequencing reads retained for downstream analysis after the removal of PCR duplicates, which lowers sequencing requirements (Figure 10).
When combined with CLIP, TLC follows the procedure outlined in FIG. 5 (Steps 2 - 9, with RNA precursors resulting from nuclease digestion (Step 2) following the purification of an RNA-binding protein of interest (not pictured). 3’ ends of RNA precursors of interest are then ligated to the first adapter oligonucleotide, containing a primer binding domain (PBS) and a polyA stretch (Step 3). Ligated RNA molecules are then captured on oligo(dT) beads and reverse transcribed into first strand cDNA, with the oligo(dT) serving as first strand cDNA synthesis primer (Step 4). Solid-phase cDNA is then separated from template RNA through heat denaturation and used as acceptor molecule for a subsequent ligation reaction (Steps 5-7). To increase the efficiency of ssDNA ligation, a tailing strategy is used that results in the addition of a few (e.g., 1-3) (Figure 12) non-template ribonucleotides (e.g., ATP) at the 3’ end of solidphase first strand cDNA (Step 6). This increases the affinity of T4 RNA Ligase 1 to join the 3’ end of the first strand cDNA molecule with the 5’ phosphorylated second adapter oligonucleotide, containing a sample barcode, a unique molecular identifier and a primer binding site containing the sequence of the read 1 sequencing primer. Following ligation, solidphase first strand dcDNA can be directly amplified via PCR with the addition of necessary sequencing adapters and simultaneously eluted off the magnetic beads. In some aspects, amplification is performed with amplification primers fully complementary to the primer binding sites present on both ends of the cDNA, resulting in short amplicons that may be desirable for additional size selection of the insert. Following size selection, further PCR amplification can be performed to add additional sequencing platform adapter domains to
complete the preparation of sequencing libraries. In this example, the nucleic acids in the library are suitable for sequencing on an Illumina® sequencing system and include the P5 adapter sequence; a Read 1 sequencing primer sequence; a unique molecular identifier surrounding a sample barcode; an insert corresponding to the template RNA of interest; a Read 2 sequencing primer sequence; a reverse index sequence; and a P7 adapter sequence. Such sequencing libraries are compatible with single-end sequencing protocols, with the first 15 nucleotides of the reads corresponding to a 9 nt unique molecular identifier (UMI) for deduplication, that is split around a 6 nt sample barcode for greater multiplexing capacity.
In addition to technical improvements during library preparation that reduce both experimental time and cost, TLC-CLIP libraries show superior performance compared to previous methodologies and retain a much larger fraction of sequencing reads that can be used for downstream analysis. This drastically reduces the associated cost for next-generation sequencing, lowering sequencing depth requirements by orders of magnitude (Figure 10).
A direct comparison between CLIP libraries prepared with TLC and public eCLIP datasets showed up to up to 68% overlap with eCLIP peaks, when restricting the comparison to genes with similar expression levels between 293T and HepG2 cells to account for underlying gene expression differences in the cell types that were profiled (Figure 13). CLIP libraries prepared with TLC also show improved sensitivity for de novo motif discovery and recapitulate previously reported motifs with high precision and stronger motif enrichment at the peak summit compared to eCLIP libraries (Figure 14 and Figure 15).
An additional benefit of the TLC-CLIP protocol compared to other technologies is an increased frequency of crosslink-induced mutations (Figure 16) that occur during reverse transcription and provide exact nucleotide resolution of the observed RNA-protein interaction. Crosslink-induced deletions (CIDs) are highly correlated at the single-nucleotide level (Figure 17), and increase the precision at RBP binding sites and identify the exact nucleotide residues bound by the probed RBP (Figure 18). They can also function as an additional quality filter, by examining the ratio of CIDs at individual nucleotide positions (ndel/ntotal reads), which allows efficient filtering of non-crosslinked, co-purifying fragments to increase specificity (Figure 19). This is of particular interest when applying TLC-CLIP without PAGE purification, which enables a 2-day fully automatable workflow and further lowers the input requirements down to 500 cells. Libraries generated without PAGE purification show lower motif enrichment, which
is indicative of a higher level of contaminating background sequences, as expected when removing an additional purification step (Figure 20). Lower motif enrichment is accompanied by lower CID ratios, demonstrating the importance of CIDs as an additional quality filter to discern true binding sites from co-purifying, non-crosslinked fragments in samples with higher background signal. Nevertheless, libraries generated without PAGE purification recapitulate the binding behavior of a given RBP, capturing the position-dependent enrichment in relation to intronic Alu elements for hnRNPc from as little as 500 cells (Figure 21).
Taken together, the streamlined TLC library preparation protocol drastically reduces both time and cost of CLIP experiments, while generating high quality RBP binding profiles from low input material. The larger number of crosslinked induced deletions further improves both the precision and specificity of CLIP libraries generated with TLC, by providing nucleotide resolution of crosslinking sites and distinguishing true binding sites from copurifying, non-crosslinked fragments. Furthermore, by eliminating the need for PAGE purification, input requirements can be further reduced with high quality data obtained from as little as 500 cells, presenting a fully bead-based, single-tube library preparation strategy amenable to automation for high-throughput settings.
These improvements will open new opportunities in the field of post-transcriptional gene regulation to study RNA-protein interactions in larger settings, for example in combination with siRNA or drug screens, as well as its application to samples of limited quantity.
Used in combination with CLIP, TLC design innovations compared to other protocols include:
1. TLC-L3 oligo: an infrared-dye-conjugated oligo during the first adapter ligation (first introduced by Zamegar et al.) that allows visualisation of cross-linked RNA without the need for radioactive isotope labelling. The adapter sequence contains the partial sequence of the Illumina Index Sequencing primer followed by a poly(A) stretch which enables purification of ligated RNA molecules on oligo(dT) beads.
2. RNA purification via poly(A) capture: introduction of the poly(A) tail during adapter ligation allows capture und purification of RNA molecules within minutes using oligo(dT)-coupled magnetic beads instead of overnight precipitation. Furthermore, this strategy makes purification of RNA-protein complexes via SDS-PAGE optional, thus opening the potential for automation of the entire protocol on a liquid handling system.
3. Solid-phase cDNA libraries: Oligo(dT)-bead based capture is not only used for purification of RNA, but also for priming reverse transcription, resulting in cDNA covalently linked to magnetic beads. This allows efficient separation of adapter-ligated RNA from first strand cDNA via heat denaturation and facilitates purification and downstream reactions that can be performed on-bead in the same reaction tube.
4. Ribo-tailing of cDNA for increased ligation efficiency: single-stranded (ss)DNA ligations are inherently inefficient due to the low affinity of RNA ligases for DNA as an acceptor molecule. This causes the permanent loss of molecules that fail to ligate, resulting in low complexity libraries. To improve the ligation efficiency, a Terminal Transferase is included in the reaction which incorporates ATP (essential for ligation reaction) in the form of a short ribo-tail at the 3’ end of the cDNA, greatly increasing the affinity, and thus efficiency, of T4 RNA Ligase for the substrates.
TLC oligonucleotides in combination with CLIP
All adapters and oligos used throughout the protocol were ordered from Integrated DNA Technologies (IDT) and information regarding sequences, scale and purification is provided in Table 1.
The TLC-L3 oligo for the first adapter ligation was synthesized at 250 nmole scale, carrying a 5’ phosphorylation and 3’ IRDye® 800CW (NHS Ester) (v3) modification and was purified using RNase-free HPLC with a total yield of 21.1 nmoles.
Pre-adenylation was performed on 5 nmoles using the 5’ DNA Adenylation Kit (NEB, E2610L) as follows: 50 pl of 100 pM L3 adapter were set up with 25 pl 10X 5’ DNA Adenylation Reaction Buffer, 25 pl 1 mM ATP and 50 pl Mth RNA Ligase (Inmol) in a total volume of 200 pl. Reaction was incubated at 65°C for 2 hours followed by inactivation at 85°C for 10 minutes, during which it turns cloudy. Reaction was then cleaned up using the Nucleotide Removal Kit (Qiagen, Cat #28304) as follows: 200 pl were mixed with 4.8 ml of PNI buffer, distributed over 10 columns and spun down at 6000 rpm for 30 seconds. Columns were washed once in 750 pl PE buffer, spun for 1 min at 6000rpm, followed by an empty spin at full speed before transferring columns to a new collection tube. 50 pl H2O were added per column and incubated at RT for 2 minutes before centrifugation at 6000 rpm for 1 minute. Eluates were combined with an approximate final concentration of 10 pM and 1 pM working stocks were prepared and frozen at -20C. Aliquots can be freeze-thawed at least 20 times without any detectable loss in activity.
Cell culture and generation of CLIP lysates
Adherent 293T cells (ATCC® CRL-1573™) were grown to -80% confluency in Dulbecco’s Modified Eagle Medium (Gibco, #41966-029) supplemented with 10% FCS (Sigma-Aldrich, #F9665-500ML, Lot #19A124) and 1% Penicillin-Streptomycin-L-Glutamine (MED30-009-CI). Cells were rinsed in ice-cold PBS and crosslinked on ice with 254 nm UV- C light at 0.3 J/cm2 in a CL-3000 Ultraviolet Crosslinker (UVPA849-95-0615-02). Cells were collected into PBS by scraping, counted and desired cell number was aliquoted and spun down. Cell pellets were resuspended in iCLIP Lysis buffer (50 mM Tris-HCl pH 7.4, 100 mM NaCl, 1% Igepal CA-630, 0.1% SDS, 0.5% sodium deoxycholate) using 50 pl per 50.000 cells. Lysates were incubated on ice for 5 minutes followed by sonication for 5-10 seconds at 0.5 seconds ON and 0.5 seconds OFF at 10% amplitude using a tip sonicator (Branson LPe 40:0.50:4T). Protein concentration was measured using the Pierce™ Rapid Gold BCA Protein Assay Kit (Thermo Scientific, A53225) and lysates were either processed directly or stored at -80°C.
RNase treatment, immunoprecipitation and first adapter ligation
Protein-G beads (100 pl for 20-30 pg of antibody) were washed twice in 1ml iCLIP Lysis buffer and resuspended in 100 pl per condition. Per IP, 1 pg of antibody against hnRNPc (Santa Cruz Biotechnology, sc-32308), RBM9 (Bethyl Laboratories, A300-864A), hnRNP Al (4B10) (Santa Cruz Biotechnology, sc-32301), or hnRNPI (Santa Cruz Biotechnology, sc- 56701) were added and antibody-bead mixture was incubated at room temperature (RT) for 30- 60 minutes on a rotating wheel.
Meanwhile cell lysates were treated with different RNase concentrations using 0.25U, 0.025U and 0.005U of RNase I (Thermo Fisher, EN0602) for high, medium and low conditions. RNase dilution was added to cell lysates together with 2ul Turbo DNase (Thermo Fisher, #AM2238) and lysates were incubated at 37°C for exactly 3 minutes at 1 lOOrpm, followed by 3 minutes on ice. Cell lysates were spun down for 10 minutes at 4°C at full speed and supernatant was transferred to a new tube.
Antibody-bead mixture was washed twice in iCLIP lysis buffer to remove unbound antibody and RNAse-treated cell lysates were added alongside cOmplete EDTA-free Protease Inhibitor Cocktail (Merck, #11836170001) and incubated for 2 hours at 4°C on a rotating wheel. After IP, beads were washed twice in 200 pl High Salt Buffer (50 mM Tris-HCl pH 7.4, 1 M NaCl, 1 mM EDTA, 1% Igepal CA-630, 0.1% SDS, 0.5% sodium-deoxycholate), with the
second wash at 4°C for 3 minutes on a rotating wheel, followed by two washed in 200 pl PNK Wash Buffer (20 mM Tris-HCl, pH 7.4, 10 mM MgC12, 0.2% Tween-20).
Dephosphorylation of 3’ ends was performed in 20 pl of PNK reaction for 30 minutes at 37°C (70 mM Tris-HCl, pH 6.5, 10 mM MgC12, 1 mM DTT, 10U SUPERaselN RNase Inhibitor (ThermoFisher, #AM2696), 5U T4 Polynucleotide Kinase (NEB, #M0201L). Beads were washed twice in PNK Wash Buffer and resuspended in 20 pl of ligation mix for overnight incubation at 16°C and 1200 rpm (50 mM Tris-HCl, pH 7.8, 10 mM MgC12, ImM DTT, 10U SUPERaselN RNase Inhibitor, 10U T4 RNA Ligase (NEB, #M0204), 1 pl of 1 pM L3 adapter and 20% PEG400 (Sigma- Aldrich, #91893)).
TLC-CLIP library preparation with PAGE purification
Following the first adapter ligation, beads were washed twice in 200 pl High Salt Buffer, twice in 200 pl PNK Wash buffer and then resuspended in 20 pl IX LDS sample buffer (Thermo Fisher, #NP0008) containing 5% beta-mercaptoethanol (Sigma-Aldrich, #M6250). Samples were denatured for 1 minute at 70°C and RNA-protein complexes were resolved on NuPAGE 4-12% Bis-Tris Gels (Thermo Fisher, #WG1402A) at 180V for 1 hour. Transfer was performed onto nitrocellulose (BioRad, #1620115) in IX NuPAGE transfer buffer (Thermo Fisher, #NP00061) with 10% methanol at 30V for 2 hours at RT.
Nitrocellulose membranes were scanned on Odyssey® CLx Infrared Imager (LLCOR, 9141) with 169 pm resolution to visualise RNA localisation and then placed on filter paper soaked in PBS. Regions of interest were cut out from nitrocellulose membrane corresponding to -20-100 kDa above the molecular weight of the RBP of interest due to the ligation of L3 adapter (-15.9 kDa) and associated RNA (with 70nt of RNA averaging ~20kDa). Nitrocellulose pieces were placed in LoBind Eppendorf tubes and 200 pl Proteinase K buffer (lOOmM Tris- HCl, pH 7.4, 50 mM LiCl, 1 mM EDTA, 0.2% LiDS) containing 200 pg Proteinase K (Thermo Fisher, #AM2546) were added and incubated at 50°C for 45 minutes at 800rpm.
Meanwhile, 10 pl of Oligo(dT)25 Dynabeads™ (Thermo Fisher, #61005) per sample were washed in 1 ml of oligo(dT) Binding Buffer (20 mM Tris-HCl, pH 7.4, 1 M LiCl, 2 mM EDTA) and resuspended in 50 pl of oligo(dT) Binding Buffer per sample. Following Proteinase K treatment, supernatant was transferred to fresh tubes containing 50 pl of oligo(dT) beads and incubated for 10 minutes at RT on a rotating wheel. Following RNA capture, beads were washed twice in 125 pl oligo(dT) Wash Buffer (10 mM Tris-HCl, pH 7.4, 150 mM LiCl, 0.1 mM EDTA) and once in 20 pl IX First-Strand Buffer (50 mM Tris-HCl, pH 8.3, 75 mM KC1, 3 mM MgC12). Beads were resuspended in 10 pl of Reverse Transcription Mix (IX First-Strand
Buffer, 0.5 mM dNTPs, 1 mM DTT, 6U SUPERase IN RNase Inhibitor, 20U SuperScript™ IV Reverse Transcriptase (Thermo Fisher, #18090050) and incubated for 15 minutes at 50°C followed by 10 minutes at 80°C heating up to 96°C. Samples were vortexed for 30 seconds at 96°C and then immediately placed on a magnet on ice. Supernatant containing adapter-ligated RNA was removed and efficiency of elution can be confirmed by dot-blotting on nitrocellulose membrane.
Solid-phase cDNA on beads was washed once in 60 pl oligo(dT) Wash Buffer and once in 20 pl IX T4 RNA Ligase Buffer (50 mM Tris-HCl, 10 mM MgC12, ImM DTT, pH 7.5). Beads were resuspended in 5 pl of 5’ Adapter mix (2 pl 10X T4 RNA Ligase Buffer, 2 pl of 10 pM L## oligo (see Table 1), 1 pl 100% DMSO), incubated at 75°C for 2 minutes then immediately placed on ice. 4 pl of Ligation Mix (5 mM ATP, 7U Terminal Deoxynucleotidyl Transferase (TdT) (2230B, Takara), 15 U T4 RNA Ligase High Concentration (M0437, NEB)) were added as well as 10 pl 50% PEG8000 and reaction was mixed by pipetting up- and down until beads are resuspended. Reaction was incubated at 37°C for 30 minutes, then cooled down to room temperature. 30 U of T4 RNA Ligase were added, the reaction mixed by pipetting and incubated at RT overnight with occasional vortexing for 15 seconds at 2000 rpm every two minutes.
Following overnight incubation, ligation reaction was removed, and beads washed in 100 pl oligo(dT) Wash Buffer and 20 pl IX Phusion HF Buffer (Thermo Fisher, #F518L). Beads were resuspended in 25 pl cDNA amplification mix (IX Phusion HF PCR Master Mix (NEB, #M0531L) and 0.5uM P5 and P7 short primer mix (see Table 1)) and amplification was performed with the following programme: 30 seconds at 98°C, 7 cycles of 10 seconds at 98°C, 30 seconds at 65°C and 30 seconds at 72°C followed by final extension at 72°C for 3 minutes. Meanwhile, 2pl of oligo(dT) beads per sample were washed once in 1 ml oligo(dT) Binding buffer and resuspended in 5 pl per sample. After cDNA amplification, 5 pl of oligo(dT) beads were added and incubated at RT for 5 minutes on a rotating wheel to capture unwanted amplification by-products. Samples were placed on magnet and supernatant containing amplified cDNA was transferred to a fresh tube.
Size-selection of cDNA was performed using ProNEX® Size-Selective Purification System (Promega, #NG2002) with a ratio of 2.8X to enrich for cDNA inserts of at least 20 nucleotides in length (>80bp). Library yield was then estimated by amplifying 1 pl of purified cDNA via qPCR using the full length P5 and P7 index primers and 2-3 cycles are subtracted from the obtained Ct value for final library amplification. Following PCR amplification, libraries were size-selected again using the ProNEX® Size-Selective Purification System, with
a ratio of 1.8X to select fragments larger than 165bp. Quality control was performed using the Agilent High Sensitivity DNA Kit (Agilent, #5067-4626) and libraries were quantified using the KAPA Library Quantification Kit (Roche, #KK4824).
TLC-CLIP library preparation without PAGE purification
When omitting PAGE purification, the first adapter ligation was performed for 75 minutes at 25°C. Beads were washed as described above and either directly resuspended in Proteinase K reaction or in 20 pl of RecJ adapter removal reaction (1 X NEB Buffer 2 (NEB, #B7002S, 25U 5’ Deadenylase (NEB, #M0331S), 30U RecJ endonuclease (NEB, #M0264S), 10U SuperaselN and 20% PEG-400) and incubated at 37C for 30 minutes prior to Proteinase K treatment. Samples were then placed on magnet, and supernatant was transferred to fresh tubes containing oligo(dT) beads, with the remaining library preparation performed as described above.
Sequencing
TLC-CLIP libraries were sequenced on an Illumina NextSeq500 using the High Output Kit v.2.5 for 75 cycles, using Illumina protocol #15048776. 5% PhiX were added to final library pools for increased complexity and sequencing run was performed with custom configuration, running 86 cycles for Read 1 and 6 index cycles.
Mock ligations and denaturing polyacrylamide gel electrophoresis
Efficiency of second adapter ligation was tested in mock ligations using TLC-CLIP L01 as donor molecule and i7-3 as acceptor. 2 pl 10X T4 RNA Ligase Buffer, Ipl lOpM TLC-CLIP L01 oligo, Ipl lOpM i7-3 oligo and Ipl DMSO were mixed and incubated at 75°C for 2 minutes. Reaction was placed on ice and 4 pl of Ligation mix containing 0.2pl 0.1M ATP, 0.5 pl TdT and 0.5 pl T4 RNA Ligase High Concentration were added followed by addition of PEG8000 to the indicated percentage. Ligation was incubated for 30 minutes at 37°C then cooled down to 16°C. Half the reaction was removed after 30 minutes at 16°C, the remaining reaction was incubated overnight. 1 pl of Ligation reaction was mixed with 1 pl Gel Loading Buffer II (Thermo Fisher, #AM8546G) and denatured at 72°C for 3 minutes. Samples were separated on 10% TBE-Urea gels (Thermo Fisher, #EC68752BOC) and stained with IX SYBR® Gold (Thermo Fisher, #S11494) for 10 minutes in TBE buffer.
Data analysis
Demultiplexing and Trimming with Flexbar
Sequencing data was demultiplexed by i7 index reads using bcl2fastq without any read trimming. Further demultiplexing by in-read 5’ barcodes and trimming of adapter sequences was performed using Flexbar v.3.5.0 (https://github.com/seqan/fl exbar) 19 in a two-step approach. In the first step, reads are demultiplexed by in-read barcodes allowing no mismatches, and UMIs are moved into the read header. Barcode sequences (see Table 1) including the UMI designated by the wildcard character ’N’ are provided in fasta format, with the arguments “-b barcodes. fasta —barcode-trim-end LT AIL —barcode-error-rate 0 — umi-tags”. In the second step, any adapter contamination at the 3’ end of the reads is removed allowing an error rate of 0.1 with the following arguments adapter-seq
'AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGT CTTCTGCTTG1 (SEQ ID NO: 5) —adapter-trim-end RIGHT —adapter-error-rate 0.1 — adapter- min-overlap 1”. In addition, potential T-stretches at the 5’ end that are the result of ribotailing during ligation are removed by trimming ‘T’ homopolymers of 1-2 nucleotide length (Figure 12) using htrim-left T — htrim-max-length 2 — htrim-min-length 1” and reads shorter than 18 nucleotides post trimming are discarded by min-read-length 18”.
STAR Alignment
Flexbar-trimmed reads were aligned against hgl9 using STAR v.2.7.3a (https://github.com/alexdobin/STAR)20 with the following parameters to keep only uniquely mapping reads, removing the penalty for opening deletions and insertions and fully extending the 5-prime end of reads to preserve the end of cDNA molecules: outFilterMultimapNmax 1 — scoreDelOpen 0 — scorelnsOpen 0 — alignEndsType Extend5pOfReadl”. To retain UMI in read header during STAR alignment, any space in header needs to be removed prior to mapping.
Deduplication of Reads
Aligned reads were deduplicated based on unique molecular identifiers using UMI-tools v.1.0.1 (https://github.com/CGATOxford/UMI-tools)21. The dedup command was used with the parameters extract-umi-method read id —method unique — spliced-is-unique” to group reads with the same mapping position and identical UMI, while treating reads starting at the same position as unique if one is spliced and the other is not.
Peak Calling
Enriched regions were identified using the peak calling algorithm CLIPper v.2.0.0 (https://github.com/YeoLab/clipper)7,16 with default settings and a p-value cutoff of 0.001 poisson-cutoff 0.01”.
Multiqc and usable reads
General quality metrics of libraries were assessed using FastQC vO.11.7 (https://github.com/s-andrews/FastQC) and QC data were collated using multiqc v.1.9 (https://github.com/ewels/MultiQC)22 to extract information from combined log files to plot usable read fractions.
Deletions
Individual nucleotide positions of crosslink-induced deletions within TLC-CLIP reads were extracted using the htseq-clip tool (https://github.com/EMBL-Hentze-group/htseq-clip) 23 with the following parameters: “htseq-clip extract —mate 1 —site d”.
Filtering of peaks
CLIPper peaks were filtered by removing ENCODE blacklisted regions from eCLIP libraries as well as peaks obtained from TLC-CLIP libraries skipping the ligation step as well as IgG controls for either Rabbit or Mouse IgG depending on RBP. An additional score filter was applied by requiring -10 log(pval) to be larger than 50 for any downstream analysis. Consensus peaks between replicates were obtained using bedtools intersect requiring a minimum overlap of 25% between peaks.
Correlation plots
For correlation plots deletion positions of individual replicates were concatenated and coverage was calculated using bedtools multicov24. Count data was normalised using the cpm function from edgeR25 against total library size and log2 transformed. Point density plots were generated using the geom_pointdensity package available on Bioconductor and correlation coefficient was calculated using Pearson correlation.
Pairwise comparison at peak level
Fraction of overlap between filtered peaks for individual replicates of either TLC-CLIP, eCLIP or easyCLIP was calculated using the intervene pairwise intersection module
(https://intervene.readthedocs.io/en/latest/index.html) requiring a minimum of 25% overlap between peaks. For comparison between TLC-CLIP and eCLIP in HepG2 cells shown in Fig. 12, peaks were restricted to genes with stable gene expression between the two cell lines, as defined by differential gene expression analysis on total RNA-seq data for 293T and HepG2 cells.
De novo motif discovery
De novo motif discovery was performed using Homer27 v4.10 on peaks centred on either the apex region obtained from CLIPper or after centring peaks on the position with the highest deletion count. fmdMotifsGenome.pl was used with the parameters “-oligo -basic -rna -len5 -S10 -size given” where peak size is a 50-nucleotide window around the apex or with parameter “size 50” for peaks centred on deletions.
Density plot for deletion and motif enrichment
Deletion density in Fig. 14, was calculated using the anotatePeaks.pl function from homer for TLC-CLIP peaks centred onto the consensus motif, with motif files being generated using seq2profile.pl. Tag directories for deletions were generated using the homer makeTagDirectory function on the bed file obtained from htseq-count. peakSizeEstimate needs to be changed to 1 in taglnfo.txt file to avoid extension of deletion tags and preserve nucleotide resolution. Deletion enrichment was obtained using the annotatePeaks.pl with “-hist 1 -size 100” across deletion-centred peaks as well as peaks shuffled across the set of target genes bound by a given RBP. For RBPs recognising palindromic sequences such as ‘AGGGA’ or ‘CUUUC’ for hnRNPAl or hnRNPI respectively, the exact position of the crosslinking site cannot be determined during alignment if the deletion falls within the homopolymer stretch. By default, STAR will position the deletion at the first base of the ambiguous sequence based on the DNA sequence, without awareness of the strand orientation of the gene, resulting in an artificial shift of the deletion position between genes on the forward or reverse strand. To remove this artifact, deletion positions for genes on the reverse strand were shifted by two nucleotides for hnRNPAl and hnRNPI prior to visualisation.
Deletion-centred analysis
Peaks were centred on the maximum deletion position and coverage of this nucleotide position was calculated using bedtools multicov to calculate the CID ratio, indicating the
proportion of reads at a given position that carry a deletion. Motif density across peaks with different CID ratios was calculated using annotatePeaks.pl with “-size 100 -hist5 -norevopp”.
For visualisation the percentage of peaks carrying motifs according to CID ratio, fmdMotifsGenome.pl was using with “-find motif.motif -size50 -norevopp”. Peak annotation across different transcriptomic and genomic features was performed using annotatePeaks.pl.
Deletion Visualisation
Intronic antisense Alu sequences were extracted from Repeatmasker and intersected with deletion-centred peaks with a CID ratio larger than 10 from PAGE or noPAGE libraries, yielding splice sites that were either shared experimental conditions or specific to either PAGE or noPAGE libraries.
Deletion positions from htseq-clip were merged across all replicates and converted to bam files using bedtools bedtobam. Bigwig files were then generated using deeptools function bamCoverage with a binsize of 1, normalising for total deletion count (CPM). Heatmaps and coverage profiles were generated using the createMatrix and plotHeatmap function from deeptools.
"N" stands for any nucleotide
"n" designates a phosphorothioated DNA base (nucleotide). The phosphorothioate (PS) bond substitutes a sulfur atom for a non-bridging oxygen in the phosphate backbone of an oligo. This modification renders the internucleotide linkage resistant to nuclease degradation.
Adaptation of the above example to the detection of RNA modifications
The TLC library preparation described herein can also been applied to an adaptation of the CLIP protocol described above towards to the profiling of RNA modifications, including but not limited to N6-methyladenosine (m6A). In this example, TLC follows the procedure outlined in FIG. 5 (Steps 2 - 9), with RNA precursors resulting from chemical fragmentation (Step 2) followed by purification of RNA fragments carrying a modification of interest through affinity purification (not pictured).
Claims
1. A method for preparing a sequencing library from a ribonucleic acid (RNA) sample, the method comprising:
(a) obtaining a test sample comprising a plurality of template RNA or RNA precursors;
(b) providing a set of oligonucleotide adapters and primers, the set comprising a plurality of first adapters, a plurality of first strand cDNA synthesis primers, a plurality of second adapters, and a plurality of amplification primers, wherein each of the first adapters comprises (i) a 5’ primer binding domain and a 3’ poly A domain or (ii) a 5’ poly A domain and a 3’ primer binding domain; each of the first strand cDNA synthesis primers comprises an RNA hybridization domain complementary to the template RNA or to the first adapters (e.g., oligo(dT)), and said each of the first strand cDNA synthesis primers is covalently linked to magnetic beads; each of the second adapters comprises primer binding sites, and each of the amplification primers comprises sequencing platform adapter constructs;
(c) ligating the plurality of first adapters to the 3' end of the plurality of RNA precursors to generate template RNA;
(d) generating a plurality of solid-phase first strand cDNA through reverse transcription primed by the first strand cDNA synthesis primer starting from the plurality of template RNA of step (c) or of step (a);
(e) separating the solid-phase first strand cDNA from the plurality of template RNA;
(f) tailing the 3’ ends of the plurality of solid-phase first strand cDNA with non-template ribonucleotides;
(g) ligating the plurality of second adapters to 3' end of the plurality of solid-phase first strand cDNA;
(h) amplifying the plurality of solid-phase cDNA with amplification primers to generate a plurality of double stranded cDNA that are processed into a sequencing library through addition of sequencing platform adapter constructs.
2. The method of claim 1, wherein each template RNA of step (a) contains a known sequence to serve as hybridization domain.
3. The method of claim 1 or 2, wherein the precursor RNA is fragmented.
4. The method of any one of claims 1-3, wherein nucleotides are added to the 3’ end of the precursor RNA through polyadenylation or ligation.
5. The method of any one of claims 1-4, wherein each of the first adapters and/or each of the first strand cDNA synthesis primers further comprise a sample barcode and/or unique molecular identifier.
6. The method of any one of claims 1-5, wherein each of the second adapters further comprises a sample barcode, unique molecular identifier and/or a sequencing read primer domain.
7. The method of any one of claims 1-6, wherein each of the first strand cDNA synthesis primers is not covalently linked to magnetic beads.
8. The method of any one of claims 1-7, wherein the method further comprises tagmenting the plurality of the double stranded cDNA of step (h) with transposomes to generate a tagmented sample, wherein the transposomes comprise a transposase and a transposon nucleic acid; and wherein the transposon nucleic acid comprises a transposon end domain and a second post-tagmentation amplification primer binding domain.
9. The method of any one of claims 1-8, wherein the test sample is obtained by a method for purifying an RNA molecule carrying a modification of interest in a biological sample, comprising
(a) cleaving the RNA molecule by contacting the biological sample with an agent capable of cleaving the phosphodiester bond, thereby generating a fragment of the RNA molecule, wherein the majority of fragments is around 100 nucleotides in length;
(b) contacting the RNA fragment in said biological sample with a molecule that specifically interacts with a particular modification of interest, wherein said molecule can be a protein, such as an antibody;
(c) contacting the biological sample with an agent that creates a covalent bond between the RNA molecule and the molecule that specifically interacts with the modification of interest, thereby generating a covalently bound complex containing the RNA with the modification of interest;
(d) purifying the complex obtained in step c) to provide RNA fragments containing the modification of interest, wherein said RNA fragments are used as precursor RNA of claim 1.
10. The method of claim 9, wherein the agent capable of cleaving the phosphodiester bond in step a) is a chemical agent, such as divalent cations (e.g., zinc, magnesium).
11. The method of any one of claims 1-8, wherein the test sample is obtained by a method for purifying an RNA molecule interacting with an RNA binding protein (RBP) of interest in a biological sample, comprising
(a) contacting the biological sample with an agent that creates a covalent bond between the RNA molecule and the RBP of interest, thereby generating a covalently bound RBP-RNA complex containing the RNA molecule;
(b) cleaving the RNA molecule by contacting the RBP-RNA complex with an agent capable of cleaving a bond thereof, thereby generating a fragment of the RNA molecule, wherein the fragment is at least 22 nucleotide bases in length;
(c) selecting the RBP-RNA fragment complex in said biological sample with a molecule that specifically interacts with a component of the RBP-RNA fragment complex; and
(d) purifying the RBP-RNA fragment complex obtained in step c) to provide RNA fragments interacting with the RBP of interest, wherein said RNA fragments are used as precursor RNA of claim 1.
12. The method of claim 11, wherein the agent capable of cleaving a bond is a nuclease, such as RNAse I, RNase A, RNase T1 or MNase.
13. The method of any one of claims 9-12, wherein purifying the RNA-protein complex of step (d) is performed under stringent conditions comprising:
(i) washing the complexes with buffer at least 5 times;
(ii) boiling the complexes in a denaturing ionic detergent;
(iii) separating the complexes by SDS-PAGE;
(iv) transferring said complexes to a substrate that preferentially binds RNA covalently crosslinked to protein over RNA not covalently crosslinked to protein; and
(v) digesting said protein with a protease to liberate said fragments of RNA from said RNA- protein complexes.
14. The method of any one of claims 1-13, wherein the test sample is a biological sample.
15. The method of any one of claims 1-14, wherein the RNA hybridization domain comprises a heteronucleotide stretch.
16. The method of any one of claims 1-15, wherein any of the provided oligonucleotide adapters comprise one or more nucleotide analogs.
17. The method of any one of claims 1-16, wherein the template RNA or the RNA precursor is messenger RNA.
18. The method of any one of claims 1-17, wherein the RNA hybridization domain of each of the first strand cDNA synthesis primers consists of a random er.
19. The method of any one of claims 1-18, wherein the method further comprises pooling the plurality of first adapters ligated to the plurality of RNA precursors and/or the plurality of solid-phase first strand cDNA.
20. The method of any one of claims 1-19, wherein the test sample comprising a plurality of template RNA or precursor RNA is obtained from a single cell.
21. The method of any one of claims 1-20, wherein the method further comprises subjecting the sequencing library to a sequencing protocol.
22. The method of any one of claims 1-21, wherein the method further comprises quantitating one or more RNA species of the test sample.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22166453.5 | 2022-04-04 | ||
EP22166453 | 2022-04-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023194331A1 true WO2023194331A1 (en) | 2023-10-12 |
Family
ID=81325102
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2023/058731 WO2023194331A1 (en) | 2022-04-04 | 2023-04-04 | CONSTRUCTION OF SEQUENCING LIBRARIES FROM A RIBONUCLEIC ACID (RNA) USING TAILING AND LIGATION OF cDNA (TLC) |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023194331A1 (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6316153B1 (en) | 1998-04-21 | 2001-11-13 | The University Of Connecticut | Free-form fabricaton using multi-photon excitation |
US6611375B2 (en) | 2001-01-18 | 2003-08-26 | Thermo Corion Corporation | Selectively tuned ultraviolet optical filters and methods of use thereof |
US20050227251A1 (en) * | 2003-10-23 | 2005-10-13 | Robert Darnell | Method of purifying RNA binding protein-RNA complexes |
US20070281313A1 (en) | 2006-05-30 | 2007-12-06 | Hitachi, Ltd. | Methods for quantitative cDNA analysis in single-cell |
US8835358B2 (en) | 2009-12-15 | 2014-09-16 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
WO2017048993A1 (en) | 2015-09-15 | 2017-03-23 | Takara Bio Usa, Inc. | Methods for preparing a next generation sequencing (ngs) library from a ribonucleic acid (rna) sample and compositions for practicing the same |
WO2019063803A1 (en) | 2017-09-29 | 2019-04-04 | Baseclick Gmbh | Click based ligation |
WO2020136438A1 (en) | 2018-12-28 | 2020-07-02 | Biobloxx Ab | Method and kit for preparing complementary dna |
EP3763826A1 (en) * | 2018-02-22 | 2021-01-13 | Osaka University | Analysis/diagnosis method utilizing rna modification |
WO2021130151A1 (en) | 2019-12-23 | 2021-07-01 | Baseclick Gmbh | Method of amplifying mrnas and for preparing full length mrna libraries |
WO2021208036A1 (en) | 2020-04-16 | 2021-10-21 | Singleron (Nanjing) Biotechnologies, Ltd. | A method for detection of whole transcriptome in single cells |
-
2023
- 2023-04-04 WO PCT/EP2023/058731 patent/WO2023194331A1/en unknown
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6316153B1 (en) | 1998-04-21 | 2001-11-13 | The University Of Connecticut | Free-form fabricaton using multi-photon excitation |
US6611375B2 (en) | 2001-01-18 | 2003-08-26 | Thermo Corion Corporation | Selectively tuned ultraviolet optical filters and methods of use thereof |
US20050227251A1 (en) * | 2003-10-23 | 2005-10-13 | Robert Darnell | Method of purifying RNA binding protein-RNA complexes |
US20070281313A1 (en) | 2006-05-30 | 2007-12-06 | Hitachi, Ltd. | Methods for quantitative cDNA analysis in single-cell |
US8835358B2 (en) | 2009-12-15 | 2014-09-16 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
WO2017048993A1 (en) | 2015-09-15 | 2017-03-23 | Takara Bio Usa, Inc. | Methods for preparing a next generation sequencing (ngs) library from a ribonucleic acid (rna) sample and compositions for practicing the same |
WO2019063803A1 (en) | 2017-09-29 | 2019-04-04 | Baseclick Gmbh | Click based ligation |
EP3763826A1 (en) * | 2018-02-22 | 2021-01-13 | Osaka University | Analysis/diagnosis method utilizing rna modification |
WO2020136438A1 (en) | 2018-12-28 | 2020-07-02 | Biobloxx Ab | Method and kit for preparing complementary dna |
WO2021130151A1 (en) | 2019-12-23 | 2021-07-01 | Baseclick Gmbh | Method of amplifying mrnas and for preparing full length mrna libraries |
WO2021208036A1 (en) | 2020-04-16 | 2021-10-21 | Singleron (Nanjing) Biotechnologies, Ltd. | A method for detection of whole transcriptome in single cells |
Non-Patent Citations (14)
Title |
---|
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 25, 1977, pages 389 - 3402 |
CHEN J ET AL., CANCER J., vol. 8, pages 154 - 63 |
DIFFEY, B L, METHODS, vol. 28, pages 4 - 13 |
FU ET AL.: "Digital Encoding of Cellular mRNAs Enabling Precise and Absolute Gene Expression Measurement by Single-Molecule Counting", ANAL. CHEM, vol. 86, 2014, pages 2867 - 2870, XP055285640, DOI: 10.1021/ac500459p |
FU ET AL.: "Molecular Indexing Enables Quantitative Targeted RNA Sequencing and Reveals Poor Efficiencies in Standard Library Preparations", PNAS, vol. 5, 2014, pages 1891 - 1896, XP055568863, DOI: 10.1073/pnas.1323732111 |
KARLIN ET AL., PROC NATI. ACAD. SCI USA, vol. 90, 1993, pages 5873 - 5877 |
KURIMOTO ET AL., NUCLEIC ACID RES, vol. 34, no. 50, 2006, pages e42 |
MACOSKO, E Z, CELL, vol. 161, 21 May 2015 (2015-05-21), pages 1202 - 1214 |
R. HUISGEN: "1 ,3-Dipolar Cycloaddition Chemistry", 1984, WILEY |
RODRIGUEZ-FONSECA C ET AL., RNA, vol. 6, pages 744 - 54 |
SAMBROOKRUSSELL: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR PRESS |
SO P T ET AL., CELL MOL BIO (NOISY LE GRAND, vol. 44, pages 771 |
TIJSSEN: "Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes", 1993, ELSEVIER, article "Overview of principles of hybridization and the strategy of nucleic acid probe assays" |
WULF, M G ET AL., J BIOL CHEM, vol. 294, 2019, pages 18220 - 18231 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210381042A1 (en) | Methods for Adding Adapters to Nucleic Acids and Compositions for Practicing the Same | |
US20210002633A1 (en) | Methods for Adding Adapters to Nucleic Acids and Compositions for Practicing the Same | |
US11421269B2 (en) | Target enrichment by single probe primer extension | |
US10870848B2 (en) | Methods for preparing a next generation sequencing (NGS) library from a ribonucleic acid (RNA) sample and compositions for practicing the same | |
EP3538662B1 (en) | Methods of producing amplified double stranded deoxyribonucleic acids and compositions and kits for use therein | |
US20120196279A1 (en) | Methods and compositions for nucleic acid sample preparation | |
EP3394292A1 (en) | Methods of library construction for polynucleotide sequencing | |
JP2010516284A (en) | Methods, compositions and kits for detection of microRNA | |
CN111154845B (en) | Direct RNA nanopore sequencing with the aid of stem-loop reverse polynucleotides | |
WO2020136438A9 (en) | Method and kit for preparing complementary dna | |
US20170175182A1 (en) | Transposase-mediated barcoding of fragmented dna | |
JP6876785B2 (en) | Methods for Generating Single-stranded Circular DNA Libraries for Single-Molecular Sequencing | |
WO2023194331A1 (en) | CONSTRUCTION OF SEQUENCING LIBRARIES FROM A RIBONUCLEIC ACID (RNA) USING TAILING AND LIGATION OF cDNA (TLC) | |
US11959078B2 (en) | Methods for preparing a next generation sequencing (NGS) library from a ribonucleic acid (RNA) sample and compositions for practicing the same | |
WO2019023243A1 (en) | Methods and compositions for selecting and amplifying dna targets in a single reaction mixture | |
CN117242190A (en) | Amplification of Single-stranded DNA | |
JP2023553983A (en) | Methods for double-stranded sequencing | |
WO2023172934A1 (en) | Target enrichment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23718623 Country of ref document: EP Kind code of ref document: A1 |