WO2024081622A1 - Improvement to cdna library priming - Google Patents
Improvement to cdna library priming Download PDFInfo
- Publication number
- WO2024081622A1 WO2024081622A1 PCT/US2023/076438 US2023076438W WO2024081622A1 WO 2024081622 A1 WO2024081622 A1 WO 2024081622A1 US 2023076438 W US2023076438 W US 2023076438W WO 2024081622 A1 WO2024081622 A1 WO 2024081622A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- primer
- synthetic oligonucleotide
- sequence
- oligonucleotide primer
- oligo
- Prior art date
Links
- 230000006872 improvement Effects 0.000 title description 4
- 230000037452 priming Effects 0.000 title description 4
- 239000002299 complementary DNA Substances 0.000 claims abstract description 54
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 claims description 98
- 229940113082 thymine Drugs 0.000 claims description 49
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 claims description 48
- 108091034117 Oligonucleotide Proteins 0.000 claims description 44
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims description 40
- 229920002477 rna polymer Polymers 0.000 claims description 37
- 239000002773 nucleotide Substances 0.000 claims description 34
- 125000003729 nucleotide group Chemical group 0.000 claims description 34
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 claims description 23
- 230000000295 complement effect Effects 0.000 claims description 23
- 229930024421 Adenine Natural products 0.000 claims description 22
- 229960000643 adenine Drugs 0.000 claims description 22
- 229940104302 cytosine Drugs 0.000 claims description 20
- 238000012163 sequencing technique Methods 0.000 claims description 18
- 102100034343 Integrase Human genes 0.000 claims description 13
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 12
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 12
- 238000000034 method Methods 0.000 claims description 12
- 241000713869 Moloney murine leukemia virus Species 0.000 claims description 9
- 230000003321 amplification Effects 0.000 claims description 9
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 239000000203 mixture Substances 0.000 claims description 8
- 229960002685 biotin Drugs 0.000 claims description 6
- 235000020958 biotin Nutrition 0.000 claims description 6
- 239000011616 biotin Substances 0.000 claims description 6
- 238000006467 substitution reaction Methods 0.000 claims description 5
- 239000011324 bead Substances 0.000 claims description 4
- 230000000903 blocking effect Effects 0.000 claims description 4
- 239000000376 reactant Substances 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- FYYHWMGAXLPEAU-UHFFFAOYSA-N Magnesium Chemical compound [Mg] FYYHWMGAXLPEAU-UHFFFAOYSA-N 0.000 claims description 3
- 239000003153 chemical reaction reagent Substances 0.000 claims description 3
- 229910052749 magnesium Inorganic materials 0.000 claims description 3
- 239000011777 magnesium Substances 0.000 claims description 3
- 239000007787 solid Substances 0.000 claims description 2
- 239000001226 triphosphate Substances 0.000 claims description 2
- 235000011178 triphosphate Nutrition 0.000 claims description 2
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 claims description 2
- 238000003559 RNA-seq method Methods 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 abstract description 4
- 238000010348 incorporation Methods 0.000 abstract description 3
- 239000013615 primer Substances 0.000 description 136
- 238000010839 reverse transcription Methods 0.000 description 20
- 108020004414 DNA Proteins 0.000 description 15
- 102000053602 DNA Human genes 0.000 description 15
- 108091036407 Polyadenylation Proteins 0.000 description 12
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 12
- 238000003752 polymerase chain reaction Methods 0.000 description 12
- 229920001519 homopolymer Polymers 0.000 description 11
- 108091034057 RNA (poly(A)) Proteins 0.000 description 9
- 108020004999 messenger RNA Proteins 0.000 description 7
- 210000004027 cell Anatomy 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 6
- 150000007523 nucleic acids Chemical class 0.000 description 6
- 229940035893 uracil Drugs 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000009545 invasion Effects 0.000 description 5
- 230000029058 respiratory gaseous exchange Effects 0.000 description 5
- 239000003155 DNA primer Substances 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 102000039446 nucleic acids Human genes 0.000 description 4
- 108020004707 nucleic acids Proteins 0.000 description 4
- 239000002777 nucleoside Substances 0.000 description 4
- 102000040430 polynucleotide Human genes 0.000 description 4
- 108091033319 polynucleotide Proteins 0.000 description 4
- 239000002157 polynucleotide Substances 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 108020004635 Complementary DNA Proteins 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 239000012807 PCR reagent Substances 0.000 description 3
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical group O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 3
- 238000010804 cDNA synthesis Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- QJGQUHMNIGDVPM-UHFFFAOYSA-N nitrogen group Chemical group [N] QJGQUHMNIGDVPM-UHFFFAOYSA-N 0.000 description 3
- 150000003833 nucleoside derivatives Chemical class 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- 238000011529 RT qPCR Methods 0.000 description 2
- 108091028664 Ribonucleotide Proteins 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000005547 deoxyribonucleotide Substances 0.000 description 2
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- KWIUHFFTVRNATP-UHFFFAOYSA-N glycine betaine Chemical compound C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000011535 reaction buffer Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000002336 ribonucleotide Substances 0.000 description 2
- 125000002652 ribonucleotide group Chemical group 0.000 description 2
- 108020004418 ribosomal RNA Proteins 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 238000009966 trimming Methods 0.000 description 2
- 238000012418 validation experiment Methods 0.000 description 2
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 108020001019 DNA Primers Proteins 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 239000003109 Disodium ethylene diamine tetraacetate Substances 0.000 description 1
- ZGTMUACCHSMWAC-UHFFFAOYSA-L EDTA disodium salt (anhydrous) Chemical compound [Na+].[Na+].OC(=O)CN(CC([O-])=O)CCN(CC(O)=O)CC([O-])=O ZGTMUACCHSMWAC-UHFFFAOYSA-L 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- 229930182474 N-glycoside Natural products 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 102000006943 Uracil-DNA Glycosidase Human genes 0.000 description 1
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 description 1
- VCORFLZFSPUNDN-UAGCYRGNSA-N [(2r,3s,5r)-5-(6-aminopurin-9-yl)-3-[[(2r,3s,5r)-5-(6-aminopurin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methyl [(2r,3s,5r)-5-(6-aminopurin-9-yl)-2-(phosphonooxymethyl)oxolan-3-yl] hydrogen phosphate Polymers C1=NC2=C(N)N=CN=C2N1[C@H](O[C@@H]1COP(O)(=O)O[C@@H]2[C@H](O[C@H](C2)N2C3=NC=NC(N)=C3N=C2)COP(O)(O)=O)C[C@@H]1OP(O)(=O)OC[C@@H](O1)[C@@H](O)C[C@@H]1N1C(N=CN=C2N)=C2N=C1 VCORFLZFSPUNDN-UAGCYRGNSA-N 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 238000007622 bioinformatic analysis Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 235000019301 disodium ethylene diamine tetraacetate Nutrition 0.000 description 1
- -1 e.g. Chemical group 0.000 description 1
- 150000002341 glycosylamines Chemical class 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 125000003835 nucleoside group Chemical group 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000003161 ribonuclease inhibitor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 229940104230 thymidine Drugs 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6841—In situ hybridisation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- the present invention relates generally to methods for preparing cDNA by reverse transcription of an RNA template and more particularly to improvements to oligo(dT)-priming, which is used to initiate first-strand cDNA synthesis in the reverse transcription reaction.
- RNA ribonucleic acid
- cDNA complementary DNA
- reverse transcription begins by adding nucleotides to the 3' end of a short primer strand of DNA that has hybridized to a complementary sequence on the RNA strand.
- oligo(dT) deoxythymidines
- poly(A) poly(A)
- every base sequence in the resulting cDNA library then includes a homopolymer of continuous dT bases, which creates a number of different problems.
- the long stretch of A:T pairs in the resulting cDNA is unstable and prone to DNA breathing, strand invasion, and mispriming artifacts during PCR.
- the homopolymer can stall the polymerase, reducing the yield of successful molecules during PCR as well as during sequencing. Further, the precise boundaries of the homopolymer sequence are difficult to identify in sequence data.
- This invention modifies the oligo(dT) primer to address these problems and related disadvantages associated with standard oligo(dT) primers.
- the present invention provides oligonucleotide primers and related compositions and methods that improve the stability and replicability of cDNA molecules produced during reverse transcription.
- the primers described here result in fewer sequence reads lost to artifacts during amplification and sequencing and provide easier detection of the end position of each sequence read after sequencing.
- the invention provides synthetic oligonucleotide primers including a 5' region and a 3' region, where the primer includes a span of from 10-40 thymine bases, and where the span of thymine bases is contiguous except for the substitution of from 1- 5 non-contiguous thymine bases with a non-thymine base.
- the synthetic oligonucleotide primer may also include where the 5' region includes a sequence that is at least partially complementary to a predetermined primer or adapter sequence.
- the synthetic oligonucleotide primer may also include where the primer does not contain a sequence of more than 2 contiguous non-thymine bases in its 5' region outside the span of from 10-40 thymine bases.
- the synthetic oligonucleotide primer may also include where the primer is from 10 to 70 nucleotides in length, or from 10 to 60 nucleotides, or from 10 to 50 nucleotides, or from 10 to 40 nucleotides, or from 10 to 30 nucleotides, or from 10 to 20 nucleotides in length.
- the synthetic oligonucleotide primer may also include where the 1-5 non-thymine bases are each independently selected from cytosine and guanine.
- the synthetic oligonucleotide primer may also include where the 1-5 non-thymine bases comprise at least one adenine.
- the synthetic oligonucleotide primer may also include where the 5’ region includes one or more variable sequence regions configured to be unique for each primer in a set of primers.
- the synthetic oligonucleotide primer may also include where the variable sequence region is an index or barcode sequence.
- the synthetic oligonucleotide primer may also further include a terminal 3' base selected from the group consisting of adenine, cytosine, and guanine.
- the synthetic oligonucleotide primer may also further include two terminal 3' bases in the configuration 3'-NV, where N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is a base selected from the group consisting of adenine, cytosine, and guanine.
- the synthetic oligonucleotide primer may also further include a blocking group such as a biotin molecule or a non-natural nucleotide covalently attached to the 5' end of the primer.
- the synthetic oligonucleotide primer may also further include where the primer is covalently attached at its 5' terminal end to a solid surface such as a bead.
- the primer may further include a linker sequence between the bead and the 5' terminal end.
- the invention also provides methods for preparing complementary deoxyribonucleic acid (cDNA), the methods comprising hybridizing a synthetic oligonucleotide primer as described herein to a target ribonucleic acid (RNA) and synthesizing a first cDNA strand complementary to at least a portion of the RNA molecule.
- RNA target ribonucleic acid
- the 5' region of the synthetic oligonucleotide primer includes a sequence complementary to at least a portion of a sequencing primer or adapter.
- the 5' terminal end of the synthetic oligonucleotide primer is covalently attached to a blocking group such as a biotin molecule or a non-natural nucleotide.
- the target RNA is one of a plurality of fragmented RNA molecules.
- the synthesizing a first cDNA strand is performed by a reverse transcriptase, optionally a recombinant Moloney murine leukemia virus (MMLV) derived reverse transcriptase.
- MMLV Moloney murine leukemia virus
- the recombinant MMLV derived reverse transcriptase lacks an RNase H domain when compared to the native MMLV enzyme, for example the RevertAid H Minus reverse transcriptase available from ThermoFisher Scientific, and similar MMLV enzymes.
- the invention also provides a kit of parts for preparing complementary deoxyribonucleic acid (cDNA) comprising a synthetic oligonucleotide primer as described herein and optionally one or more of a reactant mixture, the reactant mixture including deoxynucleoside triphosphates and a source of magnesium, a reverse transcriptase, a templateswitch oligonucleotide, and PCR reagents, the PCR reagents including primers of which one is at least partially complementary to the synthetic oligonucleotide primer as described herein and of which the other is optionally at least partially complementary to an adapter sequence to be added to the 3 ’ end of the nascent cDNA strand or to a target sequence within the cDNA, a DNA polymerase, and a buffer solution.
- a synthetic oligonucleotide primer as described herein and optionally one or more of a reactant mixture
- the reactant mixture including deoxynucleoside triphosphates
- FIG. 1A illustrates hybridization of a standard oligo(dT) primer to the poly(A) tail of an unknown target mRNA molecule. This primer is not part of the present invention and is illustrated for reference.
- FIG. IB illustrates double-stranded cDNA molecules produced following primer extension of the primers in FIG. 1A.
- FIG. 2A illustrates a modified oligo(dT) primer of the invention which includes a 5' anchor and its hybridization to the poly(A) tail of an unknown target mRNA molecule as well as the double-stranded cDNA resulting from primer extension.
- FIG. 2B illustrates reduction of “DNA breathing” provided by an embodiment of the modified oligo(dT) primers of the invention.
- FIG. 3 illustrates an advantage of the modified oligo(dT) primers in accordance with one embodiment.
- FIG. 4 depicts the results of an experiment demonstrating the higher quality sequence reads obtained with embodiments of modified oligo(dT) primers of the invention which include a 5' anchor, compared to a reference standard anchor oligo(dT) primer.
- FIG. 5 illustrates the performance of a modified oligo(dT) primer in accordance with the present disclosure compared with that of a standard oligo(dT) primer in the data-processing step of trimming poly(A)-derived cDNA sequence from the sequence reads.
- FIG. 6 shows only the proportions of correctly trimmed reads, zero position error, for the modified and standard oligo(dT) primers of FIG. 5.
- FIG. 7 shows the results of a differential expression experiment utilizing modified oligo(dT) primers of the invention which include a 5' anchor in accordance with one embodiment.
- oligo(dT) primer hybridizes to the polyadenosine sequence found on most mature non-ribosomal RNAs.
- cDNA molecules of the resulting library may contain long stretches of A:T pairs because their instability renders them prone to DNA breathing, strand invasion, and mispriming artifacts during amplification.
- the homopolymer can stall the polymerase, reducing the yield both during the initial amplification of the library and during sequencing.
- the precise boundaries of the homopolymer sequences are difficult to identify in sequence data.
- the present invention provides a modified oligo(dT) primer that is stabilized by substitution of discrete thymine bases within the contiguous oligo(dT) span.
- discrete thymine bases are substituted with non-thymine bases.
- non-thymine bases deliberately mismatch, that is they are not complementary to and do not form Watson-Crick base-pairs with the adenine bases of the poly(A) tail of the target RNA, also referred to herein as the RNA template.
- cDNA libraries created using modified oligo(dT) primers according to the present invention contain fewer sequence reads lost to PCR artifacts and provide for easier detection of the end position of each sequence read.
- the invention provides modified oligo(dT) primers comprising 1-5 or 1-3 non-thymine bases, preferably guanine or cytosine, each replacing a thymine base within a span of about 10-40 otherwise contiguous thymine bases at the 3' end of the primer, referred to herein as the “oligo(dT) span”.
- the oligo(dT) span may comprise about 30 thymine bases, or from 12-30 or from 18-30 thymine bases.
- the modified oligo(dT) primer contains 2 or 3 non-thymine bases, preferably guanine or cytosine, within an oligo(dT) span of about 30 thymine bases.
- the non-thymine bases are located at least 2-5 or at least 4-5 bases from the 3' terminal end of the primer.
- the non-thymine bases are placed within the oligo(dT) span at least about 2-20 nucleotides from the 3' end of the primer.
- the non-thymine bases are placed at least about 2-20 nucleotides from each other where the modified oligo(dT) primer contains more than one non-thymine base within the oligo(dT) span.
- the non-thymine bases are placed about 5, about 10, about 15, or about 20 nucleotides from each other. In embodiments, the spacing between the non-thymine bases may be variable.
- the modified oligo(dT) primer may further comprise a biotin moiety covalently attached to the 5' end of the primer.
- the modified oligo(dT) primers described here may incorporate one or two random bases at the 3' end of the primer as an "anchor" complementary to the last one or two non-adenine bases of a template RNA in order to avoid priming cDNA synthesis farther up the RNA's poly(A) tail.
- a modified oligo(dT) primer as described here may further comprise one or two 3' anchoring bases which hybridize to the one or two terminal bases of the target mRNA molecule preceding the poly(A) tail.
- modified oligo(dT) primers comprising a 3' terminal base V, wherein V is selected from the group consisting of adenine, cytosine, and guanine; or comprising two 3' terminal bases 3'-NV wherein N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is selected from the group consisting of adenine, cytosine, and guanine.
- the invention also provides a modified oligo(dT) primer comprising a terminal 3'-V or a terminal 3'-NV anchor and from 1-5 non-thymine bases, preferably selected from cytosine and guanine, located within a 3' span of contiguous thymine bases, for example a span of 20-50 contiguous thymine bases, or a span of about 30 contiguous thymine bases at the 3' end of the primer.
- a modified oligo(dT) primer comprising a terminal 3'-V or a terminal 3'-NV anchor and from 1-5 non-thymine bases, preferably selected from cytosine and guanine, located within a 3' span of contiguous thymine bases, for example a span of 20-50 contiguous thymine bases, or a span of about 30 contiguous thymine bases at the 3' end of the primer.
- modified oligo(dT) primers disclosed herein occur not during reverse transcription itself but at several points afterward.
- the modified oligo(dT) primers disclosed herein result in cDNA having improved stability and replicability.
- one or more duplicate DNA strands with an analogous (uracil replaced by thymine) or complementary (Watson-Crick paired DNA bases in the opposite orientation) base sequence to the original RNA template may be synthesized by a variety of methods; the synthesis of these new strands may be more efficient because polymerases tend to stall when they encounter a long homopolymer in the template sequence, such as a long oligo(dT) sequence.
- the modified oligo(dT) primers create an interruption in what would otherwise be such a long oligo(dT) homopolymer.
- the modified oligo(dT) primers disclosed herein result in fewer sequence reads lost to PCR artifacts.
- the double-stranded cDNA molecule resulting from a standard oligo(dT) primer contains a long stretch of A:T base pairs, which have lower stability than C:G pairs, rendering this portion of the molecule prone to so- called “DNA breathing” which refers to a random fluctuating separation of the strands.
- DNA breathing refers to a random fluctuating separation of the strands.
- This renders the cDNA susceptible to strand invasion, which refers to the annealing of a foreign DNA oligonucleotide to one of the separated strands.
- modified oligo(dT) primers described here interrupt the unstable A:T stretch, preferably with stabilizing C:G pairs, and therefore reduce the likelihood of strand invasion and reducing PCR artifacts.
- the modified oligo(dT) primers disclosed herein provide for easier detection of the end position of each sequence read.
- the sequence read may proceed all the way through the distinct cDNA sequence and continue into the oligo(A:T) as well. For many tasks it is then necessary to identify where the oligo(A:T) sequence begins in order to trim it off the sequence reads before searching for a matching sequence since the poly(A) tail is not transcribed from the genomic sequence.
- modified oligo(dT) primers described here may be substituted for standard oligo(dT) primers in various protocols requiring conversion of an RNA template to cDNA followed by amplification, including without limitation RNA-seq protocols. Incorporation of the modified oligo(dT) primers described here into RNA-seq protocols that sequence cDNA derived from the 3' end of the original RNA, nearest to the poly(A) tail, is expected to be particularly advantageous.
- the modified oligo(dT) primers described here are substituted for standard oligo(dT) primers in a Smart-3 SEQ protocol for sequencing RNA (Foley et al., Genome Res. 2019 Nov; 29(11): 1816-1825).
- target RNA is fragmented, for example using divalent cation and elevated temperatures, which may be for example 80°C or 95 °C in the presence of magnesium for from about 1-5 minutes.
- the fragmented RNA is subjected to reverse transcription (RT).
- the RT reaction is primed by hybridizing a modified oligo(dT) primer of the invention to the fragmented RNA.
- the 5' region of the modified oligo(dT) primer includes a sequence complementary to at least a portion of a sequencing primer or adapter.
- incorporating a partial downstream sequencing adapter into the first cDNA strand eliminates the need to incorporate the adapter in a subsequent ligation reaction.
- the RT reaction is performed by an MMLV-derived reverse transcriptase which allows for incorporation of a second adapter primer into the cDNA.
- MMLV- derived reverse transcriptase typically extends several non-template bases at the 3 'end, which are primarily cytosines.
- This provides a target for hybridization with a second oligonucleotide containing a short 3' oligo(G) and the innermost portion of an upstream sequencing adapter.
- the MMLV reverse transcriptase then performs a “template switch”, further extending the cDNA strand with sequence complementary to the sequencing adapter in the second oligonucleotide.
- the reverse transcription produces a cDNA strand with adapter sequences at both ends in a single incubation.
- the adapters are extended to full length and the cDNA molecules are amplified using a PCR reaction with primers complementary to the adapter sequences.
- the amplified double stranded cDNA library may be further purified, or optionally concentrated and then purified prior to sequencing.
- FIG. 1A illustrates hybridization of a standard oligo(dT) primer 104a, 104b, 104c to the poly(A) tail of an unknown target mRNA 102 molecule.
- a standard primer may optionally include an adapter 106 sequence.
- the figure illustrates the random nature of primer hybridization when using standard oligo(dT) primers.
- the primer is depicted as hybridized in three different locations along the poly(A) tail of the target mRNA sequence. It is understood that these three locations are exemplary among many possibilities.
- the standard oligo(dT) primer is not within the scope of the present invention and is depicted for reference.
- FIG. IB illustrates double-stranded cDNA molecules produced following primer extension of the standard oligo(dT) primers in FIG. 1A.
- the resulting cDNA sequences disadvantageously contain long and variable poly(A:T) homopolymers 108a, 108b 108c.
- FIG. 2A illustrates a modified oligo(dT) primer according to one embodiment and its hybridization to the poly(A) tail of an unknown target mRNA 102 molecule.
- the modified oligo(dT) primer includes two non-thymine bases, both guanine (G), a 3' anchor moiety, 3'-NV, and an optional 5' adapter sequence (Y).
- G guanine
- Y optional 5' adapter sequence
- the anchor portion of the modified oligo(dT) primer is shown hybridized to the two terminal bases of the target mRNA molecule preceding its poly(A) tail.
- “V” is a base selected from the group consisting of adenine, cytosine, and guanine.
- N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine.
- the optional 5' sequence (Y) is a predefined adapter sequence that may include a sequence at least partially complementary to an amplification or sequencing primer.
- cDNA molecule 206 resulting from primer extension and second- strand synthesis or amplification 202.
- the modified oligo(dT) primers interrupt the unstable A:T homopolymer of the cDNA molecule with stabilizing C:G pairs thereby reducing the likelihood of strand invasion and reducing PCR artifacts.
- FIG. 2B illustrates disadvantageous “DNA breathing” 208 that may occur within the oligo (A:T) span of a cDNA resulting from synthesis using a standard oligo(dT) primer and the substantial reduction of this phenomenon 212 provided by an embodiment of the modified oligo(dT) primers described here.
- FIG. 3 illustrates an advantage of the modified oligo(dT) primers in accordance with one embodiment.
- Top schematic exemplifies the ambiguity in identifying the beginning of the poly(A) homopolymer and bottom schematic illustrates how the precise boundary can be identified using a modified oligo(dT) primer as described here.
- FIG. 4 shows the results of a modified Smart-3 SEQ protocol (Foley et al., Genome Res. 2019 Nov; 29(11): 1816-1825) performed on 3 nanograms of Universal Human Reference RNA (Agilent Technologies) with a standard oligo(dT) primer having a 3' tail of 30 contiguous thymine bases (0G) as the reference primer and three embodiments of a modified oligo(dT) primer of the present invention, each having the same sequences as the reference primer except for 1, 2, or 3 guanine bases within the dTio portion of the primer (1G, 2G, 3G).
- 1.67X reaction buffer 50 mM Tris-HCl, 50 mM KC1, 4 mM MgC12, 10 mM DTT, pH 8.3 at IX; Thermo Fisher
- Reverse transcription reagents comprising 1 M trimethylglycine (MilliporeSigma), 4 mM additional MgC12 (MilliporeSigma), 1 U/pL RNase inhibitor (Thermo Fisher), 1 pM template-switch oligonucleotide (as described in Foley et al., but with thymidine residues replaced by uracil; Integrated DNA Technologies), and 10 U/pL RevertAid H Minus reverse transcriptase (Thermo Fisher), were then added to the previous sample, bringing down the reaction buffer to IX. The reaction was incubated 30 min at 42 °C followed by heat-inactivation 30 sec at 95 °C.
- PCR reagents comprising 0.5X Fidelity Buffer and 0.02 U/pL HiFi HotStart Polymerase (Kapa Biosystems), 3 mM disodium EDTA (Thermo Fisher), additional trimethylglycine to 1 M (MilliporeSigma), 0.025 U/pL E. coli uracil-DNA glycosylase (New England Biolabs), and indexed PCR primers (as described in Foley et al.; Integrated DNA Technologies) were then added to the previous sample, doubling the volume and reducing the concentrations of previous reagents by half.
- the mixture was incubated 10 min at 37 °C for removal of the template-switch oligonucleotide followed by 45 sec at 98 °C for initial denaturation, then 19 PCR cycles comprising 15 sec at 98 °C, 30 sec at 60 °C, and 15 sec at 72 °C, followed by a final extension of 1 min at 72 °C.
- the resulting libraries were purified with a 1.8X volume of AMPure XP bead suspension (Beckman Coulter) according to the manufacturer’s instructions.
- the purified libraries were sequenced with a MiSeq Nano kit, 300 cycles (Illumina).
- results show a greater proportion of sequence reads passed the Illumina chastity filter, which discards reads with poor quality in the first 25 bases, when using a modifed oligo(dT) primer of the present invention, compared to the reference primer. This shows that the modified oligo(dT) primer causes fewer reads to be wasted on unsequenceable artifacts and therefore generates more usable data per sequencing run.
- FIG. 5 compares a modified oligo(dT) primer in accordance with the present disclosure with a standard oligo(dT) primer in the data-processing step of trimming poly(A)-derived cDNA sequence from the sequence reads.
- ERCC ExFold RNA Spike-In Mixes were processed by the standard Smart-3 SEQ protocol (vl) with a 30T reverse-transcription primer (SEQ ID NO: 5) or by a modified protocol (v2) in which the primer was punctuated by two guanine substitutions (SEQ ID NO: 6), two replicates of each ERCC mix, and sequenced on the NextSeq 500 (Illumina), as described above.
- Illumina adapter sequences were removed by the bcl2fastq software, then the poly(A) sequence was removed by CutAdapt 4.4 with default settings but different base sequences to be trimmed.
- Table 2 shows the full 30-base reverse complement of each version of the oligo(dT) section of the primer, or the first 9 bases that were the same in both versions.
- the sequence reads in which the target sequence was identified and trimmed were aligned to the ERCC reference sequences (NIST) by Novoalign 3.09.04 (Novocraft) with default settings.
- the position of the trimmed end of each aligned sequence read was then compared with the last non-A base of the reference sequence to which it aligned; the aligner was allowed to soft-clip each read but the position error was calculated from the unclipped, trimmed read end position, with a negative offset corresponding to overtrimming, i.e. removing all the poly(A) sequence as well as part of the read that should have been derived from the useful non-A transcript sequence.
- FIG. 7 shows the results of a differential expression experiment. Varying amounts of Universal Human Reference RNA (Agilent Technologies) and Human Brain Reference RNA (Thermo Fisher Scientific) were processed by a modified version of the Smart-3 SEQ protocol as described above but with the removal of template-switch oligonucleotide reduced to 6 min at 37 C using a reinforced reverse transcription primer with two evenly spaced thymine bases in the dTso portion replaced by guanine, two technical replicates per condition.
- the resulting libraries were sequenced with a NextSeq 500 High Output v2.5 kit, 75 cycles (Illumina) and aligned to the hg38 human reference genome with GENCODE transcription annotations using STAR aligner. Correctly oriented gene-aligned read counts were used to calculate differential expression between the two RNA samples using DESeq2, in a separate analysis for each amount of input RNA. The results were compared with previous data from a TaqMan qPCR assay of 999 genes in the same RNA samples (MAQC Consortium, 2006). Smart-3 SEQ with the reinforced reverse transcription primer showed strong concordance with TaqMan qPCR using very low amounts of input RNA, approaching a single cell (10 pg). These results demonstrate a successful application of the modifed oligo(dT) primer of the present invention in a complete RNA sequencing library preparation protocol.
- an “embodiment” may refer to an illustrative representation of a process or article or component in which a disclosed concept or feature may be provided or embodied, or to the representation of a manner in which just the concept or feature may be provided or embodied.
- illustrated embodiments are to be understood as examples (unless otherwise stated), and other manners of embodying the described concepts or features, such as may be understood by one of ordinary skill in the art upon learning the concepts or features from the present disclosure, are within the scope of the disclosure.
- the present subject matter covers such modifications and variations as come within the scope of the appended claims and their equivalents.
- nucleotide refers to a nucleotide (e.g., RNA or DNA) or a sequence of nucleotides capable of base pairing with a complementary nucleotide or sequence of nucleotides.
- Complementarity is determined by the ability of an associated nitrogenous base of a nucleotide, also referred to as a “nucleobase” or simply a “base”, to hydrogen bond with the nitrogenous base of a different nucleotide, e.g., a nucleotide on a different nucleic acid. This interaction may also be referred to as “base pairing”.
- the base adenine binds to thymine or uracil and the base guanine binds to cytosine.
- Adenine may therefore be referred to as the complement of thymine or uracil and guanine may be referred to as the complement of cytosine, and vice versa.
- a complement may include a sequence of nucleotides that base pair with corresponding complementary nucleotides of a second nucleic acid sequence.
- barcode or “index” in the context of a subsequence of an oligonucleotide primer as described herein refers to one or more nucleotide sequences that are used to identify a cell or a plurality of cells with which the barcode is associated. Barcodes encoded in a primer may be from 4-40 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, nucleotides in length. A barcode is considered “unique” when the barcode is present in about one cell in a population of cells.
- nucleic acid refers to a polymer of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be used herein as shorthand for deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
- DNA deoxyribonucleic acid
- RNA ribonucleic acid
- nucleoside refers, in the usual and customary sense, to a glycosylamine including a nitrogenous base, also referred to as a “nucleobase”, and a five-carbon sugar, i.e., ribose or deoxyribose.
- nucleosides include cytidine, uridine, adenosine, guanosine, thymidine and inosine.
- nucleotide refers, in the usual and customary sense, to the monomeric units of nucleic acids, each unit consisting of a nucleoside and a phosphate.
- base as used herein with reference to sequences of nucleic acids refers to the nucleobase moiety of the nucleoside, e.g., cytosine, adenine, guanine, thymine, and uracil.
- oligonucleotide refers to the alphabetical representation of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Biomedical Technology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention provides modified oligo(dT) primers containing at least one deliberate mismatch within the dT span of the primer which provides improved stability and replicability of the resulting cDNA molecules. In the context of RNA sequencing applications, incorporation of the modified oligo(dT) primers results in fewer sequence reads lost to PCR artifacts and easier detection of the end position of each sequence read.
Description
IMPROVEMENT TO CDNA LIBRARY PRIMING
FIELD OF THE INVENTION
[0001] The present invention relates generally to methods for preparing cDNA by reverse transcription of an RNA template and more particularly to improvements to oligo(dT)-priming, which is used to initiate first-strand cDNA synthesis in the reverse transcription reaction.
US GOVERNMENT SPONSORSHIP
[0002] This invention was made with Government support under contracts CA193694 and CA233254 awarded by the National Institutes of Health. The Government has certain rights in the invention.
BACKGROUND
[0003] In many forms of research studying ribonucleic acid (RNA), each original strand of RNA must have its base sequence reverse-transcribed into a complementary DNA (cDNA) molecule, because DNA is more amenable to various standard techniques such as amplification and sequencing. Like other forms of DNA synthesis, reverse transcription begins by adding nucleotides to the 3' end of a short primer strand of DNA that has hybridized to a complementary sequence on the RNA strand. Many approaches targeted at recovering a diverse library of different cDNAs use a short sequence of deoxythymidines (oligo(dT)) as the primer, because it hybridizes to the longer tail of adenines (poly(A)) that is found on almost every mature non-ribosomal RNA. A downside of this approach is that every base sequence in the resulting cDNA library then includes a homopolymer of continuous dT bases, which creates a number of different problems. For example, the long stretch of A:T pairs in the resulting cDNA is unstable and prone to DNA breathing, strand invasion, and mispriming artifacts during PCR. In addition, the homopolymer can stall the polymerase, reducing the yield of successful molecules during PCR as well as during sequencing. Further, the precise boundaries of the homopolymer sequence are difficult to identify in sequence data. This invention modifies the oligo(dT) primer to address these problems and related disadvantages associated with standard oligo(dT) primers.
BRIEF SUMMARY
[0004] The present invention provides oligonucleotide primers and related compositions and methods that improve the stability and replicability of cDNA molecules produced during reverse transcription. When incorporated into protocols for RNA sequencing, the primers described here result in fewer sequence reads lost to artifacts during amplification and sequencing and provide easier detection of the end position of each sequence read after sequencing.
[0005] Accordingly, in embodiments, the invention provides synthetic oligonucleotide primers including a 5' region and a 3' region, where the primer includes a span of from 10-40 thymine bases, and where the span of thymine bases is contiguous except for the substitution of from 1- 5 non-contiguous thymine bases with a non-thymine base.
[0006] The synthetic oligonucleotide primer may also include where the 5' region includes a sequence that is at least partially complementary to a predetermined primer or adapter sequence.
[0007] The synthetic oligonucleotide primer may also include where the primer does not contain a sequence of more than 2 contiguous non-thymine bases in its 5' region outside the span of from 10-40 thymine bases.
[0008] The synthetic oligonucleotide primer may also include where the primer is from 10 to 70 nucleotides in length, or from 10 to 60 nucleotides, or from 10 to 50 nucleotides, or from 10 to 40 nucleotides, or from 10 to 30 nucleotides, or from 10 to 20 nucleotides in length.
[0009] The synthetic oligonucleotide primer may also include where the 1-5 non-thymine bases are each independently selected from cytosine and guanine.
[0010] The synthetic oligonucleotide primer may also include where the 1-5 non-thymine bases comprise at least one adenine.
[0011] The synthetic oligonucleotide primer may also include where the 5’ region includes one or more variable sequence regions configured to be unique for each primer in a set of primers.
[0012] The synthetic oligonucleotide primer may also include where the variable sequence region is an index or barcode sequence.
[0013] The synthetic oligonucleotide primer may also further include a terminal 3' base selected from the group consisting of adenine, cytosine, and guanine.
[0014] The synthetic oligonucleotide primer may also further include two terminal 3' bases in the configuration 3'-NV, where N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is a base selected from the group consisting of adenine, cytosine, and guanine.
[0015] The synthetic oligonucleotide primer may also further include a blocking group such as a biotin molecule or a non-natural nucleotide covalently attached to the 5' end of the primer. [0016] The synthetic oligonucleotide primer may also further include where the primer is covalently attached at its 5' terminal end to a solid surface such as a bead. The primer may further include a linker sequence between the bead and the 5' terminal end.
[0017] Other technical features of the synthetic oligonucleotide primers described here may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.
[0018] The invention also provides methods for preparing complementary deoxyribonucleic acid (cDNA), the methods comprising hybridizing a synthetic oligonucleotide primer as described herein to a target ribonucleic acid (RNA) and synthesizing a first cDNA strand complementary to at least a portion of the RNA molecule. In embodiments, the 5' region of the synthetic oligonucleotide primer includes a sequence complementary to at least a portion of a sequencing primer or adapter. In embodiments, the 5' terminal end of the synthetic oligonucleotide primer is covalently attached to a blocking group such as a biotin molecule or a non-natural nucleotide. In embodiments, the target RNA is one of a plurality of fragmented RNA molecules. In embodiments, the synthesizing a first cDNA strand is performed by a reverse transcriptase, optionally a recombinant Moloney murine leukemia virus (MMLV) derived reverse transcriptase. In embodiments, the recombinant MMLV derived reverse transcriptase lacks an RNase H domain when compared to the native MMLV enzyme, for example the RevertAid H Minus reverse transcriptase available from ThermoFisher Scientific, and similar MMLV enzymes.
[0019] The invention also provides a kit of parts for preparing complementary deoxyribonucleic acid (cDNA) comprising a synthetic oligonucleotide primer as described herein and optionally one or more of a reactant mixture, the reactant mixture including deoxynucleoside triphosphates and a source of magnesium, a reverse transcriptase, a templateswitch oligonucleotide, and PCR reagents, the PCR reagents including primers of which one is at least partially complementary to the synthetic oligonucleotide primer as described herein and
of which the other is optionally at least partially complementary to an adapter sequence to be added to the 3 ’ end of the nascent cDNA strand or to a target sequence within the cDNA, a DNA polymerase, and a buffer solution. Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1A illustrates hybridization of a standard oligo(dT) primer to the poly(A) tail of an unknown target mRNA molecule. This primer is not part of the present invention and is illustrated for reference.
[0021] FIG. IB illustrates double-stranded cDNA molecules produced following primer extension of the primers in FIG. 1A.
[0022] FIG. 2A illustrates a modified oligo(dT) primer of the invention which includes a 5' anchor and its hybridization to the poly(A) tail of an unknown target mRNA molecule as well as the double-stranded cDNA resulting from primer extension.
[0023] FIG. 2B illustrates reduction of “DNA breathing” provided by an embodiment of the modified oligo(dT) primers of the invention.
[0024] FIG. 3 illustrates an advantage of the modified oligo(dT) primers in accordance with one embodiment.
[0025] FIG. 4 depicts the results of an experiment demonstrating the higher quality sequence reads obtained with embodiments of modified oligo(dT) primers of the invention which include a 5' anchor, compared to a reference standard anchor oligo(dT) primer.
[0026] FIG. 5 illustrates the performance of a modified oligo(dT) primer in accordance with the present disclosure compared with that of a standard oligo(dT) primer in the data-processing step of trimming poly(A)-derived cDNA sequence from the sequence reads.
[0027] FIG. 6 shows only the proportions of correctly trimmed reads, zero position error, for the modified and standard oligo(dT) primers of FIG. 5.
[0028] FIG. 7 shows the results of a differential expression experiment utilizing modified oligo(dT) primers of the invention which include a 5' anchor in accordance with one embodiment.
DETAILED DESCRIPTION
[0029] In order for reverse transcription to proceed, it is necessary to introduce a short DNA primer that can hybridize to the target RNA strand. This is typically a short sequence of deoxythymidines, referred to as an “oligo(dT)” primer or first strand cDNA primer. Oligo(dT) primers hybridize to the polyadenosine sequence found on most mature non-ribosomal RNAs. However, it may be undesirable for the cDNA molecules of the resulting library to contain long stretches of A:T pairs because their instability renders them prone to DNA breathing, strand invasion, and mispriming artifacts during amplification. In addition, the homopolymer can stall the polymerase, reducing the yield both during the initial amplification of the library and during sequencing. Finally, the precise boundaries of the homopolymer sequences are difficult to identify in sequence data.
[0030] To address these issues, the present invention provides a modified oligo(dT) primer that is stabilized by substitution of discrete thymine bases within the contiguous oligo(dT) span. In particular, discrete thymine bases are substituted with non-thymine bases. These nonthymine bases deliberately mismatch, that is they are not complementary to and do not form Watson-Crick base-pairs with the adenine bases of the poly(A) tail of the target RNA, also referred to herein as the RNA template. Although each mismatch potentially weakens the affinity of the oligo(dT) primer for its poly(A) target, which is disadvantageous, the present inventors found, unexpectedly, that a certain number of deliberate mismatches provides benefits in the form of an improvement in the stability and replicability of the resulting cDNA molecules. In addition, cDNA libraries created using modified oligo(dT) primers according to the present invention contain fewer sequence reads lost to PCR artifacts and provide for easier detection of the end position of each sequence read.
[0031] Accordingly, in embodiments, the invention provides modified oligo(dT) primers comprising 1-5 or 1-3 non-thymine bases, preferably guanine or cytosine, each replacing a thymine base within a span of about 10-40 otherwise contiguous thymine bases at the 3' end of the primer, referred to herein as the “oligo(dT) span”. In embodiments, the oligo(dT) span may comprise about 30 thymine bases, or from 12-30 or from 18-30 thymine bases. In embodiments, the modified oligo(dT) primer contains 2 or 3 non-thymine bases, preferably guanine or cytosine, within an oligo(dT) span of about 30 thymine bases. Preferably, the non-thymine bases are located at least 2-5 or at least 4-5 bases from the 3' terminal end of the primer. In embodiments, the non-thymine bases are placed within the oligo(dT) span at least about 2-20
nucleotides from the 3' end of the primer. In embodiments, the non-thymine bases are placed at least about 2-20 nucleotides from each other where the modified oligo(dT) primer contains more than one non-thymine base within the oligo(dT) span. In embodiments where the modified oligo(dT) primer contains more than one non-thymine base within the oligo(dT) span, the non-thymine bases are placed about 5, about 10, about 15, or about 20 nucleotides from each other. In embodiments, the spacing between the non-thymine bases may be variable.
[0032] In embodiments, the modified oligo(dT) primer may further comprise a biotin moiety covalently attached to the 5' end of the primer.
[0033] In embodiments, the modified oligo(dT) primers described here may incorporate one or two random bases at the 3' end of the primer as an "anchor" complementary to the last one or two non-adenine bases of a template RNA in order to avoid priming cDNA synthesis farther up the RNA's poly(A) tail. Thus, in embodiments, a modified oligo(dT) primer as described here may further comprise one or two 3' anchoring bases which hybridize to the one or two terminal bases of the target mRNA molecule preceding the poly(A) tail. Accordingly, the invention provides modified oligo(dT) primers comprising a 3' terminal base V, wherein V is selected from the group consisting of adenine, cytosine, and guanine; or comprising two 3' terminal bases 3'-NV wherein N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is selected from the group consisting of adenine, cytosine, and guanine.
[0034] In embodiments, the invention also provides a modified oligo(dT) primer comprising a terminal 3'-V or a terminal 3'-NV anchor and from 1-5 non-thymine bases, preferably selected from cytosine and guanine, located within a 3' span of contiguous thymine bases, for example a span of 20-50 contiguous thymine bases, or a span of about 30 contiguous thymine bases at the 3' end of the primer.
[0035] The benefits of utilizing the modified oligo(dT) primers disclosed herein occur not during reverse transcription itself but at several points afterward. First, the modified oligo(dT) primers disclosed herein result in cDNA having improved stability and replicability. Without being bound by any particular theory, after the first strand of cDNA has been synthesized by reverse transcription, one or more duplicate DNA strands with an analogous (uracil replaced by thymine) or complementary (Watson-Crick paired DNA bases in the opposite orientation) base sequence to the original RNA template may be synthesized by a variety of methods; the synthesis of these new strands may be more efficient because polymerases tend to stall when
they encounter a long homopolymer in the template sequence, such as a long oligo(dT) sequence. The modified oligo(dT) primers create an interruption in what would otherwise be such a long oligo(dT) homopolymer. This efficiency accrues exponentially during the polymerase chain reaction (PCR) amplification of the cDNA because it is repeated each time the cDNA sequence is replicated, i.e. in every cycle of PCR or every cycle of cluster generation on a sequencing flow cell.
[0036] Second, the modified oligo(dT) primers disclosed herein result in fewer sequence reads lost to PCR artifacts. Without being bound by any particular theory, the double-stranded cDNA molecule resulting from a standard oligo(dT) primer contains a long stretch of A:T base pairs, which have lower stability than C:G pairs, rendering this portion of the molecule prone to so- called “DNA breathing” which refers to a random fluctuating separation of the strands. This renders the cDNA susceptible to strand invasion, which refers to the annealing of a foreign DNA oligonucleotide to one of the separated strands. If this occurs in the product of an ongoing polymerization reaction, it may result in priming the synthesis of the foreign cDNA strand, thereby creating various byproducts. The modified oligo(dT) primers described here interrupt the unstable A:T stretch, preferably with stabilizing C:G pairs, and therefore reduce the likelihood of strand invasion and reducing PCR artifacts.
[0037] Third, the modified oligo(dT) primers disclosed herein provide for easier detection of the end position of each sequence read. Without being bound by any particular theory, if the cDNA is sequenced, the sequence read may proceed all the way through the distinct cDNA sequence and continue into the oligo(A:T) as well. For many tasks it is then necessary to identify where the oligo(A:T) sequence begins in order to trim it off the sequence reads before searching for a matching sequence since the poly(A) tail is not transcribed from the genomic sequence. It can be difficult to identify the true site where the homopolymer sequence begins in a sequence read because errors in replication or sequencing may cause an adenine derived originally from the poly(A) tail to be misread as a different base, or a different base derived from distinct cDNA sequence to be misread as adenine. The presence of non-thymine bases at known positions in the oligo(dT) sequence of the amplified cDNA molecules, and therefore non-adenine bases at known positions in the complementary oligo(dA) sequence, allows any standard matching algorithm to precisely identify the beginning of that sequence even in the presence of errors.
[0038] The modified oligo(dT) primers described here may be substituted for standard oligo(dT) primers in various protocols requiring conversion of an RNA template to cDNA followed by amplification, including without limitation RNA-seq protocols. Incorporation of the modified oligo(dT) primers described here into RNA-seq protocols that sequence cDNA derived from the 3' end of the original RNA, nearest to the poly(A) tail, is expected to be particularly advantageous.
[0039] In an exemplary embodiment, the modified oligo(dT) primers described here are substituted for standard oligo(dT) primers in a Smart-3 SEQ protocol for sequencing RNA (Foley et al., Genome Res. 2019 Nov; 29(11): 1816-1825). In an initial step, target RNA is fragmented, for example using divalent cation and elevated temperatures, which may be for example 80°C or 95 °C in the presence of magnesium for from about 1-5 minutes. Without further purification or enrichment, the fragmented RNA is subjected to reverse transcription (RT). The RT reaction is primed by hybridizing a modified oligo(dT) primer of the invention to the fragmented RNA. In embodiments, the 5' region of the modified oligo(dT) primer includes a sequence complementary to at least a portion of a sequencing primer or adapter. For example, incorporating a partial downstream sequencing adapter into the first cDNA strand eliminates the need to incorporate the adapter in a subsequent ligation reaction. The RT reaction is performed by an MMLV-derived reverse transcriptase which allows for incorporation of a second adapter primer into the cDNA. Thus, after extending the first cDNA strand, MMLV- derived reverse transcriptase typically extends several non-template bases at the 3 'end, which are primarily cytosines. This provides a target for hybridization with a second oligonucleotide containing a short 3' oligo(G) and the innermost portion of an upstream sequencing adapter. The MMLV reverse transcriptase then performs a “template switch”, further extending the cDNA strand with sequence complementary to the sequencing adapter in the second oligonucleotide. Using this protocol, the reverse transcription produces a cDNA strand with adapter sequences at both ends in a single incubation. Next, the adapters are extended to full length and the cDNA molecules are amplified using a PCR reaction with primers complementary to the adapter sequences. The amplified double stranded cDNA library may be further purified, or optionally concentrated and then purified prior to sequencing.
[0040] FIG. 1A illustrates hybridization of a standard oligo(dT) primer 104a, 104b, 104c to the poly(A) tail of an unknown target mRNA 102 molecule. A standard primer may optionally include an adapter 106 sequence. The figure illustrates the random nature of primer
hybridization when using standard oligo(dT) primers. Thus, the primer is depicted as hybridized in three different locations along the poly(A) tail of the target mRNA sequence. It is understood that these three locations are exemplary among many possibilities. The standard oligo(dT) primer is not within the scope of the present invention and is depicted for reference.
[0041] FIG. IB illustrates double-stranded cDNA molecules produced following primer extension of the standard oligo(dT) primers in FIG. 1A. As shown graphically in the figure, the resulting cDNA sequences disadvantageously contain long and variable poly(A:T) homopolymers 108a, 108b 108c.
[0042] FIG. 2A illustrates a modified oligo(dT) primer according to one embodiment and its hybridization to the poly(A) tail of an unknown target mRNA 102 molecule. As illustrated, the modified oligo(dT) primer includes two non-thymine bases, both guanine (G), a 3' anchor moiety, 3'-NV, and an optional 5' adapter sequence (Y). The anchor portion of the modified oligo(dT) primer is shown hybridized to the two terminal bases of the target mRNA molecule preceding its poly(A) tail. “V” is a base selected from the group consisting of adenine, cytosine, and guanine. “N” is a base selected from the group consisting of thymine, adenine, cytosine, and guanine. The optional 5' sequence (Y) is a predefined adapter sequence that may include a sequence at least partially complementary to an amplification or sequencing primer.
[0043] Also shown is the cDNA molecule 206 resulting from primer extension and second- strand synthesis or amplification 202. The modified oligo(dT) primers interrupt the unstable A:T homopolymer of the cDNA molecule with stabilizing C:G pairs thereby reducing the likelihood of strand invasion and reducing PCR artifacts.
[0044] FIG. 2B illustrates disadvantageous “DNA breathing” 208 that may occur within the oligo (A:T) span of a cDNA resulting from synthesis using a standard oligo(dT) primer and the substantial reduction of this phenomenon 212 provided by an embodiment of the modified oligo(dT) primers described here.
[0045] FIG. 3 illustrates an advantage of the modified oligo(dT) primers in accordance with one embodiment. Top schematic exemplifies the ambiguity in identifying the beginning of the poly(A) homopolymer and bottom schematic illustrates how the precise boundary can be identified using a modified oligo(dT) primer as described here.
[0046] FIG. 4 shows the results of a modified Smart-3 SEQ protocol (Foley et al., Genome Res. 2019 Nov; 29(11): 1816-1825) performed on 3 nanograms of Universal Human Reference RNA (Agilent Technologies) with a standard oligo(dT) primer having a 3' tail of 30 contiguous
thymine bases (0G) as the reference primer and three embodiments of a modified oligo(dT) primer of the present invention, each having the same sequences as the reference primer except for 1, 2, or 3 guanine bases within the dTio portion of the primer (1G, 2G, 3G). Briefly, total RNA was fragmented by incubating 1 min at 95 °C in 1.67X reaction buffer (50 mM Tris-HCl, 50 mM KC1, 4 mM MgC12, 10 mM DTT, pH 8.3 at IX; Thermo Fisher) in the presence of 1 mM dNTPs (Kapa Biosystems) and 583 nM oligo(dT) primer of the specified design, followed by 1 min at 25 °C to hybridize the primer. Reverse transcription reagents, comprising 1 M trimethylglycine (MilliporeSigma), 4 mM additional MgC12 (MilliporeSigma), 1 U/pL RNase inhibitor (Thermo Fisher), 1 pM template-switch oligonucleotide (as described in Foley et al., but with thymidine residues replaced by uracil; Integrated DNA Technologies), and 10 U/pL RevertAid H Minus reverse transcriptase (Thermo Fisher), were then added to the previous sample, bringing down the reaction buffer to IX. The reaction was incubated 30 min at 42 °C followed by heat-inactivation 30 sec at 95 °C. PCR reagents, comprising 0.5X Fidelity Buffer and 0.02 U/pL HiFi HotStart Polymerase (Kapa Biosystems), 3 mM disodium EDTA (Thermo Fisher), additional trimethylglycine to 1 M (MilliporeSigma), 0.025 U/pL E. coli uracil-DNA glycosylase (New England Biolabs), and indexed PCR primers (as described in Foley et al.; Integrated DNA Technologies) were then added to the previous sample, doubling the volume and reducing the concentrations of previous reagents by half. The mixture was incubated 10 min at 37 °C for removal of the template-switch oligonucleotide followed by 45 sec at 98 °C for initial denaturation, then 19 PCR cycles comprising 15 sec at 98 °C, 30 sec at 60 °C, and 15 sec at 72 °C, followed by a final extension of 1 min at 72 °C. The resulting libraries were purified with a 1.8X volume of AMPure XP bead suspension (Beckman Coulter) according to the manufacturer’s instructions. The purified libraries were sequenced with a MiSeq Nano kit, 300 cycles (Illumina).
[0047] Technical replicates of the same oligo(dT) primer design are grouped vertically. The sequences of the primers are shown in Table 1. Each of the primers also contained a biotin moiety linked to the 5' end of the molecule to prevent concatenation of additional adapters by template-switching.
[0048] The results show a greater proportion of sequence reads passed the Illumina chastity filter, which discards reads with poor quality in the first 25 bases, when using a modifed oligo(dT) primer of the present invention, compared to the reference primer. This shows that
the modified oligo(dT) primer causes fewer reads to be wasted on unsequenceable artifacts and therefore generates more usable data per sequencing run.
[0049] Table 1 : Sequences of primers used in validation experiment
Seq
Name SEQUENCE
Identifier
SEQ ID 0G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT
NO: 1 Primer TTTTTTTTTTTTTTTTTTTTTTTV
SEQ ID 1G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT
NO: 2 Primer TTTTTTTTGTTTTTTTTTTTTTTV
SEQ ID 2G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT
NO: 3 Primer TTTTGTTTTTTTTGTTTTTTTTTV
SEQ ID 3G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT
NO: 4 Primer GTTTTTTTGTTTTTTTGTTTTTTV
[0050] The following example demonstrates the advantages of the compositions and methods described here in bioinformatic analyses of the sequence reads. FIG. 5 compares a modified oligo(dT) primer in accordance with the present disclosure with a standard oligo(dT) primer in the data-processing step of trimming poly(A)-derived cDNA sequence from the sequence reads. ERCC ExFold RNA Spike-In Mixes (Thermo Fisher Scientific) were processed by the standard Smart-3 SEQ protocol (vl) with a 30T reverse-transcription primer (SEQ ID NO: 5) or by a modified protocol (v2) in which the primer was punctuated by two guanine substitutions (SEQ ID NO: 6), two replicates of each ERCC mix, and sequenced on the NextSeq 500 (Illumina), as described above. Illumina adapter sequences were removed by the bcl2fastq software, then the poly(A) sequence was removed by CutAdapt 4.4 with default settings but different base sequences to be trimmed. Table 2 shows the full 30-base reverse complement of each version of the oligo(dT) section of the primer, or the first 9 bases that were the same in both versions. The sequence reads in which the target sequence was identified and trimmed were aligned to
the ERCC reference sequences (NIST) by Novoalign 3.09.04 (Novocraft) with default settings. The position of the trimmed end of each aligned sequence read was then compared with the last non-A base of the reference sequence to which it aligned; the aligner was allowed to soft-clip each read but the position error was calculated from the unclipped, trimmed read end position, with a negative offset corresponding to overtrimming, i.e. removing all the poly(A) sequence as well as part of the read that should have been derived from the useful non-A transcript sequence.
[0051] Table 2: Sequences of primers used in validation experiment
Seq Name SEQUENCE
Identifier
SEQ ID Long trim AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
NO: 5 (vl)
SEQ ID Long trim AAAAAAAAACAAAAAAAAACAAAAAAAAAA
NO: 6 (v2)
SEQ ID
Short trim AAAAAAAAA
NO: 7
[0052] With both versions of the primer, a large proportion of reads were trimmed to an incorrect end position due to the low complexity of the poly(A) target sequence, and overtrimming was more frequent than undertrimming. However, the modified primer (v2) resulted in substantially more reads that were trimmed to the correct end position, even when using only the short trim sequence common to both primers. FIG. 6 compares only the proportions of correctly trimmed reads, zero position error. These results show how the modified oligo(dT) primer enables more accurate bioinformatic analysis of data from poly(A) RNA-seq libraries.
[0053] FIG. 7 shows the results of a differential expression experiment. Varying amounts of Universal Human Reference RNA (Agilent Technologies) and Human Brain Reference RNA (Thermo Fisher Scientific) were processed by a modified version of the Smart-3 SEQ protocol
as described above but with the removal of template-switch oligonucleotide reduced to 6 min at 37 C using a reinforced reverse transcription primer with two evenly spaced thymine bases in the dTso portion replaced by guanine, two technical replicates per condition.
[0054] The resulting libraries were sequenced with a NextSeq 500 High Output v2.5 kit, 75 cycles (Illumina) and aligned to the hg38 human reference genome with GENCODE transcription annotations using STAR aligner. Correctly oriented gene-aligned read counts were used to calculate differential expression between the two RNA samples using DESeq2, in a separate analysis for each amount of input RNA. The results were compared with previous data from a TaqMan qPCR assay of 999 genes in the same RNA samples (MAQC Consortium, 2006). Smart-3 SEQ with the reinforced reverse transcription primer showed strong concordance with TaqMan qPCR using very low amounts of input RNA, approaching a single cell (10 pg). These results demonstrate a successful application of the modifed oligo(dT) primer of the present invention in a complete RNA sequencing library preparation protocol.
[0055] While various embodiments and aspects of the present invention are shown and described herein, it will be obvious to those skilled in the art that such embodiments and aspects are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention.
[0056] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described. All documents, or portions of documents, cited in the application including, without limitation, patents, patent applications, articles, books, manuals, and treatises are hereby expressly incorporated by reference in their entirety for any purpose.
[0057] The abbreviations used herein have their conventional meaning within the chemical and biological arts. The chemical structures and formulae set forth herein are constructed according to the standard rules of chemical valency known in the chemical arts.
[0058] While the invention herein disclosed has been described by means of specific embodiments and applications thereof, numerous modifications and variations could be made thereto by those skilled in the art without departing from the scope set forth in the claims.
[0059] It will be appreciated that the present invention is set forth in various levels of detail in this application. In certain instances, details that are not necessary for one of ordinary skill in
the art to understand the invention, or that render other details difficult to perceive may have been omitted. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting beyond the scope of the appended claims. Unless defined otherwise, technical terms used herein are to be understood as commonly understood by one of ordinary skill in the art to which the disclosure belongs.
[0060] It should be understood that, as described herein, an “embodiment” (such as illustrated in the accompanying Figures) may refer to an illustrative representation of a process or article or component in which a disclosed concept or feature may be provided or embodied, or to the representation of a manner in which just the concept or feature may be provided or embodied. However such illustrated embodiments are to be understood as examples (unless otherwise stated), and other manners of embodying the described concepts or features, such as may be understood by one of ordinary skill in the art upon learning the concepts or features from the present disclosure, are within the scope of the disclosure. Thus, it is intended that the present subject matter covers such modifications and variations as come within the scope of the appended claims and their equivalents.
[0061] The presently disclosed embodiments are to be considered in all respects as illustrative and not restrictive, the scope of the claimed subject matter being indicated by the appended claims, and not limited to the foregoing description or particular embodiments or arrangements described or illustrated herein.
[0062] In the foregoing description and the following claims, the following will be appreciated. The phrases “at least one”, “one or more”, and “and/or”, as used herein, are open- ended expressions that are both conjunctive and disjunctive in operation. The terms “a”, “an”, “the”, “first”, “second”, etc., do not preclude a plurality. For example, the term “a” or “an” entity, as used herein, refers to one or more of that entity. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein.
[0063] The term “about” when used before a numerical designation, e.g., temperature, time, amount, concentration, and such other, including a range, indicates approximations which may vary by ( + ) or ( - ) 10%, 5%, 1%, or any subrange or subvalue there between. Preferably, the term “about” means that the value may vary by +/- 10%.
[0064] The term “comprises/comprising” does not exclude the presence of other elements, components, features, regions, integers, steps, operations, etc. Additionally, although individual features may be included in different claims, these may possibly advantageously be combined,
and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. By contrast, the transitional phrase “consisting of’ excludes any element, step, or ingredient not specified in the claim. The transitional phrase “consisting essentially of’ limits the scope of a claim to the specified materials or steps “and those that do not materially affect the basic and novel characteristic(s)” of the claimed invention.
[0065] The term “complement,” refers to a nucleotide (e.g., RNA or DNA) or a sequence of nucleotides capable of base pairing with a complementary nucleotide or sequence of nucleotides. Complementarity is determined by the ability of an associated nitrogenous base of a nucleotide, also referred to as a “nucleobase” or simply a “base”, to hydrogen bond with the nitrogenous base of a different nucleotide, e.g., a nucleotide on a different nucleic acid. This interaction may also be referred to as “base pairing”. The base adenine binds to thymine or uracil and the base guanine binds to cytosine. Adenine may therefore be referred to as the complement of thymine or uracil and guanine may be referred to as the complement of cytosine, and vice versa. Thus, a complement may include a sequence of nucleotides that base pair with corresponding complementary nucleotides of a second nucleic acid sequence.
[0066] The term “barcode” or “index” in the context of a subsequence of an oligonucleotide primer as described herein refers to one or more nucleotide sequences that are used to identify a cell or a plurality of cells with which the barcode is associated. Barcodes encoded in a primer may be from 4-40 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, nucleotides in length. A barcode is considered “unique” when the barcode is present in about one cell in a population of cells.
[0067] The term “nucleic acid” refers to a polymer of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be used herein as shorthand for deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
[0068] The term “nucleoside” refers, in the usual and customary sense, to a glycosylamine including a nitrogenous base, also referred to as a “nucleobase”, and a five-carbon sugar, i.e., ribose or deoxyribose. Non limiting examples of nucleosides include cytidine, uridine, adenosine, guanosine, thymidine and inosine.
[0069] The term “nucleotide” refers, in the usual and customary sense, to the monomeric units of nucleic acids, each unit consisting of a nucleoside and a phosphate.
[0070] The term “base” as used herein with reference to sequences of nucleic acids refers to the nucleobase moiety of the nucleoside, e.g., cytosine, adenine, guanine, thymine, and uracil. [0071] The terms “oligonucleotide,” “nucleic acid sequence,” and “polynucleotide” are used interchangeably and are intended to include a polymeric form of nucleotides covalently linked together that may have various lengths, either deoxyribonucleotides or ribonucleotides, or analogs, derivatives or modifications thereof. An oligonucleotide is typically composed of a sequence of nucleotides comprising nucleobases selected from adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U). Thus, the term “polynucleotide sequence” may refer to the alphabetical representation of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself.
Claims
1. A synthetic oligonucleotide primer comprising a 5' region and a 3' region, wherein the primer comprises a span of from 10-40 thymine bases, wherein the span of thymine bases is contiguous except for the substitution of from 1-5 non-contiguous thymine bases with a nonthymine base.
2. The synthetic oligonucleotide primer of claim 1, wherein the 5' region comprises a defined sequence.
3. The synthetic oligonucleotide primer of claim 2, wherein the defined sequence comprises a sequence that is at least partially complementary to the sequence of an amplification or sequencing primer.
4. The synthetic oligonucleotide primer of claim 1, wherein the primer does not contain a sequence of more than 2 contiguous non-thymine bases in its 5' region outside the span of from 10-40 thymine bases.
5. The synthetic oligonucleotide primer of any one of claims 1 to 4, wherein the primer is from 10 to 70 nucleotides in length, or from 10 to 60 nucleotides, or from 10 to 50 nucleotides, or from 10 to 40 nucleotides, or from 10 to 30 nucleotides, or from 10 to 20 nucleotides in length.
6. The synthetic oligonucleotide primer of any one of claims 1 to 5, wherein each of the 1-5 non-thymine bases is independently selected from cytosine and guanine.
7. The synthetic oligonucleotide primer of any one of claims 1 to 5, wherein the 1-5 non- thymine bases comprise at least one adenine.
8. The synthetic oligonucleotide primer of any one of claims 1 to 7, wherein the 5’ region includes one or more variable sequence regions configured to be unique for each primer in a set of primers.
9. The synthetic oligonucleotide primer of claim 8, wherein the variable sequence region is an index or barcode sequence.
10. The synthetic oligonucleotide primer of any one of claims 1 to 9, further comprising a terminal 3' base selected from the group consisting of adenine, cytosine, and guanine.
11. The synthetic oligonucleotide primer of any one of claims 1 to 10, further comprising two 3' terminal bases in the configuration 3'-NV, where N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is a base selected from the group consisting of adenine, cytosine, and guanine.
12. The synthetic oligonucleotide primer of any one of claims 1 to 11, further comprising a blocking group such as a biotin molecule or a non-natural nucleotide covalently attached to the 5' end of the primer.
13. The synthetic oligonucleotide primer of any one of claims 1 to 12, wherein the primer is covalently attached at its 5' terminal end to a solid surface such as a bead.
14. A method for preparing complementary deoxyribonucleic acid (cDNA) comprising hybridizing the synthetic oligonucleotide primer of any one of claims 1 to 12 to a target ribonucleic acid (RNA) and synthesizing a first cDNA strand complementary to at least a portion of the RNA molecule.
15. The method of claim 14, wherein the 5' region of the synthetic oligonucleotide primer comprises a sequence complementary to at least a portion of a sequencing primer or adapter.
16. The method of claim 14, wherein the 5' terminal end of the synthetic oligonucleotide primer is covalently attached to a blocking group such as a biotin molecule or a non-natural nucleotide.
17. The method of claim 14, wherein the target RNA is one of a plurality of fragmented RNA molecules.
18. The method of claim 14, wherein the synthesizing a first cDNA strand is performed by a reverse transcriptase, optionally a recombinant Moloney murine leukemia virus (MMLV) reverse transcriptase.
19. A kit of parts for preparing complementary deoxyribonucleic acid (cDNA) comprising the synthetic oligonucleotide primer of any one of claims 1 to 12, and optionally one or more of a
reactant mixture, the reactant mixture including deoxynucleoside triphosphates and a source of magnesium, a reverse transcriptase, a template-switch oligonucleotide, and further optionally, reagents for performing a PCR reaction.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263414979P | 2022-10-11 | 2022-10-11 | |
US63/414,979 | 2022-10-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024081622A1 true WO2024081622A1 (en) | 2024-04-18 |
Family
ID=88833960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/076438 WO2024081622A1 (en) | 2022-10-11 | 2023-10-10 | Improvement to cdna library priming |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024081622A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009125001A (en) * | 2007-11-22 | 2009-06-11 | Japan Science & Technology Agency | OLIGO(dT) PRIMER, cDNA LIBRARY-FORMING KIT AND METHOD FOR FORMING cDNA LIBRARY |
WO2014201273A1 (en) * | 2013-06-12 | 2014-12-18 | The Broad Institute, Inc. | High-throughput rna-seq |
US20190161793A1 (en) * | 2015-02-17 | 2019-05-30 | Bio-Rad Laboratories, Inc. | Small nucleic acid quantification using split cycle amplification |
-
2023
- 2023-10-10 WO PCT/US2023/076438 patent/WO2024081622A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009125001A (en) * | 2007-11-22 | 2009-06-11 | Japan Science & Technology Agency | OLIGO(dT) PRIMER, cDNA LIBRARY-FORMING KIT AND METHOD FOR FORMING cDNA LIBRARY |
WO2014201273A1 (en) * | 2013-06-12 | 2014-12-18 | The Broad Institute, Inc. | High-throughput rna-seq |
US20190161793A1 (en) * | 2015-02-17 | 2019-05-30 | Bio-Rad Laboratories, Inc. | Small nucleic acid quantification using split cycle amplification |
Non-Patent Citations (6)
Title |
---|
CAVANAUGH NISHA A. ET AL: "Herpes Simplex Virus-1 Helicase-Primase: Roles of Each Subunit in DNA Binding and Phosphodiester Bond Formation", BIOCHEMISTRY, vol. 48, no. 43, 12 October 2009 (2009-10-12), pages 10199 - 10207, XP093126264, ISSN: 0006-2960, DOI: 10.1021/bi9010144 * |
FOLEY ET AL., GENOME RES., vol. 29, no. 11, November 2019 (2019-11-01), pages 1816 - 1825 |
PATRIK L. STÅHL ET AL: "Supplementary Materials for Visualization and analysis of gene expression in tissue sections by spatial transcriptomics", SCIENCE, vol. 353, no. 6294, 30 June 2016 (2016-06-30), US, pages 1 - 40, XP055653296, ISSN: 0036-8075, Retrieved from the Internet <URL:www.sciencemag.org/content/353/6294/78/suppl/DC1> DOI: 10.1126/science.aaf2403 * |
PRABIN BAJGAIN ET AL: "Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)", BMC GENOMICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 12, no. 1, 18 July 2011 (2011-07-18), pages 370, XP021104723, ISSN: 1471-2164, DOI: 10.1186/1471-2164-12-370 * |
SOO HYUNG EO ET AL: "Comparative transcriptomics and gene expression in larval tiger salamander () gill and lung tissues as revealed by pyrosequencing", GENE, ELSEVIER AMSTERDAM, NL, vol. 492, no. 2, 10 November 2011 (2011-11-10), pages 329 - 338, XP028354448, ISSN: 0378-1119, [retrieved on 20111123], DOI: 10.1016/J.GENE.2011.11.018 * |
STAHL PATRIK I ET AL: "Visualization and analysis of gene expression in tissue sections by spatial transcriptomics", SCIENCE, vol. 353, no. 6294, 1 July 2016 (2016-07-01), US, pages 78 - 83, XP093007918, ISSN: 0036-8075, Retrieved from the Internet <URL:https://www.science.org/doi/epdf/10.1126/science.aaf2403?adobe_mc=MCMID%3D07070798116302081861430570191610448449%7CMCORGID%3D242B6472541199F70A4C98A6%2540AdobeOrg%7CTS%3D1670947593> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101032750B1 (en) | Processes Using Dual Specificity Oligonucleotide and Dual Specificity Oligonucleotide | |
CN106460052B (en) | Synthesis of double-stranded nucleic acid | |
EP2365078B1 (en) | Processes using dual specificity oligonucleotide and dual specificity oligonucleotide | |
US20130183718A1 (en) | Method for Synthesizing RNA using DNA Template | |
EP3620533B1 (en) | Closed nucleic acid structures | |
US20020025526A1 (en) | Use of predetermined nucleotides having altered base pairing characteristics in the amplification of nucleic acid molecules | |
US8741569B2 (en) | Methods for normalizing and for identifying small nucleic acids | |
WO2006102309A2 (en) | Methods, compositions, and kits for detection of micro rna | |
US20130045894A1 (en) | Method for Amplification of Target Nucleic Acids Using a Multi-Primer Approach | |
AU2018380154B2 (en) | System and method for nucleic acid library preparation via template switching mechanism | |
WO2024081622A1 (en) | Improvement to cdna library priming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23806118 Country of ref document: EP Kind code of ref document: A1 |