WO2024081622A1

WO2024081622A1 - Improvement to cdna library priming

Info

Publication number: WO2024081622A1
Application number: PCT/US2023/076438
Authority: WO
Inventors: Joseph W. Foley
Original assignee: The Board Of Trustees Of The Leland Stanford Junior University
Priority date: 2022-10-11
Filing date: 2023-10-10
Publication date: 2024-04-18

Abstract

The present invention provides modified oligo(dT) primers containing at least one deliberate mismatch within the dT span of the primer which provides improved stability and replicability of the resulting cDNA molecules. In the context of RNA sequencing applications, incorporation of the modified oligo(dT) primers results in fewer sequence reads lost to PCR artifacts and easier detection of the end position of each sequence read.

Description

IMPROVEMENT TO CDNA LIBRARY PRIMING

FIELD OF THE INVENTION

[0001] The present invention relates generally to methods for preparing cDNA by reverse transcription of an RNA template and more particularly to improvements to oligo(dT)-priming, which is used to initiate first-strand cDNA synthesis in the reverse transcription reaction.

US GOVERNMENT SPONSORSHIP

[0002] This invention was made with Government support under contracts CA193694 and CA233254 awarded by the National Institutes of Health. The Government has certain rights in the invention.

BACKGROUND

[0003] In many forms of research studying ribonucleic acid (RNA), each original strand of RNA must have its base sequence reverse-transcribed into a complementary DNA (cDNA) molecule, because DNA is more amenable to various standard techniques such as amplification and sequencing. Like other forms of DNA synthesis, reverse transcription begins by adding nucleotides to the 3' end of a short primer strand of DNA that has hybridized to a complementary sequence on the RNA strand. Many approaches targeted at recovering a diverse library of different cDNAs use a short sequence of deoxythymidines (oligo(dT)) as the primer, because it hybridizes to the longer tail of adenines (poly(A)) that is found on almost every mature non-ribosomal RNA. A downside of this approach is that every base sequence in the resulting cDNA library then includes a homopolymer of continuous dT bases, which creates a number of different problems. For example, the long stretch of A:T pairs in the resulting cDNA is unstable and prone to DNA breathing, strand invasion, and mispriming artifacts during PCR. In addition, the homopolymer can stall the polymerase, reducing the yield of successful molecules during PCR as well as during sequencing. Further, the precise boundaries of the homopolymer sequence are difficult to identify in sequence data. This invention modifies the oligo(dT) primer to address these problems and related disadvantages associated with standard oligo(dT) primers. BRIEF SUMMARY

[0004] The present invention provides oligonucleotide primers and related compositions and methods that improve the stability and replicability of cDNA molecules produced during reverse transcription. When incorporated into protocols for RNA sequencing, the primers described here result in fewer sequence reads lost to artifacts during amplification and sequencing and provide easier detection of the end position of each sequence read after sequencing.

[0005] Accordingly, in embodiments, the invention provides synthetic oligonucleotide primers including a 5' region and a 3' region, where the primer includes a span of from 10-40 thymine bases, and where the span of thymine bases is contiguous except for the substitution of from 1- 5 non-contiguous thymine bases with a non-thymine base.

[0006] The synthetic oligonucleotide primer may also include where the 5' region includes a sequence that is at least partially complementary to a predetermined primer or adapter sequence.

[0007] The synthetic oligonucleotide primer may also include where the primer does not contain a sequence of more than 2 contiguous non-thymine bases in its 5' region outside the span of from 10-40 thymine bases.

[0008] The synthetic oligonucleotide primer may also include where the primer is from 10 to 70 nucleotides in length, or from 10 to 60 nucleotides, or from 10 to 50 nucleotides, or from 10 to 40 nucleotides, or from 10 to 30 nucleotides, or from 10 to 20 nucleotides in length.

[0009] The synthetic oligonucleotide primer may also include where the 1-5 non-thymine bases are each independently selected from cytosine and guanine.

[0010] The synthetic oligonucleotide primer may also include where the 1-5 non-thymine bases comprise at least one adenine.

[0011] The synthetic oligonucleotide primer may also include where the 5’ region includes one or more variable sequence regions configured to be unique for each primer in a set of primers.

[0012] The synthetic oligonucleotide primer may also include where the variable sequence region is an index or barcode sequence.

[0013] The synthetic oligonucleotide primer may also further include a terminal 3' base selected from the group consisting of adenine, cytosine, and guanine. [0014] The synthetic oligonucleotide primer may also further include two terminal 3' bases in the configuration 3'-NV, where N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is a base selected from the group consisting of adenine, cytosine, and guanine.

[0015] The synthetic oligonucleotide primer may also further include a blocking group such as a biotin molecule or a non-natural nucleotide covalently attached to the 5' end of the primer. [0016] The synthetic oligonucleotide primer may also further include where the primer is covalently attached at its 5' terminal end to a solid surface such as a bead. The primer may further include a linker sequence between the bead and the 5' terminal end.

[0017] Other technical features of the synthetic oligonucleotide primers described here may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.

[0018] The invention also provides methods for preparing complementary deoxyribonucleic acid (cDNA), the methods comprising hybridizing a synthetic oligonucleotide primer as described herein to a target ribonucleic acid (RNA) and synthesizing a first cDNA strand complementary to at least a portion of the RNA molecule. In embodiments, the 5' region of the synthetic oligonucleotide primer includes a sequence complementary to at least a portion of a sequencing primer or adapter. In embodiments, the 5' terminal end of the synthetic oligonucleotide primer is covalently attached to a blocking group such as a biotin molecule or a non-natural nucleotide. In embodiments, the target RNA is one of a plurality of fragmented RNA molecules. In embodiments, the synthesizing a first cDNA strand is performed by a reverse transcriptase, optionally a recombinant Moloney murine leukemia virus (MMLV) derived reverse transcriptase. In embodiments, the recombinant MMLV derived reverse transcriptase lacks an RNase H domain when compared to the native MMLV enzyme, for example the RevertAid H Minus reverse transcriptase available from ThermoFisher Scientific, and similar MMLV enzymes.

[0019] The invention also provides a kit of parts for preparing complementary deoxyribonucleic acid (cDNA) comprising a synthetic oligonucleotide primer as described herein and optionally one or more of a reactant mixture, the reactant mixture including deoxynucleoside triphosphates and a source of magnesium, a reverse transcriptase, a templateswitch oligonucleotide, and PCR reagents, the PCR reagents including primers of which one is at least partially complementary to the synthetic oligonucleotide primer as described herein and of which the other is optionally at least partially complementary to an adapter sequence to be added to the 3 ’ end of the nascent cDNA strand or to a target sequence within the cDNA, a DNA polymerase, and a buffer solution. Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] FIG. 1A illustrates hybridization of a standard oligo(dT) primer to the poly(A) tail of an unknown target mRNA molecule. This primer is not part of the present invention and is illustrated for reference.

[0021] FIG. IB illustrates double-stranded cDNA molecules produced following primer extension of the primers in FIG. 1A.

[0022] FIG. 2A illustrates a modified oligo(dT) primer of the invention which includes a 5' anchor and its hybridization to the poly(A) tail of an unknown target mRNA molecule as well as the double-stranded cDNA resulting from primer extension.

[0023] FIG. 2B illustrates reduction of “DNA breathing” provided by an embodiment of the modified oligo(dT) primers of the invention.

[0024] FIG. 3 illustrates an advantage of the modified oligo(dT) primers in accordance with one embodiment.

[0025] FIG. 4 depicts the results of an experiment demonstrating the higher quality sequence reads obtained with embodiments of modified oligo(dT) primers of the invention which include a 5' anchor, compared to a reference standard anchor oligo(dT) primer.

[0026] FIG. 5 illustrates the performance of a modified oligo(dT) primer in accordance with the present disclosure compared with that of a standard oligo(dT) primer in the data-processing step of trimming poly(A)-derived cDNA sequence from the sequence reads.

[0027] FIG. 6 shows only the proportions of correctly trimmed reads, zero position error, for the modified and standard oligo(dT) primers of FIG. 5.

[0028] FIG. 7 shows the results of a differential expression experiment utilizing modified oligo(dT) primers of the invention which include a 5' anchor in accordance with one embodiment. DETAILED DESCRIPTION

[0029] In order for reverse transcription to proceed, it is necessary to introduce a short DNA primer that can hybridize to the target RNA strand. This is typically a short sequence of deoxythymidines, referred to as an “oligo(dT)” primer or first strand cDNA primer. Oligo(dT) primers hybridize to the polyadenosine sequence found on most mature non-ribosomal RNAs. However, it may be undesirable for the cDNA molecules of the resulting library to contain long stretches of A:T pairs because their instability renders them prone to DNA breathing, strand invasion, and mispriming artifacts during amplification. In addition, the homopolymer can stall the polymerase, reducing the yield both during the initial amplification of the library and during sequencing. Finally, the precise boundaries of the homopolymer sequences are difficult to identify in sequence data.

[0030] To address these issues, the present invention provides a modified oligo(dT) primer that is stabilized by substitution of discrete thymine bases within the contiguous oligo(dT) span. In particular, discrete thymine bases are substituted with non-thymine bases. These nonthymine bases deliberately mismatch, that is they are not complementary to and do not form Watson-Crick base-pairs with the adenine bases of the poly(A) tail of the target RNA, also referred to herein as the RNA template. Although each mismatch potentially weakens the affinity of the oligo(dT) primer for its poly(A) target, which is disadvantageous, the present inventors found, unexpectedly, that a certain number of deliberate mismatches provides benefits in the form of an improvement in the stability and replicability of the resulting cDNA molecules. In addition, cDNA libraries created using modified oligo(dT) primers according to the present invention contain fewer sequence reads lost to PCR artifacts and provide for easier detection of the end position of each sequence read.

[0031] Accordingly, in embodiments, the invention provides modified oligo(dT) primers comprising 1-5 or 1-3 non-thymine bases, preferably guanine or cytosine, each replacing a thymine base within a span of about 10-40 otherwise contiguous thymine bases at the 3' end of the primer, referred to herein as the “oligo(dT) span”. In embodiments, the oligo(dT) span may comprise about 30 thymine bases, or from 12-30 or from 18-30 thymine bases. In embodiments, the modified oligo(dT) primer contains 2 or 3 non-thymine bases, preferably guanine or cytosine, within an oligo(dT) span of about 30 thymine bases. Preferably, the non-thymine bases are located at least 2-5 or at least 4-5 bases from the 3' terminal end of the primer. In embodiments, the non-thymine bases are placed within the oligo(dT) span at least about 2-20 nucleotides from the 3' end of the primer. In embodiments, the non-thymine bases are placed at least about 2-20 nucleotides from each other where the modified oligo(dT) primer contains more than one non-thymine base within the oligo(dT) span. In embodiments where the modified oligo(dT) primer contains more than one non-thymine base within the oligo(dT) span, the non-thymine bases are placed about 5, about 10, about 15, or about 20 nucleotides from each other. In embodiments, the spacing between the non-thymine bases may be variable.

[0032] In embodiments, the modified oligo(dT) primer may further comprise a biotin moiety covalently attached to the 5' end of the primer.

[0033] In embodiments, the modified oligo(dT) primers described here may incorporate one or two random bases at the 3' end of the primer as an "anchor" complementary to the last one or two non-adenine bases of a template RNA in order to avoid priming cDNA synthesis farther up the RNA's poly(A) tail. Thus, in embodiments, a modified oligo(dT) primer as described here may further comprise one or two 3' anchoring bases which hybridize to the one or two terminal bases of the target mRNA molecule preceding the poly(A) tail. Accordingly, the invention provides modified oligo(dT) primers comprising a 3' terminal base V, wherein V is selected from the group consisting of adenine, cytosine, and guanine; or comprising two 3' terminal bases 3'-NV wherein N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is selected from the group consisting of adenine, cytosine, and guanine.

[0034] In embodiments, the invention also provides a modified oligo(dT) primer comprising a terminal 3'-V or a terminal 3'-NV anchor and from 1-5 non-thymine bases, preferably selected from cytosine and guanine, located within a 3' span of contiguous thymine bases, for example a span of 20-50 contiguous thymine bases, or a span of about 30 contiguous thymine bases at the 3' end of the primer.

[0035] The benefits of utilizing the modified oligo(dT) primers disclosed herein occur not during reverse transcription itself but at several points afterward. First, the modified oligo(dT) primers disclosed herein result in cDNA having improved stability and replicability. Without being bound by any particular theory, after the first strand of cDNA has been synthesized by reverse transcription, one or more duplicate DNA strands with an analogous (uracil replaced by thymine) or complementary (Watson-Crick paired DNA bases in the opposite orientation) base sequence to the original RNA template may be synthesized by a variety of methods; the synthesis of these new strands may be more efficient because polymerases tend to stall when they encounter a long homopolymer in the template sequence, such as a long oligo(dT) sequence. The modified oligo(dT) primers create an interruption in what would otherwise be such a long oligo(dT) homopolymer. This efficiency accrues exponentially during the polymerase chain reaction (PCR) amplification of the cDNA because it is repeated each time the cDNA sequence is replicated, i.e. in every cycle of PCR or every cycle of cluster generation on a sequencing flow cell.

[0036] Second, the modified oligo(dT) primers disclosed herein result in fewer sequence reads lost to PCR artifacts. Without being bound by any particular theory, the double-stranded cDNA molecule resulting from a standard oligo(dT) primer contains a long stretch of A:T base pairs, which have lower stability than C:G pairs, rendering this portion of the molecule prone to so- called “DNA breathing” which refers to a random fluctuating separation of the strands. This renders the cDNA susceptible to strand invasion, which refers to the annealing of a foreign DNA oligonucleotide to one of the separated strands. If this occurs in the product of an ongoing polymerization reaction, it may result in priming the synthesis of the foreign cDNA strand, thereby creating various byproducts. The modified oligo(dT) primers described here interrupt the unstable A:T stretch, preferably with stabilizing C:G pairs, and therefore reduce the likelihood of strand invasion and reducing PCR artifacts.

[0037] Third, the modified oligo(dT) primers disclosed herein provide for easier detection of the end position of each sequence read. Without being bound by any particular theory, if the cDNA is sequenced, the sequence read may proceed all the way through the distinct cDNA sequence and continue into the oligo(A:T) as well. For many tasks it is then necessary to identify where the oligo(A:T) sequence begins in order to trim it off the sequence reads before searching for a matching sequence since the poly(A) tail is not transcribed from the genomic sequence. It can be difficult to identify the true site where the homopolymer sequence begins in a sequence read because errors in replication or sequencing may cause an adenine derived originally from the poly(A) tail to be misread as a different base, or a different base derived from distinct cDNA sequence to be misread as adenine. The presence of non-thymine bases at known positions in the oligo(dT) sequence of the amplified cDNA molecules, and therefore non-adenine bases at known positions in the complementary oligo(dA) sequence, allows any standard matching algorithm to precisely identify the beginning of that sequence even in the presence of errors. [0038] The modified oligo(dT) primers described here may be substituted for standard oligo(dT) primers in various protocols requiring conversion of an RNA template to cDNA followed by amplification, including without limitation RNA-seq protocols. Incorporation of the modified oligo(dT) primers described here into RNA-seq protocols that sequence cDNA derived from the 3' end of the original RNA, nearest to the poly(A) tail, is expected to be particularly advantageous.

[0039] In an exemplary embodiment, the modified oligo(dT) primers described here are substituted for standard oligo(dT) primers in a Smart-3 SEQ protocol for sequencing RNA (Foley et al., Genome Res. 2019 Nov; 29(11): 1816-1825). In an initial step, target RNA is fragmented, for example using divalent cation and elevated temperatures, which may be for example 80°C or 95 °C in the presence of magnesium for from about 1-5 minutes. Without further purification or enrichment, the fragmented RNA is subjected to reverse transcription (RT). The RT reaction is primed by hybridizing a modified oligo(dT) primer of the invention to the fragmented RNA. In embodiments, the 5' region of the modified oligo(dT) primer includes a sequence complementary to at least a portion of a sequencing primer or adapter. For example, incorporating a partial downstream sequencing adapter into the first cDNA strand eliminates the need to incorporate the adapter in a subsequent ligation reaction. The RT reaction is performed by an MMLV-derived reverse transcriptase which allows for incorporation of a second adapter primer into the cDNA. Thus, after extending the first cDNA strand, MMLV- derived reverse transcriptase typically extends several non-template bases at the 3 'end, which are primarily cytosines. This provides a target for hybridization with a second oligonucleotide containing a short 3' oligo(G) and the innermost portion of an upstream sequencing adapter. The MMLV reverse transcriptase then performs a “template switch”, further extending the cDNA strand with sequence complementary to the sequencing adapter in the second oligonucleotide. Using this protocol, the reverse transcription produces a cDNA strand with adapter sequences at both ends in a single incubation. Next, the adapters are extended to full length and the cDNA molecules are amplified using a PCR reaction with primers complementary to the adapter sequences. The amplified double stranded cDNA library may be further purified, or optionally concentrated and then purified prior to sequencing.

[0040] FIG. 1A illustrates hybridization of a standard oligo(dT) primer 104a, 104b, 104c to the poly(A) tail of an unknown target mRNA 102 molecule. A standard primer may optionally include an adapter 106 sequence. The figure illustrates the random nature of primer hybridization when using standard oligo(dT) primers. Thus, the primer is depicted as hybridized in three different locations along the poly(A) tail of the target mRNA sequence. It is understood that these three locations are exemplary among many possibilities. The standard oligo(dT) primer is not within the scope of the present invention and is depicted for reference.

[0041] FIG. IB illustrates double-stranded cDNA molecules produced following primer extension of the standard oligo(dT) primers in FIG. 1A. As shown graphically in the figure, the resulting cDNA sequences disadvantageously contain long and variable poly(A:T) homopolymers 108a, 108b 108c.

[0042] FIG. 2A illustrates a modified oligo(dT) primer according to one embodiment and its hybridization to the poly(A) tail of an unknown target mRNA 102 molecule. As illustrated, the modified oligo(dT) primer includes two non-thymine bases, both guanine (G), a 3' anchor moiety, 3'-NV, and an optional 5' adapter sequence (Y). The anchor portion of the modified oligo(dT) primer is shown hybridized to the two terminal bases of the target mRNA molecule preceding its poly(A) tail. “V” is a base selected from the group consisting of adenine, cytosine, and guanine. “N” is a base selected from the group consisting of thymine, adenine, cytosine, and guanine. The optional 5' sequence (Y) is a predefined adapter sequence that may include a sequence at least partially complementary to an amplification or sequencing primer.

[0043] Also shown is the cDNA molecule 206 resulting from primer extension and second- strand synthesis or amplification 202. The modified oligo(dT) primers interrupt the unstable A:T homopolymer of the cDNA molecule with stabilizing C:G pairs thereby reducing the likelihood of strand invasion and reducing PCR artifacts.

[0044] FIG. 2B illustrates disadvantageous “DNA breathing” 208 that may occur within the oligo (A:T) span of a cDNA resulting from synthesis using a standard oligo(dT) primer and the substantial reduction of this phenomenon 212 provided by an embodiment of the modified oligo(dT) primers described here.

[0045] FIG. 3 illustrates an advantage of the modified oligo(dT) primers in accordance with one embodiment. Top schematic exemplifies the ambiguity in identifying the beginning of the poly(A) homopolymer and bottom schematic illustrates how the precise boundary can be identified using a modified oligo(dT) primer as described here.

[0046] FIG. 4 shows the results of a modified Smart-3 SEQ protocol (Foley et al., Genome Res. 2019 Nov; 29(11): 1816-1825) performed on 3 nanograms of Universal Human Reference RNA (Agilent Technologies) with a standard oligo(dT) primer having a 3' tail of 30 contiguous thymine bases (0G) as the reference primer and three embodiments of a modified oligo(dT) primer of the present invention, each having the same sequences as the reference primer except for 1, 2, or 3 guanine bases within the dTio portion of the primer (1G, 2G, 3G). Briefly, total RNA was fragmented by incubating 1 min at 95 °C in 1.67X reaction buffer (50 mM Tris-HCl, 50 mM KC1, 4 mM MgC12, 10 mM DTT, pH 8.3 at IX; Thermo Fisher) in the presence of 1 mM dNTPs (Kapa Biosystems) and 583 nM oligo(dT) primer of the specified design, followed by 1 min at 25 °C to hybridize the primer. Reverse transcription reagents, comprising 1 M trimethylglycine (MilliporeSigma), 4 mM additional MgC12 (MilliporeSigma), 1 U/pL RNase inhibitor (Thermo Fisher), 1 pM template-switch oligonucleotide (as described in Foley et al., but with thymidine residues replaced by uracil; Integrated DNA Technologies), and 10 U/pL RevertAid H Minus reverse transcriptase (Thermo Fisher), were then added to the previous sample, bringing down the reaction buffer to IX. The reaction was incubated 30 min at 42 °C followed by heat-inactivation 30 sec at 95 °C. PCR reagents, comprising 0.5X Fidelity Buffer and 0.02 U/pL HiFi HotStart Polymerase (Kapa Biosystems), 3 mM disodium EDTA (Thermo Fisher), additional trimethylglycine to 1 M (MilliporeSigma), 0.025 U/pL E. coli uracil-DNA glycosylase (New England Biolabs), and indexed PCR primers (as described in Foley et al.; Integrated DNA Technologies) were then added to the previous sample, doubling the volume and reducing the concentrations of previous reagents by half. The mixture was incubated 10 min at 37 °C for removal of the template-switch oligonucleotide followed by 45 sec at 98 °C for initial denaturation, then 19 PCR cycles comprising 15 sec at 98 °C, 30 sec at 60 °C, and 15 sec at 72 °C, followed by a final extension of 1 min at 72 °C. The resulting libraries were purified with a 1.8X volume of AMPure XP bead suspension (Beckman Coulter) according to the manufacturer’s instructions. The purified libraries were sequenced with a MiSeq Nano kit, 300 cycles (Illumina).

[0047] Technical replicates of the same oligo(dT) primer design are grouped vertically. The sequences of the primers are shown in Table 1. Each of the primers also contained a biotin moiety linked to the 5' end of the molecule to prevent concatenation of additional adapters by template-switching.

[0048] The results show a greater proportion of sequence reads passed the Illumina chastity filter, which discards reads with poor quality in the first 25 bases, when using a modifed oligo(dT) primer of the present invention, compared to the reference primer. This shows that the modified oligo(dT) primer causes fewer reads to be wasted on unsequenceable artifacts and therefore generates more usable data per sequencing run.

[0049] Table 1 : Sequences of primers used in validation experiment

Seq

Name SEQUENCE

Identifier

SEQ ID 0G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT

NO: 1 Primer _{TTTTTTTTTTTTTTTTTTTTTTTV}

SEQ ID 1G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT

NO: 2 Primer _{TTTTTTTTGTTTTTTTTTTTTTTV}

SEQ ID 2G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT

NO: 3 Primer TTTTGTTTTTTTTGTTTTTTTTTV

SEQ ID 3G RT GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTT

NO: 4 Primer GTTTTTTTGTTTTTTTGTTTTTTV

[0050] The following example demonstrates the advantages of the compositions and methods described here in bioinformatic analyses of the sequence reads. FIG. 5 compares a modified oligo(dT) primer in accordance with the present disclosure with a standard oligo(dT) primer in the data-processing step of trimming poly(A)-derived cDNA sequence from the sequence reads. ERCC ExFold RNA Spike-In Mixes (Thermo Fisher Scientific) were processed by the standard Smart-3 SEQ protocol (vl) with a 30T reverse-transcription primer (SEQ ID NO: 5) or by a modified protocol (v2) in which the primer was punctuated by two guanine substitutions (SEQ ID NO: 6), two replicates of each ERCC mix, and sequenced on the NextSeq 500 (Illumina), as described above. Illumina adapter sequences were removed by the bcl2fastq software, then the poly(A) sequence was removed by CutAdapt 4.4 with default settings but different base sequences to be trimmed. Table 2 shows the full 30-base reverse complement of each version of the oligo(dT) section of the primer, or the first 9 bases that were the same in both versions. The sequence reads in which the target sequence was identified and trimmed were aligned to the ERCC reference sequences (NIST) by Novoalign 3.09.04 (Novocraft) with default settings. The position of the trimmed end of each aligned sequence read was then compared with the last non-A base of the reference sequence to which it aligned; the aligner was allowed to soft-clip each read but the position error was calculated from the unclipped, trimmed read end position, with a negative offset corresponding to overtrimming, i.e. removing all the poly(A) sequence as well as part of the read that should have been derived from the useful non-A transcript sequence.

[0051] Table 2: Sequences of primers used in validation experiment

Seq Name SEQUENCE

Identifier

SEQ ID Long trim AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

NO: 5 (vl)

SEQ ID Long trim AAAAAAAAACAAAAAAAAACAAAAAAAAAA

NO: 6 (v2)

SEQ ID

Short trim AAAAAAAAA

NO: 7

[0052] With both versions of the primer, a large proportion of reads were trimmed to an incorrect end position due to the low complexity of the poly(A) target sequence, and overtrimming was more frequent than undertrimming. However, the modified primer (v2) resulted in substantially more reads that were trimmed to the correct end position, even when using only the short trim sequence common to both primers. FIG. 6 compares only the proportions of correctly trimmed reads, zero position error. These results show how the modified oligo(dT) primer enables more accurate bioinformatic analysis of data from poly(A) RNA-seq libraries.

[0053] FIG. 7 shows the results of a differential expression experiment. Varying amounts of Universal Human Reference RNA (Agilent Technologies) and Human Brain Reference RNA (Thermo Fisher Scientific) were processed by a modified version of the Smart-3 SEQ protocol as described above but with the removal of template-switch oligonucleotide reduced to 6 min at 37 C using a reinforced reverse transcription primer with two evenly spaced thymine bases in the dTso portion replaced by guanine, two technical replicates per condition.

[0054] The resulting libraries were sequenced with a NextSeq 500 High Output v2.5 kit, 75 cycles (Illumina) and aligned to the hg38 human reference genome with GENCODE transcription annotations using STAR aligner. Correctly oriented gene-aligned read counts were used to calculate differential expression between the two RNA samples using DESeq2, in a separate analysis for each amount of input RNA. The results were compared with previous data from a TaqMan qPCR assay of 999 genes in the same RNA samples (MAQC Consortium, 2006). Smart-3 SEQ with the reinforced reverse transcription primer showed strong concordance with TaqMan qPCR using very low amounts of input RNA, approaching a single cell (10 pg). These results demonstrate a successful application of the modifed oligo(dT) primer of the present invention in a complete RNA sequencing library preparation protocol.

[0055] While various embodiments and aspects of the present invention are shown and described herein, it will be obvious to those skilled in the art that such embodiments and aspects are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention.

[0056] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described. All documents, or portions of documents, cited in the application including, without limitation, patents, patent applications, articles, books, manuals, and treatises are hereby expressly incorporated by reference in their entirety for any purpose.

[0057] The abbreviations used herein have their conventional meaning within the chemical and biological arts. The chemical structures and formulae set forth herein are constructed according to the standard rules of chemical valency known in the chemical arts.

[0058] While the invention herein disclosed has been described by means of specific embodiments and applications thereof, numerous modifications and variations could be made thereto by those skilled in the art without departing from the scope set forth in the claims.

[0059] It will be appreciated that the present invention is set forth in various levels of detail in this application. In certain instances, details that are not necessary for one of ordinary skill in the art to understand the invention, or that render other details difficult to perceive may have been omitted. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting beyond the scope of the appended claims. Unless defined otherwise, technical terms used herein are to be understood as commonly understood by one of ordinary skill in the art to which the disclosure belongs.

[0060] It should be understood that, as described herein, an “embodiment” (such as illustrated in the accompanying Figures) may refer to an illustrative representation of a process or article or component in which a disclosed concept or feature may be provided or embodied, or to the representation of a manner in which just the concept or feature may be provided or embodied. However such illustrated embodiments are to be understood as examples (unless otherwise stated), and other manners of embodying the described concepts or features, such as may be understood by one of ordinary skill in the art upon learning the concepts or features from the present disclosure, are within the scope of the disclosure. Thus, it is intended that the present subject matter covers such modifications and variations as come within the scope of the appended claims and their equivalents.

[0061] The presently disclosed embodiments are to be considered in all respects as illustrative and not restrictive, the scope of the claimed subject matter being indicated by the appended claims, and not limited to the foregoing description or particular embodiments or arrangements described or illustrated herein.

[0062] In the foregoing description and the following claims, the following will be appreciated. The phrases “at least one”, “one or more”, and “and/or”, as used herein, are open- ended expressions that are both conjunctive and disjunctive in operation. The terms “a”, “an”, “the”, “first”, “second”, etc., do not preclude a plurality. For example, the term “a” or “an” entity, as used herein, refers to one or more of that entity. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein.

[0063] The term “about” when used before a numerical designation, e.g., temperature, time, amount, concentration, and such other, including a range, indicates approximations which may vary by ( + ) or ( - ) 10%, 5%, 1%, or any subrange or subvalue there between. Preferably, the term “about” means that the value may vary by +/- 10%.

[0064] The term “comprises/comprising” does not exclude the presence of other elements, components, features, regions, integers, steps, operations, etc. Additionally, although individual features may be included in different claims, these may possibly advantageously be combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. By contrast, the transitional phrase “consisting of’ excludes any element, step, or ingredient not specified in the claim. The transitional phrase “consisting essentially of’ limits the scope of a claim to the specified materials or steps “and those that do not materially affect the basic and novel characteristic(s)” of the claimed invention.

[0065] The term “complement,” refers to a nucleotide (e.g., RNA or DNA) or a sequence of nucleotides capable of base pairing with a complementary nucleotide or sequence of nucleotides. Complementarity is determined by the ability of an associated nitrogenous base of a nucleotide, also referred to as a “nucleobase” or simply a “base”, to hydrogen bond with the nitrogenous base of a different nucleotide, e.g., a nucleotide on a different nucleic acid. This interaction may also be referred to as “base pairing”. The base adenine binds to thymine or uracil and the base guanine binds to cytosine. Adenine may therefore be referred to as the complement of thymine or uracil and guanine may be referred to as the complement of cytosine, and vice versa. Thus, a complement may include a sequence of nucleotides that base pair with corresponding complementary nucleotides of a second nucleic acid sequence.

[0066] The term “barcode” or “index” in the context of a subsequence of an oligonucleotide primer as described herein refers to one or more nucleotide sequences that are used to identify a cell or a plurality of cells with which the barcode is associated. Barcodes encoded in a primer may be from 4-40 nucleotides in length, including any length within these ranges, such as 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, nucleotides in length. A barcode is considered “unique” when the barcode is present in about one cell in a population of cells.

[0067] The term “nucleic acid” refers to a polymer of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be used herein as shorthand for deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

[0068] The term “nucleoside” refers, in the usual and customary sense, to a glycosylamine including a nitrogenous base, also referred to as a “nucleobase”, and a five-carbon sugar, i.e., ribose or deoxyribose. Non limiting examples of nucleosides include cytidine, uridine, adenosine, guanosine, thymidine and inosine.

[0069] The term “nucleotide” refers, in the usual and customary sense, to the monomeric units of nucleic acids, each unit consisting of a nucleoside and a phosphate. [0070] The term “base” as used herein with reference to sequences of nucleic acids refers to the nucleobase moiety of the nucleoside, e.g., cytosine, adenine, guanine, thymine, and uracil. [0071] The terms “oligonucleotide,” “nucleic acid sequence,” and “polynucleotide” are used interchangeably and are intended to include a polymeric form of nucleotides covalently linked together that may have various lengths, either deoxyribonucleotides or ribonucleotides, or analogs, derivatives or modifications thereof. An oligonucleotide is typically composed of a sequence of nucleotides comprising nucleobases selected from adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U). Thus, the term “polynucleotide sequence” may refer to the alphabetical representation of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself.

Claims

CLAIMS What is claimed is:

1. A synthetic oligonucleotide primer comprising a 5' region and a 3' region, wherein the primer comprises a span of from 10-40 thymine bases, wherein the span of thymine bases is contiguous except for the substitution of from 1-5 non-contiguous thymine bases with a nonthymine base.

2. The synthetic oligonucleotide primer of claim 1, wherein the 5' region comprises a defined sequence.

3. The synthetic oligonucleotide primer of claim 2, wherein the defined sequence comprises a sequence that is at least partially complementary to the sequence of an amplification or sequencing primer.

4. The synthetic oligonucleotide primer of claim 1, wherein the primer does not contain a sequence of more than 2 contiguous non-thymine bases in its 5' region outside the span of from 10-40 thymine bases.

5. The synthetic oligonucleotide primer of any one of claims 1 to 4, wherein the primer is from 10 to 70 nucleotides in length, or from 10 to 60 nucleotides, or from 10 to 50 nucleotides, or from 10 to 40 nucleotides, or from 10 to 30 nucleotides, or from 10 to 20 nucleotides in length.

6. The synthetic oligonucleotide primer of any one of claims 1 to 5, wherein each of the 1-5 non-thymine bases is independently selected from cytosine and guanine.

7. The synthetic oligonucleotide primer of any one of claims 1 to 5, wherein the 1-5 non- thymine bases comprise at least one adenine.

8. The synthetic oligonucleotide primer of any one of claims 1 to 7, wherein the 5’ region includes one or more variable sequence regions configured to be unique for each primer in a set of primers.

9. The synthetic oligonucleotide primer of claim 8, wherein the variable sequence region is an index or barcode sequence.

10. The synthetic oligonucleotide primer of any one of claims 1 to 9, further comprising a terminal 3' base selected from the group consisting of adenine, cytosine, and guanine.

11. The synthetic oligonucleotide primer of any one of claims 1 to 10, further comprising two 3' terminal bases in the configuration 3'-NV, where N is a base selected from the group consisting of thymine, adenine, cytosine, and guanine and V is a base selected from the group consisting of adenine, cytosine, and guanine.

12. The synthetic oligonucleotide primer of any one of claims 1 to 11, further comprising a blocking group such as a biotin molecule or a non-natural nucleotide covalently attached to the 5' end of the primer.

13. The synthetic oligonucleotide primer of any one of claims 1 to 12, wherein the primer is covalently attached at its 5' terminal end to a solid surface such as a bead.

14. A method for preparing complementary deoxyribonucleic acid (cDNA) comprising hybridizing the synthetic oligonucleotide primer of any one of claims 1 to 12 to a target ribonucleic acid (RNA) and synthesizing a first cDNA strand complementary to at least a portion of the RNA molecule.

15. The method of claim 14, wherein the 5' region of the synthetic oligonucleotide primer comprises a sequence complementary to at least a portion of a sequencing primer or adapter.

16. The method of claim 14, wherein the 5' terminal end of the synthetic oligonucleotide primer is covalently attached to a blocking group such as a biotin molecule or a non-natural nucleotide.

17. The method of claim 14, wherein the target RNA is one of a plurality of fragmented RNA molecules.

18. The method of claim 14, wherein the synthesizing a first cDNA strand is performed by a reverse transcriptase, optionally a recombinant Moloney murine leukemia virus (MMLV) reverse transcriptase.

19. A kit of parts for preparing complementary deoxyribonucleic acid (cDNA) comprising the synthetic oligonucleotide primer of any one of claims 1 to 12, and optionally one or more of a reactant mixture, the reactant mixture including deoxynucleoside triphosphates and a source of magnesium, a reverse transcriptase, a template-switch oligonucleotide, and further optionally, reagents for performing a PCR reaction.